Photo by Jan Kaluza on Unsplash

Number Theory in False Positive and False Negative Scenarios

Kumar Brar
Analytics Vidhya
Published in
4 min readMay 23, 2020

--

In the previous article, we learnt about the concept of False Positive and False Negative and what it is all about. We went through the entire concept and discussed , how the limitations in testing or scan results, give us false results for some people/machines/activities or whatever is being tested.

If you try to apply this scenario , to the current situation in the world of Covid-19, you will understand that there too, some of the people would be testing positive or negative due to certain limitations in the testing methodology. And for that matter, this situation was highlighted by media of certain countries, raising concerns over the accuracy of test results.

So, this is a reality in our world that not 100% accuracy can be achieved as there are some limitations that need to be taken into account and due to which results may vary. But what is important is to know — how much inaccuracy is there, so that proper action can be taken, otherwise the numbers will create an environment of extreme pessimism (too much negativity) or extreme optimism (too much positivity). This is something which can turn out to be dangerous, if not taken into account.

In this article, we will understand the concept of true numbers behind Yes and No.

You certainly know people who are allergic to pollen. Pollen allergy causes a person to experience : itchy or watery eyes, itchy throat, a runny nose, sneezing, a stuffy nose, wheezing.

Let’s say George develops allergy in every spring season and he wants to get a test done to understand whether he is allergic to pollens or not. There is a test for Allergy to Pollens, but this test is not always right:

  • For people that really do have the allergy, the test says “Yes” 90% of the time
  • For people that do not have the allergy, the test says “Yes” 5% of the time (“false positive”)
Testing hypothesis in Tabular form

Now, our question is that suppose 2% population has allergy and George’s test says “Yes”, what are the chances that George really has the allergy?

What do you think? Is it 70% or 60% or what?

There are three different ways to solve this:

  • “Imagine a 1000”,
  • “Tree Diagrams” or
  • “Bayes’ Theorem”,

Let’s look at them one by one :

Try Imagining A Thousand People

When trying to understand questions like this, just imagine a large group (say 1000) and play with the numbers:

  • Of 1000 people, only 20 really have the allergy (2% of 1000 is 20)
  • The test is 90% right for people who have the allergy, so it will get 18 of those 20 right.
  • But 980 do not have the allergy, and the test will say “Yes” to 5% of them,
    which is 49 people it says “Yes” to wrongly (false positive)
  • So out of 1000 people the test says “Yes” to (18+49) = 67 people

Tabular Representation :

Tabular Representation of A Thousand People

So, 67 people get a “Yes” but only 18 people really have the allergy:

18/67 = 0.268 i.e. 27% approximately

So, even though George’s test said “Yes” , it is still only 27% likely that George is allergic to Pollen.

As A Tree

Drawing a tree diagram, can help us with the calculations. I have made use of normal pen and paper to draw out the tree diagram along with the calculations as shown :

Tree Diagram with Calculations

First of all, let’s check that all the percentages add up:

1.8% + 0.2% + 4.9% + 93.1% = 100% (good!)

And the two “Yes” answers add up to 1.8% + 4.9% = 6.7%, but only 1.8% are correct.

1.8/6.7 = 27% (same answer as above)

Bayes’ Theorem

Bayes’ Theorem has a special formula for this kind of thing:

P(A|B) = P(A) * P(B|A) / P(A) * P(B|A) + P(not A) * P(B|not A)

where

  • P means “Probability of”
  • | means “given that”
  • A in this case is “actually has the pollen allergy”
  • B in this case is “test says Yes”

So, P(A|B) means “ The probability that George actually has the allergy given that the test says Yes”

P(B|A) means “The probability that the test says Yes given that George actually has the allergy”

Therefore, A (has allergy) and B(Test says Yes)

P(A|B) or P(has|Yes) = P(has) * P(Yes|has) / P(has) * P(Yes|has) + P(not has) * P(Yes|not has)

P(A|B) or P(has|Yes) = 0.02 * 0.9 / 0.02*0.9 + 0.98 * 0.05 = 27% (same answer)

So, this is how we get a real picture of allergic people with the help of numbers. So, may be the result on the upper hand paint an extreme pessimistic or optimistic picture, but the real truth is hidden behind the numbers which can be calculated in the above mentioned three ways.

Hope this article along with the previous article may have helped in having a firm grip on the entire concept and numbers behind it.

--

--

Kumar Brar
Analytics Vidhya

I am a lifelong learner with an ongoing curiosity to learn new things and share them with others. This helps in brainstorming and implementing new ideas.