Statistics (Averages, relationships, trends, graphs) are not always what they seem. There may be more in them than meets the eye, and there may be a good deal less
Statistics is better than a big lie, as it doesn’t pin reader.
The Sample with the Built-in Bias
If your sample is big enough, you will end-up making agreeable approximate statistics, but you can’t conclude the same with a biased or small sample
People generally don’t tell you about their bad-habits and bad-deeds, which make your survey unrealistic
A report based on sampling must use a representative sample, which is one from which every source of bias has been removed
One basic sample is ‘random’, choosing people randomly from the universe
The different interviewer or surveyor can come up with different conclusion by doing the same research or survey, and we get a biased result
For example, the person with more money, more education, more information and alertness, better appearance, more conventional behavior, and more settled habits will generate different results due to people’s approach towards him; and more often, the interviewer considers answers of people with good overall, etc.
Hence, both interviewer and people’s overall and approach affect the result
The Well-Chosen Average
One very good way of lying with statistics is by showing the average number
Let’s say the company is describing the average salary of its executives, which is 10 lakhs per month according to it
Hence, person H get attracted by this number and look into the source whether it is the right number or not
He got the data and is now sure about the number, and now H wants a job in the company’s executives
But then he read the same year data, about the same company, in one analyst’s report, and the number was 3 lakhs
Then he digs more to get the source of the data, and he got this
Now he got confused because both data are the same but the average number is different
By analyzing further, he concludes that both the company and analyst were right about the average salary of the company’s executive, but the averages were different
Mean is arithmetic average of the data, which we got from simply adding all the points divided by the number of points, which here is showing that every executive is earning 10 lakhs per month, which is not wrong, but its interpretation becomes wrong
Where the median average is 3 lakhs per month, which means half of the executives are earning income more than 3 lakhs and half are earning less
When the curve of the data is bell-shaped, the data is symmetrical, and there, mean, median and mode become the same, but when the curve is skewed, all of the three averages and their interpretation become different
That is the essential beauty of lying with statistics
Hence, when we look at the data contains average, we need to check which average they have used, which average can be more preventative, source of the data on which the average is based on
The Little Figures That Are Not There
When we use a small group for an experiment, it can result in a good way than the actual probability, where is a big group, we usually get the result same as an actual probability
The size of the group for the experiment is a tricky decision, depends among other things on how large and how varied a group you are studying by sampling
There are little figures that are not there, i.e. absence of something can become damaging; Knowing nothing about a subject is frequently healthier than a little learning
The deceptive thing about the little figure that is not there is that its absence so often goes unnoticed. That, of course, is the secret of its success
Statistics can make one thing very good for people, which is not as good
It can be done by using “words” that can prescribe “Good” as “Excellent”
Let’s have a look at the profit growth of company ‘A’ and ‘B’
You may conclude that both companies are having almost similar profit growth, now take a look here
If you see here, you will feel as “opened eye” because company ‘A’ is far much better than ‘B’
And ‘B’ is looking similar but you got tricked by statistics; it was the game of presentation of axis labels
The Gee-Whiz Graph
Statistics is the presentation of data
See the graph here,
Suppose this is a profit chart of your company, and you have a task to present your company to some people
Yeah there is not much profit growth, but unluckily you are the one who is going to present this, and you don’t have any clue to make it looks good
But you can do this, simply by changing the proportion between the ordinate and the abscissa
There’s no rule against it, and it does give your graph a prettier shape
That is impressive, isn’t it? Anyone looking at it can just feel prosperity
The One-Dimensional Picture
When you show some data of comparison, let’s say you are presenting data of the population of 100 years ago and now, you can present it by the pictorial graph, which is nothing but showing data with the help of some pictures
To present the population, we can show a small size man for 100 years ago population and big size man for today’s
The pictorial graph has what is known as eye-appeal. And it is capable of becoming a fluent, devious, and successful liar
The Semi-attached Figure
If you can’t prove what you want to, demonstrate something also and pretend that they are the same thing
We can show any number – let’s say profit – in many ways – like profit as a % of sales, increase or decrease from last year, the percentage of capital invested
The method is to choose the one that sounds best for our purpose and reflect the situation
Does correlation always mean?
Correlation between two things – i.e. if one is going in one direction, other will also go in the same or opposite direction– is not proof that one has caused the other
There can be a 3rd factor or reason behind the association between two
Suppose you read the below statistics in the magazine:
“Children who prefer cycle to reach school usually score good marks than children who prefer vehicle”
Here, cycle or vehicle doesn’t directly cause marks, there is a 3rd factor which is there causing it
We can say children who prefer cycle are who interested in studies and are bright, the cycle hasn’t caused good marks
Hence if B follows A, then it is not always B is caused by A, there can be factor C affecting B
The point is that when there are many reasonable explanations, you are hardly entitled to pick one that suits your taste and insist on
How to Statisticulate
Any percentage figure based on a small number of cases is likely to be misleading
Sometimes attractive figures presented by the company are born by statistics;
- For example, one company is selling its product at 50 Rs, which’s cost is 10 Rs
When calculating profit percentage, the company has two options, it can calculate it on cost or on selling price (company should tell which method it has used)
On the selling price, the profit number will be 80%
But on cost, it will be 400%
- If the company’s profit was 5% and it became 10% next year, the company can state it in many ways
Like, the company can say a rise of 5% to sound modest, or a company can say a 100% rise in profit!
- Let’s take another example:
Let’s say salaries of workers were cut by 20% due to lockdown, and the company promised them that they will increase it again by 20% after 2 months
Sounds reasonable right?
But the base works here for the payer/company – there is a bigger base when deducting the salary and will be a smaller base at the time of the increase in salary
Suppose the salary was 500 Rs per day earlier, and now it is 400 Rs due to a 20% cut
But after 2 months, the increase in salary will be 20% of 400 Rs – i.e. 80 Rs
Hence the salary will be 480 Rs, not 500 Rs
How to Talk Back to a Statistics
While looking at any statistics, we have to check…
Who says so?
Look for conscious and unconscious bias
How does he know?
If it is from research, is the sample large enough to permit any reliable conclusion?
And whether the sample is unbiased?
And it is a correlation, check whether it is big enough to mean anything, and is there more cases proved so?
What is missing?
You will not always get that by which or how many samples the statistics are backed
Watch out for the average used for that data (mean or median)
Is there any percentage number with an unspecified base?
Did somebody change the subject?
Watch out the raw-data of your statistics and conclusion contain a switch?
(one statistic shows that these much people have seen this movie on an average 15 times;
The first thing to look at is whether it is mean or median; Here, the research truly found this much people ‘says’ they watched the movie 15 times on an average;
Have they do so actually was not the subject)
Mainly author is driving us towards the correct interpretation of statistics, and also suggesting us to observe or analyze statistics with opened eye and brain by describing various examples of statisticians