Statistics (Averages, relationships, trends, graphs) are not always what they seem. There may be more in them than meets the eye, and there may be a good deal less

Statistics is better than a big lie, as it doesn’t pin reader.

**The Sample with the Built-in Bias**

If your sample is big enough, you will end-up making agreeable approximate statistics, but you can’t conclude the same with a biased or small sample

People generally don’t tell you about their bad-habits and bad-deeds, which make your survey unrealistic

A report based on sampling must use a representative sample, which is one from which every source of bias has been removed

One basic sample is ‘random’, choosing people randomly from the universe

The different interviewer or surveyor can come up with different conclusion by doing the same research or survey, and we get a biased result

For example, the person with more money, more education, more information and alertness, better appearance, more conventional behavior, and more settled habits will generate different results due to people’s approach towards him; and more often, the interviewer considers answers of people with good overall, etc.

Hence, both interviewer and people’s overall and approach affect the result

**The Well-Chosen Average**

One very good way of lying with statistics is by showing the average number

Let’s say the company is describing the average salary of its executives, which is **10 lakhs** per month according to it

Hence, person **H** get attracted by this number and look into the source whether it is the right number or not

He got the data and is now sure about the number, and now **H** wants a job in the company’s executives

But then he read the same year data, about the same company, in one analyst’s report, and the number was **3 lakhs**

Then he digs more to get the source of the data, and he got this

Now he got confused because both data are the same but the average number is different

By analyzing further, he concludes that both the company and analyst were right about the average salary of the company’s executive, but the averages were different

Mean is arithmetic average of the data, which we got from simply adding all the points divided by the number of points, which here is showing that every executive is earning **10 lakhs** per month, which is not wrong, but its interpretation becomes wrong

Where the median average is **3 lakhs** per month, which means half of the executives are earning income more than 3 lakhs and half are earning less

When the curve of the data is bell-shaped, the data is symmetrical, and there, mean, median and mode become the same, but when the curve is skewed, all of the three averages and their interpretation become different

That is the essential beauty of lying with statistics

Hence, when we look at the data contains average, we need to check which average they have used, which average can be more preventative, source of the data on which the average is based on

**The Little Figures That Are Not There**

When we use a small group for an experiment, it can result in a good way than the actual probability, where is a big group, we usually get the result same as an actual probability

The size of the group for the experiment is a tricky decision, depends among other things on how large and how varied a group you are studying by sampling

There are little figures that are not there, i.e. absence of something can become damaging; Knowing nothing about a subject is frequently healthier than a little learning

The deceptive thing about the little figure that is not there is that its absence so often goes unnoticed. That, of course, is the secret of its success

Statistics can make one thing very good for people, which is not as good

It can be done by using “words” that can prescribe “Good” as “Excellent”

Let’s have a look at the profit growth of company ‘A’ and ‘B’

You may conclude that both companies are having almost similar profit growth, now take a look here

If you see here, you will feel as “opened eye” because company ‘A’ is far much better than ‘B’

And ‘B’ is looking similar but you got tricked by statistics; it was the game of presentation of axis labels

**The Gee-Whiz Graph**

Statistics is the presentation of data

See the graph here,

Suppose this is a profit chart of your company, and you have a task to present your company to some people

Yeah there is not much profit growth, but unluckily you are the one who is going to present this, and you don’t have any clue to make it looks good

But you can do this, simply by changing the proportion between the ordinate and the abscissa

There’s no rule against it, and it does give your graph a prettier shape

That is impressive, isn’t it? Anyone looking at it can just feel prosperity

**The One-Dimensional Picture**

When you show some data of comparison, let’s say you are presenting data of the population of 100 years ago and now, you can present it by the pictorial graph, which is nothing but showing data with the help of some pictures

To present the population, we can show a small size man for 100 years ago population and big size man for today’s

The pictorial graph has what is known as eye-appeal. And it is capable of becoming a fluent, devious, and successful liar

**The Semi-attached Figure**

If you can’t prove what you want to, demonstrate something also and pretend that they are the same thing

We can show any number – let’s say profit – in many ways – like profit as a % of sales, increase or decrease from last year, the percentage of capital invested

The method is to choose the one that sounds best for our purpose and reflect the situation

**Does correlation always mean?**

Correlation between two things – i.e. if one is going in one direction, other will also go in the same or opposite direction– is not proof that one has caused the other

There can be a 3^{rd} factor or reason behind the association between two

Suppose you read the below statistics in the magazine:

“Children who prefer cycle to reach school usually score good marks than children who prefer vehicle”

Here, cycle or vehicle doesn’t directly cause marks, there is a 3rd factor which is there causing it

We can say children who prefer cycle are who interested in studies and are bright, the cycle hasn’t caused good marks

**Hence if B follows A, then it is not always B is caused by A, there can be factor C affecting B**

The point is that when there are many reasonable explanations, you are hardly entitled to pick one that suits your taste and insist on

**How to Statisticulate**

Any percentage figure based on a small number of cases is likely to be misleading

Sometimes attractive figures presented by the company are born by statistics;

- For example, one company is selling its product at 50 Rs, which’s cost is 10 Rs

When calculating profit percentage, the company has two options, it can calculate it on cost or on selling price (company should tell which method it has used)

On the selling price, the profit number will be 80%

But on cost, it will be 400% - If the company’s profit was 5% and it became 10% next year, the company can state it in many ways

Like, the company can say a rise of 5% to sound modest, or a company can say a 100% rise in profit! - Let’s take another example:

Let’s say salaries of workers were cut by 20% due to lockdown, and the company promised them that they will increase it again by 20% after 2 months

Sounds reasonable right?

But the base works here for the payer/company – there is a bigger base when deducting the salary and will be a smaller base at the time of the increase in salary

Suppose the salary was 500 Rs per day earlier, and now it is 400 Rs due to a 20% cut

But after 2 months, the increase in salary will be 20% of 400 Rs – i.e. 80 Rs

Hence the salary will be 480 Rs, not 500 Rs

**How to Talk Back to a Statistics**

While looking at any statistics, we have to check…

*Who says so?*

Look for conscious and unconscious bias*How does he know?*

If it is from research, is the sample large enough to permit any reliable conclusion?

And whether the sample is unbiased?

And it is a correlation, check whether it is big enough to mean anything, and is there more cases proved so?

*What is missing?*

You will not always get that by which or how many samples the statistics are backed

Watch out for the average used for that data (mean or median)

Is there any percentage number with an unspecified base?

*Did somebody change the subject?*

Watch out the raw-data of your statistics and conclusion contain a switch?

(one statistic shows that these much people have seen this movie on an average 15 times;

The first thing to look at is whether it is mean or median; Here, the research truly found this much people **‘says’** they watched the movie 15 times on an average;

Have they do so actually was not the subject)

Mainly author is driving us towards the correct interpretation of statistics, and also suggesting us to observe or analyze statistics with opened eye and brain by describing various examples of statisticians