## Statistics Maths Notes

Statistics is useful in many fields of life: for example, agriculture, economies, commerce, medicine, botany, biotechnology, physics, chemistry, education, sociology, administration etc. An experiment can have many outcomes. To assess the possibility of possible outcomes, one has to carry out the experiment on u large scale and keep the record meticulously. Possibilities of different outcomes can be assessed using the record. For this purpose, rules are formulated in statistics.

**Mean from classified frequency distribution**

When the number of scores in a data is large, it becomes tedious to write all numbers in the above formula and take their sum. So we use some different methods to find the sum.

Sometimes, the large data collected from an experiment is presented in a table in the grouped form. In such a case, we cannot find the exact mean of statistical data. Hence, let us study a method which gives the approximate mean, or a number nearby.

**Assumed mean method**

In the examples solved above, we see that some times the product x,4 is large. Hence it becomes difficult to calculate the mean by direct method. So let us study another method, called the ‘assumed mean method’. Finding the mean becomes simpler if we use addition and division in this method.

For example, we have to find the mean of the scores 40, 42, 43, 45, 47 and 48. The obeservation of the scores reveals that the mean of the data is more than 40. So let us assume that the mean is 40. 40-40 = 0, 42 – 40 = 2, 43 – 40 = 3, 45,- 40 = 5, 47 – 40 = 7, 48 – 40 = 8 These are called ‘deviations’. Let us find their mean. Adding this mean to the assumed mean, we get the mean of the data.

That is, mean = assumed mean + mean of the deviations

**Using the symbols**

A- for assumed mean; d- for deviation and \(\bar{d}\) – for the mean of the deviations, the formula for mean of the given data can be briefly written as \(\overline{\mathrm{X}}=\mathrm{A}+\bar{d}\).

Let us solve the same example taking 43 as assumed mean. For this, let us find the deviations by subtracting 43 from each score.

40 – 43 = -3, 42 – 43 = – 1, 43 – 43 = 0, 45 – 43 = 2, 47 – 43 = 4, 48 – 43 = 5

The sum of the deviations = -3 -1 + 0 + 2 + 4 + 5 = 7

Now, \(\overline{\mathrm{X}}=\mathrm{A}+\bar{d}\)

= 43 + \(\left(\frac{7}{6}\right)\) (as the number of deviations is 6)

= 43 + 1\(\frac{1}{6}\) = 44\(\frac{1}{6}\)

Note that; use of assumed mean method reduces the work of calculations.

Also note that; taking any score, or any other convenient number as asssumed mean does not change the mean of the data.

**Step deviation method**

We studied the direct method and assumed mean method to find the mean. Now we study one more method which reduces the calculations still further.

Find the values of d_{i} as d_{i} = x_{i} – A and write in the column.

If we can find g, the G.C.D. of all d_{i} easily, we create a column for all u_{i} where u_{i} = \(\frac{d_{i}}{g}\)

Find the mean \(\bar{u}\) of all u_{i}.

Using the formula \(\overline{\mathrm{X}}\) = A + \(\bar{u}\) g, find the mean of the data.

**Median for grouped frequency distribution**

When the number of scores in a data is large, it is difficult to arrange them in ascending order. In such case, the data is divided into groups. So let us study, with an example, how the median of grouped data is found.

Ex. The scores 6, 8, 10.4, 11, 15.5, 12, 18 are grouped in the following table.

We could not record the scores 10.4 and 15.5 in the first table, as they cannot be placed in any of the classes 6-10, 11-15, 16-20. We know that in such a case the classes are made continuous.

For this, in the first table, the lower class limits are reduced by 0.5 and the upper class limits are increased by 0.5 and the second table is prepared. In the second table, the score 15.5 is placed in the class 15.5-20.5.

Note that if the method of making groups is changed, the frequency distribution may change.

**Let’s remember!**

In the above table, the class mark of 6-10 is = \(\frac{6+10}{2}=\frac{16}{2}\) = 8;

Similarly, the class mark of 5.5 – 10.5 is = \(\frac{5.5+10.5}{2}=\frac{16}{2}\) = 8.

This shows that, if the classes are made continuous, the class marks do not change

**Let’s remember!**

If the given classes are not continuous, we have to make them continuous to find out the median.

It is difficult to write the scores in the ascending order when the number of scores is large. So the data is classified into groups. It is not possible to find the exact median of a classified data, but the approximate median is found by the formula.

**Mode for grouped frequency distribution**

We know that the score repeating maximum number of times in a data is called the mode of the data.

For example, a company manufactures bicycles of different colours. To know which colour is most wanted, the company needs to know the mode. If a company manufactures many items, it may want to know which item sells most. In such cases, the mode is needed.

We have learnt the method of finding the mode of an ungrouped data.

Now let us study the method of estimation of mode of grouped data.

The following formula is used for the purpose.

In the above formula,

L = Lower class limit of the modal class.

f_{1} = Frequency of the modal class.

f_{0} = Frequency of the class preceding the modal class.

f_{2}= Frequency of the class succeeding the modal class.

h = Class interval of the modal class.

**Let’s remember!**

We have studied the central tendencies mean, median and mode. Before selecting any of these measures, we have to know the purpose of its selection clearly.

Suppose, we have to judge the performance of five divisions of standard 10 in the internal examination. For the purpose, we have to find the ‘mean’ of marks of students in each division.

If we have to make two groups of students in a division based on their marks in the examination, we have to find the ‘median’ of their marks

If a ‘bachat’ group producing chalks wants to know about the colour of chalks having maximum demand, it will have to choose the ‘mode’.

**Pictorial representation of statistical data**

The mean, median or mode of a numerical data or analysis of the data is useful to draw some useful inferences.

We know that tabulation is one of the methods of representing numerical data in brief. But a table does not quickly reveal some aspects of the data. A common man is interested in the important aspects of a data. For example, annual budget, information about a game, etc. Let us think of another way of data representation for the purpose.

**Presentation of data**

Pictorial and graphical presentation are attractive methods of data interpretation. The tree chart below shows different methods of data interpretation.

**Method of drawing a histogram:**

- If the given classes are not continuous, make them continuous. Such classes are called extended class intervals.
- Show the classes on the X- axis with a proper scale.
- Show the frequencies of the Y- axis with a proper scale.
- Taking each class as the base, draw rectangles with heights proportional to the frequencies.

**Frequency polygon :**

The information in a frequency table can be presented in various ways. We have studied a histogram. A frequency polygon is another way of presentation.

Let us study two methods of drawing a frequency polygon.

- With the help of a histogram
- Without the help of a histogram.

(1) We shall use the histogram in figure 6.1 to learn the method of drawing a frequency polygon.

- Mark the mid-point of upper side of each rectangle in the histogram.
- Assume that a rectangle of zero height exists preceeding the first rectangle and mark its mid-point. Similarly, assume a rectangle succeeding the last rectangle and mark its mid – point.
- Join all mid-points in order by line segments.
- The closed figure so obtained is the frequency polygon.

2. Observe the following table. It shows how the coordinates of points are decided to draw a frequency polygon, without drawing a histogram.

The points corresponding to the coordinates in the fifth column are plotted. Joining them in order by line segments, we get a frequency polygon. The polygon is shown in figure 6.3. Observe it.