How to find the mode and median of a series of numbers. Structural characteristics of the variational distribution series. Definition of fashion in statistics

Mode and median- a special kind of averages that are used to study the structure of the variation series. They are sometimes called structural averages, in contrast to the previously discussed power-law averages.

Fashion- this is the value of the attribute (variant), which is most often found in this population, i.e. has the highest frequency.

Fashion has a great practical application, and in some cases only fashion can characterize social phenomena.

Median is the variant that is in the middle of the ordered variation series.

The median shows the quantitative limit of the value of the variable characteristic, which is reached by half of the population units. The use of the median along with the average or instead of it is advisable if there are open intervals in the variation series, because the calculation of the median does not require the conditional establishment of the boundaries of open intervals, and therefore the absence of information about them does not affect the accuracy of the calculation of the median.

The median is also used when the indicators to be used as weights are unknown. The median is used instead of the arithmetic mean in statistical methods of product quality control. The sum of absolute deviations of options from the median is less than from any other number.

Consider the calculation of the mode and median in a discrete variational series :

Determine the mode and median.

Fashion Mo = 4 years, since this value corresponds to the highest frequency f = 5.

Those. Most of the workers have 4 years of experience.

In order to calculate the median, we first find half the sum of the frequencies. If the sum of the frequencies is an odd number, then we first add one to this sum, and then divide it in half:

The median will be the eighth option.

In order to find which option will be the eighth in number, we will accumulate frequencies until we get the sum of frequencies equal to or greater than half the sum of all frequencies. The corresponding option will be the median.

Me = 4 years.

Those. half of the workers have less than four years of experience, half more.

If the sum of the accumulated frequencies against one option is equal to half the sum of the frequencies, then the median is defined as the arithmetic average of this option and the next one.

Calculation of the mode and median in an interval variation series

The mode in the interval variation series is calculated by the formula

where X М0- initial border of the modal interval,

hm 0 is the value of the modal interval,

fm 0 , fm 0-1 , fm 0+1 - the frequency of the modal interval, respectively, preceding the modal and subsequent.

Modal The interval with the highest frequency is called.

Example 1

Groups by experience

Number of workers, people

Accumulated Frequencies

Determine the mode and median.

Modal interval, because it corresponds to the highest frequency f = 35. Then:

Hm 0 =6, fm 0 =35

In addition to power-law averages in statistics, for a relative characteristic of the magnitude of a varying attribute and the internal structure of distribution series, structural averages are used, which are mainly represented by mode and median.

Fashion- This is the most common variant of the series. Fashion is used, for example, in determining the size of clothes, shoes, which are in greatest demand among buyers. The mode for a discrete series is the variant with the highest frequency. When calculating the mode for the interval variation series, you must first determine the modal interval (by the maximum frequency), and then the value of the modal value of the attribute according to the formula:

Median - this is the value of the feature that underlies the ranked series and divides this series into two parts equal in number.

To determine the median in a discrete series in the presence of frequencies, the half-sum of frequencies is first calculated, and then it is determined what value of the variant falls on it. (If the sorted row contains an odd number of features, then the median number is calculated by the formula:

M e \u003d (n (number of features in the aggregate) + 1) / 2,

in the case of an even number of features, the median will be equal to the average of the two features in the middle of the row).

When calculating the median for interval variation series first determine the median interval within which the median is located, and then the value of the median according to the formula:

Example. Find the mode and median.

Decision:
In this example, the modal interval is within the age group of 25-30 years, since this interval accounts for the highest frequency (1054).

Let's calculate the mode value:

This means that the modal age of students is 27 years.

Let's calculate the median. The median interval is in the age group of 25-30 years, since within this interval there is a variant that divides the population into two equal parts (Σf i /2 = 3462/2 = 1731). Next, we substitute the necessary numerical data into the formula and get the value of the median:

This means that one half of the students are under 27.4 years old, and the other half are over 27.4 years old.

In addition to the mode and median, indicators such as quartiles dividing the ranked series into 4 equal parts, deciles - 10 parts and percentiles - into 100 parts can be used.

Basic concepts

For the experimental data obtained from the sample, one can calculate the series numerical characteristics (measures).

Mode is the numerical value that occurs most frequently in the sample. Fashion is sometimes referred to as Mo.

For example, in the value series (2 6 6 8 9 9 9 10), the mode is 9 because 9 occurs more often than any other number.

The mode is the most frequently occurring value (in this example, 9) and not the frequency of occurrence of that value (in this example, 3).

Fashion is found according to the rules

1. In the case when all values ​​in the sample occur equally often, it is considered that this sample series has no mode.

For example, 556677 - there is no fashion in this selection.

2. When two neighboring (adjacent) values ​​have the same frequency and their frequency is greater than the frequencies of any other values, the mode is calculated as the arithmetic mean of these two values.

For example, in sample 1 2 2 2 5 5 5 6, the frequencies of adjacent values ​​2 and 5 are the same and equal to 3. This frequency is greater than the frequency of other values ​​1 and 6 (which have it equal to 1).

Therefore, the mode of this series will be .

3) If two non-adjacent (not adjacent) values ​​in the sample have equal frequencies that are greater than the frequencies of any other value, then two modes are distinguished. For example, in the series 10 11 11 11 12 13 14 14 14 17, the modes are 11 and 14. In this case, the sample is said to be bimodal.

There may also be so-called multimodal distributions that have more than two vertices (modes)

4) If the mode is estimated from a set of grouped data, then to find the mode, it is necessary to determine the group with the highest frequency of the feature. This group is called modal group.

Median - denoted Me and is defined as the value in relation to which at least 50% of the sample value is less than it and at least 50% is more.

The median is the value that divides an ordered set of data in half.

Task 1. Find the median of the sample 9 3 5 8 4 11 13

Solution First, let's sort the sample according to the values ​​included in it. We get, 3 4 5 8 9 11 13. Since there are seven elements in the sample, the fourth element in order will have a value greater than the first three and less than the last three. So the median will be the fourth element - 8

Task 2. Find the median of the sample 20, 9, 13, 1, 4, 11.

Let's order the sample 1, 4, 9, 11, 13, 20 Since there is an even number of elements here, there are two "middles" - 9 and 13 In this case, the median is defined as the arithmetic mean of these values

Average


The arithmetic mean of a series of n numerical values ​​is calculated as

To show the deceitfulness of this indicator, let's give a well-known example: a 60-year-old grandmother with four grandchildren fit in one carriage compartment: one - 4 years old, two - 5 years old and one - 6 years old. The arithmetic average age of all passengers in this compartment is 80/5 = 16. In another compartment there is a company of young people: two are 15-year-olds, one is 16-year-olds and two are 17-year-olds. The average age of the passengers of this compartment is also equal to 80/5 = 16. Thus, the passengers of these compartments do not differ in arithmetic averages. But if we turn to the standard deviation indicator, it turns out that the average spread relative to the average age in the first case will be 24.6, and in the second case 1.

In addition, the average turns out to be quite sensitive to very small or very large values ​​that differ from the main values ​​of the measured characteristics. Let 9 people have an income of 4500 to 5200 thousand dollars a month. Their average income is $4,900. If we add a person with an income of $20,000 a month to this group, then the average of the entire group will shift and be equal to $6,410, although no one from the entire sample (except one person) really receives such an amount.

It is clear that a similar bias, but in the opposite direction, can also be obtained if a person with a very small annual income is added to this group.

Sample scatter

Scatter ( on a grand scale) samples- the difference between the maximum and minimum values ​​of this particular variation series. Designated with the letter R.

Range = maximum value - minimum value

It is clear that the more the measured trait varies, the greater the value of R, and vice versa.

However, it may happen that two sample series have the same mean and range, but the nature of the variation of these series will be different. For example, given two samples

Dispersion

Dispersion is the most commonly used measure of the dispersion of a random variable (variable).

Dispersion - the arithmetic mean of the squares of the deviations of the values ​​of a variable from its mean value

Along with the average values, structural averages are calculated as statistical characteristics of the variational distribution series - fashion and median.
Fashion(Mo) represents the value of the studied feature, repeated with the highest frequency, i.e. mode is the value of the feature that occurs most often.
median(Me) is the value of the feature that falls in the middle of the ranked (ordered) population, i.e. median - the central value of the variation series.
The main property of the median is that the sum of the absolute deviations of the attribute values ​​from the median is less than from any other value ∑|x i - Me|=min.

Determining Mode and Median from Ungrouped Data

Consider determination of mode and median from ungrouped data. Let's assume that the work crews, consisting of 9 people, have the following wage categories: 4 3 4 5 3 3 6 2 6 . Since this team has the most workers of the 3rd category, this tariff category will be modal. Mo = 3.
To determine the median, it is necessary to rank: 2 3 3 3 4 4 5 6 6 . Central in this series is the worker of the 4th category, therefore, this category will be the median. If the ranked series includes an even number of units, then the median is defined as the average of the two central values.
If the mode reflects the most common variant of the value of the attribute, then the median practically performs the functions of an average for a heterogeneous population that does not obey the normal distribution law. Let us illustrate its cognitive significance with the following example.
Suppose we need to characterize the average income of a group of people numbering 100 people, of which 99 have incomes in the range from $100 to $200 per month, and the monthly income of the latter is $50,000 (Table 1).
Table 1 - Monthly incomes of the studied group of people. If we use the arithmetic mean, we get an average income of about 600 - 700 dollars, which has little in common with the income of the main part of the group. The median, in this case equal to Me = 163 dollars, will allow us to give an objective description of the income level of 99% of this group of people.
Consider the definition of mode and median by grouped data (distribution series).
Suppose the distribution of workers of the entire enterprise as a whole according to the tariff category has the following form (Table 2).
Table 2 - Distribution of workers of the enterprise according to the tariff category

Calculation of mode and median for a discrete series

Calculation of mode and median for an interval series

Calculation of mode and median for a variation series

Determining the Mode from a Discrete Variation Series

The series of feature values ​​built earlier, sorted by value, is used. If the sample size is odd, take the center value; if the sample size is even, we take the arithmetic mean of the two central values.
Determining the Mode from a Discrete Variation Series: the 5th tariff category has the highest frequency (60 people), therefore, it is modal. Mo = 5.
To determine the median value of the attribute, the number of the median unit of the series (N Me) is found using the following formula: , where n is the volume of the population.
In our case: .
The resulting fractional value, which always occurs with an even number of population units, indicates that the exact middle is between 95 and 96 workers. It is necessary to determine which group the workers with these serial numbers belong to. This can be done by calculating the accumulated frequencies. There are no workers with these numbers in the first group, where there are only 12 people, and they are not in the second group (12+48=60). The 95th and 96th workers are in the third group (12+48+56=116), therefore, the 4th wage category is the median.

Calculation of mode and median in an interval series

Unlike discrete variational series, the determination of the mode and median from interval series requires certain calculations based on the following formulas:
, (5.6)
where x0- the lower limit of the modal interval (the interval with the highest frequency is called modal);
i is the value of the modal interval;
fMo is the frequency of the modal interval;
f Mo-1 is the frequency of the interval preceding the modal;
f Mo +1 is the frequency of the interval following the modal.
(5.7)
where x0– the lower limit of the median interval (the median is the first interval, the accumulated frequency of which exceeds half of the total sum of frequencies);
i is the value of the median interval;
S Me-1- accumulated interval preceding the median;
f Me is the frequency of the median interval.
We illustrate the application of these formulas using the data in Table. 3.
The interval with boundaries 60 - 80 in this distribution will be modal, because it has the highest frequency. Using formula (5.6), we determine the mode:

To establish the median interval, it is necessary to determine the accumulated frequency of each subsequent interval until it exceeds half the sum of the accumulated frequencies (in our case, 50%) (Table 5.11).
It was found that the median is the interval with the boundaries of 100 - 120 thousand rubles. We now define the median:

Table 3 - Distribution of the population of the Russian Federation by the level of average per capita nominal cash income in March 1994
Groups by level of average per capita monthly income, thousand rublesShare of the population, %
up to 201,4
20 – 40 7,5
40 – 60 11,9
60 – 80 12,7
80 – 100 11,7
100 – 120 10,0
120 – 140 8,3
140 –160 6,8
160 – 180 5,5
180 – 200 4,4
200 – 220 3,5
220 – 240 2,9
240 – 260 2,3
260 – 280 1,9
280 – 300 1,5
Over 3007,7
Total100,0

Table 4 - Definition of the median interval
Thus, the arithmetic mean, mode, and median can be used as a generalized characteristic of the values ​​of a certain attribute for units of a ranked population.
The main characteristic of the distribution center is the arithmetic mean, which is characterized by the fact that all deviations from it (positive and negative) add up to zero. It is typical for the median that the sum of deviations from it in modulus is minimal, and the mode is the value of the feature that occurs most often.
The ratio of the mode, median and arithmetic mean indicates the nature of the distribution of the trait in the aggregate, allows us to assess its asymmetry. In symmetrical distributions, all three characteristics are the same. The greater the discrepancy between the mode and the arithmetic mean, the more asymmetric the series. For moderately skewed series, the difference between the mode and the arithmetic mean is approximately three times the difference between the median and the mean, i.e.:
|Mo–`x| = 3 |Me –`x|.

Determination of the mode and median by a graphical method

Mode and median in an interval series can be determined graphically. The mode is determined from the histogram of the distribution. To do this, the tallest rectangle is selected, which in this case is modal. Then we connect the right vertex of the modal rectangle with the upper right corner of the previous rectangle. And the left vertex of the modal rectangle is with the upper left corner of the subsequent rectangle. From the point of their intersection, we lower the perpendicular to the abscissa axis. The abscissa of the point of intersection of these lines will be the distribution mode (Fig. 5.3).


Rice. 5.3. Graphical definition of fashion by histogram.


Rice. 5.4. Graphical determination of the median by cumulate
To determine the median from a point on the scale of accumulated frequencies (frequencies) corresponding to 50%, a straight line is drawn parallel to the abscissa axis to the intersection with the cumulate. Then, from the point of intersection, a perpendicular is lowered to the abscissa axis. The abscissa of the intersection point is the median.

Quartiles, Deciles, Percentiles

Similarly, with finding the median in the variational series of distribution, you can find the value of a feature for any unit of the ranked series in order. So, for example, you can find the value of a feature in units that divide the series into four equal parts, into 10 or 100 parts. These values ​​are called "quartiles", "deciles", "percentiles".
Quartiles are the value of a feature that divides the ranged population into 4 equal parts.
A distinction is made between the lower quartile (Q 1), which separates ¼ of the population with the lowest values ​​of the attribute, and the upper quartile (Q 3), which cuts off ¼ of the population with the highest values ​​of the attribute. This means that 25% of the population units will be less than Q 1 ; 25% units will be enclosed between Q 1 and Q 2 ; 25% - between Q 2 and Q 3, and the remaining 25% are superior to Q 3. The middle quartile of Q 2 is the median.
To calculate the quartiles by the interval variation series, the following formulas are used:
, ,
where x Q 1– the lower limit of the interval containing the lower quartile (the interval is determined by the accumulated frequency, the first exceeding 25%);
x Q 3– the lower limit of the interval containing the upper quartile (the interval is determined by the accumulated frequency, the first exceeding 75%);
i– interval value;
S Q 1-1 is the cumulative frequency of the interval preceding the interval containing the lower quartile;
S Q 3-1 is the cumulative frequency of the interval preceding the interval containing the upper quartile;
f Q 1 is the frequency of the interval containing the lower quartile;
f Q 3 is the frequency of the interval containing the upper quartile.
Consider the calculation of the lower and upper quartiles according to Table. 5.10. The lower quartile is in the range 60 - 80, the cumulative frequency of which is 33.5%. The upper quartile lies in the range 160 - 180 with an accumulated frequency of 75.8%. With this in mind, we get:
,
.
In addition to quartiles, deciles can be determined in the variational distribution ranks - options that divide the ranked variational series into ten equal parts. The first decile (d 1) divides the population 1/10 to 9/10, the second decile (d 1) 2/10 to 8/10, and so on.
They are calculated according to the formulas:
, .
Feature values ​​that divide the series into one hundred parts are called percentiles. The ratios of the median, quartiles, deciles and percentiles are shown in Fig. 5.5.

Read also: