Sample variance in excel. Calculation of standard deviation in Microsoft Excel

Variance is a measure of dispersion that describes the comparative deviation between data values ​​and the mean. It is the most used measure of dispersion in statistics, calculated by summing and squaring the deviation of each data value from the mean. The formula for calculating variance is given below:

s 2 – sample variance;

x av—sample mean;

n sample size (number of data values),

(x i – x avg) is the deviation from the average value for each value of the data set.

To better understand the formula, let's look at an example. I don’t really like cooking, so I rarely do it. However, in order not to starve, from time to time I have to go to the stove to implement the plan of saturating my body with proteins, fats and carbohydrates. The data set below shows how many times Renat cooks every month:

The first step in calculating variance is to determine the sample mean, which in our example is 7.8 times per month. The rest of the calculations can be made easier using the following table.

The final phase of calculating variance looks like this:

For those who like to do all the calculations in one go, the equation would look like this:

Using the raw count method (cooking example)

There are more effective method calculation of variance, known as the "raw counting" method. Although the equation may seem quite cumbersome at first glance, it is actually not that scary. You can make sure of this, and then decide which method you like best.

is the sum of each data value after squaring,

is the square of the sum of all data values.

Don't lose your mind right now. Let's put this all into a table and you will see that there are fewer calculations here than in the previous example.

As you can see, the result was the same as when using the previous method. The advantages of this method become apparent as the sample size (n) increases.

Variance calculation in Excel

As you probably already guessed, Excel has a formula that allows you to calculate variance. Moreover, starting with Excel 2010, you can find 4 types of variance formula:

1) VARIANCE.V – Returns the variance of the sample. Boolean values ​​and text are ignored.

2) DISP.G - Returns the variance of the population. Boolean values ​​and text are ignored.

3) VARIANCE - Returns the variance of the sample, taking into account Boolean and text values.

4) VARIANCE - Returns the variance of the population, taking into account logical and text values.

First, let's understand the difference between a sample and a population. Purpose descriptive statistics is to summarize or display data so as to quickly obtain an overall picture, an overview, so to speak. Statistical inference allows you to make inferences about a population based on a sample of data from that population. The population represents all possible outcomes or measurements that are of interest to us. A sample is a subset of a population.

For example, we are interested in a group of students from one of the Russian universities and we need to determine the average score of the group. We can calculate the average performance of students, and then the resulting figure will be a parameter, since the whole population will be involved in our calculations. However, if we want to calculate the GPA of all students in our country, then this group will be our sample.

The difference in the formula for calculating variance between a sample and a population is the denominator. Where for the sample it will be equal to (n-1), and for the general population only n.

Now let's look at the functions for calculating variance with endings A, the description of which states that text and logical values ​​are taken into account in the calculation. In this case, when calculating the variance of a certain data array, where there are not numeric values Excel will interpret text and false Boolean values ​​as equal to 0, and true Boolean values ​​as equal to 1.

So, if you have a data array, calculating its variance will not be difficult using one of the Excel functions listed above.

Good afternoon

In this article, I decided to look at how standard deviation works in Excel using the STANDARDEVAL function. I just haven’t described or commented on it for a very long time, and also simply because this is a very useful function for those who study higher mathematics. And helping students is sacred; I know from experience how difficult it is to master. In reality, standard deviation functions can be used to determine the stability of products sold, create prices, adjust or form an assortment, and other equally useful analyzes of your sales.

Excel uses several variations of this variance function:


Mathematical theory

First, a little about the theory of how mathematical language you can describe the function standard deviation for using it in Excel, for analyzing, for example, sales statistics data, but more on that later. I warn you right away, I will write a lot of incomprehensible words...)))), if there is anything below in the text, look immediately practical use in a programme.

What exactly does standard deviation do? It estimates the standard deviation of a random variable X relative to its mathematical expectation based on an unbiased estimate of its variance. Agree, it sounds confusing, but I think students will understand what we are actually talking about!

First, we need to determine the “standard deviation”, in order to subsequently calculate the “standard deviation”, the formula will help us with this: The formula can be described as follows: it will be measured in the same units as the measurements of a random variable and is used when calculating the standard arithmetic mean error, when constructing confidence intervals, when testing hypotheses for statistics, or when analyzing a linear relationship between independent variables. The function is defined as Square root from the variance of the independent variables.

Now we can define and standard deviation is an analysis of the standard deviation of a random variable X relative to its mathematical perspective based on an unbiased estimate of its variance. The formula is written like this:
I note that all two estimates are biased. At general cases It is not possible to construct an unbiased estimate. But an estimate based on an estimate of the unbiased variance will be consistent.

Practical implementation in Excel

Well, now let’s move away from the boring theory and see in practice how the STANDARDEVAL function works. I will not consider all variations of the standard deviation function in Excel; one is enough, but in examples. As an example, let’s look at how sales stability statistics are determined.

First, look at the spelling of the function, and as you can see, it is very simple:

STANDARD DEVIATION.Г(_number1_;_number2_; ….), where:


Now let's create an example file and, based on it, consider how this function works. Since to carry out analytical calculations it is necessary to use at least three values, as in principle in any statistical analysis, I took conditionally 3 periods, this could be a year, a quarter, a month or a week. In my case - a month. For maximum reliability, I recommend taking as many a large number of periods, but not less than three. All the data in the table is very simple for clarity of operation and functionality of the formula.

First, we need to calculate the average value by month. We will use the AVERAGE function for this and get the formula: = AVERAGE(C4:E4).
Now, in fact, we can find the standard deviation using the STANDARDEVAL.G function, in the value of which we need to enter the sales of the product for each period. The result will be a formula of the following form: =STANDARD DEVIATION.Г(C4;D4;E4).
Well, half the work is done. The next step is to form “Variation”, this is obtained by dividing by the average value, standard deviation and converting the result into percentages. We get the following table:
Well, the basic calculations are completed, all that remains is to figure out whether sales are stable or not. Let us take as a condition that deviations of 10% are considered stable, from 10 to 25% these are small deviations, but anything above 25% is no longer stable. To obtain the result according to the conditions, we will use a logical one and to obtain the result we will write the formula:

IF(H4<0,1;"стабильно";ЕСЛИ(H4<0,25;"нормально";"не стабильно"))

All ranges are taken for clarity; your tasks may have completely different conditions.
To improve data visualization, when your table has thousands of positions, you should take the opportunity to apply certain conditions that you need or use to highlight certain options with a color scheme, this will be very clear.

First, select the ones for which you will apply conditional formatting. In the “Home” control panel, select “Conditional Formatting” and in the drop-down menu, select “Rules for highlighting cells” and then click the menu item “Text contains...”. A dialog box appears in which you enter your conditions.

After you have written down the conditions, for example, “stable” - green, “normal” - yellow and “unstable” - red, we get a beautiful and understandable table in which you can see what to pay attention to first.

Using VBA for the STDEV.Y function

Anyone interested can automate their calculations using macros and use the following function:

Function MyStDevP(Arr) Dim x, aCnt&, aSum#, aAver#, tmp# For Each x In Arr aSum = aSum + x "calculate the sum of the array elements aCnt = aCnt + 1 "calculate the number of elements Next x aAver = aSum / aCnt "average value For Each x In Arr tmp = tmp + (x - aAver) ^ 2 "calculate the sum of the squares of the difference between the array elements and the average value Next x MyStDevP = Sqr(tmp / aCnt) "calculate STANDARDEV.G() End Function

Function MyStDevP(Arr)

Dim x , aCnt & , aSum #, aAver#, tmp#

For Each x In Arr

aSum = aSum + x "calculate the sum of the array elements

Conducting any statistical analysis is unthinkable without calculations. In this article we will look at how to calculate variance, standard deviation, coefficient of variation and other statistical indicators in Excel.

Maximum and minimum value

Average linear deviation

The average linear deviation is the average of the absolute (modulo) deviations from in the analyzed data set. The mathematical formula is:

a– average linear deviation,

X– analyzed indicator,

– average value of the indicator,

n

In Excel this function is called SROTCL.

After selecting the SROTCL function, we indicate the data range over which the calculation should occur. Click "OK".

Dispersion

(module 111)

Perhaps not everyone knows what , so I’ll explain, it’s a measure that characterizes the spread of data around the mathematical expectation. However, usually only a sample is available, so the following variance formula is used:

s 2– sample variance calculated from observational data,

X– individual values,

– arithmetic mean for the sample,

n– the number of values ​​in the analyzed data set.

The corresponding Excel function is DISP.G. When analyzing relatively small samples (up to about 30 observations), you should use , which is calculated using the following formula.

The difference, as you can see, is only in the denominator. Excel has a function for calculating sample unbiased variance DISP.B.

Select the desired option (general or selective), indicate the range, and click the “OK” button. The resulting value may be very large due to the preliminary squaring of the deviations. Dispersion in statistics is a very important indicator, but it is usually used not in its pure form, but for further calculations.

Standard deviation

The standard deviation (RMS) is the root of the variance. This indicator is also called standard deviation and is calculated using the formula:

by general population

by sample

You can simply take the root of the variance, but Excel has ready-made functions for standard deviation: STDEV.G And STDEV.V(for the general and sample populations, respectively).

Standard and standard deviation, I repeat, are synonyms.

Next, as usual, indicate the desired range and click on “OK”. The standard deviation has the same units of measurement as the analyzed indicator, and therefore is comparable to the original data. More on this below.

The coefficient of variation

All indicators discussed above are tied to the scale of the source data and do not allow one to obtain a figurative idea of ​​the variation of the analyzed population. To obtain a relative measure of data dispersion, use the coefficient of variation, which is calculated by dividing standard deviation on average. The formula for the coefficient of variation is simple:

There is no ready-made function for calculating the coefficient of variation in Excel, which is not a big problem. The calculation can be made by simply dividing the standard deviation by the mean. To do this, write in the formula bar:

STANDARDDEVIATION.G()/AVERAGE()

The data range is indicated in parentheses. If necessary, use the sample standard deviation (STDEV.B).

The coefficient of variation is usually expressed as a percentage, so you can frame a cell with a formula in a percentage format. The required button is located on the ribbon on the “Home” tab:

You can also change the format by selecting from the context menu after highlighting the desired cell and right-clicking.

The coefficient of variation, unlike other indicators of the scatter of values, is used as an independent and very informative indicator of data variation. In statistics, it is generally accepted that if the coefficient of variation is less than 33%, then the data set is homogeneous, if more than 33%, then it is heterogeneous. This information can be useful for preliminary characterization of the data and for identifying opportunities for further analysis. In addition, the coefficient of variation, measured as a percentage, allows you to compare the degree of scatter of different data, regardless of their scale and units of measurement. Useful property.

Oscillation coefficient

Another indicator of data dispersion today is the oscillation coefficient. This is the ratio of the range of variation (the difference between the maximum and minimum values) to the average. There is no ready-made Excel formula, so you will have to combine three functions: MAX, MIN, AVERAGE.

The coefficient of oscillation shows the extent of the variation relative to the average, which can also be used to compare different data sets.

In general, using Excel, many statistical indicators are calculated very simply. If something is not clear, you can always use the search box in the function insert. Well, Google is here to help.

Now I suggest you watch the video tutorial.

The coefficient of variation is a comparison of the dispersion of two random values. Quantities have units of measurement, which leads to a comparable result. This coefficient is needed to prepare statistical analysis.

With it, investors can calculate risk indicators before making investments in selected assets. It is useful when the selected assets have different returns and degrees of risk. For example, one asset may have a high income and a high degree of risk, while another, on the contrary, may have a low income and a correspondingly lower degree of risk.

Standard Deviation Calculation

Standard deviation is a statistical value. By calculating this value, the user will receive information about how much the data deviates in one direction or another relative to the average value. Standard deviation in Excel is calculated in several steps.

Prepare data: open the page where the calculations will take place. In our case, this is a picture, but it could be any other file. The main thing is to collect the information that you will use in the table for the calculation.

Enter data into any spreadsheet editor (in our case Excel), filling out the cells from left to right. Should start from column "A". Enter headings in the line at the top, and names in the same columns that relate to headings, only below. Then the date and data to be calculated to the right of the date.

Save this document.

Now let's move on to the calculation itself. Select a cell with the cursor after the last value entered below.

Enter the “=” sign and enter the formula below. The equal sign is required. Otherwise, the program will not calculate the proposed data. The formula is entered without spaces.

The utility will display the names of several formulas. Select " STANDARD DEVIATION" This is the formula for calculating standard deviation. There are two types of calculation:

  • with sample calculation;
  • with calculation based on the general population.

By selecting one of them, indicate the data range. The entire entered formula will look like this: “=STDEV (B2: B5)”.

Then click on the button “ Enter" The received data will appear in the marked item.

Calculation of the arithmetic mean

Calculated when the user needs to create a report, for example, on wages in his company. This is done as follows:


  • there will only be select range and click on the “Enter” button. And the cell will now display the result from the data taken above.

Calculation of coefficient of variation

Formula for calculating the coefficient of variation:

V= S/X, where S is the standard deviation and X is the average.

In order to calculate the coefficient of variation in Excel, you need to find the standard deviation and the arithmetic mean. That is, having completed the first two calculations, which were shown above, you can move on to working on the coefficient of variation.

To do this, open Excel, fill in two fields where you should enter the resulting numbers of standard deviation and average value.

Now select the cell that is allocated for the number to calculate the variation. Open the tab " home"if it is not open. Click on the tool " Number" Select percentage format.

Go to the marked cell and double-click on it. Then enter the equal sign and highlight the item where the total standard deviation is entered. Then click on the “slash” or “split” button on your keyboard (looks like this: “/”). Select the item, where the arithmetic mean is entered, and click on the “Enter” button. It should look like this:

And here is the result after pressing “Enter”:

You can also use online calculators to calculate the coefficient of variation, for example planetcalc.ru and allcalc.ru. It is enough to enter the necessary numbers and start the calculation, after which you will receive the necessary information.

Standard deviation

Standard deviation in Excel is solved using two formulas:

In simple words, the root of the variance is extracted. How to calculate variance is discussed below.

The standard deviation is synonymous with the standard deviation and is calculated exactly as well. The cell for the result under the numbers that need to be calculated is highlighted. One of the functions shown in the figure above is inserted. The button “ is clicked Enter" The result has been received.

Oscillation coefficient

The ratio of the range of variation to the average is called the oscillation coefficient. There are no ready-made formulas in Excel, so need to be assembled several functions into one.

The functions that need to be put together are the average, maximum and minimum formulas. This coefficient is used to compare a set of data.

Dispersion

Variance is a function by which characterize the spread of data around mathematical expectation. Calculated using the following equation:

The variables take the following values:

Excel has two functions that determine variance:


To make a calculation, a cell is highlighted under the numbers that need to be calculated. Go to the insert function tab. Select category " Statistical" Select one of the functions from the drop-down list and click on the “Enter” button.

Maximum and minimum

Maximum and minimum are needed so as not to manually search among a large number of numbers for the minimum or maximum number.

To calculate the maximum, select the entire range required numbers in the table and a separate cell, then click on the “Σ” or “ Autosum" In the window that appears, select “Maximum” and by pressing the “Enter” button you get the desired value.

You do the same thing to get the minimum. Just select the “Minimum” function.

In this article I will talk about how to find standard deviation. This material is extremely important for a full understanding of mathematics, so a math tutor should devote a separate lesson or even several to studying it. In this article you will find a link to a detailed and understandable video tutorial that explains what standard deviation is and how to find it.

Standard deviation makes it possible to evaluate the spread of values ​​obtained as a result of measuring a certain parameter. Indicated by the symbol (Greek letter "sigma").

The formula for calculation is quite simple. To find the standard deviation, you need to take the square root of the variance. So now you have to ask, “What is variance?”

What is variance

The definition of variance goes like this. Dispersion is the arithmetic mean of the squared deviations of values ​​from the mean.

To find the variance, perform the following calculations sequentially:

  • Determine the average (simple arithmetic average of a series of values).
  • Then subtract the average from each value and square the resulting difference (you get squared difference).
  • The next step is to calculate the arithmetic mean of the resulting squared differences (You can find out why exactly the squares below).

Let's look at an example. Let's say you and your friends decide to measure the height of your dogs (in millimeters). As a result of the measurements, you received the following height measurements (at the withers): 600 mm, 470 mm, 170 mm, 430 mm and 300 mm.

Let's calculate the mean, variance and standard deviation.

First let's find the average value. As you already know, to do this you need to add up all the measured values ​​and divide by the number of measurements. Calculation progress:

Average mm.

So, the average (arithmetic mean) is 394 mm.

Now we need to determine deviation of the height of each dog from the average:

Finally, to calculate variance, we square each of the resulting differences, and then find the arithmetic mean of the results obtained:

Dispersion mm 2 .

Thus, the dispersion is 21704 mm 2.

How to find standard deviation

So how can we now calculate the standard deviation, knowing the variance? As we remember, take the square root of it. That is, the standard deviation is equal to:

Mm (rounded to the nearest whole number in mm).

Using this method, we found that some dogs (for example, Rottweilers) are very large dogs. But there are also very small dogs (for example, dachshunds, but you shouldn’t tell them that).

The most interesting thing is that the standard deviation carries useful information. Now we can show which of the obtained height measurement results are within the interval that we get if we plot the standard deviation from the average (to both sides of it).

That is, using the standard deviation, we obtain a “standard” method that allows us to find out which of the values ​​is normal (statistical average), and which is extraordinarily large or, conversely, small.

What is standard deviation

But... everything will be a little different if we analyze sample data. In our example we considered general population. That is, our 5 dogs were the only dogs in the world that interested us.

But if the data is a sample (values ​​selected from a large population), then the calculations need to be done differently.

If there are values, then:

All other calculations are carried out similarly, including the determination of the average.

For example, if our five dogs are just a sample of the population of dogs (all dogs on the planet), we must divide by 4, not 5, namely:

Sample variance = mm 2.

In this case, the standard deviation for the sample is equal to mm (rounded to the nearest whole number).

We can say that we have made some “correction” in the case where our values ​​are just a small sample.

Note. Why exactly squared differences?

But why do we take exactly the squared differences when calculating the variance? Let's say when measuring some parameter, you received the following set of values: 4; 4; -4; -4. If we simply add the absolute deviations from the mean (differences) together... the negative values ​​cancel out with the positive ones:

.

It turns out that this option is useless. Then maybe it’s worth trying the absolute values ​​of the deviations (that is, the modules of these values)?

At first glance, it turns out well (the resulting value, by the way, is called the mean absolute deviation), but not in all cases. Let's try another example. Let the measurement result in the following set of values: 7; 1; -6; -2. Then the average absolute deviation is:

Wow! Again we got a result of 4, although the differences have a much larger spread.

Now let's see what happens if we square the differences (and then take the square root of their sum).

For the first example it will be:

.

For the second example it will be:

Now it’s a completely different matter! The greater the spread of the differences, the greater the standard deviation...which is what we were aiming for.

In fact, this method uses the same idea as when calculating the distance between points, only applied in a different way.

And from a mathematical point of view, using squares and square roots provides more benefits than we could get from absolute deviation values, making standard deviation applicable to other mathematical problems.

Sergey Valerievich told you how to find the standard deviation



Read also: