Skip navigation
Nuclear Safeguards Education Portal
  

Mean and Variance of a Dataset

When we make a measurement we are trying to determine the true value of a particular characteristic of some material or process. However, the measurements that we make involve uncertainty (i.e., no measurement is perfect). Therefore, the value we acquire is not the true value but some estimate of the true value. If we measure the same sample 20 times, we typically do not expect to acquire the same measured value every time. We will get a distribution of measured values. For example, Figure 1 shows a plot of the result from 20 measurements of the same sample. We need to be able to characterize the true value of the sample (i.e., determine our best estimate of the true value) based on this distribution

Figure 1. Measured values from 20 measurements of the same sample.

Figure 1. Measured values from 20 measurements of the same sample.

If we plot the frequency of each particular measured value, then we acquire the plot shown in Figure 2. This distribution suggests that the next measurement that we make is likely to be close to 8 or 9, but there is some chance that it will be higher or lower than that. However, it is very unlikely that it will be lower than 3 or higher than 16.

Figure 2. Frequency of each measurement value occurring in the dataset of 20 measurements.

Figure 2. Frequency of each measurement value occurring in the dataset of 20 measurements.

We characterize this dataset by calculating the experimental mean ( xmean) given by

x

where xn are the measured values for the n=1,2,...N data points. In this example N was equal to 20. This experimental mean tells us that "on average" we expect the true value for x to be close to xmean. But how close would we expect the true value to be from xmean? We can begin to estimate that by looking at the variance in the dataset.

We characterize the spread in this dataset by calculating the experimental variance ( xvariance ) of the dataset given by

equat_2

The standard deviation is the square root of the variance. A large standard deviation implies that if we make another measurement, then we will have a low confidence that it will be close to the mean. A small standard deviation means that if we make another measurement, then we have a high confidence that it will be close to the mean. One measure of the size of the standard deviation is given by the relative standard deviation (S), which is the ratio of the standard deviation to the mean or

equat_3

For the dataset shown in Figure 2, the experimental mean is 8.80 and the standard deviation is 2.78. The relative standard deviation is 0.317 or 31.7%. This suggests that there is a large variation in the measured data points. If we make one more measurement, we would have a low confidence that it would be close to the mean.

How spread out this distribution is will depend on the uncertainty in the measured values and specifically uncertainties in the measurement instrument used. Assume we have a sample whose true value for the mass of the sample is 20.00 g. We make 155 measurements of this sample using an instrument that has low uncertainties. The value for each of the 155 measurements is shown in Figure 3. A frequency plot showing the distribution of these measurement values is shown in Figure 4. The experimental mean of these values is 19.986 g and the standard deviation is 1.034 g. The relative standard deviation (S) in this dataset is 5.2%. Thus, if we make another measurement of this sample, we would have a high confidence that it would be close to the mean. Note that since we know the true value of the characteristics (20.00 g), we can confirm that the experimental mean differs from the true value by only a small amount (0.014 g).

 Figure 3. Measurement values (in grams) from 155 measurements of a sample using an instrument with low measurement uncertainty.

Figure 3. Measurement values (in grams) from 155 measurements of a sample using an instrument with low measurement uncertainty.

Figure 4. Frequency of each measurement value occurring in the dataset of 155 measurements with low uncertainty.

Figure 4. Frequency of each measurement value occurring in the dataset of 155 measurements with low uncertainty.

Now assume that we make another 155 measurements of the same 20.00 g sample but this time we use an instrument that has a higher uncertainty. The value for each of the 155 measurements is shown in Figure 5 and a frequency plot showing the distribution of these measured values is shown in Figure 6. The experimental mean of these values is 19.956 g and the standard deviation is 3.102 g. The relative standard deviation is 15.5%. Thus, we expect that if we perform an additional measurement we have a lower confidence than in the previous example that it would likely be close to the experimental mean

Figure 5. Measurement values (in grams) from 155 measurements of sample using an instrument with medium measurement uncertainty.

Figure 5. Measurement values (in grams) from 155 measurements of sample using an instrument with medium measurement uncertainty.

Figure 6. Frequency of each measurement value occurring in the dataset of 155 measurements with medium uncertainty.

Figure 6. Frequency of each measurement value occurring in the dataset of 155 measurements with medium uncertainty.

The examples above show us how we could use repetitive measurements of the same sample using the same instrument to determine characteristics of the measurement system. By measuring the same sample over and over again, we can determine the expected uncertainties for the measurement instrument and the shape of the distribution of measured values.

Page 2 / 9