# Section2.3revised.doc

Computer Science
Computer Science 1032A/B
Vicki Olds

Measuring The Spread (measures of variation) Example We are to compare four factories that are producing 600-ml bottles of water. We need to determine which factory is doing the best job. We take a sample of 5 bottles of water from a selected hour for each factory. The actual amount of water poured in the 600-ml bottle is given below: Factory 1 598 602 600 597 603 Factory 2 598 601 599 595 597 Factory 3 604 591 598 605 592 Factory 4 603 595 607 592 603 How do you determine which factory is doing the best job of filling the bottles Main Measures of Spread 1. Range = highest observation − lowest observation • not very useful; ignores bulk of the observations 2. Standard Deviation • this is the most commonly used measure of spread • it is a measure of the variation of the data about its mean (it looks at how far the observations are from their average value) Formula We first calculate what is called the variance, s , of a set of n-observations x , x , … , x 1 2 n s = 2 s = ∑ (xi− x ) n −1 The standard deviation is the square root of the variance, s = s2 Shortcut Formula – a bit easier to use when doing calculations by hand ( x)2 2 ∑ xi− ∑ i s = n n −1 Notes 1. The larger the value of s, the more scattered the observations are about the mean, x (the more spread out is the distribution) 2. Standard deviation should only be used when the mean is chosen as the measure of centre 3. s = 0 can only occur when all observations are the same (no spread) • otherwise, s > 0 4. s is not a resistant measure of spread; it is influenced by outlying observatio
