Class Notes (839,315)
United States (325,922)
BIOS 1500 (11)
Lecture 5

# BIOS 1500 Lecture 5: Lecture_5_Descriptive Statistical Measures_2017

9 Pages
88 Views

Department
Biostatistics
Course Code
BIOS 1500
Professor
Kevin O'brien

This preview shows pages 1,2 and half of page 3. Sign up to view the full 9 pages of the document.
Description
Numerical Descriptive Statistical Measures BIOS 1500 Spring 2017 Book Chapter 1 Purpose These measures are used to ‘characterize’the distribution of a variable under study. Some measure central tendency or location, others variability, and others shape. They rely on the type of variable or measurement) one is analyzing: Nominal, Ordinal, Interval, or Ratio. Measures of Frequency The simple frequency with which each value of a variable occurs in the sample provides a basic summary of the distribution. This basic summary is often presented with relative frequencies or proportions (frequency divided by the number of observations. The frequency and relative frequency are the basic elements of a ‘frequency distribution’. Outlier An outlier is a value in the sample that lies outside the pattern containing most of the observed values. Can there be an outlier for a nominal variable? Outliers can occur in either tail of a distribution and can be referred to as extreme values. However, the term outlier is descriptive as it captures the idea of being outside the typical pattern or ‘main body’of values. Outlier One of the reasons for engaging in a descriptive statistical analysis is to identify outliers. Another is to locate gaps in the distribution.Areas of low or zero frequency values. And of course to describe high frequency areas, variability, central tendency, and shape Describing Distributions with Numbers One approach we have seen to describing a distribution is that of specifying the frequency and relative frequency of each value and giving certain quantiles or percentiles of that distribution. Now we will expand our tool kit, though many of you may already know of the arithmetic mean, the median and the mode as measures of center and variance, standard deviation, range as measures of variability, and, skewness and kurtosis as measures of shape. Measures of Central Tendency Recall that the purpose here is to describe and possibly present a typical value or ‘location’ of where a large number of the values ‘cluster’. Measures of center or location are most relevant for a unimodal distribution. Often used measures are theArithmetic Mean, the Median and the Mode. 1 Saturday, April 29, 2017 KOB Sample Values Suppose we have a sample of values for some variable Y. Acommon abstract way to show a sample of size n i1: 2y , yn, …, y }. Example: we have Y as weight in lbs, and a sample of n=5: {173, 110, 187, 210, 168}. Arithmetic Mean Usually just referred to as the mean. This measure of central tendency is obtained by taking the set of values for a variable, adding them and then dividing the sum by the number of values. An often seen formula: 1 n Y  y n  i1 i The is a symbol for summation. Example Using the 5 weights in lbs {173, 110, 187, 210, 168} the mean weight is: y  13110187 210168 /5  = 848/5 = 169.6 lbs. Arithmetic Mean The mean is a commonly used measure of centrality. It uses all of the data. The mean will be influenced by outliers to one side (negative or positive). Large positive values will drag the mean out toward the right hand tail. Large negative values will drag it toward the left tail. In very skewed distributions the mean may not be the best measure of central tendency. Median The median is the 50 percentile or second quartile. Half of the values are smaller and half the values are larger. It is the middle value of the sample when the values are laid out from smallest to largest. To obtain the median the values are listed in ascending magnitude and ranked 1 to n. If n is odd then the median is the value with the rank (n+1)/2. If n is even then the median is the average of the values with ranks n/2 and (n+1)/2. 2 Saturday, April 29, 2017 KOB Example Median {173, 110, 187, 210, 168} is our sample of weights. Median Example Value Rank 110 1 168 2 173 3 187 4 210 5 Here n is odd so the median is the value with rank (5+1)/2 = 3. The value with that rank is 173 lbs, and is the median. Median The median relies only on the middle one or two values, and the ranks of all the values. Because of this it is not overly influence by an extremely large value or one that is extremely small. If you replace the 210 lb value with 1000 lbs the mean would shift to 327.6 lbs while the median is still 173 lbs. As such the median may be preferred measure of central tendency in highly skewed distributions. It is rare that persons use the median in their data analysis or report it in journals. Mode The Mode is the most frequently occurring value for the variable in the sample. Note that this may make it a great measure of central tendency. Neither the mean nor the median may be observed values in the sample, but the mode always will be an actual observed value. So it has a ‘leg up’on description especially in unimodal situations. However it is a ‘large’sample statistic. In small samples a mode may not exist. Again it is rare that you see it used in research or scientific publications. Applications Type Mean Median Mode Nominal no no yes Ordinal ok yes yes Interval yes yes yes Ratio yes yes yes Measures of Variability The concepts of variability and spread attempt to capture how different the set of values are, and how spread out the distribution may be. 3 Saturday, April 29, 2017 KOB All measures of variability require interval or ratio data as they are based on numerical differences. Range The range is the difference between the maximum value and the minimum value in the sample. The range clearly gives us some idea of the spread of the distribution. The value of the range is greatly influenced by extreme values. Example {173, 110, 187, 210, 168} is our sample of weights. The maximum value is 210 lbs and the minimum is 110 lbs. The range is: 210 lbs – 110 lbs = 100 lbs The range is 110 lbs, but it is common practice to give the min and max values and call them the range. That practice is incorrect but omnipresent. Inter Quartile Range The Inter Quartile Range (IQR) is defined as the difference between the 75 and 25 th th percentiles or the third and first quartile. The value of the IQR has an advantage of not being susceptible to outliers or extreme values. Example Weights in lbs Rank 110 1 133 2 137 3 143 4 145
More Less

Only pages 1,2 and half of page 3 are available for preview. Some parts have been intentionally blurred.

Unlock Document

Unlock to view full version

Unlock Document
Me

OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.