false

Class Notes
(839,315)

United States
(325,922)

East Carolina University
(550)

Biostatistics
(11)

BIOS 1500
(11)

Kevin O'brien
(11)

Lecture 5

School

East Carolina University
Department

Biostatistics

Course Code

BIOS 1500

Professor

Kevin O'brien

Description

Numerical Descriptive
Statistical Measures
BIOS 1500
Spring 2017
Book Chapter 1
Purpose
These measures are used to ‘characterize’the distribution of a variable under study.
Some measure central tendency or location, others variability, and others shape.
They rely on the type of variable or measurement) one is analyzing: Nominal, Ordinal,
Interval, or Ratio.
Measures of Frequency
The simple frequency with which each value of a variable occurs in the sample provides a
basic summary of the distribution.
This basic summary is often presented with relative frequencies or proportions (frequency
divided by the number of observations.
The frequency and relative frequency are the basic elements of a ‘frequency distribution’.
Outlier
An outlier is a value in the sample that lies outside the pattern containing most of the
observed values. Can there be an outlier for a nominal variable?
Outliers can occur in either tail of a distribution and can be referred to as extreme values.
However, the term outlier is descriptive as it captures the idea of being outside the typical
pattern or ‘main body’of values.
Outlier
One of the reasons for engaging in a descriptive statistical analysis is to identify outliers.
Another is to locate gaps in the distribution.Areas of low or zero frequency values.
And of course to describe high frequency areas, variability, central tendency, and shape
Describing Distributions with Numbers
One approach we have seen to describing a distribution is that of specifying the frequency
and relative frequency of each value and giving certain quantiles or percentiles of that
distribution.
Now we will expand our tool kit, though many of you may already know of the arithmetic
mean, the median and the mode as measures of center and variance, standard deviation,
range as measures of variability, and, skewness and kurtosis as measures of shape.
Measures of Central Tendency
Recall that the purpose here is to describe and possibly present a typical value or ‘location’
of where a large number of the values ‘cluster’.
Measures of center or location are most relevant for a unimodal distribution.
Often used measures are theArithmetic Mean, the Median and the Mode.
1
Saturday, April 29, 2017 KOB Sample Values
Suppose we have a sample of values for some variable Y. Acommon abstract way to show
a sample of size n i1: 2y , yn, …, y }.
Example: we have Y as weight in lbs, and a sample of n=5: {173, 110, 187, 210, 168}.
Arithmetic Mean
Usually just referred to as the mean. This measure of central tendency is obtained by taking
the set of values for a variable, adding them and then dividing the sum by the number of
values.
An often seen formula:
1 n
Y y
n i1 i
The is a symbol for summation.
Example
Using the 5 weights in lbs {173, 110, 187, 210, 168} the mean weight is:
y 13110187 210168 /5
= 848/5 = 169.6 lbs.
Arithmetic Mean
The mean is a commonly used measure of centrality. It uses all of the data.
The mean will be influenced by outliers to one side (negative or positive). Large positive
values will drag the mean out toward the right hand tail. Large negative values will drag it
toward the left tail.
In very skewed distributions the mean may not be the best measure of central tendency.
Median
The median is the 50 percentile or second quartile. Half of the values are smaller and half
the values are larger. It is the middle value of the sample when the values are laid out from
smallest to largest.
To obtain the median the values are listed in ascending magnitude and ranked 1 to n.
If n is odd then the median is the value with the rank (n+1)/2.
If n is even then the median is the average of the values with ranks n/2 and (n+1)/2.
2
Saturday, April 29, 2017 KOB Example Median
{173, 110, 187, 210, 168} is our sample of weights.
Median Example
Value Rank
110 1
168 2
173 3
187 4
210 5
Here n is odd so the median is the value with rank (5+1)/2 = 3. The value with that rank is
173 lbs, and is the median.
Median
The median relies only on the middle one or two values, and the ranks of all the values.
Because of this it is not overly influence by an extremely large value or one that is extremely
small.
If you replace the 210 lb value with 1000 lbs the mean would shift to 327.6 lbs while the
median is still 173 lbs.
As such the median may be preferred measure of central tendency in highly skewed
distributions.
It is rare that persons use the median in their data analysis or report it in journals.
Mode
The Mode is the most frequently occurring value for the variable in the sample.
Note that this may make it a great measure of central tendency. Neither the mean nor the
median may be observed values in the sample, but the mode always will be an actual
observed value. So it has a ‘leg up’on description especially in unimodal situations.
However it is a ‘large’sample statistic. In small samples a mode may not exist.
Again it is rare that you see it used in research or scientific publications.
Applications
Type Mean Median Mode
Nominal no no yes
Ordinal ok yes yes
Interval yes yes yes
Ratio yes yes yes
Measures of Variability
The concepts of variability and spread attempt to capture how different the set of values are,
and how spread out the distribution may be.
3
Saturday, April 29, 2017 KOB All measures of variability require interval or ratio data as they are based on numerical
differences.
Range
The range is the difference between the maximum value and the minimum value in the
sample.
The range clearly gives us some idea of the spread of the distribution.
The value of the range is greatly influenced by extreme values.
Example
{173, 110, 187, 210, 168} is our sample of weights. The maximum value is 210 lbs and the
minimum is 110 lbs.
The range is: 210 lbs – 110 lbs = 100 lbs
The range is 110 lbs, but it is common practice to give the min and max values and call them
the range. That practice is incorrect but omnipresent.
Inter Quartile Range
The Inter Quartile Range (IQR) is defined as the difference between the 75 and 25 th th
percentiles or the third and first quartile.
The value of the IQR has an advantage of not being susceptible to outliers or extreme
values.
Example
Weights in lbs Rank
110 1
133 2
137 3
143 4
145

More
Less
Unlock Document

Related notes for BIOS 1500

Only pages 1,2 and half of page 3 are available for preview. Some parts have been intentionally blurred.

Unlock DocumentJoin OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.