Statistics 9/9/2013 10:59:00 AM
: Polls, Studies, surveys & other data collecting tools collect data from a
small part of a larger group so that we can learn something about the larger
This is a common & important goal of stat; Learn about a large group.
Data- Collections of observations (such as measurements, genders, survey
Statistics- science of planning studies and experiments, obtaining data, and
then organizing, summarizing, presenting, analyzing, interpreting, and
drawing conclusions based on data.
Population- the complete collection of all individuals (scores people,
measurements) to be studied; the collection is complete in sense that it
includes all of individuals being studied
Census- collection of data from every member of population *all students at
ODU taking STAT 130 (971 students)
Sample- subcollection of members selected from a population *a student in
Ms. Hinton‟s 130 class (200 students)
Context: What do the values represent?
Where did the data come from?
Why were they connected?
-An understanding of context will directly affect the statistical procedure
Source of Data: Is the source objective? (clear goal?)
Is the source biased? (data partial to collector?)
Is there incentive to distort/spin results to support some self-serving
Is there something to gain/lose by distorting results?
Be vigilant and skeptical of studies form sources that may be biased.
Sampling Method: The method chosen can have great influence on the
validity. -voluntary response (not necessarily valid)
-other methods are more likely to produce good results.
Conclusions: - make statements that are clear to those without an
understanding or statistics and terminology
-avoid making statements not justified by statistical analysis
Practical Implications- state practical implications of the results-may
statistical but not practical significance
Statistical significance:- likelihood of getting the results by chance
-if results could easily occur by chance, then not statistically significant
-if likelihood of getting results is small, results are statistically significant
Key concept- largely about using sample data to make inferences about an
Parameter- a numerical measurement describing some characteristic of a
Statistic-numerical measurement describing some characteristic of a sample
Variables and Types of Data: Qualitative and Quantitative ( discrete and
continuous) discrete is countable and continuous can be any decimal
Levels of measurements
Nominal level: data that consists of names, labels, or categories only, and
the data cannot be arranged in an ordering scheme (low to high) ex. Yes,
no, undecided survey
Ordinal level: data that can be arranged in some order, but differences
between data values either can‟t be determined or are meaningless. Ex:
course grades: ABCD
Interval level: like ordinal level; with additional property that difference
between any 2 values is meaningful, however, there is no natural zero
starting point. (where none of the quantity is present) Ratio level: interval level w/ additional property that there is a natural zero
starting pt. (where zero indicates that none of the quantity is present) ex.
Prices of college textbooks, distance
Important Characteristics of Data
1. Center- A representative value that indicates where the middle of the data
is located. Frequency Distribution 9/9/2013 10:59:00 AM
-shows how a data set is partitioned among all of several categories (or
classes) by listing all of the several categories (or classes) by listing all of
the categories along w/ the number of data values in each category.
Constructing a Frequency Distribution for Qualitative Data
Step 1: List distinct values of the observations in the data set in the first
column of the table
Step 2: For each of observations, place a tally mark in the second column of
the table in the row of the appropriate distinct value.
Step 3: Count number of tally marks and record it in the third column
Frequency Distributions for Quantitative Data
Constructing a Frequency Distribution
1.Determine the number of classes (should be between 5 and 20).
2.Calculate the class width (round up)
class width= (maximum value)-(minimum value)
number of classes
3. Starting point: Choose the minimum data value or a convenient value
below it as the first lower class limit.
4. Using the first lower class limit and class width, proceed to list the other
lower class limits.
5. List the lower class limits in a vertical column and proceed to enter the
upper class limits
6. Take each individual data value and put a tally mark in the appropriate
class. Add the tally marks to get the frequency.
Relative Frequency Distribution
Cumulative Frequency Distribution- the sum of the frequency of all classes
Critical Thinking Interpreting Frequency Distributions In later chapters, there will be frequent reference to data w/ a normal
distribution. One key characteristic of a normal distribution is that it has a
-The frequencies start low, then increase to one or two high frequencies,
then decrease a low frequency
-The distribution is approximately symmetric, with frequencies preceding the
maximum being roughly a mirror image off those that follow the maximum.
-The presence of gaps can show that we have data from two or more
different populations. However, the converse is not true, because data from
different populations do not necessarily result in gaps.
SECTION 2.3 HISTOGRAMS-
Histogram- ( graphic version of a freq dist.) a graph consisting of bars of
equal width drawn adjacent to each other (w/o gaps). The horizontal scale
reps the classes of quantitative data values and the vertical scale represents
the frequencies. The height of the bars determine
Critical Thinking Interpreting Histograms
- Objective is not simply to construct a histogram, but (pwpt) Chapter 3 9/9/2013 10:59:00 AM
Center- There are four measures of center: Mean, Median, Mode & Midrange
1. Mean- (Arithmetic mean) or mean of a set of data is the sum of all data
values divided by the total number (#) of data values.
Mean= Σx - sum of all data points
n - total # of data points
X – “x bar” -- representative sample mean
- Mu -- represents the population mean
- relatively reliable, i.e. sample means tend to be more consistent that other
measures of center.
-takes every data value into account
-Sensitive to extreme values (outliers)
*The mean is not a resistant measure. Resistant measures are not
influenced by outliers.
$2.0, 4.9, 6.9, 2.1, 5.1, 3.2, 5.7, 6.6
= 36.1 = 4.51
2. Median- of a data set is the number that divides the bottom 50% from the
top 50%. With the original data value