Introduction to the Practise of Statistics

Introduction

- Statistics: The science of learning from data (numerical facts)

- Context of data includes understanding variables that are recorded with special

instruments

- Individuals: Objects described in a set of data (sometimes people but when they are not,

they are cases)

- Variables: Any characteristics of an individual, can take different values for different

individuals, 2 types:

•Categorical: Places individual into 1 or >2 groups/categories (i.e. sex = female or

male)

•Quantitative: Takes numerical values of which arithmetic co operations such as

adding or averaging take place

•Distribution of a variable tells us what values it takes and how often it takes these

values

1.1 Displaying Distributions with Graphs

- Exploratory Data Analysis: Using statistical tools and ideas to examine data in order to

describe main features, 2 basic strategies:

•Begin by examining each variable – move on to study relationships among

variables

•Begin with a graph(s) – add numerical summaries of specific aspects of data

- Categorical Variables:

•Distribution of categorical variables list the categories and give the cont/percent of

individuals that all in each category

•Graphs to Represent Variables:

A.Bar Graphs

B.Pie Chart: Require that you include all categories that make a WHOLE – use only

when you want to emphasize each category’s relation to the whole

- Quantitative Variables:

www.notesolution.com

•Graphs to Represent Variables:

A.Histogram: Bars broken into sub-intervals (bins), breaks range of variable into

classes and displays only the count of percent of observations that fall into each class

X-axis covers RANGE into data

Can learn the “shape” of distribution ( i.e. bell shape, skew / , unimodal)

Extreme values of distribution are in the tails of the distribution

See distribution (visual inspection)

3 types of histograms

1.Frequency: Height/count (i.e. # of individuals that move mpg in bin)

2. Relative Frequency: Height of bat/bin if the # of individuals that fall into

that bin

3.Density: Area of the bar/bin is the # of individuals in the data set that fill

into that bin

Examples: Distribution of IQ Scores:

1.Divide the range of data into classes of equal width. The score ranges is from

81-145.

2.Count the # of individual sin each class – counts are called frequency, and

the table to frequencies of all classes is frequency table

3.Draw histogram

B.Stem Plot/ Stem-And-Leaf Plot: Gives interpretation of shape of distribution

while including the actual numerical values in the graph, works best for small

numbers of observations that are all >0

How to make a Stemplot:

1.Separates each observation into stem (most digit) and leaf (final digit) –

stems may have as many digits as needed but each leaf contains only 1 digit

2.Write stems in vertical column with the smallest at the top and draw | at the

of the column

3.Write each leaf in the row to the of its stem (optional, in increasing order)

Stemplots do NOT work well for LARGE DATA SETS, where ach stem must hold

large # of leave – 2 modifications to distribute observations:

www.notesolution.com

1.Splitting each stem into 2: One with leaves 0-4 and another with 5-9

2.Truncating (round DOWN)/ Removing the # by removing the last digit(s)

before making the stem plot

- Examining Distributions:

•In any graph of data, look for the overall pattern and for striking deviations from

that pattern

• You can describe the overall pattern of distribution by it shape, center and spread

•An important kind of deviation is an outlier (individual value that falls outside overall

pattern)

•Does the distribution have 1 or more major peaks (modes)? 1 peak is called unimodal.

•A distribution is symmetric if values smaller and larger than midpoint are mirror

images. It is skewed to the right if the right tail (larger values) and longer than the

left tail (smaller values)

- Time Plots: A time plot of a variable plots each observation against time at which it was

measured. Time is on the horizontal scale and the variable is on the vertical scale.

Connecting the data point by lines helps emphasize any change over time.

•Sometimes there is a trend (persistent, long term rise or fall)that is shown

•A pattern in a time series that repeats itself at known regular intervals of time is called

seasonal variation. Because of seasonal variation, agencies often adjust to this

(seasonally adjusted) to help avoid misrepresentation.

•Many interesting data sets are time series (measurements of a variable taken at

regular intervals over time)

1.2 Describing Distributions with Numbers

- Measuring Center: The Mean – The average value

•To find the mean of a set of observations, add their values and divide by the number of

observations

www.notesolution.com

Over 90% improved by at least one letter grade.

OneClass has been such a huge help in my studies at UofT especially since I am a transfer student. OneClass is the study buddy I never had before and definitely gives me the extra push to get from a B to an A!

Leah — University of Toronto

Balancing social life With academics can be difficult, that is why I'm so glad that OneClass is out there where I can find the top notes for all of my classes. Now I can be the all-star student I want to be.

Saarim — University of Michigan

As a college student living on a college budget, I love how easy it is to earn gift cards just by submitting my notes.

Jenna — University of Wisconsin

OneClass has allowed me to catch up with my most difficult course! #lifesaver

Anne — University of California

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.