Textbook Notes (280,000)
CA (170,000)
UTSC (20,000)
Psychology (10,000)
Chapter 2

# PSYB07H3 Chapter Notes - Chapter 2: Interquartile Range, Exploratory Data Analysis, Frequency Distribution

Department
Psychology
Course Code
PSYB07H3
Professor
Dwayne Pare
Chapter
2

This preview shows pages 1-3. to view the full 14 pages of the document.
Textbook Notes B07 September 21, 2016
Chapter #2 Lec 2
Chapter #2 Describing and
Exploring Data
2.1 - Plotting Data
- One of the simplest methods to reorganize data to make them more intelligible is to
polot them in some sort of graphical form
oFrequency distributions
oHistograms
oStem-and-leaf displays
Frequency Distributions
-Frequency distribution: as a way of organizing them in some sort of logical order
2.2 – Histograms
-Histogram: Graph in which rectangles are used to represent frequencies of observations
within each interval.
- Goal – to obscure some of the random “noise” that is not likely to be meaningful, but
still preserves important trends in the data
-Real lower limit: The points halfway between the top of one interval and the bottom of
the next.
othe smallest value that would be classed as falling into the interval.
-Real Upper limit: The points halfway between the top of one interval and the bottom of
the next
othe largest value that would be classed as being in the interval
- The midpoints: the averages of the upper and lower limits and are presented for
convenience.
oWhen we plot the data, we often plot the points as if they all fell at the
midpoints of their respective intervals.
- People often ask about the optimal number of intervals to use when grouping data.
oat least six possible rules for determining the optimal number of intervals
obut these rules are primarily intended for those writing software who need rules
to handle the general case.
osomewhere around 10 or 12 intervals is usually reasonable.
1
find more resources at oneclass.com
find more resources at oneclass.com

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Textbook Notes B07 September 21, 2016
Chapter #2 Lec 2
oit is best to use natural breaks in the number system
oif another kind of limit makes the data more interpretable, then use those limits.
-This value is called an outlier because it is widely separated from the rest of the data.
Outliers frequently represent errors in recording data, but in this particular case it was
just a trial in which the subject couldn’t decide which button to push.
2.3 – Fitting Smoothed Lines to Data
-A number of people have pointed out that histograms, as common as they are,
often fail as a clear description of data.
Fitting a Normal Curve
- we will often assume that our data are normally distributed, and superimposing a
normal distribution on the histogram will give us some idea how reasonable that
assumption is
Kernel Density Plot
-Kernel density plots: actually try to fit a smooth curve to the data while at the same
time taking account of the fact that there is a lot of random noise in the observations
that should not be allowed to distort the curve too much. Kernel density plots pay no
attention to the mean and standard deviation of the observations.
- The idea behind a kernel density plot is that each observation might have been slightly
different.
oFor example, on a trial where the respondent’s reaction time was 80 hundredths
of a second, the score might reasonably have been 79 or 82 instead.
o It is even conceivable that the score could have been 73 or 86, but it is not at all
likely that the score would have been 20 or 100.
oIn other words there is a distribution of alternative possibilities around any
obtained value, and this is true for all obtained values
2
find more resources at oneclass.com
find more resources at oneclass.com

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Textbook Notes B07 September 21, 2016
Chapter #2 Lec 2
2.4 – Stem-and-leaf Displays
- There are set backs to histograms, frequency distributions and kernel density functions
oBecause histograms often portray observations that have been grouped into
intervals, they frequently lose the actual numerical values of the individual
scores in each interval.
oFrequency distributions, on the other hand, retain the values of the individual
observations, but they can be difficult to use when they do not summarize the
data sufficiently.
-Stem-and-leaf display: Graphical display presenting original data arranged into a
histogram.
-Exploratory data analysis (EDA): A set of techniques developed by Tukey for presenting
data in visually meaningful ways.
-Leading Digits (most significant digits): most significant digit, left most digit of a
number
- Leading digits form the stem: Vertical axis of display containing the leading digits
-Trailing (less significant) digits: right-most digits of a number
3
find more resources at oneclass.com
find more resources at oneclass.com