Textbook Notes (280,000)

CA (170,000)

UTSC (20,000)

Psychology (10,000)

PSYB07H3 (10)

Dwayne Pare (5)

Chapter 2

# PSYB07H3 Chapter Notes - Chapter 2: Interquartile Range, Exploratory Data Analysis, Frequency Distribution

by OC1062122

Department

PsychologyCourse Code

PSYB07H3Professor

Dwayne PareChapter

2This

**preview**shows pages 1-3. to view the full**14 pages of the document.**Textbook Notes B07 September 21, 2016

Chapter #2 Lec 2

Chapter #2 – Describing and

Exploring Data

2.1 - Plotting Data

- One of the simplest methods to reorganize data to make them more intelligible is to

polot them in some sort of graphical form

oFrequency distributions

oHistograms

oStem-and-leaf displays

Frequency Distributions

-Frequency distribution: as a way of organizing them in some sort of logical order

2.2 – Histograms

-Histogram: Graph in which rectangles are used to represent frequencies of observations

within each interval.

- Goal – to obscure some of the random “noise” that is not likely to be meaningful, but

still preserves important trends in the data

-Real lower limit: The points halfway between the top of one interval and the bottom of

the next.

othe smallest value that would be classed as falling into the interval.

-Real Upper limit: The points halfway between the top of one interval and the bottom of

the next

othe largest value that would be classed as being in the interval

- The midpoints: the averages of the upper and lower limits and are presented for

convenience.

oWhen we plot the data, we often plot the points as if they all fell at the

midpoints of their respective intervals.

- People often ask about the optimal number of intervals to use when grouping data.

oat least six possible rules for determining the optimal number of intervals

obut these rules are primarily intended for those writing software who need rules

to handle the general case.

osomewhere around 10 or 12 intervals is usually reasonable.

1

find more resources at oneclass.com

find more resources at oneclass.com

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Textbook Notes B07 September 21, 2016

Chapter #2 Lec 2

oit is best to use natural breaks in the number system

oif another kind of limit makes the data more interpretable, then use those limits.

-This value is called an outlier because it is widely separated from the rest of the data.

Outliers frequently represent errors in recording data, but in this particular case it was

just a trial in which the subject couldn’t decide which button to push.

2.3 – Fitting Smoothed Lines to Data

-A number of people have pointed out that histograms, as common as they are,

often fail as a clear description of data.

Fitting a Normal Curve

- we will often assume that our data are normally distributed, and superimposing a

normal distribution on the histogram will give us some idea how reasonable that

assumption is

Kernel Density Plot

-Kernel density plots: actually try to fit a smooth curve to the data while at the same

time taking account of the fact that there is a lot of random noise in the observations

that should not be allowed to distort the curve too much. Kernel density plots pay no

attention to the mean and standard deviation of the observations.

- The idea behind a kernel density plot is that each observation might have been slightly

different.

oFor example, on a trial where the respondent’s reaction time was 80 hundredths

of a second, the score might reasonably have been 79 or 82 instead.

o It is even conceivable that the score could have been 73 or 86, but it is not at all

likely that the score would have been 20 or 100.

oIn other words there is a distribution of alternative possibilities around any

obtained value, and this is true for all obtained values

2

find more resources at oneclass.com

find more resources at oneclass.com

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Textbook Notes B07 September 21, 2016

Chapter #2 Lec 2

2.4 – Stem-and-leaf Displays

- There are set backs to histograms, frequency distributions and kernel density functions

oBecause histograms often portray observations that have been grouped into

intervals, they frequently lose the actual numerical values of the individual

scores in each interval.

oFrequency distributions, on the other hand, retain the values of the individual

observations, but they can be difficult to use when they do not summarize the

data sufficiently.

-Stem-and-leaf display: Graphical display presenting original data arranged into a

histogram.

-Exploratory data analysis (EDA): A set of techniques developed by Tukey for presenting

data in visually meaningful ways.

-Leading Digits (most significant digits): most significant digit, left most digit of a

number

- Leading digits form the stem: Vertical axis of display containing the leading digits

-Trailing (less significant) digits: right-most digits of a number

3

find more resources at oneclass.com

find more resources at oneclass.com

###### You're Reading a Preview

Unlock to view full version