false

Textbook Notes
(368,802)

Canada
(162,170)

University of Toronto Scarborough
(18,529)

Psychology
(9,697)

PSYB01H3
(581)

Connie Boudens
(81)

Chapter 10

Unlock Document

Psychology

PSYB01H3

Connie Boudens

Fall

Description

Chapter 10: Quantitative Data Analysis
o Dunn found an interesting association between increased happiness and spending money on others
o Performed three different types of research:
1. a cross-sectional national representative survey
2. longitudinal field study
3. experimental laboratory investigation
o In each study, the researchers reported statistics that supported their hypothesis that spending money on other people
has a more positive impact on happiness than spending money on oneself
o 60 research participants rate their happiness in the morning and then we give each participant an envelope that
contains either $5 or $20, which they are to spend by 5:00 p.m. that day
Randomly assign participants to either (1) personal spending or (2) prosocial spending (buying stuff for other
people
2 independent variables: (1) amount of money given ($5 vs. $20) and (2) spending (personal vs. prosocial)
Assign 15 research participants to each of the resulting four conditions: (1) personal spending $5; (2) personal
spending $20; (3) prosocial spending $5; and (4) prosocial spending $20
We call participants after 5.00 5.00 p.m. on that same day and ask them to rate their happiness from 1 to 10,
with 1 = very unhappy and 10 = very happy
We have a 2 x 2 between-subjects factorial design
Dependent measure is happiness rating
STATISTICAL APPROACH
o 2 major types of statistics:
1. Descriptive statistics - are used to describe the variables in a study, both one at a time and in terms of their
relations to each other
2. Inferential statistics - are used to estimate characteristics of a population from those we found in a random
sample of that population
To test hypotheses about the relationship between variables (How confident can we be that the effect
we observed was not simply due to chance?)
Level of Measurement Matters
o Before we calculate statistics involving a variable, we must identify that variable’s level of measurement
o There are 4 levels of measurement:
1. Nominal
2. Ordinal
3. Interval
4. Ratio
o In example, spending (personal vs. prosocial) reflects a nominal scale
Participants are asked whether they’d spend the money on themselves or others categorical
Happiness scale (1-10) ordinal
o Even though we know that 1 is for very unhappy and 10 is for very happy, it does not really mean that a response of
10 is 9 units more happiness than a response of 1
o It is at the interval and ratio levels of measurement that the values of the variable reflect actual numbers on a scale
with fixed intervals
o Although conceptually different, ratio and interval measures can be analyzed with the same statistical procedures.
These procedures are more powerful than those that are allowable with an ordinal (or nominal) level of measurement.
The levels of measurement of both independent and dependent variables are thus important factors to consider when
designing experiments and other types of research and when planning statistical analyses of the resulting data
UNIVARIATE DISTRIBUTIONS
o Both frequency distributions and graphs are used to describe the distribution of variables one at a time Frequency Distributions
o Frequency Distributions – shows the number of cases and/or the percentage of cases who receive each possible score
on a variable
o For many descriptive purposes, the analysis may go no further than a frequency distribution
o Many frequency distributions (and graphs) require grouping of some values after the data are collected. There are two
reasons for grouping:
1. There are more than 15 to 20 values to begin with, a number too large to be displayed in an easily readable
table
2. The distribution of the variable will be clearer or more meaningful if some of the values are combined
o (see figure 10.3, p. 318) once some of the values are combined it is much easier to discern the distribution’s shape
o Once we decide to group values, or categories, we have to be sure that we do not distort the distribution, following the
following guidelines will prevent many problems:
Categories should be logically defensible and preserve the distribution’s shape
Categories should be mutually exclusive and exhaustive so that every case should be classifiable in one and
only one category
Graphing
o Several different types of graph are commonly used to depict frequency distributions.
The most common and most useful: bar charts and histograms
o Bar Chart – contains solid bars separated by spaces
It is a good tool for displaying the distribution of variables measured at the nominal level because there is, in
effect, a gap between each of the categories
o Histogram – displays a frequency distribution of a quantitative variable
o Both graphs and frequency distributions allow the researcher to display the distribution of cases across the categories
or scores of a variable
o Graphs have the advantage of providing a picture that is easier to comprehend, although frequency distributions are
preferable when exact numbers of cases having particular values must be reported and when many distributions must
be displayed in a compact form
Beware of Deceptive Graphs
o If graphs are misused, they can distort, rather than display the shape of a distribution
o Difference between the two graphs is due simply to changes in how the graphs are drawn ex. drawing a graph that
begins at 15 rather than 0
o Adherence to several guidelines will help you spot these problems and avoid them in your own work:
The difference between bars can be exaggerated by cutting off the bottom of the vertical axis and displaying
less than the full height of the bars. Instead, begin the graph of a quantitative variable at 0 on both axes. It
may be reasonable, at times, to violate this guideline, as when an age distribution is presented for a sample of
adults, but in this case be sure to mark the break clearly on the axis.
Bars of unequal width, including pictures instead of bars, can make particular values look as if they carry
more weight than their frequency warrants. Always use bars of equal width.
Either shortening or lengthening the vertical axis will obscure or accentuate the differences in the number of
cases between values. The two axes should be of approximately equal length.
Avoid chart junk that can confuse the reader and obscure the distribution’s shape (a lot of verbiage or
umpteen marks, lines, lots of cross-hatching, etc.)
DESCRIPTIVE STATISTICS
o Three features of shape are important in a graph:
1. Central Tendency
2. Variability
3. Skewness (lack of symmetry)
Measures of Central Tendency
o Central tendency is usually summarized with one of three statistics:
1. the mode 2. the median
3. the mean
o To choose an appropriate measure of central tendency the researcher must consider level of measurement, the
skewness of the distribution of the variable (if it is a quantitative variable), and the purpose for which the statistic is
used
Mode
o Mode (probability average) – most frequent and most probable value in a distribution
For a nominal scale of measurement, the mode, defined as the most frequent score, is the only measure of
central tendency The mode indicates the most frequently occurring value
It does not use the actual values of a scale, as we have seen in the gender example most frequent score is
53 in questionnaire
o The mode is used much less often than the other two measures of central tendency because it can so easily give a
misleading impression of a distribution’s central tendency
o Problems with mode:
When a distribution is bimodal instead of unimodal
Might happen to fall far from the main clustering of cases in a distribution
o With such a distribution, it would be misleading to say simply that the variable’s central tendency was whatever the
modal value was. However, when the issue is what the most probable value is, the mode is the appropriate statistic.
Which ethnic group is most common in a given school? The mode provides the answer
Median
o Median – is the position average or the point that divides the distribution in half (the 50 percentile)
o The median is inappropriate for variables measured at the nominal level because their values cannot be put in order,
and so there is no meaningful middle position
o If the median point falls between two cases the median is defined as the average of the two middle values
o The median in a frequency distribution is determined by identifying the value corresponding to a cumulative
percentage of 50
Mean
o Mean (Arithmetic Average) – takes into account the value of each case in a distribution; it is a weighted average
o Mean = Sum of value of cases/Number of cases
o Because computing the mean requires adding up the values of the cases, it makes sense to compute a mean only if the
values of the cases can be treated as actual quantities-that is, if they reflect an interval or ratio level of measurement,
or if they are ordinal and we assume that ordinal measures can be treated as interval.
o It would make no sense to calculate the mean of a variable measured at the nominal level, such as religion
o Ex. A group of 4 people (2 protestants, 1 catholic, 1 Jew) calculating the mean would not make sense
protestant+protestant+catholic+jew?
Median Vs. Mean
o Both the median and the mean are used to summarize the central tendency of quantitative variables, but their
suitability for a particular application must be carefully assessed
o The key issues to be considered in this assessment are the variable’s:
1. level of measurement
2. the shape of its distribution
3. the purpose of the statistical summary
o Level of measurement is a key concern because to calculate the mean, we must add up the values of all the cases - a
procedure that assumes the variable is measured at the interval or ratio level
So even though we know that coding Agree as 2 and Disagree as 3 does not really mean that Disagree is 1
unit more of disagreement than Agree, the mean assumes this evaluation to be true. Because calculation of the
median requires only that we order the values of cases, we do not have to make this assumption. Technically
speaking, then, the mean is simply an inappropriate statistic for variables measured at the ordinal level (and
you already know that it is completely meaningless for variables measured at the nominal level). In practice,
however, many social researchers use the mean to describe the central tendency of variables measured at the
ordinal level. o The shape of the distribution of a variable should also be taken into account when deciding whether to use the median
or mean
The values of the mean and median are affected differently by skewness, or the presence of cases with
extreme values on one side of the distribution but not the other side
Median is not affected by extreme values
Mean is affected by extreme values as it will be pulled in the direction of exceptionally high (or low) values
Mean > Median distribution is skewed in a (+) direction, with proportionately more cases with higher than
lower values
Mean < Median distribution is skewed in a ( - ) direction.
o If the purpose is to report the middle position in one or more distributions, then the median is the proper statistic,
whether or not the distribution is skewed
If the purpose of the research is to show how likely different groups are to have age related health problems,
the measure of central tendency for these groups should take into account people’s ages, not just the number
of people who are older and younger than a particular age. For this purpose, the median would be
inappropriate mean (higher number of older people than median)
o Keep in mind that it is not appropriate to use either the median or the mean as a measure of central tendency for
variables measured at the nominal level, because at this level of measurement, the different attributes of a variable
cannot be ordered as higher or lower
Measures of Central Tendency (MCT) and Scales of Measurement
Levels of Most Appropriate Potentially Useful Definitely
Measurement MCT MCT Inappropriate MCT
Nominal Mode None Median, Mean
Ordinal Median Mean, Mode None
Interval, Ratio Mean Median, Mode None
o In general, the mean is the most common measure of central tendency for quantitative variables, both because it takes
into account the value of all cases in the distribution and because it is the foundation for many other advanced
statistics. However, the mean’s very popularity results in its use in situations for which it is inappropriate. Keep an
eye out for this problem.
Measures of Variation
o A summary of distributions based only on their central tendency can be very incomplete, even misleading
o Ex. Town A (homogeneous – middle class), Town B (Heterogeneous) and Town C (polarized = mostly very poor and
very rich with few in between) but all have the same median outcome
o The way to capture these differences with statistical measures of variation or variability
o Popular measures of variability in a set of data are:
1. Range
2. Variance
3. Standard deviation (most popular)
o To calculate each of these measures, the variable must be at the interval or ratio level
o Measures of variability are descriptive statistics that capture only part of what we need to be concerned with about the
distribution of a variable to not tell us about the extent to which a distribution is skewed. Which we’ve seen is very
important for interpreting measures of central tendency
Range
o Range – is a simple measure of variation
Range = Highest value – Lowest value + 1
o It often is important to report the range of a distribution to identify the whole range of possible values that might be
encountered. However, because the range can be drastically altered by just one exceptionally high or low value
(termed an outlier), it does not do an adequate job of summarizing the extent of variability in a distribution.
Variance
o Variance – is the average squared deviation of each case from the mean, so it takes into account the amount by which
each case differs from the mean ̅
o Symbol key: = mean; N = number of cases; = sum of all cases;= value of case i on variable Y
Standard Deviation
o Standard Deviation – is simply the square root of the variance √ ̅
o When the standard deviation is calculated from sample data, the denominator is supposed to be N – 1, rather than N,
an adjustment that has no discernible effect when the number of cases is reasonably large
o Squared deviations in the formula accentuates the impact of relatively large deviations, because squaring a large
number makes that number count much more.
o The standard deviation can be used to answer two simple questions:
1. How much variation, dispersion, or spread is there in a set of scores, numbers, or data points?
2. How far from the average is a particular score?
SAMPLING DISTRIBUTIONS
o The mean or any other statistic we obtain for a random sample is not likely to equal exactly the population mean or,
more generally the population parameter
o Sampling Distribution – the hypothetical distribution of a statistic across all the random samples that could be drawn
from a population
o The mean of the distribution of sample means is the actual population mean
o We can calculate a statistic called the standard error of the mean to indicate the degree to which the means of the
samples vary from the population mean
INFERENTIAL STATISTICS
o A normal distribution is a distribution that results from chance variation around a mean. It is symmetric and tapers off
in a characteristic shape from its mean. If a variable is normally distributed 68% of the cases will be between plus and
minus 1 standard deviation from the distribution’s mean, and 95% of the cases will lie between plus and minus 1.96
standard deviations from the mean
o This correspondence of the standard deviation to the normal distribution enables us to infer how confident we can be
that the mean (or some other statistic) of a population sampled randomly is within a certain range of the hypothetical
population mean.
o Standard deviation 4 more steps
1. Calculate the standard error (SE). This is the estimated value of the standard deviation of the sampling
distribution ( ) from which your sample was selected
√
2. Decide on the degree of confidence that you wish to have that the population parameter falls within the
confidence interval you compute. It is conventional to calculate the 95%, 99%, or even the 99.9% confidence
limits around the mean. Most often, the 95% confidence limits are used, so we will just show the calculation
for this estimate.
3. Multiply the value of the SE x 1.96. This is because 95% of the area under the normal curve falls within ±
1.96 standard deviation units of the mean.
4. Add and subtract the number in step 3 from the sample mean. The resulting numbers are the upper and lower
confidence limits.
o Use of inferential statistics: to estimate a population parameter from a sample statistic
Hypothesis Testing
o Inferential statistics are also used to test hypotheses
o The logic of hypothesis testing centers on two hypotheses: the null hypothesis and the research (or alternative)
hypothesis
o In a test of the difference between two means, the null hypothesis states that the population me

More
Less
Related notes for PSYB01H3

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.