Textbook Notes (368,802)
Canada (162,170)
Psychology (9,697)
PSYB01H3 (581)
Chapter 10

Chapter 10

10 Pages
Unlock Document

Connie Boudens

Chapter 10: Quantitative Data Analysis o Dunn found an interesting association between increased happiness and spending money on others o Performed three different types of research: 1. a cross-sectional national representative survey 2. longitudinal field study 3. experimental laboratory investigation o In each study, the researchers reported statistics that supported their hypothesis that spending money on other people has a more positive impact on happiness than spending money on oneself o 60 research participants rate their happiness in the morning and then we give each participant an envelope that contains either $5 or $20, which they are to spend by 5:00 p.m. that day ­ Randomly assign participants to either (1) personal spending or (2) prosocial spending (buying stuff for other people ­ 2 independent variables: (1) amount of money given ($5 vs. $20) and (2) spending (personal vs. prosocial) ­ Assign 15 research participants to each of the resulting four conditions: (1) personal spending $5; (2) personal spending $20; (3) prosocial spending $5; and (4) prosocial spending $20 ­ We call participants after 5.00 5.00 p.m. on that same day and ask them to rate their happiness from 1 to 10, with 1 = very unhappy and 10 = very happy ­ We have a 2 x 2 between-subjects factorial design ­ Dependent measure is happiness rating STATISTICAL APPROACH o 2 major types of statistics: 1. Descriptive statistics - are used to describe the variables in a study, both one at a time and in terms of their relations to each other 2. Inferential statistics - are used to estimate characteristics of a population from those we found in a random sample of that population  To test hypotheses about the relationship between variables (How confident can we be that the effect we observed was not simply due to chance?) Level of Measurement Matters o Before we calculate statistics involving a variable, we must identify that variable’s level of measurement o There are 4 levels of measurement: 1. Nominal 2. Ordinal 3. Interval 4. Ratio o In example, spending (personal vs. prosocial) reflects a nominal scale ­ Participants are asked whether they’d spend the money on themselves or others  categorical ­ Happiness scale (1-10)  ordinal o Even though we know that 1 is for very unhappy and 10 is for very happy, it does not really mean that a response of 10 is 9 units more happiness than a response of 1 o It is at the interval and ratio levels of measurement that the values of the variable reflect actual numbers on a scale with fixed intervals o Although conceptually different, ratio and interval measures can be analyzed with the same statistical procedures. These procedures are more powerful than those that are allowable with an ordinal (or nominal) level of measurement. The levels of measurement of both independent and dependent variables are thus important factors to consider when designing experiments and other types of research and when planning statistical analyses of the resulting data UNIVARIATE DISTRIBUTIONS o Both frequency distributions and graphs are used to describe the distribution of variables one at a time Frequency Distributions o Frequency Distributions – shows the number of cases and/or the percentage of cases who receive each possible score on a variable o For many descriptive purposes, the analysis may go no further than a frequency distribution o Many frequency distributions (and graphs) require grouping of some values after the data are collected. There are two reasons for grouping: 1. There are more than 15 to 20 values to begin with, a number too large to be displayed in an easily readable table 2. The distribution of the variable will be clearer or more meaningful if some of the values are combined o (see figure 10.3, p. 318)  once some of the values are combined it is much easier to discern the distribution’s shape o Once we decide to group values, or categories, we have to be sure that we do not distort the distribution, following the following guidelines will prevent many problems: ­ Categories should be logically defensible and preserve the distribution’s shape ­ Categories should be mutually exclusive and exhaustive so that every case should be classifiable in one and only one category Graphing o Several different types of graph are commonly used to depict frequency distributions. ­ The most common and most useful: bar charts and histograms o Bar Chart – contains solid bars separated by spaces ­ It is a good tool for displaying the distribution of variables measured at the nominal level because there is, in effect, a gap between each of the categories o Histogram – displays a frequency distribution of a quantitative variable o Both graphs and frequency distributions allow the researcher to display the distribution of cases across the categories or scores of a variable o Graphs have the advantage of providing a picture that is easier to comprehend, although frequency distributions are preferable when exact numbers of cases having particular values must be reported and when many distributions must be displayed in a compact form Beware of Deceptive Graphs o If graphs are misused, they can distort, rather than display the shape of a distribution o Difference between the two graphs is due simply to changes in how the graphs are drawn  ex. drawing a graph that begins at 15 rather than 0 o Adherence to several guidelines will help you spot these problems and avoid them in your own work: ­ The difference between bars can be exaggerated by cutting off the bottom of the vertical axis and displaying less than the full height of the bars. Instead, begin the graph of a quantitative variable at 0 on both axes. It may be reasonable, at times, to violate this guideline, as when an age distribution is presented for a sample of adults, but in this case be sure to mark the break clearly on the axis. ­ Bars of unequal width, including pictures instead of bars, can make particular values look as if they carry more weight than their frequency warrants. Always use bars of equal width. ­ Either shortening or lengthening the vertical axis will obscure or accentuate the differences in the number of cases between values. The two axes should be of approximately equal length. ­ Avoid chart junk that can confuse the reader and obscure the distribution’s shape (a lot of verbiage or umpteen marks, lines, lots of cross-hatching, etc.) DESCRIPTIVE STATISTICS o Three features of shape are important in a graph: 1. Central Tendency 2. Variability 3. Skewness (lack of symmetry) Measures of Central Tendency o Central tendency is usually summarized with one of three statistics: 1. the mode 2. the median 3. the mean o To choose an appropriate measure of central tendency the researcher must consider level of measurement, the skewness of the distribution of the variable (if it is a quantitative variable), and the purpose for which the statistic is used Mode o Mode (probability average) – most frequent and most probable value in a distribution ­ For a nominal scale of measurement, the mode, defined as the most frequent score, is the only measure of central tendency The mode indicates the most frequently occurring value ­ It does not use the actual values of a scale, as we have seen in the gender example  most frequent score is 53 in questionnaire o The mode is used much less often than the other two measures of central tendency because it can so easily give a misleading impression of a distribution’s central tendency o Problems with mode: ­ When a distribution is bimodal instead of unimodal ­ Might happen to fall far from the main clustering of cases in a distribution o With such a distribution, it would be misleading to say simply that the variable’s central tendency was whatever the modal value was. However, when the issue is what the most probable value is, the mode is the appropriate statistic. Which ethnic group is most common in a given school? The mode provides the answer Median o Median – is the position average or the point that divides the distribution in half (the 50 percentile) o The median is inappropriate for variables measured at the nominal level because their values cannot be put in order, and so there is no meaningful middle position o If the median point falls between two cases the median is defined as the average of the two middle values o The median in a frequency distribution is determined by identifying the value corresponding to a cumulative percentage of 50 Mean o Mean (Arithmetic Average) – takes into account the value of each case in a distribution; it is a weighted average o Mean = Sum of value of cases/Number of cases o Because computing the mean requires adding up the values of the cases, it makes sense to compute a mean only if the values of the cases can be treated as actual quantities-that is, if they reflect an interval or ratio level of measurement, or if they are ordinal and we assume that ordinal measures can be treated as interval. o It would make no sense to calculate the mean of a variable measured at the nominal level, such as religion o Ex. A group of 4 people (2 protestants, 1 catholic, 1 Jew)  calculating the mean would not make sense  protestant+protestant+catholic+jew? Median Vs. Mean o Both the median and the mean are used to summarize the central tendency of quantitative variables, but their suitability for a particular application must be carefully assessed o The key issues to be considered in this assessment are the variable’s: 1. level of measurement 2. the shape of its distribution 3. the purpose of the statistical summary o Level of measurement is a key concern because to calculate the mean, we must add up the values of all the cases - a procedure that assumes the variable is measured at the interval or ratio level ­ So even though we know that coding Agree as 2 and Disagree as 3 does not really mean that Disagree is 1 unit more of disagreement than Agree, the mean assumes this evaluation to be true. Because calculation of the median requires only that we order the values of cases, we do not have to make this assumption. Technically speaking, then, the mean is simply an inappropriate statistic for variables measured at the ordinal level (and you already know that it is completely meaningless for variables measured at the nominal level). In practice, however, many social researchers use the mean to describe the central tendency of variables measured at the ordinal level. o The shape of the distribution of a variable should also be taken into account when deciding whether to use the median or mean ­ The values of the mean and median are affected differently by skewness, or the presence of cases with extreme values on one side of the distribution but not the other side ­ Median is not affected by extreme values ­ Mean is affected by extreme values as it will be pulled in the direction of exceptionally high (or low) values ­ Mean > Median  distribution is skewed in a (+) direction, with proportionately more cases with higher than lower values ­ Mean < Median distribution is skewed in a ( - ) direction. o If the purpose is to report the middle position in one or more distributions, then the median is the proper statistic, whether or not the distribution is skewed ­ If the purpose of the research is to show how likely different groups are to have age related health problems, the measure of central tendency for these groups should take into account people’s ages, not just the number of people who are older and younger than a particular age. For this purpose, the median would be inappropriate  mean (higher number of older people than median) o Keep in mind that it is not appropriate to use either the median or the mean as a measure of central tendency for variables measured at the nominal level, because at this level of measurement, the different attributes of a variable cannot be ordered as higher or lower Measures of Central Tendency (MCT) and Scales of Measurement Levels of Most Appropriate Potentially Useful Definitely Measurement MCT MCT Inappropriate MCT Nominal Mode None Median, Mean Ordinal Median Mean, Mode None Interval, Ratio Mean Median, Mode None o In general, the mean is the most common measure of central tendency for quantitative variables, both because it takes into account the value of all cases in the distribution and because it is the foundation for many other advanced statistics. However, the mean’s very popularity results in its use in situations for which it is inappropriate. Keep an eye out for this problem. Measures of Variation o A summary of distributions based only on their central tendency can be very incomplete, even misleading o Ex. Town A (homogeneous – middle class), Town B (Heterogeneous) and Town C (polarized = mostly very poor and very rich with few in between) but all have the same median outcome o The way to capture these differences with statistical measures of variation or variability o Popular measures of variability in a set of data are: 1. Range 2. Variance 3. Standard deviation (most popular) o To calculate each of these measures, the variable must be at the interval or ratio level o Measures of variability are descriptive statistics that capture only part of what we need to be concerned with about the distribution of a variable  to not tell us about the extent to which a distribution is skewed. Which we’ve seen is very important for interpreting measures of central tendency Range o Range – is a simple measure of variation Range = Highest value – Lowest value + 1 o It often is important to report the range of a distribution to identify the whole range of possible values that might be encountered. However, because the range can be drastically altered by just one exceptionally high or low value (termed an outlier), it does not do an adequate job of summarizing the extent of variability in a distribution. Variance o Variance – is the average squared deviation of each case from the mean, so it takes into account the amount by which each case differs from the mean ̅ o Symbol key: = mean; N = number of cases; = sum of all cases;= value of case i on variable Y Standard Deviation o Standard Deviation – is simply the square root of the variance √ ̅ o When the standard deviation is calculated from sample data, the denominator is supposed to be N – 1, rather than N, an adjustment that has no discernible effect when the number of cases is reasonably large o Squared deviations in the formula accentuates the impact of relatively large deviations, because squaring a large number makes that number count much more. o The standard deviation can be used to answer two simple questions: 1. How much variation, dispersion, or spread is there in a set of scores, numbers, or data points? 2. How far from the average is a particular score? SAMPLING DISTRIBUTIONS o The mean or any other statistic we obtain for a random sample is not likely to equal exactly the population mean or, more generally the population parameter o Sampling Distribution – the hypothetical distribution of a statistic across all the random samples that could be drawn from a population o The mean of the distribution of sample means is the actual population mean o We can calculate a statistic called the standard error of the mean to indicate the degree to which the means of the samples vary from the population mean INFERENTIAL STATISTICS o A normal distribution is a distribution that results from chance variation around a mean. It is symmetric and tapers off in a characteristic shape from its mean. If a variable is normally distributed 68% of the cases will be between plus and minus 1 standard deviation from the distribution’s mean, and 95% of the cases will lie between plus and minus 1.96 standard deviations from the mean o This correspondence of the standard deviation to the normal distribution enables us to infer how confident we can be that the mean (or some other statistic) of a population sampled randomly is within a certain range of the hypothetical population mean. o Standard deviation  4 more steps 1. Calculate the standard error (SE). This is the estimated value of the standard deviation of the sampling distribution ( ) from which your sample was selected √ 2. Decide on the degree of confidence that you wish to have that the population parameter falls within the confidence interval you compute. It is conventional to calculate the 95%, 99%, or even the 99.9% confidence limits around the mean. Most often, the 95% confidence limits are used, so we will just show the calculation for this estimate. 3. Multiply the value of the SE x 1.96. This is because 95% of the area under the normal curve falls within ± 1.96 standard deviation units of the mean. 4. Add and subtract the number in step 3 from the sample mean. The resulting numbers are the upper and lower confidence limits. o Use of inferential statistics: to estimate a population parameter from a sample statistic Hypothesis Testing o Inferential statistics are also used to test hypotheses o The logic of hypothesis testing centers on two hypotheses: the null hypothesis and the research (or alternative) hypothesis o In a test of the difference between two means, the null hypothesis states that the population me
More Less

Related notes for PSYB01H3

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.