Textbook Notes (363,559)
Chapter 9

# Chapter 9.docx

6 Pages
89 Views

School
Western University
Department
Statistical Sciences
Course
Statistical Sciences 2244A/B
Professor
Jennifer Waugh
Semester
Spring

Description
9.1 Overview ● This chapter introduces important methods for making inferences based on sample data that come in pairs ● This chapter has the objective of determining whether there is an association between two variables, and if such an association exists, we want to describe it with an equation that can be used for predictions 9.2 Correlation ● The main objective of this section is to analyze a collection of pairs sample data and determine whether there appears to be an association between the two variables ○ we refer to such an association as a correlation Part 1: Basic Concepts of Correlation ● Acorrelation exists between two variables when one of them is related to the other in some way Exploring the Data ● We should always begin an investigation into the association between two variables by constructing a graph called a scatterplot or scatter diagram ● Ascatterplot is a graph in which the paired (x,y) sample data are plotted with a horizontal x-axis and a vertical y-axis; each individual (x,y) pair is plotted as a single point ● When examining a scatterplot, we should study the overall pattern of the plotted points; note its direction and if there are any outliers Linear Correlation Coefficient ● Because visual examinations of scatterplots are largely subjective, we need more objective measures; we use the linear coefficient r, useful for detecting straight-line patterns ● The linear correlation coefficient r measures the strength of the linear association between the paired x and y quantitative values in a sample ● Because the linear correlation coefficient r is calculated using sample data, it is a sample statistic used to measure the strength of the linear correlation between x and y ○ if we had every pair of population values for x and y, the result of the linear correlation coefficient would be a population parameter, ρ Rounding the Linear Correlation Coefficient ● Round the linear correlation coefficient r to three decimal places so that its value can be directly compared to critical values in TableA-6 Interpreting the Linear Correlation Coefficient ● Given the way that the formula for calculating r is constructed, the value of r must always fall between -1 and +1 inclusive ○ if r is close to 0, we conclude there are no significant linear correlation between x and y ○ if r is close to -1 or +1, we conclude that there is a significant linear correlation between x and y ● When there really is no linear correlation between x and y, table A-6 lists values that are critical in this sense: they separate usual values of r from those that are unusual ● Properties of the Linear Correlation Coefficient r ○ The value of r is always between -1 and +1 inclusive ○ The value of r does not change if all values of either variable are converted to a different scale ○ The value of r is not affected by the choice of x or y; interchange all x and y values and the value of r will not change ○ r measures the strength of a linear association; not designed to measure strength of an association that is not linear Interpreting r: Explained Variation ● If we conclude that there is a significant linear correlation between x and y, we can find a linear equation that expresses y in terms of x and that equation can be used to predict values of y for given values of x ● In section 9-3 we will describe a procedure for finding such equations and how how to predict values of y when given x ● However, a predicted value of y will not necessarily be the exact result because in addition to x, there are other factors affecting y such as random variation and other characteristics not included in the study Common Errors Involving Correlation ● Acommon source of error involves concluding that correlation implies causation ○ a lurking variable is one that affects the variables being studied but is not included in the study ● Another source of error arises with data based on averages ○ averages suppress individual variation and may inflate the correlation coefficient ● Athird source of error involves the property of linearity ○ an association may exist between x and y even when there is no significant linear correlation Part 2 Beyond the Basic Concepts of Correlation Formal Hypothesis Test ● We present two methods for using a formal hypothesis test to determine whether there is a significant linear correlation between two variables ● Method 1 uses the Student t distribution with a test statistic having the form of t = r/s r where s drnotes the sample standard deviation of r values ● Generally the hypothesis tests in this section will involve two tailed tests where the null and alternative hypothesis are as follows: ○ H : 0 = 0 ○ H : 1 ≠ 0 ● However, often we will see one tailed tests with a claim of a positive linear correlation or a claim of a negative linear correlation: ○ H : 0 = 0 ○ H : ρ < 0 or H : ρ > 0 1 1 ● Given a collection of paired (x,y) data, the point (x(bar), y(bar)) is called the centroid ● The statistic r is based on the sum of the products (x - x(bar))(y - y(bar)) ● In any scatterplot, vertical and horizontal lines through the centroid (x(bar), y(bar)) divide the diagram into four quadrants ● If the points of the scatter plot then to approximate an uphill line, individual values of the product (x - x(bar))(y - y(bar)) tend to be positive as most of the points are found in the first and third quadrants ● If the points of the scatterplot approximate a downhill line, most of the points are in the second and fourth quadrants where (x - x(bar)) and (y - y(bar)) are opposite in sign thus the sum of the product (x - x(bar))(y - y(bar)) is negative ● Points that follow non linear pattern tend to be scattered among the four quadrants so the value of the sum of (x - x(bar))(y - y(bar)) tends to be close to 0 ● Therefore, we can use the sum of (x - x(bar))(y - y(bar)) as a measure of how the points are arranged Confidence Intervals ● In preceding chapters we discussed methods of inferential statistics by addressing methods of hypothesis testing and methods for constructing confidence interval estimates ● Asimilar procedure may be used for confidence intervals of , however it involves complicated transformations so screw it for now 9.3 Regression ● In Section 9-2 we analyzed paired data with the goal of determining whether there is a significant linear correlation between two variables ● The main objective of this section is to describe the association between two variables by finding the graph and equation of the straight line that represents the association ● This straight line is called the regression line and its equation is called the regression equation Part 1 Basic Concepts of Regression ● The regression equation expresses an association between x (called the independent variable, or predictor variable, or explanatory variable) and y(hat) (called the dependent variable, or response variable) ● The typical equation of a straight line y= mx + b is expressed in the form y(hat) = b + 0 b 1 ○ where b is 0he y-intercept and b is th1 slope ○ the given notation shows that b and 0 are sa1ple statistics used to estimate the population parameters β and0β 1 ● We will use paired sample data to estimate the regression equation ● Once we have evaluated b and b 1e can 0dentify the estimated regression equation,
More Less

Related notes for Statistical Sciences 2244A/B

OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.