6 Pages
Unlock Document

Western University
Statistical Sciences
Statistical Sciences 1024A/B
Mary Millard

Chapter1:Summary • A data set contains information on a number of individuals. Individuals may be people, animals, or things. For each individual, the data gives values for one or more variables. A variable describes some characteristic of an individual, such as a person's height, sex or salary. • Some variables are categoricaland others are quantitative. A categorical variable places each individual into a category, such as male or female. A quantitative variable has numerical values that measure some characteristic of each individual, such as height in centimetres or salary in dollars. • Exploratorydataanalysisuses graphs and numerical summaries to describe the variables in a data set and the relations among them. • After you understand the background of your data (individuals, variables, units of measurement), the first thing to do is almost always plotyourdata. • The distributionof a variable describes what values the variable takes and how often it takes these values. Piechartsand bargraphsdisplay the distribution of a categorical variable. Bar graphs can also compare any set of quantities measured in the same units. Histogramsand stemplotsgraph the distribution of a quantitative variable. • When examining any graph, look for an overallpatternand for notable deviationsfrom the pattern. • Shape,centre,andspreaddescribe the overall pattern of the distribution of a quantitative variable. Some distributions have simple shapes, such as symmetricor skewed. Not all distributions have a simple overall shape, especially when there are few observations. • Outliersare observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. • When observations on a variable are taken over time, make a timeplot that graphs time horizontally and the values of the variable vertically. A time plot can reveal trends,cycles,or other changes over time. Chapter2:Summary • A numerical summary of a distribution should report at least its centreand its spreador variability. • The meanx-barand the medianM describe the centre of a distribution in different ways. The mean is the arithmetic average of the observations, and the median is the midpoint of the values. • When you use the median to indicate the centre of the distribution, describe its spread by giving the quartiles.The firstquartile,Q1has 1/4th of the observations below it, and the thirdquartileQ3has 3/4ths of the observations below it. • The five-numbersummaryconsisting of the median, the quartiles, and the smallest and largest individual observations provides a quick overall description of the distribution. The median describes the centre, and the quartiles and extremes show the spread. • Boxplotsbased on the five-number summary are useful for comporting several distributions. The box spans the quartiles and shows the spread of the central half of the distribution. The median is marked within the box. Lines extend from the box to the extremes and show the full spread of the data. • The variances^2and especially its square root, the standarddeviations,are common measures of spread about the mean as centre. The standard deviation s is zero when there is not spread and gets larger as the spread increases. • A resistantmeasureof any aspect of a distribution is relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, no matter how large these changes are. The median and quartiles are resistant, but the mean and the standard deviation are not. • The mean and standard deviation are good descriptions for symmetric distributions without outliers. They are most useful for the Normal distributions introduced in Chapter 3. The five-number summary is a better description for skewed distributions. • Numerical summaries do not fully describe the shape of a distribution. Always plot your data. • A statistical problem has a real-world setting. You can organize many problems using the four steps state,plan,solveand conclude. Chapter3:Summary • We can sometimes describe the total pattern of a distribution by a densitycurve. A density curb has total area 1 underneath it. An area under a density curve gives the proportion of observations that fall in a range of values. • A density curve is an idealized description of the overall pattern of a distribution that smooths out the irregularities in the actual data. We write the meanofadensity curveas muand the standarddeviationofadensitycurveas sigmato distinguish them from the mean (x bar) and standard deviation (s) of the actual data. • The mean, the median and the quartiles of a density curve can be located by eye. The meanis the balance point of the curve. The mediandivides the area under the curve in half. The quartilesand the median divide the area under the curve into quarters. The standarddeviationsigmacannot be located by eye on most density curves. • The mean and median are equal for symmetric density curves. The mean of a skewed curve is located farther toward the long tail than is the median. • The Normaldistributionsare described by a special family of bell-shaped, symmetric density curves, called Normalcurves. Mu and sigma completely specify a Normal distribution N(mu,sigma). The mean is the centre of the curve, and sigma is the distance from mu to the change-of-curvature points on either side. • To standardizeany observation x, subtract the mean of the distribution and then divide by the standard deviation. The resulting z-score: z=(x-mu)/ sigma says how many standard deviations x lies from the distribution mean. • All Normal distributions are the same when measurements are transformed to the standardized scale. In particular, all Normal distributions satisfy the 68-95-99.7 rule,which describes what percent of observations lie within one, two, and three standard deviations of the mean. • If x has the N(mu,sigma) distribution, then the standardizedvariable[z=(x-mu)/sigma] has the standardNormaldistributionN(0,1) with mean 0 and standard deviation 1. Table A gives the cumulativeproportionsof standard Normal observations that are less than z for many values of z. By standardizing, we can use Table A for any Normal distribution. Chapter4:Summary • To study relationships between variables, we must measure the variables on the same group of individuals. • If we think that a variable x may explain or even cause changes in another variable y, we call x and explanatoryvariableand y a responsevariable. • A scatterplot displays the relationship between 3 quantitative variables measured on the same individuals. Mark values of one variable on the horizontal axis (x axis) and values of the other variable on the vertical axis (y axis). Plot each individual's data as a point on the graph. Always plot the explanatory variable, if there is one, on the x axis of a scatterplot. • Plot points with different colors or symbols to see the effect of a categorical variable in a scatterplot. • In examining a scatterplot, look for an overall pattern showing the direction,form,and strengthof the relationship, and then for outliersor other deviations from this pattern. • Direction:If the relationship has a clear direction, we speak of either positiveassociation (high values of the two variables tend to occur together) or negativeassociation (high values of one variable tend to occur with low values of the other variable). • Form:Linearrelationships,where the points show a straight-line pattern, are an important form of relationship between two variables. Curved relationships and clustersare other forms to watch for. • Strength:The strength of a relationship is determined by how close the points in the scatterplot lie to a simple form such as a line. • The correlationrmeasures the direction and strength of the linear association between two quantitative variables x and y. Although you can calculate a correlation for any scatterplot, r measures only straight-line relationships. • Correlation indicates the direction of a linear relationship by its sign: r > 0 for a positive association and r < 0 for a negative association. Correlation always satisfies [-1 greaterthanorequaltorlessthanorequalto+1]and indicates the strength of a relationship by how close it is to -1 or +1. Perfect correlation, r = +/- 1, occurs only when the points on a scatterplot lie exactly on a straight line. • Correlation ignores the distinction between explanatory and response variables. The value of r is not affected by changes in the unit of measurement of either variable. Correlation is not resistant, so outliers can greatly change the value of r. Chapter5:Summary • A
More Less

Related notes for Statistical Sciences 1024A/B

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.