Stat2507 Lecture Notes

STAT 2507
Stat 2507 Notes Masoud Nasarimmnasarimathcarletonca HP5250 12pm Mondays 5 x Assignments 5 each Midterm 25good practice for final Final 50 close book cumulativeAssignmentscomputer component softwareMINITAB written answer component Due Dates not yet determinedWhat is the definition of Statistics Branch of mathematics with applications in almost every facet of our lives Making educated decisions in the presence of uncertainty is what statistics does for usVariablea characteristic that changesvaries over time andor for different individuals or objects of considerationEg Height of a person at this momentExperimental unit individual or object on which the variable is measuredData Value single measurementvalue of the variable taken from a single measurement of one experimental unitPopulationis perfect knowledgethe entire target group Sets definitional boundariessets full area of studyIt is seldom realistic to test or get data from the entire population Set of all measurements of interest Egheight of all Canadians is a populationeach Canadian is an experimental unitEach height is a Data ValueThe entire set is the PopulationSample is a hopefully representative subset of the population something that can be compared to the populationApproximationBest GuessUnivariate dataresults when a single variable is measured on an experimental unitie heightBivariateMultivariate data when two or more variables are measured over an experimental unit Ie measuring both height and gender or both weight and blood pressure Often related or dependentInterval between a and b a b includes all values BETWEEN BUT NOT INCLUDINGa and bQuantitative variablesnumberstwo types1 DISCRETE variable assumes a finite or countable number of values 6a ie number of Canadians that will buy a car in 2014Min0 max34x10 b ie number of times to flip a coin to get headsinfinite but countable 2 CONTINUOUS variable assumes infinite uncountableof valuesinterval ratio scales aie a persons height or weight 0 in general DISTANCErelated or TIMErelated variables are CONTINUOUS Page 1 of 52Stat 2507 NotesDescriptive statsdescribes the dataDescriptive statistical tools Graphs 1 Pie charts 2 Line charts 3 Stem and leaf charges 4 Histogram 5 Box plot Numerical descriptions 1 Mean 2 Median 3 Variance 4 Std deviation 5 Range 6 Stderrinferential statistics to be covered laterused to make predictions or inferences from the dataQualitative variables nominal What two types of information should a graph provide 1 The measured values of variable of interest 2 How often the values occurred Pie charts and Bar Chartsfor qualitativenominal variablesuse colour to distinguish different qualitative variablesie measure of party support in student populationTo produce pie or bar chart 1 FFrequencyof times each value is observed a F Frequency of a given data point ie number of times shoe size of 10 is measured s2 RFRelative FrequencyFtotal number of observations n ie size 10 observed 6 times total snumber of observations60 RF660 110 s3 RF100ie 10 ss4 AngleRF360 ss Pie and bar charts for quantitative variables Same idea as for qualitative Eg FAvg annual income for a categoryx sRFxtotal of all annual incomes s Line charts for Time series dataallow you to discern identify a pattern or trend that will most likely continue to hold into the immediate future X axistime Y axisdata valuesPage 2 of 52Stat 2507 NotesStem and Leaf Plots 1 divide each measurement into two parts part on left is stem part on right is leaf 2 list the stems in a column from smallest to largest 3 record the leaves for each stem 4 order the leaves from lowest to highest include LEAF UNITSEg The following 15 numbers represent the shoe size of 15 people 34 31 303 and 4 become the stems 30 and 40 38 30 4142 36 43 30 0 1 4 6 7 8 9 40 37 46 40 1 1 2 3 5 645 41 39 Unit of leaf1Unit of stem10 Frequency Histograms 1Chose a number of bins between 5 and 12 a To calculate the number of bins to use take the square root of the sample size b ALWAYS ROUND UPie for a sample size of 37 the total number of bins should be 7 2 Range of X axislargest measurement smallest 3 To determine the width of each binRange of bins ALWAYS ROUND UPie if width calc165 round up to 17 or 20 4 If the measurements are discrete and the sample size is small then each distinct value can be taken as a bin ie sample size of 12 or less 5 Identify the boundaries of binsa first boundary of the first bin the smallest measurement b Second boundary of first binsmallest measurement width of a bin ndndc First boundary of 2 bin2 boundary of first binwidth of a bin d6 Determine the total number of samples that reside in each bin frequencyConstruct a statistical table based on the frequency of each bin Sum of frequencies shouldtotal sample size n 7 Plot the RF for each BIN Bin Bin Boundary Frequency RF 1 Boundary 1 Boundary 2 F Fn 112 Boundary 2 Boundary 3 F Fn 223 Boundary 3 Boundary 4 F Fn 33Totaln 1 EgNumber of Liters of milk purchased by 25 households Quantitative Discrete Liters of milk 0 3 5 4 32 1 3 1 21 1 2 0 1 4 3 2 2 2 2 2 2 3 4Page 3 of 52
