Class Notes (864,808)
CA (522,917)
McGill (29,496)
MATH (1,244)
MATH 203 (22)
Lecture

Stats Notes Ch. 1-3 + R Tips.doc

13 Pages
110 Views

Department
Mathematics & Statistics (Sci)
Course Code
MATH 203
Professor
Patrick Reynolds

This preview shows pages 1-3. Sign up to view the full 13 pages of the document.
Description
Descriptive statistics utilizes numerical and graphical methods to look for patterns and data sets. Inferential statistics utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data. Experimental unit: something about which we collect data. Population: a set of units to study. Variable: a characteristic property of an individual unit. Sample: a subset of the units of the population. Measure of reliability is a statement (usually quantitative) about the degree of uncer- tainty associated with a statistical inference. Four Elements of Descriptive Statistical Problems: 1. Population or sample of interest 2. One or more variables that are to be investigated 3. Tables, graphs, or numerical summary tools 4. Identification of patterns in the data. Five Elements of Inferential Statistical Problems: 1. Population of interest 2. One or more variables that are to be investigated 3. The sample of population units 4. The inference about the population based on information contained in the sample 5. A measure of the reliability of the inference. Quantitative and Qualitative: Quantitative data are measurements recorded on a numerical scale Qualitative (or categorical) data are measurements that cannot be measured on a natural numerical scale; they can only be classified into groups of categories. A designed experiment is a data collection method where the researcher exerts full control over the characteristics of the experimental units sampled. These experiments typically involve a group of experimental units that are assigned the treatment and an untreated (or control) group. (Can be 2 different treatment groups) An observed experiment is a data collection method where the experimental units sampled are observed in their natural setting. No attempt is made to control the charac- teristics of the experimental units sampled. (Eg. surveys) If we wish to infer something from sample data, the sample should be a representative sample: a sample that exhibits characteristics typical of those possessed by the target population. How do we get a representative sample? A random sample of n experimental units is a sample selected from the population in such a way that every different sample of sixe n has an equal chance of selection. Types of Error Selection bias results when a subset of the experimental units in the population is ex- cluded so that these units have no chance of being selected in the sample. Nonresponse bias results when the researchers conducting a survey or study are un- able to obtain data on all experimental units selected for the sample. Measurement error refers to inaccuracies in the values of the data recorded. In sur- veys, this kind of error may be due to ambiguous or leading questions and the inter- viewer's effect on the respondent. A class is one of the categories into which qualitative data can be classified. The class frequency is the number of observations in the data set that fall into a partic- ular class. The class relative frequency is the class frequency divided by the total number of ob- servations in the data set. The class percentage is the class relative frequency multiplied by 100. Ways to represent qualitative data: - Table - Bar graph - Pie Chart - Pareto diagram: bar graph with the classes in decreasing order. Ways to represent quantitative data: - Table - A stem-and-leaf display presents data in a convenient format. We'll take the stem to be the portion of the value left of the decimal point, and the rest (to the right of the deci- mal point) called the leaf. The stems are listed in one column, and the leaf for each ob- servation in another column. Eg. A data set of these numbers: 31.5, 31.7, 33.6, 35.0, 37.1, 37.2, 37.8 would be shown as ( | represents the decimal point): 31 | 57 32 | 33 | 6 34 | 35 | 0 36 | 37 |128 **For R: DATASETNAME 1, at least (1-1/k^2) of the measurements will fall within k standard deviations of the mean. The Empirical Rule: For data sets with frequency distributions that are mound shaped and symmetric (like a bell curve) (so the mean, median and mode are roughly the same): - Approximately 68% od the measurements will fall within 1 standard deviation of the mean. - Approximately 95% of the measurements will fall within 2 standard deviations of the mean. - Approximately 99.7% of the measurements will fall within 3 standard deviations of the mean. Example: Rats run through a maze. Thirty times are recorded, and are stored in a file, RATMAZE. We wish to determine what percentage of measurements fall within xbar +or- s, xbar R: sv("RATMAZE.csv")," > ratmaze ratmaze RUNTIME 1 1.97 2 1.74 3 3.77 4 0.60 5 2.75 6 5.36 7 4.02 8 3.81 9 1.06 10 3.20 11 9.70 12 1.71 13 1.15 14 8.29 15 2.47 16 6.06 17 5.63 18 4.25 19 4.44 20 5.21 21 1.93 22 2.02 23 4.55 24 5.15 25 3.37 26 7.60 27 2.06 28 3.65 29 3.16 30 1.65 > str(ratmaze) 'data.frame':30 obs. of 1 variable: $ RUNTIME: num 1.97 1.74 3.77 0.6 2.75 5.36 4.02 3.81 1.06 3.2 ... > mean(ratmaze$RUNTIME) [1] 3.744333 > hist(ratmaze$RUNTIME) > ratmaze$RUNTIME [1] 1.97 1.74 3.77 0.60 2.75 5.36 4.02 3.81 1.06 3.20 9.70 1.71 1.15 8.29 2.47 6.06 5.63 4.25 4.44 5.21 1.93 2.02 4.55 5.15 3.37 7.60 2.06 3.65 3.16 1.65 > mean(ratmaze$RUNTIME) [1] 3.744333 > # This is a com ment. > var(ratmaze$RUNTIME) [1] 4.832287 > # That 's how you can f ind the variance > sd(ratmaze$RUNTIME) [1] 2.198246 > # That 's the standard deviat ion > # To check that sd is the square root of the variance: > sqrt(var(ratmaze$RUNTIME)) [1] 2.198246 > # Tsal l good! > # To not having to retype al l that, name your values: > xbar s # So let 's compute: > xbar + sd Error in xbar + sd : non-numeric argument to binary operator > # oops > xbar + s [1] 5.94258 > xbar - s [1] 1.546087 > # So now, we ask how many measurements (runt imes) fa l l within (xbar-s,xbar+s) = (1.55,5.94) > # Let 's just see the measurements that are less than xbar+s > ratmaze$RUNTIME[ratmaze$RUNTIME < xbar + s] [1] 1.97 1.74 3.77 0.60 2.75 5.36 4.02 3.81 1.06 3.20 1.71 1.15 2.47 5.63 4.25 4.44 5.21 1.93 2.02 4.55 5.15 3.37 2.06 3.65 3.16 1.65 ]]> Frequency distributions: a graph of the frequency of measurements (eg. a histogram). Graphically: the median divides the graph into two equal areas; the mean is the bal- ancing point (a little trickier to visualize). ***We can use this on R to verify the empirical rule: > ratmaze$RUNTIME[ratmaze$RUNTIME < xbar +s & ratmaze$RUNTIME > xbar - s] [1] 1.97 1.74 3.77 2.75 5.36 4.02 3.81 3.20 1.71 2.47 5.63 4.25 4.44 5.21 [15] 1.93 2.02 4.55 5.15 3.37 2.06 3.65 3.16 1.65 > length(ratmaze$RUNTIME[ratmaze$RUNTIME < xbar +s & ratmaze$RUNTIME > xbar - s]) [1] 23 > 23/30 [1] 0.7666667 > # percentage of the values between xbar +/- s > # ~77% of the values fa l l within one standard deviat ion of the mean. GREAT! > length(ratmaze$RUNTIME[ratmaze$RUNTIME < xbar + 2*s & ratmaze$RUNTIME > xbar - 2*s]) [1] 28 > 28/30 [1] 0.9333333 Percentile, Quartiles For any set of n measurements (arranged in order), the pth percentile is a number such that p% of measurements fall below it, and (100-p)% fall above it. Quartiles partition of the dataset into 4 categories each containing 25% of the measure- ments. Lower quartile : Ql, lower than 25% Middle Quartile (Qm or M): 50th percentile aka median Upper quartile: Qu, over 75% ***In R: > quanti le(ratmaze$RUNTIME) 0% 25% 50% 75% 100% 0.6000 1.9825 3.5100 5
More Less
Unlock Document
Subscribers Only

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Unlock Document
Subscribers Only
You're Reading a Preview

Unlock to view full version

Unlock Document
Subscribers Only

Log In


OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit