Class Notes (835,376)
York University (35,229)
hassanq (1)
Lecture

# Statistics .docx

11 Pages
159 Views

School
Department
Course
Professor
hassanq
Semester
Fall

Description
Statistics is a way to get information from data Statistics is a tool for creating new understanding from a set of numbers. Descriptive statistics deals with methods of organizing, summarizing, and presenting data in a convenient and informative way. One form of descriptive statistics uses graphical techniques, which allow statistics practitioners to present data in ways that make it easy for the reader to extract useful information. Another form of descriptive statistics uses numerical techniques to summarize data. The mean and median are popular numerical techniques to describe the location of the data. The range, variance, and standard deviation measure the variability of the data The actual method used depends on what information we would like to extract. Are we interested in… • measure(s) of central location? and/or • measure(s) of variability (dispersion)? Inferential statistics is a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data. exit polls, wherein a random sample of voters who exit the polling booth is asked for whom they voted. Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample. Population — a population is the group of all items of interest to a statistics practitioner. — frequently very large; sometimes infinite. E.g. All 5 million Florida voters, per Example 12.5 Other examples you think are? Sample — A sample is a set of data drawn from the population. — Potentially very large, but less than the population. E.g. a sample of 765 voters exit polled on election day. Parameter — A descriptive measure of a population. Statistic — A descriptive measure of a sample. We use statistics to make inferences about parameters. Therefore, we can make an estimate, prediction, or decision about a population based on sample data. The confidence level is the proportion of times that an estimating procedure will be correct. E.g. a confidence level of 95% means that, estimates based on this form of statistical inference will be correct 95% of the time. When the purpose of the statistical inference is to draw a conclusion about a population, the significance level measures how frequently the conclusion will be wrong in the long run. E.g. a 5% significance level means that, in the long run, this type of conclusion will be wrong 5% of the time. If we use α (Greek letter “alpha”) to represent significance, then our confidence level is 1 - α. A variable is some characteristic of a population or sample. E.g. student grades. Typically denoted with a capital letter: X, Y, Z… The values of the variable are the range of possible values for a variable. E.g. student marks (0..100) Data(datum) are the observed values of a variable. E.g. student marks: {67, 74, 71, 83, 93, 55, 48} Interval data • Real numbers, i.e. heights, weights, prices, etc. • Also referred to as quantitative or numerical. Nominal Data • The values of nominal data are categories. E.g. responses to questions about marital status, coded as: Single = 1, Married = 2, Divorced = 3, Widowed = 4 Nominal data are also called qualitative or categorical. Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them: E.g. College course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5 Interval Values are real numbers. All calculations are valid. Data may be treated as ordinal or nominal. Ordinal Values must represent the ranked order of the data. Calculations based on an ordering process are valid. Data may be treated as nominal but not as interval. Nominal Values are the arbitrary numbers that represent categories. Only calculations based on the frequencies of occurrence are valid. Data may not be treated as ordinal or interval. We can summarize the data in a table that presents the categories and their counts called a frequency distribution. The total count A relative frequency distribution lists the categories and the proportion with which each occurs (total count divided by total pop) If the two variables are unrelated, the patterns exhibited in the bar charts should be approximately the same. If some relationship exists, then some bar charts will differ from others. Techniques applied to single sets of data are called univariate Bivariate – depict the relationship between variables Cross-classification table- used to describe relationship between two nominal variables Histrogram 1) Collect the Data 2) Create a frequency distribution for the data… How? a) Determine the number of classes to use. ( 1+ 3,3log(n)=) b) Determine how large to make each class… How? Look at the range of the data, that is, Range = Largest Observation – Smallest Observation Range = \$119.63 – \$0 = \$119.63 Then each class width becomes: Range ÷ (# classes) = 119.63 ÷ 8 ≈ 15 Symmetry A histogram is said to be symmetric if, when we draw a vertical line down the center of the histogram, the two sides are identical in shape and size: Skewness A skewed histogram is one with a long tail extending to either the right or the left: Modality A unimodal histogram is one with a single peak, while a bimodal histogram is one with two peaks: Bell Shape A special type of symmetric unimodal histogram is one that is bell shaped: We create an ogive in three steps… 1) Calculate relative frequencies.  2) Relative Frequency = # of observations in a class Total # of observations 3) Calculate cumulative relative frequencies by adding the current class’ relative frequency to the previous class’ cumulative relative frequency. 4) Graph the cumulative relative frequencies… Observations measured at the same point in time are called cross-sectional data. Observations measured at successive points in time are called time-series data. Time-series data graphed on a line chart, which plots the value of the variable on the vertical axis against the time periods on the horizontal axis. To explore this relationship, we employ a scatter diagram, which plots two variables against one another. The independent variable is labeled X and is usually placed on the horizontal axis, while the other, dependent variable, Y, is mapped to the vertical axis. Factors That Identify When to Use Frequency and Relative Frequency Tables, Bar and Pie Charts 1. Objective: Describe a single set of data. 2. Data type: Nominal Factors That Identify When to Use a Histogram, Ogive, or Stem-and-Leaf Display 1. Objective: Describe a single set of data. 2. Data type: Interval Factors that Identify When to Use a Cross-classification Table
More Less

Me

OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.