LS 280 All Chapter and lecture notes.pdf

29 Pages
212 Views
Unlock Document

Department
Legal Studies
Course
LS 221
Professor
Owen Gallupe
Semester
Winter

Description
LS280 - Chapter 1 2014-02-07 1:47 PM Statistics - the science and practice of developing human knowledge through the use of empirical data. • Based on statistical theory which uses probability theory to estimate popular values. Mathematical Statistics - study of how to create the statistical methods using mathematical principles Applied Statistics - the practice of developing knowledge by using statistical methods to analyze data and make inferences about the population from which the data came. Conducting Research in the social world: • First, the theory that underlies the phenomenon is considered • Next, the researcher develops the research question regarding the situation of interest. • Hypotheses are developed to test the theory in light of research question • Following hypotheses development, research design involves determining things such as how concepts are measured, type of data needed, how the data will be collected required sample size • After fata is collected, data analysis is conducted and hypotheses are tested • Researcher draws conclusions about his/her hypotheses and research question • When quantitative research methods are used, there is a tendency to consider statistics only in the data analysis stage. • Statistics should be considered from the research question development stage to the results and conclusion stage. Population versus the Sample • Population is the total number of individuals, objects or items you are interested in • Sample is a subset of the population • Representative of the Population: when we randomly select individuals from a population, each person has an equal chance of being in the sample. Descriptive Statistics versus Inferential Statistics • Describe a phenomenon about society or people - descriptive o Explains how frequently something of interest occurs in the observations you make o Ex: how many books the average Canadian reads per year o Describe how often things occur • Interested in knowing if there is a relationship o Generalize the results found in a sample to the entire population of interest o Ex: overall mental health decreases as the number of alcoholic drinks consumed per week increases Ethics and Statistics • We must present our data and findings in an honest and professional manner • Four major areas of responsibility o Responsibility to society o Responsibility to Employers and Clients  Responsibility to Other Statistical Practitioners • Variable - a phenomenon of interest that can take on different values and can be measured Independent versus Dependent Variables • Dependent variable - changes as a result of the change in an independent variable • Independent variable - variable that is hypothesized or suggested to influence the dependent variable Nominal Level of Measurement • Nominal - level of measurement when the difference within the variable is just a name or a symbol. o Ex: bus #101 and bus #202 - qualitative categories • Respondent - person being observed in the research study. One who completes a survey Ordinal Level of Measurement • Ordinal - when the answers to the qualitative categories or attributes have some order to them o Looking for answers we can see the order of the respondent’s preferences o Ex: three runners running for a race or media - look at the rank of the medals to only see an order of performance. Internal Levels of Measurement • Interval and ratio variables are identical. • Interval refers to a space between things when there is an eqial difference between the level of the variable but the variable itself does not have an absolute 0 value. Ratio Level of Measurement • When the intervals between the values are equal and comparable and the zero value means the absence of something. • Ex: age is a ratio level variable. LS 280 - Chapter 2 2014-02-07 1:47 PM Empirical data is gathered from objects or participants of a research study. For example, data gathered on cultural norms, stock fluctuation, and gender difference in academic performance. Social science disciplines gather some form of empirical data from the real world and after it is gathered, the data needs to be organized and presented in a manner that can provide summary information about the phenomena of interest. Measurement and Coding • Nominal - Gender (Male = 1, Female = 2) • Ordinal - Age (20-25 = 1, 26-35 = 2) • Interval - Satisfied with life (Strongly agree =1, Disagree = 2) • Ratio - internet hours (actual number of hours) • When we assess a variable with a specific level of measurement we say that it produces a specific type of data Frequency Distributions and Tables • Nominal and Ordinal variables have qualitative categories as potential values where as interval and ratio variables have quantitative values. • Given that a variable can have different values, we can count it’s frequency - number of observations of a specific value within a variable. • Example: gender has two possible categories of male and female so if there are 46 male and 54 female, there are two values with different frequencies. • Frequency Distribution is the summary of values of a variable based on the frequencies in which they occur. We look at how the values of the variable are distributed across all of the cases in the data • Relative frequency is a comparative measure of the proportion of observed vales to the total number of responses within a variable. It provides a proportion or fraction of one occurrence relative to other occurrences. Equation: relative frequency = f/n; where f - frequency of specific responses and n=total number of responses • Percentage frequency also provides a useful way of displaying the frequency of data. It is expressed as a percentage value and is calculated as: %frequency = f/n x 100; where f = frequency of responses n = total number of responses within the variable • Relative frequencies are written as a decimal • Cumulative percentage frequency gives percentage of observations up to the end of a specific value. Simple Frequency Tables for Nominal and Ordinal Data • Simple frequency table displays frequency distribution of one variable at a time - nominal, ordinal, interval, or ratio • List possible values the variable can have in one column • Record number of times that each value occurs in another column • Frequency tables are easy to create for nominal and ordinal variables because there is a limited range of potential values Simple Frequency Tables for Interval and Ratio Data • When there are too many values on which to report frequencies, the table becomes less useful as a device to communicate summary information about the data • Class intervals to create frequency tables for interval and ratio variables that have a large range of potential values • Class intervals - values that are combined into a single group for a frequency table. They have a class width - range and starting and ending values which are the class limits. • Class intervals must be exhaustive including the entire range of the data and mutually exclusive meaning that the class widths are unique enough that the observed value can only be placed into one class interval. • Step 1: Determine the range of the data (largest-smallest value) • Step 2: Determine width and number of class intervals o Divide range of the data by number of class intervals o Intervals should be of equal width • Step 3: Determine Class Boundaries o With continuous data, having gaps may cause problems when we have values falling between them. o To calculate class boundary, subtract 0.50 from lower class limit and add 0.50 to the upper class limit for each interval. o Boundaries do not have a value separating them like class intervals because they are continuous • Step 4: Determine Each Class Interval Midpoint o Midpoint is the average value of the class interval - often used as a rough estimate of the average case in each interval. o Add the lower and upper limits and divide by 2 • In putting the frequency table together: use the class intervals, record the number of observations that fall between the class limits of each interval Cross-Tabulations for Nominal, Ordinal, Interval and Ratio Data • Cross tabulations display a summary of the distribution of two or more variables. • Cross-tabs allow you to observe how frequency distribution of one variable relates to that of one or more other variables • Tabulates the frequencies by categories or class intervals of the variables being compared • Cross-tab includes any combination of nominal, ordinal, interval and ratio variables Comparing the Distribution of Frequencies Percentage Change • When data spans different time periods, we can calculate change from one year to another and report this change as a percentage • P = f(time2) - f(time1)/f(time1) x 100 where f1 = frequency of a specific response at time 1 f2 = frequency of specific response at time 2 Ratios • Comparison of two values of a variable based on their frequency • Ratio = fv1/fv2 where fv1 = frequency of first value to be compared fv2 = frequency of second value to be compared Rates • Ratios are useful when the values being compared are in the same units • When you need to compare values where other factors affect those values, rates are more useful. • Rate = Number of events for the population of interest/Total population of the population of interest x 10,000 o Multiply by 10,000 to avoid small decimals (per 10,000 people) Graphing Data Pie Charts and Bar Charts, for Nominal and Ordinal Data • Pie chart displays distribution of a variable out of 100 percent where 100 percent represents the entire pie • Frequency or percentage frequency may be used in constructing the chart • Pie charts are useful for nominal and ordinal variables • Bar chart displays the frequency of a variable with the variable categories along the x-axis and the variable frequencies on the y- axis Frequency Polygon and Cumulative Percentage Frequency Polygon, for Interval and Ratio Level Data • Frequency Polygon is a line graph of the frequency distribution of interval or ratio data and is constructed by placing class intervals on the x-axis and frequencies on y-axis. • Frequency polygon can be used to compare the distribution of a variable acorss groups of respondents o Example: what age do you drink alcohol - compare frequency distributions (male vs female) of age in which respondents stated they began drinking alcohol. • Cumulative frequency polygon graphs the cumulative frequency column in a frequency polygon - used for comparing frequency of a variable across groups Histograms • Histogram is a plot of the frequency of an interval or ratio data. • Histograms are useful for representing interval and ratio variables because they show continuous nature of data without necessarily creating class intervals. • Class intervals were constructed to reduce the variable from a ratio level to four classes. Stem and Leaf Plots • Provides a graphical representation of the frequency of interval or ratio data • Drawback is that actual values of the data are not shown • In a stem and leaf plot, the shape of the leaf corresponds to the shape of the histogram Boxplot • Graphical summary of data based on percentiles • Box represents the distribution of the data between the 25 thand th 75 percentile • The light line in the middle represents the median (50 th percentile) and the lines coming out of the box extend to the lowest and highest value in the data which provides the range LS280 - Chapter 1 Synopsis 2014-02-07 1:47 PM Chapter 1 Summary • Background • Definitions • Three Branches of Statistics o Description - more prevalent in daily life. o Association - degree to which we can calculate the extent to which patterns move together and are associated o Inference - ability to take information based on smaller convenient samples and generalize it to the entire population • Populations vs. Samples o Sampling Error - an error or assumption of error (refer to things which are random or improbable) Randomness causes problems in interpretation.  Assume it happens all the time  Build an estimate into our calculations • • • • Hypotheses o Independent and dependent variables  Independent variables INFLUENCE the dependent variables. Usually ‘x’  Dependent variables are outcome variables that we feel is interesting. Chosen to see what influences it.  The independent variables explain the dependent o Scope conditions • Units of Analysis - represents what we gather the data on o Individuals o Families o Households o Communities o Universities o Classes o Couples etc • Measurement Levels o Nominal o Ordnal o Interval o Ratio LS280 - Chapter 3 2014-02-07 1:47 PM Measures of Central Tendency • Which value lies in the middle of a distribution. • Mode - value that occurs with greatest frequency. o When there is too much data, construct a frequency table in ascending order by frequency o One mode - unimodal • Median - middle point in the distribution that separates the upper th and lower 50 percentile. o Numbers have to be put in ascending order o If there is an odd number of values, the median is (n+1)/2 o If there is an even number of values, the middle point is • Mean - mean or average score in the distribution o To obtain the mean score, you add up all of the x scores and divide the total by the sample size (n). When to use Mean, Median and Mode as Measures of Central Tendency • Mode for nominal data o Categories in a nominal level of measurement are qualitative labels and as such the differences within the variables are really just a name or symbol, they are not numeric. o Mode tells you which is the most commonly occurring qualitative category • Median for ordinal data o data for ordinal level variables have some order to them, the value that sits at the center of the distribution is meaningful. • Mean for Interval/Ratio o The responses have an order to them (e.g., first, second, third) and there are equal intervals between the responses o However, if the data is denser on one end of the distribution than the other, there is skewness in which case Median is better o The mean is sensitive to extreme values (also referred to as outliers), whereas the median is not because the mean uses the actual values in its estimation, whereas the median only counts the number of values. Use mean and median Measures of Dispersion • Describe the variability in the data. • Variability is the extent to which the data varies from its mean - tells us how spread out the data is across the range of values • Percentile o percentage frequency and cumulative percentage frequency columns from the frequency table o percentiles are percentages of frequencies indicating the percentage of scores that fall within a given area. • Calculating the percentile o place the numbers in order from lowest to highest, then number them in order of their position. o To determine an unknown score for a known (or desired) percentile you multiply the desired percentile by the sample size o If you want to know the score for the 50th percentile, multiply the desired percentile (0.50) by the sample size - value represents the position of the number o To determine the percentile for a known score, divide the position number of the known score by the sample size • The Range o the range is the value of the largest observation minus the value of the smallest observation o The interquartile range creates quartiles (25th, 50th, and 75th percentiles) and removes the first and fourth quartile from the calculation of its range, considers only the middle 50 percent of the range (25th percentile to the 75th percentile). • Calculating Interquartile Range: o calculate the 25th and 75th percentiles o Since the interquartile range only considers the values between the 25th and the 75th percentile, the range can be obtained by subtracting the value in the 25th percentile from the value in the 75th percentile • The Variance o extent to which the data varies from its mean o If we subtract each observation from the mean, we get deviation, meaning how far the observation deviates from its mean o Square each deviation value and sum them up o divide the sums of squares of the deviations by the sample size minus 1 - average variability within the data • Standard Deviation o average amount, measured in standard units, in which the data scores vary (positively and negatively) from the mean. o most commonly used measure of dispersion because it allows the researcher to report the deviation in the units of measurement used in the study o Higher standard deviation
More Less

Related notes for LS 221

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit