Study Guides (238,613)
Canada (115,253)
UOIT (305)
Business (82)

Stats - Midterm 1 notes

20 Pages
Unlock Document

BUSI 1450U
William Goodman

Statistic Notes: Chapter 1 – 7.1 Chapter 1: Introduction to Statistical Data 1.1 Why Study Statistics? - Statistics can be defined as the art of collecting, classifying, interpreting, reporting numerical information related to a particular subject. - Population  The total set of objects or measurements that are of interest to a decision maker. - Descriptive Statistics  Focused on summarizing and presenting information.  Such as pie graphs, bar graph, etc comparing populations of different provinces. - Data set  The set of all observations for a given project or purpose. - Inferential Statistics  Goes beyond the data asset that is at hand.  Designed for making estimates, or interferences, about the characteristics of a population, based on information found in one or more samples. - Samples  Representative subsets of the population. Two main types of problems call for inferential statistics: 1) Estimate a characteristic of a population, based on data from a sample. a. Example: Survey finds that 52% of questioned voters support a certain piece of legislation. Estimate the proportion of all voters that support the legislation. 2) Test a claim or assumption about a characteristic of a population, based on data from the sample. a. Example: Report claims that “a majority” of voters support a certain piece of legislation. A survey finds that 52% of questioned voters support that legislation. 1.2 Populations, Samples, and Inference - Infinite population  In practice, it is not realistically possible to identify and count every member of the population, especially if you include cars for sale anywhere in the world, including on the internet. - Finite population  it is possible and realistic to count every member of the population. - Census  A set of observations taken from every member of a population. o Example: Statistics Canada takes a census of all households in Canada. o Possible only for a finite population. - Parameter  a value for summarizing the measurements of a quantifiable feature of the population.  Example: 9/10, there is 10 as total. - Statistic  An estimate or infer that the sample statistic is “close to” or “approximately equal to” the population parameter.  Example: 1/12, we estimate to have 8/100 then. - Several valid reasons to using samples: 1) Time constraints: It may be too time-consuming to survey or observe all the members of the population. Several possible reasons for this include: a. Decision Deadlines: Sampling allows people to make their close/don’t-close decisions for each situation in a timely fashion and to post necessary signs. b. Trends: Example if you’re expanding your business internationally, it would take a long time to conduct a census, the population parameters will be changing even as you collect the data. c. Seasonality: Some variables increase and decrease in value over time in a cyclical pattern. 2) Cost Constraints: It costs less to survey fewer people or make fewer observations than to survey or observe a whole population. 3) Unknown Population Size: Infinite populations are in this category. 4) Destructive tests 5) Greater accuracy of some samples: Can possibly yield more accurate results than a census. - A good sample should be representative of the population that it is drawn from. 1.3 Data types and Level of Measurements - Datum  A single observation - Data  raw materials for analysis. They are sets of numeric or nonnumeric facts that represent records of observations. o Constant  an observed characteristic whose value does not change over time or in different experiments. o Variable  An observed characteristic, however the values of a variable can change from observation to observation.  Random variable  Something that happens by chance or is unplanned, therefore whose value cannot be known in advance prior to its discovery by observation or experiment.  Qualitative (categorical) random variable  Does not result in a numerical value. Observes a trait or characteristic that can be classified into one of a number of categories. (Ex. Age)  Quantitative (numerical) random variable  A random variable takes values that vary in magnitude. (Ex. Number of hits at a ball game)  Discrete random variable  Restraints on the values that could be observed. (Counting all cars, you cannot have an inbetween number. Cannot be 1.5 cars)  Continuous Random variable  Possibly assume any value over a particular range of values. (example: a person’s height.) - If a measuring instrument is capable of precision only to 4 significant digits, then in practice there is a small gap between the measured values that could be observed. Levels of Measurement - The higher the level of measurement of data, the more information these data contain. - Nominal-Level Measurement  lowest range of measurement o Qualitative data are generally measured tat this level, unless the data values imply some kind of ranking. o Example: Mines classified as Surface or underground. o There is no order. o 2 important rules must be observed:  The full set of categories should cover all possible outcomes.  All of the categories should be mutually exclusive. - Ordinal-Level Measurement  second level of measurement, conveys more information. o Are both categorized and ranked. o Example: Gold, Silver, Bronze. o The mathematical differences between rank numbers have no interpretation; only the relative order of the numbers convey information. - Interval-Level Management  These numbers can be compared meaningfully to distinguish higher from lower ranks. o Equal measurement scale correspond to equal differences between the real world characteristics being measured. o Example: Temperature o The relative orders of the numbers and the mathematical difference between the numbers convey information. o The ratios between numbers on these scales have no interpretation. - Ratio-Level Measurement  The relative orders of the numbers, the mathematical differences between the numbers, and the ratios between numbers on the scale all apply meaningfully to the characteristic being measured. o Example: 20 grams, 40 grams, 60 grams. o This last property of meaningful ratios can apply only when a scale originates at a “true” of zero. Chapter 2: Obtaining the data 2.1 Source of Data - Scientific approach  the search for data begins with the recognition of a problem o investigate. 1) Determine the objectives of the research 2) Determine the sources of the data 3) Design the data-gathering instruments 4) Develop a sampling plan 5) Collect and analyze the data 6) Report and follow up on the findings. Types of research 1) Exploratory Research  Preliminary research, conducted when very little is known as yet about the problem being studied. 1. Example: Pilot studies or focus groups. 2. These projects or interactive sessions can help in better formulating the research problems, fine-tuning the questions in survey instruments. 2) Conclusive research  Can be conducted when the research objectives are clear and the problems are unambiguous. i. Descriptive Research is used to describe the characteristics of a population. The goal to make predictions or inferences about a population. ii. Casual Research Designs (experimental research designs) go beyond describing a population as it is, to exploring possible cause-and-effect relationships among the observed factors. Types of Data - Primary Source  Observation (Personal, Mechanical)  Survey (Personal interview, Mail)  Experiment (Laboratory, test market) - Secondary Sources  Internal (Accounting records, previous studies)  External (Government reports, published research) Primary Data - Not available prior to the research. o Observation  Involves actually observing what is happening in order to gather the data. o Surveys  Obtain data by asking people questions about their experiences, opinions, preferences, backgrounds, and many other variables. o Experiments  Formal testing of a casual hypothesis requires an experiment.  Not always feasible in studies of human behaviour or in studies where human beings are affected by the outcomes. Secondary Data - When you use previously collected data as input to your own research. - May not exactly match your own objectives. o Internal Secondary Data  Originally collected in one’s own organization for other purposes. o External secondary data  Originally collected outside of one’s own organization. 2.2 Designing a Sampling Plan - Two basic reasons for sampling:  To estimate the values of characteristics in a population based on their values in the sample.  To evaluate previous assumptions or hypotheses that have been made about a population. - Statistical Sampling (probability sampling)  uses random selection to best ensure that the collected samples are representatives of the population. o If sampling is conducted using a random selection, then the likelihood or probability of obtaining a non- representative sample can be estimated and taken into account. - Sampling frame  a list of all individuals or objects from which the sample will be drawn. - Simple random sampling  a sample is selected in such a way that:  Each object in the sampling frame has an equal chance of being selected.  Each possible combination of objects of a given size has an equal chance of being selected from the sampling frame. - Stratified Random Sampling  If the data contains subpopulations that are relatively small and you want to ensure that all subpopulations have reasonable representation in the sample. - Simple Cluster Sampling  The population is divided into clusters or groups. - Systematic Sampling  when a list such as order of last names is available, the procedure is to determine a sampling interval (k).  Next choose randomly a starting position in the list between 1 and the kth position on the list.  Example: 51 fishes, and she decides to collect a sample of 6. K = 54/6 = 9. She starts at 4 and then counts every 9 fish. Nonstatistical Sampling techniques - Data can be collected in many ways without using the randomizing strategies that are fundamental for statistical sampling. - From the viewpoint of quantitative statistics, these alternatives are, at best, compromises to be used with caution and, at worst, error-prone techniques. - These methods can play a role in qualitative statistics, which aims for a deeper view of how people are thinking in certain contexts and how they relate their ideas. Convenience Sampling - Example: A crowd mingles for big sports playoff and the reporter asks some individuals. No way to know if these convenient to get samples are representatives of the entire population. Judgement Sampling - Has many variations with names like critical case sampling, typical case sampling and extreme case sampling. - These particular individuals can provide some key information that might be missed or diluted in a broader random sampling. - Only good as the judgment of the researcher. Quota Sampling - A special case of judgment sampling. - With quota sampling, he deliberately chooses a number of individuals to question from each gender, income group, and so on, based on his expectations of their exposure to the ad. Errors associated with sampling 1) Time 2) Cost 3) Quality Sampling Error - Best defined as the difference between the information in the sample and the information in the population, that occurs simply because the sample is a subset of the population. NonSampling Error - The difference between information in the sample and information in the population that is due to missing data or incorrect measurement. - Can also be called measurement errors. Nonresponse Error - Different class of error - Many people who receive questionnaires choose not to reply or refuse to take part in a telephone survey. - Such response patterns could bias the results; that is, cause the collected data to tend misleadingly more toward one possible conclusion than another. Chapter 3: Displaying Data Distributions 3.1 Constructing distribution Tables - Univariate descriptive statistics  descriptive statistics for one variable. - Descriptive statistics characterize available data in terms of patterns, clusters and other features. - Clean data  recorded numbers or computer codes must be correct. - Outliers  values that appear remote from all or most of the other values for that variable. - Frequency distribution data  counts how many times (the frequency) that a specific data value reappears in the dataset; or, if the data values are group into ranges, the distribution counts how many values fall into each range. Discrete Quantitative Frequency Table - Point spread data are quantitative and discrete, because the projections are limited to specific numeric values – those ending in “.0” or “.5”. - Any specific data value that occurred can be examined to determine how often it appeared in the overall data set. - Percentage x = (frequency x / n) * 100% - Cumulative percent frequency distribution  the cumulative percent is the percentage of all cases having values up to or including the value displayed at the left of that row. - Cumulative percent for the first data row = percent frequency for that row - Cumulative percent for any subsequent row = cumulative percent for the previous row + percent frequency for the current row. Continuous Quantitative Frequency Tables - When the data are continuous, the number of possible values is potentially infinite. - Values displayed in a continuous quantitative frequency table are grouped into ranges (called classes). - The frequencies in the table represent how many individual data values fall into each of the classes. - The basic principles of constructing and interpreting this table are the same as for a discrete frequency distribution table. Guidelines for Manually Constructing the tables 1) The displayed classes should be mutually exclusive.: Any specific data value should fall into one, and only one, of the classes. 2) The classes should be collectively exhaustive: there is a class to which every data value can be assigned. 3) Try to use the same class width for all classes. 4) Try to include all classes, even if the frequency is zero. 5) Select convenient numbers for class limits.: Select limits that appear to be reasonable and easy to interpret yet are consistent with the other guidelines. 6) Choose an appropriate number of classes. : Typically the number of classes in between 5 – 15, depending on the amount of the data available. 7) The sum of the class frequencies must equal the number of original data values: This serves as a check on your work. 8) Combine these guidelines – with a touch of common sense. Steps for Manually Constructing Tables from Raw Data 1) Sort the data: The first step is to put the data into an ordered array, which means that the data are recorded into either ascending order (lowest to highest) or descending order (highest number to lowest). a. We can also identify immediately the range of the full data set, defined as the spread or difference between the highest and lowest values 2) Choose a tentative number of classes: As a general rule, more classes are needed when more raw data are available. a. Sturge’s Rule: Estimated number of classes = 1 + 3.3log(n) b. N = the number of values in the data set. c. Example: Given a data set of 100 points, the formula = 1 + 3.3Log(100) = 8 classes 3) Choose a tentative class width: Defined as the difference between the lower limit of one class and the lower limit of the next class. a. Tentative class width = Range of full data set / tentative number of classes. 4) Define all classes: Considering the results from steps 1-3, keep in mind the guidelines, establish specific limits for all of the classes. a. Once you have determined the lower limit for the first class, each successive class begins xactly one class width larger in value. b. Midpoint = (Lower limit of class + Upper limit of class) / 2 5) Construct the table: Group and counting the data, then the total counts of values to be assigned to particular classes are determined. Qualitative Frequency Table - Qualitative data  nonnumerical in character. - Qualitative frequency distribution table can help you visualize how the data values are distributed and shows how often each particular value or category of the qualitative variable has occurred. - The column of frequencies in a qualitative frequency distribution table can be supplemented by additional columns for percent and cumulative percent frequencies. - Percentage x = (Frequency x / n) * 100% Constructing Qualitative Frequency Tables 1) Define and code the categories: For fixed-answer variables, this step occurs before the data are collected. i. It is generally advisable to not have too many categories, which would thin out the frequencies for particular choices ii. Open-ended variables advantage is that it may elicit some unexpected responses that add to the researcher’s understanding of the variable. 2) Construct the table: Constructing the qualitative frequency table is in practice very similar to constructing a discrete quantitative frequency table. 3.2 Graphing Quantitative Data - A graph is especially useful for revealing the shape of the distribution of quantitative data, which (if the future is like the past) may give a sense of approximately into what values the data tend “Naturally” to fall. The histogram - A graph that uses the lengths of bars to represent the frequency of values in each class of a frequency distribution. - Percent Frequency histogram  analogous to converting the frequencies in a freque
More Less

Related notes for BUSI 1450U

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.