Econ 2B03 Ch1, Ch2.docx

14 Pages
Unlock Document

McMaster University
Bridget O' Shaughnessy

Econ 2B03 Chapter 1&2: Presenting Data: Summary Statistics Sep 8, 2011 Course Website: Statistics - Is a branch of mathematics dealing with the collection, presentation, and interpretation of data. - The field can be divided into two broad categories  Descriptive statistics, which seeks to describe general characteristics of a set of data.  Inferentail statistics, which seeks to draw inferences about (unknown) features of a population based on a (known) sample drawn from population. - Broadly speaking, statistics provides a rigorous framework for describing, analyzing and drawing inference on the basis of data. Q: Why do we need statistics? A: Mainly due to uncertainty ( we cannot know with certainty many things in life, even though they clearly exist) e.g. The actual avg height of all people in the world at a time use ‘sample’ information based on limited information. - In any statistical study, we analyze information/data obtained from individuals. - Individuals can be any object of interest. (e.g. people, plant) - A variable is any characteristic of an individual. - Variable varies among individuals. (as opposed to a constant which does not) (e.g. age, income) - The distribution of a variable tells us what values the variable takes and how often it takes these values. - Population: Set of all possible observations on some characteristic. (e.g. age, height, income) - Sample: Subset of a population. - Random sample: One obtained if every member of the population has an equal chance of being in the sample. 2 Type of population - Categorical population: A population whose characteristic is inherently nonumerical. (e.g. sex, race) - Quantitative population: A population whose characteristic is numerical. - For categorical populations, we are typically interested in proportions. (e.g. the proportion who are male) - For quantitative population, we are typically interested in e.g. avg, and how ‘spread-out’ the variable is Types of Categorical Data - Nominal Data  Numbers merely label differences in kind  Cannot be manipulated arithmetically  Can only be counted (e.g. 0=male, 1=female) - Ordinal Data  Label differences in kind  Order or rank observations based on importance  No meaningful arithmetic, analysis possible (e.g. ranking firms according to profit) R: R-studio: Sep 12, 2011 Presenting Data : Tables & Graphs - First split data into groups/classes - Split depends on data type - Data types  Categorical (nominal or ordinal)  Quantitative (discrete or continuous) - Data Classes should be:  Collectively exhaustive (must exhaust all logical possibilities for classifying available data)  Mutually exclusive (must not overlap or have data in common) - Desirable Class number:  Should fit data type  Often recommended: between 5&20  Sturgess’s Rule: desirable number of class=k, an integer, where k is the integer closest to (use the rules for rounding) 1 + 3.3 l10 n Where n is the sample size10log n is the power to which the base (10) is raised to yield n. - Desirable class widths:  Class widths = the difference between lower & upper limits of a class  To achieve uniform class widths in a table, divide data set width by desirable class number. - Approx class width: Largest value – smallest value Desirable class number Tabular Components - First consider creating effective tabular summaries - An effective table includes: Number (often based on chapter or page no.)   Title (focus on what, where, when)  Caption (brief verbal summary)  Footnotes (e.g. size of sampling error)  Decimals (consistent rounding rules)  Class sums (sum of data pertaining to each class) Frequency Distributions - An effective way to summarize data - Absolute Frequency Distributions  Absolute class frequency:  Absolute no. of observations that fall into given class  Absolute frequency distribution:  Tabular summary of a data set  Shows absolute no. of observations fall into each of several data classes - Relative Frequency Distribuetions  Relative class frequency:  Ratio of a particular class’s no. of observations to the total no. of observations made  Relative frequency distribution  Tabular summary of a data set  Shows proportions of all observations that fall into each of several data classes  Absolute / total no. Of observations - Cumulative Frequency Distributions  Cumulative class frequency  The sum of all class frequencies up to & including the class in question - LE distribution (typical, called ‘cumulative’ distribution)  LE: Less than or equal to upper class limit  Moves from lesser to greater class - ME distribution (less used, often called ‘survivor’ distribution)  ME: more than or equal to lower class limit  Moves from greater to lesser class Sep 13, 2011 Cross Tabulations - Are tabular summaries for two variables - 1 represented by row heads - 1 represented by column heads - Information for both variables entered in table cells - In R we use xtabs( ) Function e.g. Suppose that 7 out of 10 males are admitted to engineering school while 4 of 10 females are admitted Admission +(Yes) -(No) Sex +(Male) 7 3 -(Female) 4 6 Graph Types - Core Graphs  Two-dimensional graphs  E.g. histograms scatter diagram etc. - Specialty Graphs  Combine elements from core graphs to display data in unique ways  E.g. bar graphs, pie charts etc. - 3D graphs  Display with height, width and depth - Frequency Histograms  Horizontal axis  Identifies upper and lower limits of data class  Vertical axis  Shows number of observations in each class (absolute frequencies)  Shows ratio of class frequency to class width (relative frequencies)  Rectangles  Represent corresponding class frequencies by area or by height (if all class intervals are alike)  In R we use the hist( ) Function - Frequency Polygons  Portray frequency distributions as many sided figures  Class mark: avg of two class limits  Final points on horizontal axis lie 1/2 length below the lowest (or above the highest) Class limit - Density Estimator  Portray frequency distributions as smooth curves  Remove irregularities from histograms depicting info gathered in sample surveys  Estimate how histograms would appear if census info were graphed with many tiny classes  In R we use density( ) Graphing Two Variables - Scatter plot diagrams graph the relationship between 2 quantitative variables - Each dot equals one observation measured on the horizontal axis and another observation measured on the vertical axis - R: plot( ) Ways to chart categorical data - Bar graph: each category is represented by a bar, R: barplot( ) - Pie charts: The slices, R: pie( ) Sep 15, 2011 Summary Statistics: Symbolic Expression - Population Parameters (typically unknown)  Summary statistics based on population data  Designated by Greek letters (e.g. μ, σ) - Sample Statistics (computed from sample)  Summary statistics based on sample data  Designated by Roman Letters (e.g. X, s) - Observed values Traditionally represented by x or X   Different values indicated by subscripts 1, 2, 3 etc (e1g.2X ,3X , X ....) - Observation Totals  Population total = N  Sample total = n Summation Notation - In statistics, we often need to ‘accumulate’ or ‘sum’ some
More Less

Related notes for ECON 2B03

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.