Description
Econ 2B03 Vocabs Statistics: a branch of mathematics dealing with the collection, presentation, and interpretation of data Descriptive statistics: describe general characteristics of a set of data. Inferential statistics: draw inferences about (unknown) features of a population based on a (known) sample drawn from a population Variable: any characteristic of an individual Distribution: the values the variable takes and how often/ likely it takes these values Population: Set of all possible observations on some characteristic (e.g., age, height, income) Sample: Subset of a population Random sample: One obtained if every member of the population has an equal chance of being in the sample Categorical population: A population whose characteristic is inherently non-numerical (e.g., sex, race) Quantitative population: A population whose characteristic is inherently numerical Sturgess’s rule: desirable number of classes = k, an integer, where k is the integer closet to (use standard rules for rounding) 1 + 3.3log10 7 Class width: the difference between the lower and upper limits of a class Absolute class frequency: Absolute number of observations that fall into a given class Absolute frequency distribution: Tabular summary of a data set, shows absolute numbers of observations that fall into each of several data classes Relative class frequency: Ratio of a particular class’s number of observations to the total number of observations made Relative Frequency Distribution: Tabular summary of a data set, shows proportions of all observations that fall into each of several data classes Cumulative class frequency: The sum of all class frequencies up to and including the class in question Density Estimator: Estimate how histograms would appear if census information were graphed with many tiny classes Population Parameters (typically unknown): Summary statistics based on population data Sample Statistics (computed from sample): Summary statistics based on sample data Median: Divides data into equal halves Variance (average (squared) deviation): Most common measure of dispersion, measures the typical squared deviation about the mean Standard Deviation: The positive square root of the variance, falls in the same range of magnitude (and appears in the same units) as the observations themselves Interfractile Ranges: Measure difference between 2 values in the ordered array (called ‘fractiles’) Quartiles: divide the array into 4 quarters rd st Interquartile range: difference between 3 and 1 quartiles (contains middle 50% of data) Skewness: A frequency distribution’s degree of distortion from horizontal symmetry Skewness = (mean – mode) / Standard deviation Kurtosis: A frequency curve’s degree of peakedness Five-number summary: a set of observations consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest Minimum Q Med1an Q Maximu3 Suspected Outliers: identify outliers Box-and-whisker plot (box plot): a histogram-like method of displaying data Normal distribution: a special type of population whose relative frequency distribution is characterized by: Single peak with mean, median and mode coinciding at the center of distribution Chebyshev’s Theorem: an empirical rule that applies to all distributions, not only the Normal distribution Coefficient of Variation: Used to compare degrees of dispersion among data sets Origins of Data: Data can originate in a number of ways Internal Data: Created as by-products of regular activities External Data (typical source for this course): Created by entities other than the person, firm, or government that wants to use the data Census: a complete survey of every member in the population Sample: a partial survey in which data is collected for only a subset of the population Nonprobability Sample: occurs when a sample is taken from an existing population in a haphazard fashion without the use of some randomizing device assigning each member a known (positive) probability of selection Voluntary response sample (e.g., phone survey [self-selection issue]) Convenience sample (e.g., most easily selected persons) Judgement sample (e.g., based on data collector’s) Probability Sample: occurs when a sample is taken with the help of a randomizing device that assures each member a known (positive, not necessarily equal) probability of selection Simple random sample: obtained if every member of the population has an equal chance of being in the sample (main type) st Systematic random sample: randomly select 1 element, then include every kth element thereafter till sample complete Stratified random sample: take random samples from every stratum (clearly distinguishable subgroups) in a population Clustered random sample: population naturally subdivided into geographically distinct clusters, and samples are created by taking censuses in randomly chosen clusters Selection Bias: Systematic tendency to include elementary units with particular characteristics (while excluding those units with other [opposite] characteristics
