Description

Econ 2B03 Vocabs
Statistics: a branch of mathematics dealing with the collection, presentation, and interpretation of
data
Descriptive statistics: describe general characteristics of a set of data.
Inferential statistics: draw inferences about (unknown) features of a population based
on a (known) sample drawn from a population
Variable: any characteristic of an individual
Distribution: the values the variable takes and how often/ likely it takes these values
Population: Set of all possible observations on some characteristic (e.g., age, height,
income)
Sample: Subset of a population
Random sample: One obtained if every member of the population has an equal chance
of being in the sample
Categorical population: A population whose characteristic is inherently non-numerical
(e.g., sex, race)
Quantitative population: A population whose characteristic is inherently numerical
Sturgess’s rule: desirable number of classes = k, an integer, where k is the integer
closet to (use standard rules for rounding) 1 + 3.3log10 7
Class width: the difference between the lower and upper limits of a class
Absolute class frequency: Absolute number of observations that fall into a given class
Absolute frequency distribution: Tabular summary of a data set, shows absolute
numbers of observations that fall into each of several data classes
Relative class frequency: Ratio of a particular class’s number of observations to the
total number of observations made
Relative Frequency Distribution: Tabular summary of a data set, shows proportions of
all observations that fall into each of several data classes
Cumulative class frequency: The sum of all class frequencies up to and including the
class in question
Density Estimator: Estimate how histograms would appear if census information were
graphed with many tiny classes
Population Parameters (typically unknown): Summary statistics based on population
data Sample Statistics (computed from sample): Summary statistics based on sample
data
Median: Divides data into equal halves
Variance (average (squared) deviation): Most common measure of dispersion,
measures the typical squared deviation about the mean
Standard Deviation: The positive square root of the variance, falls in the same range of
magnitude (and appears in the same units) as the observations themselves
Interfractile Ranges: Measure difference between 2 values in the ordered array (called
‘fractiles’)
Quartiles: divide the array into 4 quarters
rd st
Interquartile range: difference between 3 and 1 quartiles (contains middle 50% of
data)
Skewness: A frequency distribution’s degree of distortion from horizontal symmetry
Skewness = (mean – mode) / Standard deviation
Kurtosis: A frequency curve’s degree of peakedness
Five-number summary: a set of observations consists of the smallest observation, the
first quartile, the median, the third quartile, and the largest observation, written in order
from smallest to largest Minimum Q Med1an Q Maximu3
Suspected Outliers: identify outliers
Box-and-whisker plot (box plot): a histogram-like method of displaying data
Normal distribution: a special type of population whose relative frequency distribution
is characterized by: Single peak with mean, median and mode coinciding at the center of
distribution
Chebyshev’s Theorem: an empirical rule that applies to all distributions, not only the
Normal distribution
Coefficient of Variation: Used to compare degrees of dispersion among data sets
Origins of Data: Data can originate in a number of ways
Internal Data: Created as by-products of regular activities
External Data (typical source for this course): Created by entities other than the person,
firm, or government that wants to use the data
Census: a complete survey of every member in the population Sample: a partial survey in which data is collected for only a subset of the population
Nonprobability Sample: occurs when a sample is taken from an existing population in a
haphazard fashion without the use of some randomizing device assigning each member
a known (positive) probability of selection
Voluntary response sample (e.g., phone survey [self-selection issue])
Convenience sample (e.g., most easily selected persons)
Judgement sample (e.g., based on data collector’s)
Probability Sample: occurs when a sample is taken with the help of a randomizing
device that assures each member a known (positive, not necessarily equal) probability of
selection
Simple random sample: obtained if every member of the population has an equal
chance of being in the sample (main type)
st
Systematic random sample: randomly select 1 element, then include every kth
element thereafter till sample complete
Stratified random sample: take random samples from every stratum (clearly
distinguishable subgroups) in a population
Clustered random sample: population naturally subdivided into geographically distinct
clusters, and samples are created by taking censuses in randomly chosen clusters
Selection Bias: Systematic tendency to include elementary units with particular
characteristics (while excluding those units with other [opposite] characteristics

