Lecture 1

Jan. 8th

Definitions:

● Random variable: a property that can take on different (at least 2) values (in it varies). These

values have associated probabilities and we can this talk about their associated probability

distributions

○ Symbolized as X Y

● Discrete random variables are those that can only take on particular values that are made up

of disjointed categories

○ E.g., random variables cannot be something like 3.1

○ Not necessarily represented by numbers

● Continuous random variables can take on values along an entire interval of the number line

and these values are not disjointed

○ E.g. random variable Y that can take on values between 2 and 5 and everything in

between (such as 4.14)

● Data: numerical (or sometimes non-numerical) information collected by the researcher -

these are usually the observed values on random variables

● Data matrix: organizes data in an array of columns/rows with order n x p

○ n = number of rows of the matrix (usually # of observations)

○ p = number of columns of the matrix (usually # of variables)

○ If p=1, the data is univariate

● Population versus Sample

○ Census population: all individuals or objects of interest to a researcher

■ Almost impossible to achieve (?)

○ Statistical population: the entire set of possible outcomes on a variable of interest

and their associated probabilities and frequencies

■ Usually what we are talking about when we are talking about population

○ Sample: a subset or portion of scores or measurements taken from a population

● Parameter versus Statistic

○ Parameters: numerical properties that describe statistical populations

■ E.g. parameters of the population - for example, population mean, population

variance, etc.

○ Statistics: real-valued quantities that describe various features of a data set (the

sample)

■ E.g. sample variance

● Parameters are to populations as statistics are to samples

● Empirical vs. Theoretical Distributions:

○ Empirical distributions: based on observed (raw) data

