Department

StatisticsCourse Code

STA260H5Professor

Ingrid L.StefanovicThis

**preview**shows half of the first page. to view the full**2 pages of the document.** applying statistics to a scientific, industrial, or societal problem, it is necessary to begin with a

population or process to be studied. Populations can be diverse topics such as "all persons living

in a country" or "every atom composing a crystal". A population can also be composed of

observations of a process at various times, with the data from each observation serving as a

different member of the overall group. Data collected about this kind of "population" constitutes

what is called a time series.

For practical reasons, a chosen subset of the population called a sample is studied — as

opposed to compiling data about the entire group (an operation called census). Once a sample

that is representative of the population is determined, data are collected for the sample members

in an observational or experimental setting. These data can then be subjected to statistical

analysis, serving two related purposes: description and inference.

* Descriptive statistics summarize the population data by describing what was observed in the

sample numerically or graphically. Numerical descriptors include mean and standard deviation for

continuous data types (like heights or weights), while frequency and percentage are more useful

in terms of describing categorical data (like race).

* Inferential statistics uses patterns in the sample data to draw inferences about the population

represented, accounting for randomness. These inferences may take the form of: answering

yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the

data (estimation), describing associations within the data (correlation) and modeling relationships

within the data (for example, using regression analysis). Inference can extend to forecasting,

prediction and estimation of unobserved values either in or associated with the population being

studied; it can include extrapolation and interpolation of time series or spatial data, and can also

include data mining.

“... it is only the manipulation of uncertainty that interests us. We are not concerned with the

matter that is uncertain. Thus we do not study the mechanism of rain; only whether it will rain.”

Dennis Lindley, "The Philosophy of Statistics", The Statistician (2000).

The concept of correlation is particularly noteworthy for the potential confusion it can cause.

Statistical analysis of a data set often reveals that two variables (properties) of the population

under consideration tend to vary together, as if they were connected. For example, a study of

annual income that also looks at age of death might find that poor people tend to have shorter

lives than affluent people. The two variables are said to be correlated; however, they may or may

not be the cause of one another. The correlation phenomena could be caused by a third,

previously unconsidered phenomenon, called a lurking variable or confounding variable. For this

reason, there is no way to immediately infer the existence of a causal relationship between the

two variables. (See Correlation does not imply causation.)

For a sample to be used as a guide to an entire population, it is important that it is truly a

representative of that overall population. Representative sampling assures that the inferences

and conclusions can be safely extended from the sample to the population as a whole. A major

problem lies in determining the extent to which the sample chosen is actually representative.

Statistics offers methods to estimate and correct for any random trending within the sample and

data collection procedures. There are also methods of experimental design for experiments that

can lessen these issues at the outset of a study, strengthening its capability to discern truths

about the population. Statisticians[citation needed] describe stronger methods as more "robust".

Randomness is studied using the mathematical discipline of probability theory. Probability is used

in "mathematical statistics" (alternatively, "statistical theory") to study the sampling distributions of

sample statistics and, more generally, the properties of statistical procedures. The use of any

www.notesolution.com

###### You're Reading a Preview

Unlock to view full version