Study Guides (238,443)
Canada (115,138)
Statistics (153)
STAT 206 (2)

STAT 206 Fall 2013 1/2 Course Notes

10 Pages
Unlock Document

University of Waterloo
STAT 206
Eddie Dupont

STAT 206 - Statistics for Engineers Kevin James Fall 2013 Introduction Statistics in the collection, organization, analysis, iterpretation, and presentation of data. In e▯ect, it is a quanti▯cation of uncertainty. Process To conduct an empirical study or statistical study, we must ▯rst identify the population population: the set of elements your query pertains to of interest. Individual items of this population are called units units: a single element, usually a person or object, whose characteristics we are interested in We also de▯ne the hypothesis, or question we would like answered. We select a subset of units from the population to have in our sample sample: a subset of the population from which measurements are actually made which must have a pre-determined size and should make an attempt to reduce or eliminate sample error sample error: an error which occurs randomly due to the uncertainty of the sample We also must determine how we can measure the variable of interest variable of interest: a measure of the interesting characteristic of a unit This variable can often be measured in a multitude of ways, though many of which will be somewhat lacking in value. You must take into account not only what this variable is and how it is collected, but also ways to minimize bias, such as by randomizing and repeating your experiments. We should also attempt to avoid study errors study errors: systematic errors which occur because the sample does not accurately re ect the population 1 or else we will ▯nd ourselves with a large amount of error and/or uncertainty. Post-experiments, we need to analyze our data and come to a conclusion. It is generally a goo didea to graph the data, as this gives us a highly visual method of analysis. We can use two main branches of statistics to analysyze our data: descriptive statistics descriptive statistics: a summary of the collected data, both visuallt and numerically or inferential statistics inferential statistics: generalized results for the population based on the sample data We will be focusing on inferential statistics, which inlude a quanti▯cation of uncertainty, in this course. Finally, we use the results of our study to answer the original hypothesis or research question. We also must be sure to address the limitations of our study. Types of Variables Our variables may be either categorical categorical: a qualitative measure belonging to one of K possible classes discrete discrete: a quantitative measure with some countable value or continuous continuous: a quantitative measure with some uncountable value, such as a range of values Plots We can design a stem-and-leaf plot by writing all ▯rst digits in a single column and all of the other digits in the corresponding right-hand side. For example, for a standard bell-curve grading scheme: 4 | 24 5 | 0068 6 | 24556 7 | 4556678889 8 | 00022223334558 9 | 0334469 We can also use grouped frequency tables by using frequency bins, for example Average Frequency 90+ 18 80+ 43 70+ 87 60+ 92 Histograms follow a similar pattern, since we select bins such as 40-49, 50-59, 60-69, 70-79, 80-89, 90-100 and diagram the amount in each bin. If we have di▯erently sized bins (e.g. 1, 2, 3-4) we want to examine the "area" of the bars instead of their "height". 2 Measures of Certainty The sample mean is a set of n values and is denoted X n x i x = i=1 n The median is the number x such that half the values are below and half are above. If we denote the ithsmallest value as x , then i ▯ x = x n+1 2 if n is odd, or xn+2 + xn x = 2 2 2 if n is even. Measures of Dispersion The sample variance of a set of n values is denoted by Xn (xi▯ x)2 s = i=1 n ▯ 1 The standard deviation, denoted s, is the square root of the sample variance. The range of the set is the di▯erence between the maximum and minimum values. If we create a graph with the median, mean, average variance, etc, it is called a box-and-whiskers plot. Probability Classical probability is the "common sense" probability related to discrete events such as coin ips, dice rolls, etc. Though useful, this form of probability has some severe limitations: namely, the de▯nition of what "equally likely" actually means. In e▯ect, we can use this type of probability to ▯nd an answer, but can not use that answer for anything. Relative frequency probability is slightly more useful: we repeat an experiment some number of times and record the relative chance of various outcomes. This type of probability analysis, however, is extremely impractical. Finally, we have subjective probability, which is based on a person’s experiences and subjective knowledge. Obviously, this method also has some severe limitations and is far too abstract to be used scienti▯cally. When discussing probabilty, we always refer to experiments experiments: a repeatable phenomenon or process or their various trials 3 trials: an iteration of an experiment These experiments have a sample space sample space: set of discrete outcomes for an experiment which is obviously either discrete or continuous, depending on whether or not this range is countable. We will be attaching a mathematical model to the sample space to have our our de▯nition of probability. Any probablity model must obey the following axioms: ▯ 0 ▯ P(A) ▯ 1;A 2 S ▯ P(S) = 1 ▯ P(A [ B) = P(A) + P(B) for any mutually exclusive outcomes for any sample space S and potential outcomes A and B. The classical model would suggest that for a sample set S = fa;b;cg, each outcome has a probability P = .3This is referred to as a uniform distribution, and is incorrect for most non-trivial samples. Permutations and Combinations A common problem requires we create an arrangement using r of n objects. In such a set, the number of permutations is equal to (r) n! n = (n ▯ r)! If we don’t care about the order of the arrangement, we can use the formula for a combination. The way to ▯nd r of n items is ▯ ▯ n = n! r r!(n ▯ r)! Set Operations A \ B is the intersection of two events, event A and event B. In other words, this is the probability that both events will occur. It is also written as AB. Note that if P(A \ B) = 0, the two events are mututally exclusive. P(A [ B) is the union of events A and B, and is de▯ned by P(A [ B) = P(A) + P(B) ▯ P(\B). This is the probability of one or the other happening. We also de▯ne the complement of A, A = 1▯P(A). This is the probability of the event not occuring. We de▯ne conditional occurances P(A\B)he following notation: the probability of A conditional on the probability of B is P(AjB) = P(B) . Obviously, if the probability of B is zero, this is non-sensical. Two events are independant if and only if P(A\B) = P(A)P(B). Note that this will also tell us that P(AjB) = P(A) and vice-versa (the probability of A/B is the same regardless of whether the other has occured. 4 Law of Total Probability If we have some distinct partition of our sample set su0h th1t B [ n [ ▯▯▯B , then for any event A Xn we can ▯nd P(A) = P(AjB iP(i) i=0 Bayes’ Theorem For any two events in a sample set P(AB) P(AjB)P(B) P(BjA) = = P(A) P(AjB)P(B) + P(AjB)P((B) Discrete Random Variables A random variable is one which may have any value, where R(X) is the range of possible values it can take. We denote random variables with upper-case letters and denote observed variables as lower-case letters. If these variables can take on only two possible values, we refer to them as binary. We denote the probability distribution (i.e. the chance of some random variable being equal to a certain variable) as f(x) = P(X = x). The sum of the probability distributions of X for all possible x is equal to 1. We also de▯ne the cumulative distribution function as F(x) = P(X ▯ x). The mean or expected value of a random variable X is de▯ned as X ▯ = E(X) = xf(x) x This function is linear, thus we have E(aX + bY ) = aE(X) + bE(Y ) Variance is the square of the expected di▯erence X 2 2 V ar(X) = E((X ▯ E(X)) ) = f(x)(x ▯ ▯) x We sometimes denote this as V ar(X) = E(X ) ▯ E(X) . Bernoulli Distributions A Bernoulli distribution will be formed when an experiment is repeated several times. The outcomes of each tril must be independant though the probability of any given outcome must be identical over all experiments. Results must be binary. 5 We say that X follows a Bernoulli distribution (X Bernoulli(p)), where p is the probablity of success, if ( p if x = 1 f(x) = 1 ▯ p if x = 0 Note that for all Bernoulli distributions E(X) = p and V ar(
More Less

Related notes for STAT 206

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.