Lecture 9

# POL222H1 Lecture 9: Univariate Distriubtion Premium

Department
Political Science
Course
POL222H1
Professor
Kenichi Ariga
Semester
Fall

Description
POL222 Week 10 Lecture 9 • Univariate Distribution • Standard deviation is useless for linear regression analysis • One use is compare the different values in different cases • Variance o Related concept: variance is the square of the standard deviation and measure of Y o S2 is often used to denote the variance o Recall S is often used to denote the standard deviation • Variability 2:IQR o Percentile: the p-th percentile is the value of a variable such that p% of the observations fall below that value and (100-p)% fall above it • Quartiles: th o Lower – equals the 25 percentile. One quarter of the data fall below the lower quartile o Upper—equals the 75 percentile. One quarter of the data fall above the upper • Inter Quartile Range o IQR is the difference between the upper and lower quartile • Box plot is useful to compare the distribution of 2 variables • Histogram and box plot are ways to visualize the value we have • Normal distribution: the mean = medium o It is not common to see a true normal distribution for any of the variable but there are many variables o Important draw for when we conduct statistical • Right=skewed distribution: the long scale on the right side if the chart o Positive: the mean is normally greater than the medium o Because mean is sensitive o The difference between mean and median is half o Relative level of spending , not the actual level • Left-skewed distribution or negative: longer scale on the left or negative values o The mean is normally smaller than the median • Bimodal distribution o More than three piece , we may call is multi-model o Two piece in this distribution, and small number of observations in the middle o In the UN, there’s usually two variable distributions • Mean for a Binary Variable o Dichotomous Categories: e.g., voted for the government part or not, won an election of not engaged in war or not, changed o We assign either y = 1 or 0 to each of the dichotomous outcomes o Then, the sample mean of y is the proportion of observations with y=1 o The mean should be the sum of the value of variables divided by the observation • • Univariate analysis: o Relationship between y and x, and interest is the causal relationship between y and x o Linear regression analysis to describe relationship o Focus on summarizing relationship in data in sample o Conditional mean of Y given X: represents how Y varies on average as X varies • Scatterplot: each dots represents value of observations in data o Does Y and X have a negative or positive relationship? Scatterplot will tell you • Conditional mean o To represent how Y will vary across different values of X o Average value of Y when X takes a specific value o E.g. 2 variables= Y and X o I remember this from last year o I still don’t understand it • Conditional mean of Y given X • E.g. 1: ho
