Class Notes (786,474)
Canada (482,196)
Economics (523)
ECON 2740 (43)


34 Pages
Unlock Document

University of Guelph
ECON 2740
David Prescott

Chapter 1Univariate Distributions1 Descriptive StatisticsThe most basic application of statistical concepts is to describe dataIn many situations largequantities of data are available to researchers and typically the most urgent problem is to find a way ofpresenting the data so that the most important features can be highlightedOne useful approach is to for each variable Figure 11 is a histogram that wasconstruct a diagram known as a histogram1constructed from 3921 observations on the hourly pay earned by fulltime Canadian workers in 1995 The data have been sorted into 10 binsThe centre of each bin is recorded on the horizontal axis Forexample the first bin contains all the wage rates in the sample that lie between 200 and 600 per hour and this its centre is at 400 per hourThe number of observations within a bin is called the frequencytype of histogram is known as a frequency distribution because it shows how the frequencies aredistributed amongst the binsSince each observation falls in only one bin the sum of the frequencies isthe sample size 3921 By rescaling the vertical axis the heights of the bars in Figure 11 can also be which are obtained by dividing each frequency by the sample size interpreted as the relative frequenciesFor example the relative frequency of the first bin is 17739210045In other words 45 of thesample falls in the first binClearly the sum of the relative frequencies or shares must be unityIt will be useful if some notation is used to refer to key conceptsThe size of the entire sample isdefined to be n n3921 in the exampleThe number of bins is m where mn and in the wagethexample m10The frequency of observations in the j bin is denoted by f for j1 2mIn thej177The sum of the frequencies must equal the total number of observations in theexample f12sample1This sample is drawn from the Survey of Consumer Finance SCF 1996This large surveyquestionnaire was completed by almost 100000 adult Canadians and provides information on sources ofincome hours of work and family characteristics during 1995Further information about the SCF isavailable at httptrexeconuoguelphcadprescotcoursesscfinfohtmThe subsample of 3921individuals used here is a random subset of the full sampleSome restrictions were imposed when thesample was drawnIn particular only workers who stated they worked fulltime throughout the yearwere included2 Refer to the appendix of this chapter for details on the properties of the summation operator Chapter 1 2Econometrics Text by D M Prescott m11ffffn12mjj1Figure 11Distribution of WagesDistribution of WagesFigure 12025 006 005 02 DensityRelative Freqbin width004 015 003 Density01 Relative Frequency002 005 001 0 0 848411222311222333686826048626048622Wage Rates hourWage Rates hourBy dividing each frequency by the sample size n we obtain the relative frequencies rf n forjjj 12 mThe fact that the sum of the relative frequencies is unity can be confirmed by dividingall the terms in equation 11 by the sample size nmmffff1n12mj1fjnnnnnn11jjm1rrrr12mj1j The picture of the data that we get from the histogram depends on how the bin boundaries aredefined and there is no unique way to do this Over most of the datas range a bin width of 400 per hourseems to present a useful picture of the dataHowever at higher wages the data are very thinly spread If all bin widths were 400 per hour 24 bins would be needed to cover the entire sample since themaximum hourly wage in this sample is 9615 per hour and many bins would be emptyToaccommodate the thinness of data and the large maximum wage a single bin has been created that spanswages from 3800 per hour to 9799The centre of this bin is 6800There are 88 observations in thelast bin compared to 59 observations in the bin that spans wages between 3400 and 3799 per hourAlthough Figure 11 provides a useful representation of the data it does have some deficiencies Chapter 1 3Econometrics Text by D M Prescott It is natural to judge the relative importance of a bin not so much by the bins height as by its areaThiscreates a problem when the bin widths vary in sizeThe difficulty is that the areas of wider bins are toolarge in relation to the narrower binswider bins simply look more important than they shouldFigurearea12 corrects this problem by using theof a bin to represent its relative frequencyWhen the area isused to represent the relative frequency the height of the bar is referred to as the density dTo calculateth bin d we use the fact that the area of the bar the relative frequency rthe density of the data in the jjjmust equal the width of the bin wtimes the height or density d jjdwrjjjThe density is thereforerwdjjjFigure 12 is similar to a probability density function in which areas rather than heights represent3 Notice that the total area under the distribution in Figure 12 is exactly unity because theprobabilities areas represent relative frequencies and as noted in equation 2 the relative frequencies add up to unitySeveral characteristics of the wage data are evident in Figure 12First not all wages are thesame rather they are spread out or dispersedSecond most wage rates are close to the central wagebut further away the frequencies diminishThird wage rates are not distributed symmetrically about thecentrethe maximum wage is much further away from the central wage than the minimum wage The distribution is stretched out or skewed to the rightEach of the concepts centre dispersionand skewness can be quantified using specific formulaeHowever there are often several ways tomeasure each concept each having its own way of capturing some essential aspect of centredispersion etc11 Measures of the Centre of a DistributionThe midrange or half way point between the minimum and maximum values can be used todefine the centre of a distribution but this is clearly unsatisfactory when the data are severely skewed In Figure 12 995 of the wages are below the midrange of 496 so this number hardly represents thecentral wageOften the bin with the greatest frequency referred to as the mode is a useful measure of3 Consider the widely encountered normal distribution which is reviewed later in this chapterItareasisunder the bellshaped normal density function that represent probabilities
More Less

Related notes for ECON 2740

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.