This

**preview**shows pages 1-3. to view the full**149 pages of the document.**ECON1203 Notes

1

CHAPTER 1: What is Statistics?

Statistics is a way to get information from data

Descriptive statistics deals with methods of organising, summarising, and presenting

data in a convenient and informative way

o e.g. using graphical techniques (visual allows easier extraction of info) or

numerical techniques (finding the mean)

o The measure of central location is a value that attempts to describe a set of

data by identifying the central position within that set of data, e.g. the mean,

median, mode, etc.

o The measure of variability is a mathematical determination of how much

the performance of the data set as a whole deviates from the mean or median,

e.g. the range, standard deviation, interquartile range, variance, etc.

Inferential statistics is a body of methods used to draw conclusions or inferences

about characteristics of populations based on sample data – it involves three key

concepts: the population, the sample, and the statistical inference

o A population is the group of all items of interest to a statistics practitioner –

the group can be unlimited in size and doesn‟t necessarily refer to a group of

people, e.g. 100,000 on campus

A parameter is a descriptive measure of a population – in most

applications of inferential statistics, the parameter represents the

information we need, e.g. mean number of soda drunk by the 100,000

o A sample is a set of data drawn from the studied population, e.g. 1000 of the

100,000 are interviewed

A statistic is a descriptive measure of a sample, e.g. the average

number of soda drunk by the 1000

We can then use statistics to make inferences about

parameters, e.g. using the sample mean to infer the value of

the population mean, which is the parameter of interest

o Statistical inference is the process of making an estimate, prediction, or

decision about a population based on sample data

Necessary because populations can be unlimited in size and thus

investigation would be impracticable and expensive

In order to ensure conclusions and estimate are as accurate as

possible, a measure of reliability is built into the statistical inference

– two such measures:

The confidence level is the proportion of times than an

estimating procedure will be correct, e.g. an estimate of the

amount of sodas consumed by all 100,000 has a confidence

level of 95%, meaning estimates based on this form of

statistical inference will be correct 95% of the time

The significance level measures how frequently the

conclusion will be wrong

CHAPTER 2: Graphical Descriptive Techniques I

A variable is some characteristic of a population or sample, e.g. the mark on an exam

since not all students achieve the same mark, i.e. they will vary

1. Usually represented with uppercase letters such as X, Y, and Z

The values of the variable are the possible observations of the variables, e.g. an exam

marked out of 100 will have values as the integers between 0 and 100

Data are the observed values of a variable, e.g. 58/100, 100/100, etc.

1. Data is plural for datum – the mark of one student is a datum

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

2

There are three types of data:

1. Interval data are real numbers, e.g. height, weight, income, distance, etc.

Also referred to as quantitative or numerical (or ratio) data

All calculations are permitted – a set of interval data are usually

described by calculating the average

2. Nominal data are categories, e.g. responses to questions about marital status

The values of this variable are: singled, married, divorced, and

widowed (not numbers but words that describe the categories)

Nominal data is often recorded with arbitrary assignments of

numbers to each category, e.g. single = 1, married = 50, divorced =

48, etc.

Also referred to as qualitative or categorical data

Since the values are appointed arbitrary numbers, calculations for

nominal data using the numbers are meaningless

Only calculations allowed involve counting or calculating the

percentages of the occurrences of each category and

reporting the frequency

3. Ordinal data is similar to nominal in appearance, but the difference is that

the order of the values for ordinal types of data indicate a higher rating, thus

when assigning codes to the values, the order must be maintained, e.g. poor =

1, fair = 2, good = 3, very good = 4, excellent = 5

Magnitude of values aren‟t important, only the order is, i.e. poor = 1,

fair = 50, good = 51, very good = 100, excellent = 6310 implies the

same meaning as the previous example

The only permissible calculations are those involving a ranking

process, e.g. placing all the data in order and selecting the value that

lies in the middle (the median)

Critical difference between ordinal and interval data is that the intervals between

values of interval data are consistent and meaningful

The data types can be placed in order of the permissible calculations:

1. Interval data – all calculations are allowed

2. Ordinal data

3. Nominal data – no calculations other than determining frequencies are

permitted

Higher-level data types may be treated as lower-level ones (the vice-versa is not

possible), e.g. in UNSW, marks (interval data) are converted to letter grades (ordinal)

o This leads to loss of information e.g. a 83 provides more information than a

D (since D implies a mark between 70-85)

The variables whose observations constitute our data will be given the same name as

the type of data, e.g. interval data are the observations of an interval variable

For nominal data, the only allowable calculation is to count the frequency or compute

the percentage each value of the variable represents

o This data can be summarised in a table, presenting the categories and their

counts, called a frequency distribution

o A relative frequency distribution lists the categories and the proportion

with which each occurs – graphical techniques are commonly used to depict a

picture of the data

Bar chart and the pie chart – used because they are eye-catching

and enhance individual‟s ability to grasp the substance of the data

A bar chart is often used to display frequencies

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

3

A pie chart graphically shows relative frequencies

(percentages)

For ordinal data, there are no specific graphical techniques – to describe a set of

ordinal data, they‟re treated as if they were nominal

o The difference is that:

In bar charts, the bars should be arranged in ascending (or descending)

ordinal values

In pie charts, the wedges are arranged clockwise in ascending (or

descending) order

Techniques applied to single sets of data are called univariate

When depicting the relationship between variables, bivariate methods are required

A cross-classification table (also referred as a cross-tabulation table) is used to

describe the relationship between two nominal variables

o A variation of the bar chart is employed to graphically describe the

relationship – the same technique is used to compare two or more sets of

nominal data

To describe the relationship between two nominal variables, only the

frequency of the values can be determined (since it‟s nominal) –

therefore a cross-classification table that lists the frequency of each

combination of the values of the two variables needs to be produced

If the two variables are unrelated, then the patterns exhibited

in the bar charts should be approximately the same – if some

relationships exist, then some bar charts will differ from

others

CHAPTER 3: Graphical Descriptive Techniques II

For interval data, the most common graphical method is the histogram

o A histogram is created by drawing rectangles whose bases are the intervals

and whose heights are the frequencies – the frequency distributions provides

information about how the numbers are distributed

A frequency distribution for interval data is created by counting the

number of observations that fall into each of a series of intervals

called classes, that cover the complete range of observations

The number of class intervals selected depend on the number of

observations in the data set – the more observations, the larger the

number of intervals to enable a useful histogram

Sturges‟ formula, which recommends that the number of

class intervals be determined by the following: Number of

class intervals = 1 + 3.3*log(n), where n is the number of

observations, e.g. if there‟s 50 observations, number of class

intervals = 1 + 3.3*log(50) = 1 + 3.3(1.7) = 6.6, which is

rounded to 7

The width of the class intervals is determined by the formula: class

width = largest observation - smallest observation

numer of classes , e.g. 119.63 - 0

8 =

14.95 which is rounded up to 15, thus the first class is defined as

“Greater than or equal to 0 but less than or equal to 15”

Sturges‟ formula acts a guideline only – it‟s more important to

choose classes that are easy to interpret, e.g. marks of an exam out of

100 where the highest is 94 and the lowest is 48 would lead to 8 class

###### You're Reading a Preview

Unlock to view full version