Get 2 days of unlimited access
Class Notes (1,000,000)
AUS (30,000)
UNSW (1,000)
ECON (60)
Lecture

ECON1203 Notes.docx


Department
Economics
Course Code
ECON1203
Professor
Yongdoek

This preview shows pages 1-3. to view the full 149 pages of the document.
ECON1203 Notes
1
CHAPTER 1: What is Statistics?
Statistics is a way to get information from data
Descriptive statistics deals with methods of organising, summarising, and presenting
data in a convenient and informative way
o e.g. using graphical techniques (visual allows easier extraction of info) or
numerical techniques (finding the mean)
o The measure of central location is a value that attempts to describe a set of
data by identifying the central position within that set of data, e.g. the mean,
median, mode, etc.
o The measure of variability is a mathematical determination of how much
the performance of the data set as a whole deviates from the mean or median,
e.g. the range, standard deviation, interquartile range, variance, etc.
Inferential statistics is a body of methods used to draw conclusions or inferences
about characteristics of populations based on sample data it involves three key
concepts: the population, the sample, and the statistical inference
o A population is the group of all items of interest to a statistics practitioner
the group can be unlimited in size and doesn‟t necessarily refer to a group of
people, e.g. 100,000 on campus
A parameter is a descriptive measure of a population in most
applications of inferential statistics, the parameter represents the
information we need, e.g. mean number of soda drunk by the 100,000
o A sample is a set of data drawn from the studied population, e.g. 1000 of the
100,000 are interviewed
A statistic is a descriptive measure of a sample, e.g. the average
number of soda drunk by the 1000
We can then use statistics to make inferences about
parameters, e.g. using the sample mean to infer the value of
the population mean, which is the parameter of interest
o Statistical inference is the process of making an estimate, prediction, or
decision about a population based on sample data
Necessary because populations can be unlimited in size and thus
investigation would be impracticable and expensive
In order to ensure conclusions and estimate are as accurate as
possible, a measure of reliability is built into the statistical inference
two such measures:
The confidence level is the proportion of times than an
estimating procedure will be correct, e.g. an estimate of the
amount of sodas consumed by all 100,000 has a confidence
level of 95%, meaning estimates based on this form of
statistical inference will be correct 95% of the time
The significance level measures how frequently the
conclusion will be wrong
CHAPTER 2: Graphical Descriptive Techniques I
A variable is some characteristic of a population or sample, e.g. the mark on an exam
since not all students achieve the same mark, i.e. they will vary
1. Usually represented with uppercase letters such as X, Y, and Z
The values of the variable are the possible observations of the variables, e.g. an exam
marked out of 100 will have values as the integers between 0 and 100
Data are the observed values of a variable, e.g. 58/100, 100/100, etc.
1. Data is plural for datum the mark of one student is a datum

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

2
There are three types of data:
1. Interval data are real numbers, e.g. height, weight, income, distance, etc.
Also referred to as quantitative or numerical (or ratio) data
All calculations are permitted a set of interval data are usually
described by calculating the average
2. Nominal data are categories, e.g. responses to questions about marital status
The values of this variable are: singled, married, divorced, and
widowed (not numbers but words that describe the categories)
Nominal data is often recorded with arbitrary assignments of
numbers to each category, e.g. single = 1, married = 50, divorced =
48, etc.
Also referred to as qualitative or categorical data
Since the values are appointed arbitrary numbers, calculations for
nominal data using the numbers are meaningless
Only calculations allowed involve counting or calculating the
percentages of the occurrences of each category and
reporting the frequency
3. Ordinal data is similar to nominal in appearance, but the difference is that
the order of the values for ordinal types of data indicate a higher rating, thus
when assigning codes to the values, the order must be maintained, e.g. poor =
1, fair = 2, good = 3, very good = 4, excellent = 5
Magnitude of values aren‟t important, only the order is, i.e. poor = 1,
fair = 50, good = 51, very good = 100, excellent = 6310 implies the
same meaning as the previous example
The only permissible calculations are those involving a ranking
process, e.g. placing all the data in order and selecting the value that
lies in the middle (the median)
Critical difference between ordinal and interval data is that the intervals between
values of interval data are consistent and meaningful
The data types can be placed in order of the permissible calculations:
1. Interval data all calculations are allowed
2. Ordinal data
3. Nominal data no calculations other than determining frequencies are
permitted
Higher-level data types may be treated as lower-level ones (the vice-versa is not
possible), e.g. in UNSW, marks (interval data) are converted to letter grades (ordinal)
o This leads to loss of information e.g. a 83 provides more information than a
D (since D implies a mark between 70-85)
The variables whose observations constitute our data will be given the same name as
the type of data, e.g. interval data are the observations of an interval variable
For nominal data, the only allowable calculation is to count the frequency or compute
the percentage each value of the variable represents
o This data can be summarised in a table, presenting the categories and their
counts, called a frequency distribution
o A relative frequency distribution lists the categories and the proportion
with which each occurs graphical techniques are commonly used to depict a
picture of the data
Bar chart and the pie chart used because they are eye-catching
and enhance individual‟s ability to grasp the substance of the data
A bar chart is often used to display frequencies

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

3
A pie chart graphically shows relative frequencies
(percentages)
For ordinal data, there are no specific graphical techniques to describe a set of
ordinal data, they‟re treated as if they were nominal
o The difference is that:
In bar charts, the bars should be arranged in ascending (or descending)
ordinal values
In pie charts, the wedges are arranged clockwise in ascending (or
descending) order
Techniques applied to single sets of data are called univariate
When depicting the relationship between variables, bivariate methods are required
A cross-classification table (also referred as a cross-tabulation table) is used to
describe the relationship between two nominal variables
o A variation of the bar chart is employed to graphically describe the
relationship the same technique is used to compare two or more sets of
nominal data
To describe the relationship between two nominal variables, only the
frequency of the values can be determined (since it‟s nominal) –
therefore a cross-classification table that lists the frequency of each
combination of the values of the two variables needs to be produced
If the two variables are unrelated, then the patterns exhibited
in the bar charts should be approximately the same if some
relationships exist, then some bar charts will differ from
others
CHAPTER 3: Graphical Descriptive Techniques II
For interval data, the most common graphical method is the histogram
o A histogram is created by drawing rectangles whose bases are the intervals
and whose heights are the frequencies the frequency distributions provides
information about how the numbers are distributed
A frequency distribution for interval data is created by counting the
number of observations that fall into each of a series of intervals
called classes, that cover the complete range of observations
The number of class intervals selected depend on the number of
observations in the data set the more observations, the larger the
number of intervals to enable a useful histogram
Sturges‟ formula, which recommends that the number of
class intervals be determined by the following: Number of
class intervals = 1 + 3.3*log(n), where n is the number of
observations, e.g. if there‟s 50 observations, number of class
intervals = 1 + 3.3*log(50) = 1 + 3.3(1.7) = 6.6, which is
rounded to 7
The width of the class intervals is determined by the formula: class
width = largest observation - smallest observation
numer of classes , e.g. 119.63 - 0
8 =
14.95 which is rounded up to 15, thus the first class is defined as
“Greater than or equal to 0 but less than or equal to 15
Sturges‟ formula acts a guideline only it‟s more important to
choose classes that are easy to interpret, e.g. marks of an exam out of
100 where the highest is 94 and the lowest is 48 would lead to 8 class
You're Reading a Preview

Unlock to view full version