Textbook Notes
(362,755)

Canada
(158,052)

York University
(12,350)

MATH 1131
(2)

Cindy Fu
(1)

Chapter 2

# Chapter 2- Statistics 1131.docx

Unlock Document

York University

Mathematics and Statistics

MATH 1131

Cindy Fu

Fall

Description

Chapter 2
2.1
Definitions 1:
-Univariate: a data set consisting of observation of only a single characteristic of the individuals/ objects.
-Bivariate: a data set consisting of observations of two characteristics of the individuals/objects.
-Multivariate: data set consisting of observations of more than two characteristics of the
individuals/objects.
Definitions 2:
-A categorical/qualitative data set: consists of non-numerical observations that may be placed in
categories.
-A numerical/quantitative data set: consist of observations that are numbers.
Examples:
1-Sneaker preference
A sample of 12 people is asked what their favorite brand of sneakers is. This is a qualitative data set,
since the responses are either Nike, Adidas ...etc. It is non-numerical univariate study.
2-Egg Weights
Suppose a new enzyme is tested and 20 eggs are randomly selected and weighted, to test the nutritional
benefits of the enzyme. Then the resulting weights are recorded in a table. Since the observations are
numerical then this is a univariate quantitative data set.
Definitions 3:
-Discrete data: is data that values are finite. It is recognized with the word counting. Ex: The number of
lightning that hits Ontario in one day can be 5 but not 5.5, and you can count it.
-Continuous data: is data that its values fall on an interval. It is recognized with the word measuring. Ex:
Barometric pressure can be any value between 960 and 1070 mmHg. It can be 970.67.
Categorical data
Univariate data
Discrete data
Numerical data
Continuous data Questions: classify the following as categorical or numerical. If numerical, then classify as discrete or
continuous.
1-The number of books read by middle-school students during the academic year. You are counting the
number of books. Number=numerical, and counting=discrete.
2-The length of time (in minutes) it takes to get a haircut. Time is numerical, but it is measured since you
can take 15.4 minutes. Therefore, it is continuous.
3-The type of candy received at house on Halloween. Type= quality= categorical. Therefore, this is a
categorical data set.
2.2
Definitions:
-Frequency distribution for categorical data: is a summary table that presents categories, counts, and
proportions. Refer to table 2.1, on page 22 on textbook.
-Class: the label of each categorical data set.
-Frequency: is the count for each glass.
-Relative frequency/sample proportion: is the frequency of the class divide by the total number of
observations.
Examples:
Class Frequency Relative frequency
Bahamas 2 2/25=0.08
Bermuda 4 4/25=.16
Caribbean 6 6/25=.24
Mediterranean 3 3/25=.12
Southampton 10 10/25=.4
Total 25 1.00
1-What is the proportion of cruise ships that did not go to Southampton?
.4 went to Southampton, so 1-0.4=0.6 did not go to Southampton.
2- Draw a bar graph for the above table. 12
10
8
Series 3
6 Column1
Frequency
4
2
0
Bahamas Bermuda Caribbean Mediterranean Southampton
The key here is that the class is at the x-axis, and the frequency is at the y-axis.
3- Draw a Pie chart.
Sales
Bahamas
Bermuda
Caribbean
Mediterrean
Southampton
You take the frequency of a class and multiply it by 360 to get the angle/size of a class. Each piece of the
pie is a class. 2.3
Definitions:
-Outliers: values that are very far from the rest.
-Variability: refers to the spread or compactness (crowdedness together, little variability) of the data.
A stem-and leaf plot is a graphical procedure used to describe the shape, centre, and variability
of the distribution of numerical data.
How to draw a stem-leaf graph.
520 52 0
Leaf
46 6 Stem
47
48
49 8 7 3 8 7 - 0 is placed in the 52 stem row.
50 2 8 2 6 4 1 5 1 - Data is organized so one digit
51 5 3 1 3 2 is on the right, and the rest is
52 0 2 5 3 7 3 on the left, but up to 2 digits
53 3 on left. If for example, you
54 0 8 8 4 have 502 and 503, then the 2
55 7 6 7 and the 3 are in the same stem
56 7 4 row.
57 0 0 2 0 4 - Notice that the 2 digit
58 9 5 numbers are organized in
59 8 7 6 increasing order.
60 1 9 4 4 - The centre of data=typical
61 2 value=value in the
middle=where the data is
clustered is 52 or 53 here.
- The outlying value here is 466
since it is far from the data
cluster.
- Data can be referred to as
variable (spread) or outlier.
Real World Analogy:
-In

More
Less
Related notes for MATH 1131