-Univariate: a data set consisting of observation of only a single characteristic of the individuals/ objects.
-Bivariate: a data set consisting of observations of two characteristics of the individuals/objects.
-Multivariate: data set consisting of observations of more than two characteristics of the
-A categorical/qualitative data set: consists of non-numerical observations that may be placed in
-A numerical/quantitative data set: consist of observations that are numbers.
A sample of 12 people is asked what their favorite brand of sneakers is. This is a qualitative data set,
since the responses are either Nike, Adidas ...etc. It is non-numerical univariate study.
Suppose a new enzyme is tested and 20 eggs are randomly selected and weighted, to test the nutritional
benefits of the enzyme. Then the resulting weights are recorded in a table. Since the observations are
numerical then this is a univariate quantitative data set.
-Discrete data: is data that values are finite. It is recognized with the word counting. Ex: The number of
lightning that hits Ontario in one day can be 5 but not 5.5, and you can count it.
-Continuous data: is data that its values fall on an interval. It is recognized with the word measuring. Ex:
Barometric pressure can be any value between 960 and 1070 mmHg. It can be 970.67.
Continuous data Questions: classify the following as categorical or numerical. If numerical, then classify as discrete or
1-The number of books read by middle-school students during the academic year. You are counting the
number of books. Number=numerical, and counting=discrete.
2-The length of time (in minutes) it takes to get a haircut. Time is numerical, but it is measured since you
can take 15.4 minutes. Therefore, it is continuous.
3-The type of candy received at house on Halloween. Type= quality= categorical. Therefore, this is a
categorical data set.
-Frequency distribution for categorical data: is a summary table that presents categories, counts, and
proportions. Refer to table 2.1, on page 22 on textbook.
-Class: the label of each categorical data set.
-Frequency: is the count for each glass.
-Relative frequency/sample proportion: is the frequency of the class divide by the total number of
Class Frequency Relative frequency
Bahamas 2 2/25=0.08
Bermuda 4 4/25=.16
Caribbean 6 6/25=.24
Mediterranean 3 3/25=.12
Southampton 10 10/25=.4
Total 25 1.00
1-What is the proportion of cruise ships that did not go to Southampton?
.4 went to Southampton, so 1-0.4=0.6 did not go to Southampton.
2- Draw a bar graph for the above table. 12
Bahamas Bermuda Caribbean Mediterranean Southampton
The key here is that the class is at the x-axis, and the frequency is at the y-axis.
3- Draw a Pie chart.
You take the frequency of a class and multiply it by 360 to get the angle/size of a class. Each piece of the
pie is a class. 2.3
-Outliers: values that are very far from the rest.
-Variability: refers to the spread or compactness (crowdedness together, little variability) of the data.
A stem-and leaf plot is a graphical procedure used to describe the shape, centre, and variability
of the distribution of numerical data.
How to draw a stem-leaf graph.
520 52 0
46 6 Stem
49 8 7 3 8 7 - 0 is placed in the 52 stem row.
50 2 8 2 6 4 1 5 1 - Data is organized so one digit
51 5 3 1 3 2 is on the right, and the rest is
52 0 2 5 3 7 3 on the left, but up to 2 digits
53 3 on left. If for example, you
54 0 8 8 4 have 502 and 503, then the 2
55 7 6 7 and the 3 are in the same stem
56 7 4 row.
57 0 0 2 0 4 - Notice that the 2 digit
58 9 5 numbers are organized in
59 8 7 6 increasing order.
60 1 9 4 4 - The centre of data=typical
61 2 value=value in the
middle=where the data is
clustered is 52 or 53 here.
- The outlying value here is 466
since it is far from the data
- Data can be referred to as
variable (spread) or outlier.
Real World Analogy: