STAT1008 Study Guide - Final Guide: Bar Chart, Standard Deviation, Quartile

57 views3 pages
17 May 2018
School
Department
Course
Professor
Describing Data
2.1 Categorical (Discrete) Variables
One Categorical Variable
Frequency table shows the number of cases that fall in each category.
The proportion in a category is found by number in that category/total number.
Proportion for a sample: p-hat
Proportion for a population: p
Relative frequency table shows the proportion of cases that fall in each category.
Bar charts or pie charts can be used to visualise the data in one categorical variable.
Two Categorical Variables
A two-way table - shows the relationship between two categorical variables.
The categories for one variable are listed in rows and the categories for the second variable
are listed in columns.
A difference in proportions is a difference in proportions for one categorical variable
calculated for different levels of the other categorical variable.
A segmented bar chart or a side-by-side bar chart can be used to visualise the relationship
between 2 categorical variables = comparative plots.
2.2 Quantitative(Continuous) Variables
One Quantitative Variable: Shape and Centre
Visualised using a dotplot.
Histograms the height of each bar corresponds to the number of cases within that range of
the variable.
The sample size, the number of cases in the sample, is denoted n.
Symmetric and Skewed Distributions
Symmetric - if the two sides approximately match when folded on a vertical centre line.
Skewed - if the data are piled up on the left or the right and the tail extends relatively far out
to the other side.
Bell-shaped - if the data are symmetric and in addition, have the shape shown in 2.9c.
Bimodal two peaks.
Other terms - asymmetric, peak and range.
The Centre of Distribution
Mean = sum (Σ) of all data values/number of data values.
Sample mean: -ar
Population mean: u
The ea is pulled i the diretio of skewess.
Median (m) the middle value when the data are ordered.
If there are an even number of values in the dataset, then we use the average of the two
middle values.
Outlier - an observed value that is notably distinct from the other values in a dataset.
Outliers should be kept in the data uless the are a istake or do’t elog to the
population.
A statistic is resistant if it is relatively unaffected by extreme values.
The median is resistant, while the mean is not.
The mode is the most common number.
2.3. One Quantitative Variable: Measures of Spread
Standard Deviation
      
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows page 1 of the document.
Unlock all 3 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Symmetric - if the two sides approximately match when folded on a vertical centre line. Mean = sum ( ) of all data values/number of data values. Sample mean: (cid:894)(cid:862)(cid:454)-(cid:271)ar(cid:863)(cid:895: population mean: (cid:894)(cid:862)(cid:373)u(cid:863)(cid:895, the (cid:373)ea(cid:374) is (cid:862)pulled(cid:863) i(cid:374) the dire(cid:272)tio(cid:374) of skew(cid:374)ess, median (m) the middle value when the data are ordered. (cid:1865)(cid:1866)=(cid:2869)+(cid:2870)+ + (cid:1866) Standard deviation measures the spread of the data. Divide by n for populations: a larger standard deviation = more variability = the data values are more spread out, population standard deviation: (cid:894)(cid:862)sig(cid:373)a(cid:863)(cid:895) If a distribution of data is approx. bell-shaped, about 95% of the data should fall within two standard deviations of the mean. For a population, 95% of the data will be between 2 and + 2 . Z-score - the number of standard deviations a value falls from the mean. For bell-shaped distributions, 95% of all the z-scores fall between +/- 2. Five number summary = minimum, q1, median, q3, maximum.