BSB123 - Data Analysis Mid-Semester Exam Notes (Summary of Lectures 1 - 6)

10 Pages
Unlock Document

Management and Human Resources
All Professors

Data Analysis Mid-Semester Notes Lecture 1: Introduction to Statistics Statistics: processing and analysing data Descriptive: collecting, presenting and characterising Inferential: use sample to draw conclusions about the population Population: whole dataset Sample: subset Parameter: numerical measure that describes a characteristic of a population Statistic: numerical measure that describes a characteristic of a sample Primary source: collect yourself or internally Secondary source: buy data/external source Collecting Data Important Sources 1. Data distributed by organisation or individual 2. Designed experiment 3. Survey 4. Observational study Data  Categorical  Nominal: no order  Ordinal: order to categories  Numerical  Discrete (finite e.g 1,2,3) OR continuous (infinite 1.2713…)  Interval (no ratio proportions) OR ratio (ration comparisons)  Time series (time element) OR cross sectional (one point in time) Graphing/tables  Categorical Data  Summary table  Bar graph (good for frequency comparisons)  Pie graph (good for proportions) Christina Meyers BSB 123 Data Analysis 1 Lecture 2: Presenting data in tables and charts & Introduction to descriptive measures Numerical Data Ordered array: ordered from smallest to largest Frequency distribution: summary table in which data is arranged into numerically ordered classes Histogram: graph of continuous data in a frequency distribution (no gaps) Class intervals and boundaries Range  max min range Classwidth no. of classes Lowerboundary Upperboundeary Classmid-point 2 Rule of thumb: Usually at least 5 classes, but no more than 15 Two Variables: Bivariate Data x: Independent variable y: dependent variable Does (dependent variable, y) depend on (independent, x)?  Categorical vs categorical  Contingency table (allows for cross-tabulation of data)  Clustered bar chart/stacked bar chart (converts contingency table into graphical form)  Numerical vs numerical  Scatter plot (x,y pairs of data)  Line chart (time series – time is the independent variable, against a dependent variable)  Numerical vs categorical  Pivot table Positive relationship: as x increases, y increases Negative relationship: as x increase, y decreases No relationship: random movements of x and y Christina Meyers BSB 123 Data Analysis 2 Central tendency Mean: average (sum of values divided by the no. of values)  X i x  n Median: middle number of an ordered array Medianposition n 1 3 Rule 1: If Data set is even, median is the average of the two middle ranked values Rule 2: If Data set is odd, median is the middle ranked value Mode: most frequently observed value [may be no mode or several modes (bimodal)] Quartiles Split the ranked data into 4 segments, with an equal number of values per segment Q1 Q2 Q3 Q4 25% 25% 25% 25% n1  n1 3(n1)  4(n1) 4 2 4 4 (Median) Rules Rule 1: If the result is an integer, then the quartile is equal to the ranked value Rule 2: If the result is a fractional half, then the quartile is equal to the mean of the corresponding ranked values Rule 3: If the result is neither an integer nor a fractional half, round the result to the nearest integer and select that ranked value Box and whisker plot Christina Meyers BSB 123 Data Analysis 3 Lecture 3: Numerical descriptive measures Variation Range  max min Outliers can lead to an untrue indication of the range Interquartile range IQR Q3Q1 Ignores extreme values Variance: Measure of variation based on squared deviations from the mean Population Variance 2 2  (X  )   N Sample Variance 2 S   (x  x) n 1 Standard Deviation Only measures 1 variable Population Standard Deviation    2 Sample Standard Deviation 2 S  S Coefficient of Variation: Relative measure of variation, the standard deviation divided by the mean, multiplied by 100% CV  S x Covariance Measures direction of linear relationship between two numerical variables i.e positive or negative (x  x)(y  y) Cov(X,Y)   n1 Christina Meyers BSB 123 Data Analysis 4 Correlation Measures the direction and strength of the relationship Cov(X,Y) rXY  S x y Z Score The difference between a given observation and the mean, divided by the standard deviation Z  x  x S A Z Score above 3 below -3 is considered an outlier. An outlier can cause numerical measures to be distorted, resulting in misleading overall trends Shape Longer left tail Longer right tail Christina Meyers BSB 123 Data Analysis 5 Lecture 4: Simple Linear Regression and Introduction to Probabili
More Less

Related notes for BSB123

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.