# Class Notes for DATA1001 at University of Sydney

Foundations of Data Science

##### DATA1001 Lecture Notes - Lecture 13: Scatter Plot, Standard Deviation, Dependent And Independent Variables

Lecture 13 The Scatter Plot and Correlation Data story: Can we predict a sons height from his fathers height? What do you notice about the heights? Plotting the pairs of heights creates a cloud of points. Generally, taller...

##### DATA1001 Lecture Notes - Lecture 11: Standard Deviation, Kilogram, Measuring Instrument

Lecture 11: Measurement Error Data story: Is Coles overcharging me? Data Here we simulate the weight of 20 trays of lamp chops from Coles, using pricing from 221117. set.seed(1) chopweight = rnorm(20, 550, 5) chopweight [...

##### DATA1001 Lecture Notes - Lecture 19: Conditional Probability, Almost Surely

Lecture 19 Chance Data Story: Did OJ murder his wife? ProsecutorsFallacy The prosecutorsfallacy is a mistake instatistical thinking, whereby itis assumedthatthe probability of a random match is equal to the probability tha...

##### DATA1001 Lecture Notes - Lecture 16: Standard Deviation, Normal Distribution, Homoscedasticity

Lecture 16 The Residual Plot A residualis the vertical distance (or gap) of a pointabove andbelow the regressionline. Itrepresents the error between the predictionand the actual value. Eg.Ifthe actual val ue is 67 and the ...

##### DATA1001 Lecture Notes - Lecture 10: Normal Distribution, Normality Test

Lecture 10 -The Normal Curve Data Story: Howlikely is to findan elite netball goal player in Australia? The Normal Curve The Normal curve was discovered around 1720 byAbraham de Moivre The Normal curv...

##### DATA1001 Lecture Notes - Lecture 7: Standard Deviation, Box Plot

Lecture 7 -Centre Data story: How much does a property in Newtown cost? The type of people who would be interested in this data: Stakeholders Investors Possible research questions: ...

##### DATA1001 Lecture Notes - Lecture 5: Scatter Plot, Box Plot

Lecture 5: Quantitative Data Data Story: Australian Fatalities Histogram Higher boxes= more data = crowding We use a histogram for quantitative data. A histogram highlights the percentage of data in one ...

##### DATA1001 Lecture Notes - Lecture 6: Data Visualization, Domain Knowledge, Scatter Plot

Lecture 6: Data Visualization Data story: What is the price point for a diamond? Domain knowledge: Demandis predicted outweigh supplyof diamonds Interms of usage anddemand, 30% of diamonds are needed forjew...

##### DATA1001 Lecture Notes - Lecture 9: Web Scraping, Data Scraping, Data Wrangling

Lecture9:DataWrangling Data Wrangling Data wrangling is whatever is needed to get the data ready for analysis. o It is also called data munging or data janitor work. o Involves: ...

##### DATA1001 Lecture 3: Lecture 3- Observational Studies

Lecture 3: Data Story: Does Smoking Cause Cancer? Observational study: Where the investigator cannot use randomization for allocation to groups. To study the effects of smoking, investigators cannot choose which subjects w...

##### DATA1001 Lecture 2: Lecture 2- Controlling Data

Lecture 2: Controlling Data Data story: Does an antiacne drug cause depression? Domain knowledge Questions on Roaccutane might include: What is Roaccutane prescribed for and how does it chemically work? What are known side...

##### DATA1001 Lecture 13: Tests for Relationships Premium

Scanned with CamScanner

##### DATA1001 Lecture 12: Tests for Relationships Premium

Scanned with CamScanner

##### DATA1001 Lecture 11: Tests for Mean Premium

Scanned with CamScanner

##### DATA1001 Lecture 10: Hypothesis Testing Premium

Scanned with CamScanner Scanned with C

