STAT 100 Lecture 6: how to do EDA
-Missing values
(branch of missing values )
1-drop.(what wrong )→implications on sample
@-impute missing values ->Bias
→might be convientf
risky Process
preference (guess and estimate )
-Time Zone inconsistencies
@@
-convert to common timezone CUTC )
-convert to time zone of location
-Duplicated Records of field
1identify primary key
-Spelling errors
1APPLY corrections,records not indict →implication
-Units not specified or consistent
1infer units ,check reasonable
-Truncated data (limits )
1be aware of consequences in analysis
How do you do EDA ?
-Exam data and megadata
-Exam each field 1attribute
-Exam pairs (group )
Along the way :visualize
validate assumptions
identify and address anomalies
Amy data transformations
Document Summary
Drop . ( what wrong ) implications on sample impute missing values. Might be convientf risky and preference ( guess. !1! infer check reasonable records not indict implication consistent. How do you of do consequences in analysis. Exam data each and megadata field 1 attribute pairs ( group ) Along the way : visualize validate assumptions identify and. Amy data transformations address anomalies doing ? relationships. { tiled interpolations look for spread , shape , modes , Bar plots look for skew , frequent , rare categories. Relative bin values second mode frequency of of sitesnew. Box useful comparing multiple for summarizing distribution distribution and outliers wiser upper quartile. } zar data quartile so % of lower median. Bar charts used to compare consider nominal and ordinal data sorting by category 1 frequency. Calls . check size broke data record sort. Offense : all capital numbers if any missing.