Sociology and Anthropology
SOAN 3120
Michelle Dumas

Soan 3120 Chapter NotesChapter 1 Picturing Distributions with GraphsIndividuals and VariablesIndividuals the objects described by a set of data may be people animals or thingsVariables any characteristic of an individual A variable can take different values for different individualsCategorical Variable places an individual into one of several groups or categories Ex sex majorQuantitative Variable takes numerical values for which arithmetic operations make sense These values are usually recorded in a unit of measurementEx height in inches age in yearsCategorical Variables Pie Charts and Bar GraphsExploratory Data Analysis examining data to describe its main featuresBegin by examining each variable by itself then study the relationship among the variablesBegin with a graphs and add numerical summaries of specific aspects of the dataTo examine a single variable display its distributionDistribution tells us what value it takes and how often it takes these valuesThe Distribution of a categorical variable lists the categories and gives the countpercent of individuals who fall in each categoryRoundoff Error when the percentages in a distribution dont equal 100 because they were rounded This error doesnt actually point to a mistake just the effect of roundingPie Chart used to display the distribution of a categorical variableMust include all the categories to make up a wholeUsed to emphasize each categorys relation to the wholeBar Graph represents each category as a barCan compare any set of quantities that are measured in the same unitsQuantitative Variables HistogramsHistogram the most common graph of the distribution of one quantitative variableTo create a histogram1Choose the Classes divide the range of the data into classes of equal width 2Count the Individuals in each class3Draw the HistogramChoosing too few classes will cause all values to fall into a few classes SkyscraperChoosing too many classes will cause many classes to have few or no values PancakeWhen examining a histogram look for the overall pattern and for striking deviations from that patternUse shape center spread to describe the patternOutlier an individual value that falls outside the overall patternMidpoint the value with roughly half the observations taking smallerlarge valuesSpread described by giving the smallest and largest values of a distributionSymmetric a distribution in which the left and right sides of the histogram are approximately mirror imagesSkewed to the right the right side of the histogram extends much farther than the left sideSkewed to the left the left side of the histogram extends much farther than the right sideNote the direction of skewedness is the direction of the long tail NOT the direction where more observations are clusteredthe humpQuantitative Variables StemplotsFor small data sets a stemplot is quicker to make and presents more detailed informationTo make a stemplot1Separate each observation into a stem all but the final digit and a leaf the final digit Stems can have as many digits as needed but a leaf has only 1 per observation2Write the stems in a vertical column smallest to largest and draw a line separating them from the leafs Include all the stems even if they have no leaf3Write each leaf in the row to the right of its stem in increasing orderEx 13 24 632 5 6 41 357This would represent the data 13 24 26 32 35 36 41 43 57A stemplot looks like a histogram turned on endUnlike a histogram a stemplot preserved the actual value of each observationNote Stemplots do NOT work well for large data sets where each stem must hold a large number of leavesSplit Stems you can split the stems in a stemplot to double the number of stems but reduce the number of leafs on each stemTime PlotsTime Plot plots each observation against the time at which is was measuredAlways put time on the horizontal scale of your plot and the variable you are measuring on the vertical scaleCycles regular up and down movementsTrend a long term upwarddownward movement over timeTime Series Data time plots show the change in one variable over time Cross Sectional Data histograms display many variables at the same timeChapter 2Chapter 2 Describing Distributions with NumbersxMeasures of Center The Mean To find the mean of a set of observations add their values and divide by the number of observations1xxinThe mean is sensitive to the influence of a few extreme observations outliers as well as skewed distributions which pull the mean toward the tailThe mean is NOT considered a resistant measure of centerMeasures of Center The Median MThe Median is the midpoint of a distributions the number such that half the observations are smaller and the other half are larger To find the median of a distributionArrange all the observations in order of size smallest to largestIf n is odd M is the center observation in the listIf n is even M is midway between the two center observations in the listYou can always find M by countingn12from the start of the list this does not give M just its positionThe median is a resistant measure of centerComparing the Mean and the MedianSymmetric Distribution MeanMedianSkewed Distribution Mean goes towards the tailMeasuring Spread The QuartilesTo calculate the quartilesArrange observations in order and locate MQ1 the median of the observations below the meanQ3 the median of the observations above the meanFive Number Summary and Box PlotsFive Number Summary of a distribution consists ofMinQMQMax13Offers a reasonably complete description of the center and spread of a distributionBox Plot a graph of the fivenumber summaryA central box spans the quartiles 1 and 3A line in the box marks the median MLines extend from the box to the smallest and largest observations
