Textbook Notes (280,000)
CA (160,000)
U of G (10,000)
SOAN (400)
SOAN 2120 (100)
Chapter 8

SOAN 2120 Chapter Notes - Chapter 8: Barcode, Codebook, Applied Mathematics

Sociology and Anthropology
Course Code
SOAN 2120
David Walters

This preview shows pages 1-2. to view the full 6 pages of the document.
SOAN 2120
Chapter 8 summary
NOTE: all charts are for example purposes only.
Data coding symmetrically reorganizing raw numerical data into a format that is easy to
analyze through the use of computers.
Each category of a variable and missing information needs a code
Codebook is a document of one or more pages that describes the coding procedure
and the location of the data for variables in a format that computers ca use. Keeps data
organized and detailed
Pre-coding means placing the code categories (ie 1 male and 2 female) on the survey
- If no pre-coding is done, a codebook must be created as the data is collected. The
1’s are always in column 2 and the 10’s in column 1.
Four ways to get raw quantitative data into a computer
1) Code-sheet: gather the information and then transfer it to a grid format and then
typed into a computer
2) Direct entry method including CATI: observe/listen to the information and enter it
or have the respondent enter it themselves.
3) Optical scan: gather information and enter it onto optical scan sheets or have the
respondent do it by filling in the correct “dots.” Use an optical scanner to transfer the
information to the computer
4) Bar code: gather information and convert it into different bar codes specific to
numerical values, use a bar code reader to scan information to the computer.
Must be very accurate when coding data. If one codes 10-15% random data and finds
no errors they are fine. If an error is found, all the data must be checked.
possible code cleaning/wild code checking: checking the categories of all variables
for impossible codes. Used to verify coding.
Contingency cleaning/consistency checking: cross-classifying two variables and
looking for logically impossible combinations. Used to verify coding ie education is cross-
classified by occupation
Statistics: a set of collected numbers as well as a branch of applied mathematics used
to manipulate and summarize the features of the numbers

Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

Descriptive statistics: describe numerical data
-Univariable statistics: describe one variable the easiest way to describe the
numerical data of one variable is with a frequency distribution. Can be also
represented in graphs such as histogramsb usually upright bar graphs for interval
or ratio level data bar charts used for discrete variables. Have a vertical or
horizontal orientation with a small space b/w the bars/
Frequency polygon: plots interval or ratio level data
3 measures of central tendency or measures of the center of the frequency
1) Mode: the easiest to use and can be used w nominal, ordinal, interval, or ratio data. It is
the most common or frequently occurring number ie in a list of 6 5 7 10 9 5 3 5 the mode
is 5. Or in a list of 5 6 1 2 5 9 7 4 7 the mode is both 5 and 7
2) Median: the middle point in which half of the cases are above and below. It can be used
with ordinal, interval and ratio level data. To find, organize from the highest to the lowest
then count to the middle. If there is an odd number ie 7 people aged 12 17 20 27 30 55
80 the median age is 27. if there is an even number of people aged 17 20 26 30 50 70
the median is somewhere b/w 26 and 30. Add the two numbers together and then divide
the answer by 2 to get the media. 26+30= 56 /2 = 28 as the median.
3) Mean (arithmetic average): is most widely used measures of central tendency. Only
used with interval or ratio level data. Find the mean by adding up all the scores and
divide by the number of scores ie 17+20+26+30+50+70= 213/6 = 35.5. The mean is
strongly affected by changes in extreme values.
If the frequency distribution forms a “normal” or bell-shaped curve the three measures
are equal. If it is a skewed distribution ie lots of higher and lower numbers the measures
are unequal
Zero variation: means that there is no variation across the data
Variation is measured in 3 ways:
1) Range: consists of the largest and smallest scores ie the age range for the bus stop in
front of a bar is 25-30. Subtract 30 from 25 and the range is 10 years. Used for ordinal,
interval, ratio level data
2) Percentiles: tell the score at the specific place w/in the distribution ie have 100 people
and want to find the 25th percentile. Rank the scores and count up from the bottom until
the number 25 is reached. If the total is not 100, adjust the distribution to a percentage
basis. Used for ordinal, interval, ratio level data
You're Reading a Preview

Unlock to view full version