Textbook Notes (280,000)

CA (160,000)

U of G (10,000)

SOAN (400)

SOAN 2120 (100)

David Walters (20)

Chapter 8

School

University of GuelphDepartment

Sociology and AnthropologyCourse Code

SOAN 2120Professor

David WaltersChapter

8This

**preview**shows pages 1-2. to view the full**6 pages of the document.**SOAN 2120

Chapter 8 summary

NOTE: all charts are for example purposes only.

•Data coding symmetrically reorganizing raw numerical data into a format that is easy to

analyze through the use of computers.

•Each category of a variable and missing information needs a code

•Codebook is a document of one or more pages that describes the coding procedure

and the location of the data for variables in a format that computers ca use. Keeps data

organized and detailed

•Pre-coding means placing the code categories (ie 1 male and 2 female) on the survey

questionnaire

- If no pre-coding is done, a codebook must be created as the data is collected. The

1’s are always in column 2 and the 10’s in column 1.

•Four ways to get raw quantitative data into a computer

1) Code-sheet: gather the information and then transfer it to a grid format and then

typed into a computer

2) Direct entry method including CATI: observe/listen to the information and enter it

or have the respondent enter it themselves.

3) Optical scan: gather information and enter it onto optical scan sheets or have the

respondent do it by filling in the correct “dots.” Use an optical scanner to transfer the

information to the computer

4) Bar code: gather information and convert it into different bar codes specific to

numerical values, use a bar code reader to scan information to the computer.

•Must be very accurate when coding data. If one codes 10-15% random data and finds

no errors they are fine. If an error is found, all the data must be checked.

•possible code cleaning/wild code checking: checking the categories of all variables

for impossible codes. Used to verify coding.

•Contingency cleaning/consistency checking: cross-classifying two variables and

looking for logically impossible combinations. Used to verify coding ie education is cross-

classified by occupation

•Statistics: a set of collected numbers as well as a branch of applied mathematics used

to manipulate and summarize the features of the numbers

Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

•Descriptive statistics: describe numerical data

-Univariable statistics: describe one variable the easiest way to describe the

numerical data of one variable is with a frequency distribution. Can be also

represented in graphs such as histogramsb usually upright bar graphs for interval

or ratio level data bar charts used for discrete variables. Have a vertical or

horizontal orientation with a small space b/w the bars/

•Frequency polygon: plots interval or ratio level data

3 measures of central tendency or measures of the center of the frequency

distribution:

1) Mode: the easiest to use and can be used w nominal, ordinal, interval, or ratio data. It is

the most common or frequently occurring number ie in a list of 6 5 7 10 9 5 3 5 the mode

is 5. Or in a list of 5 6 1 2 5 9 7 4 7 the mode is both 5 and 7

2) Median: the middle point in which half of the cases are above and below. It can be used

with ordinal, interval and ratio level data. To find, organize from the highest to the lowest

then count to the middle. If there is an odd number ie 7 people aged 12 17 20 27 30 55

80 the median age is 27. if there is an even number of people aged 17 20 26 30 50 70

the median is somewhere b/w 26 and 30. Add the two numbers together and then divide

the answer by 2 to get the media. 26+30= 56 /2 = 28 as the median.

3) Mean (arithmetic average): is most widely used measures of central tendency. Only

used with interval or ratio level data. Find the mean by adding up all the scores and

divide by the number of scores ie 17+20+26+30+50+70= 213/6 = 35.5. The mean is

strongly affected by changes in extreme values.

•If the frequency distribution forms a “normal” or bell-shaped curve the three measures

are equal. If it is a skewed distribution ie lots of higher and lower numbers the measures

are unequal

•Zero variation: means that there is no variation across the data

Variation is measured in 3 ways:

1) Range: consists of the largest and smallest scores ie the age range for the bus stop in

front of a bar is 25-30. Subtract 30 from 25 and the range is 10 years. Used for ordinal,

interval, ratio level data

2) Percentiles: tell the score at the specific place w/in the distribution ie have 100 people

and want to find the 25th percentile. Rank the scores and count up from the bottom until

the number 25 is reached. If the total is not 100, adjust the distribution to a percentage

basis. Used for ordinal, interval, ratio level data

###### You're Reading a Preview

Unlock to view full version