# SOAN 2120 Chapter Notes - Chapter ch 4-9: Sampling Frame, Nonverbal Communication, Descriptive Statistics

130 views35 pages

14 Jan 2018

School

Department

Course

Professor

Introductory Methods Textbook Notes

Chapter #4, Analysis of Quantitative Data, Page 62-88

Dealing with Data

Coding Data

• Data coding means systematically reorganizing raw numerical data into a format that is easy to analyze

usig oputes. It a e siple at ties, ut also ople he the data ist ell ogaized o i

the form of numbers. Researchers (R) develop rules to assign numbers to variable attributes (1=single

people, 2=married people, 3=common law people, 4=divorced people, etc.) Each category of a variable

and missing information need a code. Codebook: a document that describes the procedure for coding

variables and their location in a format for computers. R begin to think about a coding procedure and

codebook before they collect data. Precoding: means placing the code categories (1=man, 2=woman,

3=trans) on the questionnaire.

Entering Data

• In the grid, each row represents a respondent, subject or case. A column or a set of columns represents

specific variables. It is possible to go from a column and row location (row 7, column 5) back to the

original source of data.

• 1 row= 1 student. 250 students=250 rows. And the columns represent how many questions were

asked.

• There are four ways to get raw quantitative data into a computer. #1) Code sheet: paper with a printed

grid on which a R records information so that it can be easily entered into a computer. It is an

alternative to the direct-entry method and using optical-scan sheets. #2) Direct-entry method: a

method of entering data into a computer by typing data without code or optical-scan sheets. Can

include an online survey, where the R or subject enters the data manually. #3) Optical scan: gather

information, then enter it into optical-scan sheets, or have a respondent enter it in, by having them fill

in the correct dots. Like a scantron, where a machine scans the data, and transfers it onto a computer.

#4) Bar code: gather information and convert in into different widths of bars that are associated with

specific numerical values. Then scan it onto a computer.

Cleaning Data

• ‘esults a get uied if data ist added ito the opute oetl. Afte data has ee entered into

a computer, R verify the coding in two ways. #1) Possible code cleaning: or wild code checking,

cleaning data using a computer in which the R looks for responses or answer categories that cannot

have cases, or finding variables for impossible codes (like age 300, that would be an error.) #2)

Contingency cleaning: or consistency checking, cleaning data using a computer in which the R looks at

the combination of categories for two variables for logically impossible cases. Cross classifying two

vaiales soeoe ho didt fiish H“, sas thei a doto

find more resources at oneclass.com

find more resources at oneclass.com

Results with One Variable

Frequency Distributions

• The word statistics can mean a set of collected numbers, and mathematics used to manipulate and

summarize the features of these numbers. Descriptive statistics: a general type of simple statistics

used by researchers to describe basic patterns in the data. They are characterized by the number of

variables involved: univariate, bivariate, and multivariate (1, 2, and 3 or more variables.) Univariate

statistics: statistical measures which deal with one variable only, easiest way to describe the numerical

data of one variable is with a frequency distribution: a table that shows the distribution of cases into

the categories of one variable (the number or percent of cases in each category, like using gender of

400 people.) You can present the information in graphic form like a histogram: a type of bar chart used

to visually display the distribution of a continuous variable (no gaps). Bar chart: a display of

quantitative data from one variable in the form of rectangles where longer rectangles indicate more

cases in a variable category. Usually, it is used with discrete data, and there is a small space between

rectangles. Rectangles can have a horizontal or vertical orientation. Also called a bar graph (gaps). Pie

chart: a display of numerical information of one variable that divides a circle into fractions by lines

epesetig the popotio of ases i the aiales attiutes.

Measures of Central Tendency

• Mean: the average (the sum of all the scores divided by the total number of scores.) The mean is

sensitive to outliers.

• Median: the middle number

• Mode: the most common

- bimodal: a distribution of two modes.

- multimodal: a distribution with more than one mode

Normal distribution: a ell-shaped feue polgo fo a distiutio of ases, ith a peak i the ete

and identical curving slopes on either side of the centre. It is the distribution of many naturally occurring

phenomena and is the basis for much statistical theory. This happens is the mean, median, and mode equal

each other.

Skewed distribution: a distribution of cases among the categories of a variable that is not normal (not a bell

shape.) instead of an equal number of cases at both ends, more are at one of the extremes (more on one side

than the other.) Here, the mean, median, and mode are different, mainly if there is a few odd extremely high

or low scores.

Most cases have lower scores, with few high scores, the mean will be the highest, then median and mode

lowest—right

Most cases have higher scores, with few low scores, the mean will be the lowest, then median and mode

highest--- left

Right Skewed (positive):

Mode (highest), Median, Mean

Left Skewed (Negative):

Mean, Median, Mode (lowest)

find more resources at oneclass.com

find more resources at oneclass.com

Mean is to the left of median= left skewed = negative

Mean exceeds mode- right

Mean less than mode- left

Right, POSITIVE; mean > median

Left, NEGATIVE; mean < median

Measures of Variation

• Measures of central tendency are 1 number (univariate) summary of a distribution but only give us the

centre. Another characteristic of a distribution is its spread, dispersion, and variability around the

centre. Zero variation: every number is the same (each family makes $35,600 a year.)

• Measuring variation can happen in 3 ways. Range: a measure of dispersion for one variable indicating

the highest and lowest scores. Biggest and smallest number form the range, 25,26,27,30,33,34,35 the

range is 35-25=10, therefore the range is 10. Percentiles: a measure of dispersion for one variable that

indicates the percentage of cases at or below a score of point. Median is the 50th percentile, or it can

be the 25% percentile, where 25% of the items in the distribution have that score or a lower one.

Standard deviation: a measure of dispersion for one variable that indicates an average distance

between the scores and the mean. It simply means standard (average) deviation (difference,) or

average difference. The bigger the standard deviation (SD) the bigger the difference, the smaller the SD

the more similar the difference, and closer the scores.

• Computing the Standard Deviation: #1) compute the mean #2) subtract the mean from each score #3)

square the resulting difference for each score #4) total up the squared difference to get the sum of

squares #5) divide the sum of squares by the number of cases to get the variance #6) take the square

root of the variance, which is the standard deviation. Example: 15,12,12,10,16,18,8,9 mean =12.5. 15-

12.5=2.5, 12-12.5=-0.5, 12-12.5=-0.5, 10-12.5=-2.5, 16-12.5=3.5, 18-12.5=5.5, 8-12.5=-4.5, 9-12.5=-3.5.

Square the score-mean, 2.5=6.25, -0.5=0.25, -0.5=-0.25, -2.5=6.25, 3.5=12.25, 5.5=30.25, -4.5=20.25,

3.5=12.25. The variance is then all the sum of squares,

6.25+0.25+0.25+6.25+12.25+30.25+20.25+12.25=88 divided by case numbers minus 1 case. 8-1=7, so

88/7=12.571. The standard deviation is then the square root of the variance, which is v~12.571=3.546.

Find the mean, minus the scores by the mean, square the scores (all become positive,) add up all the

squared scores, divide the squared sum by number of cases -1 to get the variance, then anti-square

the variance to get the standard deviation. The closer the number, the more similar the cases.

• The SD and mean are meant to create Z-Scores: a way to locate a score in a distribution of scores, by

determining the number of SDs it is above or below the mean or arithmetic average. This allows R to

compare two or more distributions or groups.

• Z-score example: GPA at Cartier U., is 2.62 (mean) with a SD of 0.50, where at Hudson U., the GPA is

3.24 (mean) with a SD, of 0.40. To get the z-soe, ou alulate a idiiduals GPA sutatig it

from the mean, then dividing by the SD, so one Cartie U. studets GPA is .6, so .6-2.62=1/0.50=2,

hee a Hudso U. studets GPA is .6, so .6-3.24=0.40/0.40=1.

find more resources at oneclass.com

find more resources at oneclass.com