SOAN 2120 Chapter Notes - Chapter ch 4-9: Sampling Frame, Nonverbal Communication, Descriptive Statistics

130 views35 pages
Introductory Methods Textbook Notes
Chapter #4, Analysis of Quantitative Data, Page 62-88
Dealing with Data
Coding Data
Data coding means systematically reorganizing raw numerical data into a format that is easy to analyze
usig oputes. It a e siple at ties, ut also ople he the data ist ell ogaized o i
the form of numbers. Researchers (R) develop rules to assign numbers to variable attributes (1=single
people, 2=married people, 3=common law people, 4=divorced people, etc.) Each category of a variable
and missing information need a code. Codebook: a document that describes the procedure for coding
variables and their location in a format for computers. R begin to think about a coding procedure and
codebook before they collect data. Precoding: means placing the code categories (1=man, 2=woman,
3=trans) on the questionnaire.
Entering Data
In the grid, each row represents a respondent, subject or case. A column or a set of columns represents
specific variables. It is possible to go from a column and row location (row 7, column 5) back to the
original source of data.
1 row= 1 student. 250 students=250 rows. And the columns represent how many questions were
There are four ways to get raw quantitative data into a computer. #1) Code sheet: paper with a printed
grid on which a R records information so that it can be easily entered into a computer. It is an
alternative to the direct-entry method and using optical-scan sheets. #2) Direct-entry method: a
method of entering data into a computer by typing data without code or optical-scan sheets. Can
include an online survey, where the R or subject enters the data manually. #3) Optical scan: gather
information, then enter it into optical-scan sheets, or have a respondent enter it in, by having them fill
in the correct dots. Like a scantron, where a machine scans the data, and transfers it onto a computer.
#4) Bar code: gather information and convert in into different widths of bars that are associated with
specific numerical values. Then scan it onto a computer.
Cleaning Data
‘esults a get uied if data ist added ito the opute oetl. Afte data has ee entered into
a computer, R verify the coding in two ways. #1) Possible code cleaning: or wild code checking,
cleaning data using a computer in which the R looks for responses or answer categories that cannot
have cases, or finding variables for impossible codes (like age 300, that would be an error.) #2)
Contingency cleaning: or consistency checking, cleaning data using a computer in which the R looks at
the combination of categories for two variables for logically impossible cases. Cross classifying two
vaiales soeoe ho didt fiish H“, sas thei a doto
find more resources at
find more resources at
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 35 pages and 3 million more documents.

Already have an account? Log in
Results with One Variable
Frequency Distributions
The word statistics can mean a set of collected numbers, and mathematics used to manipulate and
summarize the features of these numbers. Descriptive statistics: a general type of simple statistics
used by researchers to describe basic patterns in the data. They are characterized by the number of
variables involved: univariate, bivariate, and multivariate (1, 2, and 3 or more variables.) Univariate
statistics: statistical measures which deal with one variable only, easiest way to describe the numerical
data of one variable is with a frequency distribution: a table that shows the distribution of cases into
the categories of one variable (the number or percent of cases in each category, like using gender of
400 people.) You can present the information in graphic form like a histogram: a type of bar chart used
to visually display the distribution of a continuous variable (no gaps). Bar chart: a display of
quantitative data from one variable in the form of rectangles where longer rectangles indicate more
cases in a variable category. Usually, it is used with discrete data, and there is a small space between
rectangles. Rectangles can have a horizontal or vertical orientation. Also called a bar graph (gaps). Pie
chart: a display of numerical information of one variable that divides a circle into fractions by lines
epesetig the popotio of ases i the aiales attiutes.
Measures of Central Tendency
Mean: the average (the sum of all the scores divided by the total number of scores.) The mean is
sensitive to outliers.
Median: the middle number
Mode: the most common
- bimodal: a distribution of two modes.
- multimodal: a distribution with more than one mode
Normal distribution: a ell-shaped feue polgo fo a distiutio of ases, ith a peak i the ete
and identical curving slopes on either side of the centre. It is the distribution of many naturally occurring
phenomena and is the basis for much statistical theory. This happens is the mean, median, and mode equal
each other.
Skewed distribution: a distribution of cases among the categories of a variable that is not normal (not a bell
shape.) instead of an equal number of cases at both ends, more are at one of the extremes (more on one side
than the other.) Here, the mean, median, and mode are different, mainly if there is a few odd extremely high
or low scores.
Most cases have lower scores, with few high scores, the mean will be the highest, then median and mode
Most cases have higher scores, with few low scores, the mean will be the lowest, then median and mode
highest--- left
Right Skewed (positive):
Mode (highest), Median, Mean
Left Skewed (Negative):
Mean, Median, Mode (lowest)
find more resources at
find more resources at
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 35 pages and 3 million more documents.

Already have an account? Log in
Mean is to the left of median= left skewed = negative
Mean exceeds mode- right
Mean less than mode- left
Right, POSITIVE; mean > median
Left, NEGATIVE; mean < median
Measures of Variation
Measures of central tendency are 1 number (univariate) summary of a distribution but only give us the
centre. Another characteristic of a distribution is its spread, dispersion, and variability around the
centre. Zero variation: every number is the same (each family makes $35,600 a year.)
Measuring variation can happen in 3 ways. Range: a measure of dispersion for one variable indicating
the highest and lowest scores. Biggest and smallest number form the range, 25,26,27,30,33,34,35 the
range is 35-25=10, therefore the range is 10. Percentiles: a measure of dispersion for one variable that
indicates the percentage of cases at or below a score of point. Median is the 50th percentile, or it can
be the 25% percentile, where 25% of the items in the distribution have that score or a lower one.
Standard deviation: a measure of dispersion for one variable that indicates an average distance
between the scores and the mean. It simply means standard (average) deviation (difference,) or
average difference. The bigger the standard deviation (SD) the bigger the difference, the smaller the SD
the more similar the difference, and closer the scores.
Computing the Standard Deviation: #1) compute the mean #2) subtract the mean from each score #3)
square the resulting difference for each score #4) total up the squared difference to get the sum of
squares #5) divide the sum of squares by the number of cases to get the variance #6) take the square
root of the variance, which is the standard deviation. Example: 15,12,12,10,16,18,8,9 mean =12.5. 15-
12.5=2.5, 12-12.5=-0.5, 12-12.5=-0.5, 10-12.5=-2.5, 16-12.5=3.5, 18-12.5=5.5, 8-12.5=-4.5, 9-12.5=-3.5.
Square the score-mean, 2.5=6.25, -0.5=0.25, -0.5=-0.25, -2.5=6.25, 3.5=12.25, 5.5=30.25, -4.5=20.25,
3.5=12.25. The variance is then all the sum of squares,
6.25+0.25+0.25+6.25+12.25+30.25+20.25+12.25=88 divided by case numbers minus 1 case. 8-1=7, so
88/7=12.571. The standard deviation is then the square root of the variance, which is v~12.571=3.546.
Find the mean, minus the scores by the mean, square the scores (all become positive,) add up all the
squared scores, divide the squared sum by number of cases -1 to get the variance, then anti-square
the variance to get the standard deviation. The closer the number, the more similar the cases.
The SD and mean are meant to create Z-Scores: a way to locate a score in a distribution of scores, by
determining the number of SDs it is above or below the mean or arithmetic average. This allows R to
compare two or more distributions or groups.
Z-score example: GPA at Cartier U., is 2.62 (mean) with a SD of 0.50, where at Hudson U., the GPA is
3.24 (mean) with a SD, of 0.40. To get the z-soe, ou alulate a idiiduals GPA  sutatig it
from the mean, then dividing by the SD, so one Cartie U. studets GPA is .6, so .6-2.62=1/0.50=2,
hee a Hudso U. studets GPA is .6, so .6-3.24=0.40/0.40=1.
find more resources at
find more resources at
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 35 pages and 3 million more documents.

Already have an account? Log in

Get OneClass Notes+

Unlimited access to class notes and textbook notes.

YearlyBest Value
75% OFF
$8 USD/m
$30 USD/m
You will be charged $96 USD upfront and auto renewed at the end of each cycle. You may cancel anytime under Payment Settings. For more information, see our Terms and Privacy.
Payments are encrypted using 256-bit SSL. Powered by Stripe.