Biostats 2244 Textbook Notes
Chapter 1 – Introduction (Sampling and Study Design)
1-1 Overview
Data – Observations (measurements, genders, survey responses) that have been
collected
Statistics – A collection of methods for planning experiments, obtaining data and then
organizing, summarizing, analyzing, interpreting, presenting and drawing conclusions
based on data
Population – Complete collection of all elements (scores, people, measurements) to be
studied. The collection is complete in that it includes all subjects to be studied
Census – The collection of data from every member of the population
Sample – A subcollection of members selected from part of a population
Example: A poll asked 1000 adults a question. 1000 survey subjects constitute a
sample, whereas the population would be all 202 million Americans. Every 10
years the government attempts a census of every citizen but that is almost
impossible.
Note: It is extremely important to obtain sample data that is representative of the
population from which data are drawn.
1-2 Types of Data
Parameter – A measurement describing some characteristic of a population.
Example: If a freshwater pound is excavated and filled and stocked with 500
rainbow trout with a weight of 2100 pounds we get an average weight of 4.2
pounds. Since 500 is the total population the number 4.2 is a parameter.
Statistic – A measurement describing some characteristic of a sample.
Example: In a sample of 877 executives it is found that 45% would not hire
someone with an error on their job application. 45% is a statistic since not every
executive was surveyed.
Quantitative Data – Consists of numbers representing counts or measurements (height,
weight). It is important to use the appropriate units (dollars, hours, feet).
Discrete Data – Results when the number of possible values is either a finite
number or a countable number (the number of possible values is 0 or 1 or 2).
Example: Chickens can lay 0, 1, 2, etc. eggs. They cannot lay an infinite
amount of eggs. Continuous (numerical) Data – Result from infinitely many possible values that
correspond to some continuous scale that covers a range of values without gaps,
interruptions or jumps.
Example: The amount of milk from cows is continuous because they are
measurements that can assume any value. A cow can yield an infinite
amount of milk between 1 and 2 gallons (1.245 gallons)
Note: Grammar dictates we use “fewer” for discrete amounts and “less” for
continuous amounts
Qualitative Data – Data can be separated into different categories that are non-
numerical (eye color, gender).
Nominal Level of Measurement – Characterized by data that consists of names, labels or
categories only. The data cannot be arranged in an ordering scheme (low to high)
Example: Survey responses of yes, no and undecided. Colors of pea pods (green,
yellow)
Ordinal Level of Measurement – Data that can be arranged in some order, but
differences cannot be determined or are meaningless. Provide information about
comparisons but not the magnitudes.
Example: Course grades of A, B, C, D or F. There is an order but the differences
cannot be calculated.
Interval Level of Measurement – Similar to the ordinal level but with the additional
property that the difference between any two data values is meaningful. Data at this
level does not have a natural starting point.
Example: Temperatures of 98.2 F and 98.6 F have a difference of 0.6 F but there
o
is no initial starting point – i.e. temperature does not start at 0 F.
Ratio Level of Measurement – The interval level but with the additional property that
there is a natural zero starting point. For values at this level but differences and ratios
are both meaningful.
Example: Weights of bald eagles. 0kg represents no weight and 4kg is twice as
heavy as 2kg.
o o
Note: The difference between interval and ratio levels is difficult. 25 F is not half of 50 F
but being age 4 is half of age 8. 1-3 Design of Experiments
Successful use of statistics typically requires more common sense than mathematical
expertise.
Voluntary Response Sample – One in which the respondents themselves decide whether
to be included. This method is flawed because if often happens that only people with
strong interest or opinion will respond and thus responses are not representative of the
whole population.
If sample data are not collected in an appropriate way, the data may be so
completely useless that no amount of statistical analysis can salvage them.
Observational Study – We observe and measure specific characteristics but we don’t
attempt to modify the subjects being studied
Cross-Sectional Study – Data are observed, measured and collected at one point
in time.
More
Less