Study Guides (248,595)
Canada (121,624)
Psychology (1,236)
PSY3307 (12)
Midterm

PSY3307 List of Psychometric Terms Midterm Fall 2013.docx

24 Pages
123 Views

Department
Psychology
Course Code
PSY3307
Professor
Darcy Santor

This preview shows pages 1,2,3,4. Sign up to view the full 24 pages of the document.
Description
List: Measurement: definition, 2 goals • Definition: measurement is the assignment of numbers to individuals in a systematic way. It is a way of representing the properties of individuals. • Formal definition: measurement is the assignment of numbers to aspects of objects or events according to one or another rule or convention (Stephens, 1968). • Goals of measurement: o 1) controls information  The questions that are included • What you pay attention to, content domain, differing degrees of control (1 question vs 250 questions) o 2) streamlines decision making  Numbers streamline decision making  Cut off points, how you add items up  Yes/no questions streamline decision making • How does measurement affect the performance of a test? Scale performance • what you decide about a person, will be determined by the performance of the items and the overall test • how do we decide a test is performing well? • How do we define performance? • Ex: What would the distribution of scores on a midterm need to look like to conclude that the scale has performed well? o Distribution, variability, don’t want everyone lumped together o Average shifted up, but distribution the same o But differences in learning wouldn’t be captured if the test was designed for everyone to get 100 o Mean and standard deviation are reasonable and make sense • How does scale affect the performance of a test? Performance • The key question for psychometrics is to understand how the manner in which you define and operationalize a construct, the type of test employed, or the analytical model or statistical test you use, influences the ability/performance of the test to measure the construct of interest, with a group of individuals or even just a single individual • Defining performance: o Valid o Reliable – internally consistent o Able to effectively discriminate among individuals (ex: who are depressed) Measures of Performance • Mean o X = your score on the midterm, i= each of us, N= # people • Variance o Computing the average score, every person gets their score replaced by a difference score relative to the mean  Tells you about the difference between the class and you  Tells you whether you’re more or less than average o Tells us how much people vary o What’s the impact of the residual on the overall total?  Sums of squares of deviations  Variance over-represents the impact of scores which are outliers • The impact of taking the squared residuals is that outliers get really big o Large variability = accurate in estimating real differences  You want a test with large variability  A test that lumps everyone together tells you that differences in achievement have not been measured accurately  o Standard deviation o Variability • Definition: the extent to which scores among a group of people differ • Variability + the extent to which the variability among scores can be predicted by the variability among scores from some other source = foundation of knowing anything • The life blood of research • Variability is good! • How does variability affect the test scores? Median – the middle score Mode – the most frequent score Two ways to capture “variablilty” Sum of absolute differences vs sum of square deviations. Distribution of scores • The normal distribution: arises as the outcome of the central limit theorem, which states that under mild conditions the sum of a large number of random variables is distributed approximately normally • Bell shape of the normal distribution makes it a good choice for modelling a variety of random variables in practice • Normal distribution is also known as the Gaussian Distribution o Johann Carl Friedrich Gauss • Almost anything will produce a normal distribution Deviation Aptitude test • Myers-Briggs Type Indicator (MBTI) • Measure psychological preferences in how people perceive the world and make decisions • What factors will affect scores? • How will that influence how we make decisions and control information? o What will affect ability to control info?  Types of questions asked, number of questions o How does this affect ability to make decisions?  Less options = streamlining decision making – there’s 4 main categories • Tells you about your preferences, its unambiguous Diagnostic test • Structured Clinical Interview for DSM-IV (SCID-I) axis 1 disorders • Diagnostic exam to determine major mental disorders and personality disorders • Structured interview – have to go through every question • How does this control information? o Contrasted with a self report measure of depression o Very structured – not a simple yes or no o Level of control is at the clinician level rather than the patient – clinician controls whether person is diagnosed or not o How structured interviews exert control:  1) Control over the clinician – structured, have to as a certain way, the test imposes control • vs family doctor who can ask any questions, some or all  2) control over how you add up information from each question • diagnostic decision trees o 5/9 yes, rule book says you’re depressed, whereas family doctor might diagnose at 2/9  3) The judgement lies with the clinician • The clinician decides whether it is meaningful answer • Compared to self report where you have to accept it as fact  4) The training the clinician goes through  5) The type of questions • Open vs closed format • Semi structured = you get to ask until you’re satisfied with the answer • How do structured interviews affect decision making? o Decision trees – follow it o Tries to take out all the variability that comes with someone asking questions however they like to • Clinicians can still be biased Ability tests • 1) The Graduate Record Examination (GRE) – standardized test as admission requirement into many grad schools o The test is normed  Absolute score doesn’t matter, it matters how well you do compared to everyone else • Works to your advantage in general knowledge in psychology, but not with math and reasoning  Norming can control information • If the raw scores have a mean lower than 50 you can norm them to spread them out, shift them up, or both  Norming facilitates decision making • Tells you how much above or below the average someone is • Accommodates vast variability across samples o Relative to your year, your sample o How well you did relative to everyone else • 2) Beck Depression Inventory (BDI) – self report, multiple choice o The amount of something you have is important o Self report affects variability  What will affect your score? • Participant factors – demand characteristics – will answer what you think they want you to • Types of questions – personal o Content, the way its asked, the scale of the question o How does it control information? How does it facilitate decision making?  A score between 0-63 on BDI – facilitates decision making  The cut off score, or threshold, introduces facilitation of decision making Personality tests • Revised NEO Personality Inventory – NEO PI-R: 5 factor model – Extraversion, Agreeableness, Conscientiousness, Neuroticism, Openness • What information are you controlling to make the decision and what cut off score are you using to decide whether they have that personality trait or not? Normative tests • WISC – Wechsler Intelligence Scale for Children – generates an IQ score o Standard scoring – rank order of test content, median score 100, sd of 15 o The Flynn Effect  Performance on IQ tests peaked in late 1990s and has been declining moderately since  But IQ doesn’t fall, should re-norm the questions to make them more applicable to today o Designed to locate you relative to others in terms of your innate IQ o Not something that can be acquired • How does it control information and streamline decision making? Projective tests • Rorschach Test – Rorschach Inkblot Test o Perceptions of inkblots recorded and analyzed using psychological interpretation and algorithms o Personality and emotional functioning • What will increase variability, how will this control information and facilitate decision making? • Factors that control or affect variability o Online? o How long you show the card o The response to each card plus the sequence of the cards o How it is administered o Personal experiences o How it is scored  How is the scoring controlled? • Clinician decided – training involved, books that categorize responses o Affects responses – cant lie, minimizes demand characteristics – cant consciously manipulate your answer since you don’t know right answer • Main problem – the mechanism isn’t clear with this test • Advantage compared to forced choice o Not constrained o Content domain open o Cant miss anything, theres no limit, people can say anything • Designed to overcome the shortcomings of self-report questionnaires Content domain • if your questions don’t cover the entire content domain their score isn’t a proper measure and they wont do well on the test • do the questions cover the entire content domain that they should, or are they missing something? • Want the entire content domain covered Self report tests Sources of variability Systematic Variance Unsystematic Variance Participant Factors/demand characteristics Ethical issues Psychometric issues Measures of performance for screening tests: • sensitivity, specificity, positive predictive value Sensitivity: • Sensitivity = (Designated Cases) (True Cases) • Identifying everybody who has the disorder • Perfect sensitivity = if 100 people have the disorder and you pick up 100 people • The percentage of people who test positive for a specific disease among a group of people who have the disease • The number of people with a disorder who are identified o How many people with the disorder are picked up by the test? Positive Predictive Value • Positive predictive value =(True Cases) (Designated Cases) • The proportion of subjects with a positive test result who actually have the disease • Perfect positive predictive value: everyone who is designated as having it actually has it • Of everyone who was designated as depressed, how many of them were actually depressed? • Out of the people who are designated as having it, the number of people who actually have it Face validity • An estimate of whether a test appears to measure a certain criterion • If it look like it’s measuring what it’s supposed to • Face validity does not guarantee that the test actually measures phenomena in that domain. Indeed, when a test is subject to faking (malingering), low face validity might make the test more valid. Content validity • Content validity is a non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured” (Anastasi & Urbina, 1997 p. 114). • For example, does an IQ questionnaire have items covering all areas of intelligence discussed in the scientific literature? Structural validity • Evaluates whether a test measures one or more distinct underlying components of the constructs • Ex: lack of happiness vs. depressive symptoms • Depression tests should measure one underlying idea, whereas personality tests should measure multiple underlying ideas o Uni vs multi dimensional Construct validity • Extent to which the test actually measures what it’s supposed to measure • Ex: to what extent is an IQ test actually measuring intelligence? • Involves the empirical and theoretical support for the underlying idea, ability, conditions, etc that is being assessed • Evidence includes the statistical analysis of the internal structure of the test o Including the relationship between responses to different test items o Relationship between test and measures of other constructs Criterion validity • Correlation between the test and a criterion variable(s) taken as representative of the construct o Ex: a diagnosis, or another test score considered to be valid • Compares the test with other measures or outcomes (the criteria) already held to be valid • Ex: employee selection tests o Often validated against measures of job performance (criterion), and IQ tests are often validated against measured of academic performance (criterion) Incremental validity • How well one test performs above and beyond another test • Does the CES-D add anything to evaluating whether a person is depressed above and beyond what is obtained by using the BDI? Convergent validity • The degree to which a measure is correlated with other measures that it is theoretically predicted to correlate with. Convergent validity refers to a convergence among methods to measure the same thing (e.g., correlation between self-report and interview measures of anxiety should be high) • When evaluating a multiple set of measures and correlate them – how well they match up/correlate with other measures of the same construct • Two thing that are supposed to measure the same thing are correlated = good convergent validity Divergent/discriminant validity • The degree to which the operationalization does not correlate with other operationalizations that it theoretically should not be correlated with. Divergent validity refers to the distinctiveness of the constructs, demonstrated by the divergence of methods designed to measure different things (e.g., correlation between self-report measures of anxiety and depression should be low) • Two things that measure different things ought not to be related • Two things that are supposed to measure different things are not correlated = good divergent validity • If two things that are supposed to measure different things are correlated is this good convergent or poor divergent validity? o How can you tell if it’s good convergent or poor divergent?  If you expect depression and anxiety to be different constructs, then its poor divergent validity  Have to answer this with respect to what you expect ought to be  If your theory said depression and anxiety were not different • You would expect them to correlate, then it would be good convergent validity  Have to look at the hypothesis Prospective validity • Predicts something into the future that you’d expect it to predict • Ex: the midterm should show good prospective validity with the final Assumptions • All things stated or unstated about how the scale works • Ex: what are some assumptions behind DSM-4 measures of depression? o Assumes that differences in severity are not important, you either have it or you don’t o You need all the symptoms listed Operations/indicators/items • How many questions/items are used? Could range from 1 item to hundreds • Operation: the way in which a construct is “operationalized” into a set of questions Continuous/dichotomous • Continuous: there’s a range of scores • Dichotomous: yes or no, 2 answers, you either have it or you don’t Cut scores • The point at which you will designate someone/ categorize someone into something • Ex: BDI has 3 different cut scores: mild, moderate, severe • Which of the two main goals does a ct score help with? o Helps with decision making!  Ex: if you score 49 you’re not neurotic, if you score 50 you are Number of options • If the # of options changes from one question to the next, this can impact performance o If depression gets scored 1-5, then sleep gets scored 1-3, there will be less variability with sleep scores which means you will lose information  Implication: you are overemphasizing mood compared to sleep, assigning more weight to mood than sleep  You’ll be rated as less severely depressed because the sleep item has less options Response anchors • Likert or graded response? Scale: interval, ratio, ordinal, nominal • Ratio: has true zero – the zero means an absence • Interval: doesn’t have a true zero, zero means something Dimensionality: unidimensional vs multidimensional • To know if a test is uni-dimensional or multi-dimensional ask yourself if a single total score makes sense • What’s good about multi-dimensional? o You can cover a lot of ground • What’s bad about multi-dimensional? o Makes decision making harder Graded response • Every option is more severe than the next • Every option has a unique descriptor • Ex: A University of Ottawa Education is … o A) Not worth very much o B) Worthwhile o C) Worth valuable o D) Indispensible • The Leap of Faith • The single most important assumption we make in measuring human abilities, conditions, or personality characteristics, occurs when we assign or impose numbers on individual responses to questions. • Whether the interval between two responses to a single question or the interval between scores form two different individuals is truly an interval remains largely unexamined • It is the single biggest leap of faith we make in measuring people • Saying that the differences between 0-1, 1-2, 2-3, etc are the same o Saying that one person is twice as depressed as another Quincunx • Pachinko like device, aka bean machine or Galton box, to demonstrate the law of error and the normal distribution. Demonstrates the central limit theorem – the normal distribution is approximate to the binomial distribution • Invented by Francis Galton Equations • Tell us how scores deviate from the mean • Correlation Correlation coefficients Correlation table/matrix Types of correlation: Pearson product moment correlation, Spearman’s rank correlation coefficient, Point-biserial coefficient, phi coefficient, measures of association, biserial correlation coefficient, tetrachoric correlation coefficient, rank-biserial correlation coefficient, coefficient of nonlinear relationship (eta)  Know which type of correlation is used with different types of data Variable Y\X  Interval / Ratio X Ordinal X Nominal X  Interval / Ratio Y Pearson r  Biserial b   Point Biserial pb  Ordinal Y  Biserial rb  Spearman rho/Tetrachoric r   tetk Biserial r   rb Nominal Y  Point Biserial r   Rank Bisereal r   Phi, L, C,  pb rb Lambda  Homoscedasticity Heteroscedasticity Effect size: factors affecting effect size Measurement model • How we communicate how we think the scale is structured • Theoretical model • We all have some idea or theory about everything, which is represented in diagrams as a model • Beck’s model or “theory” of depression is different from Hamilton’s • Models greatly influence how their scales are created, what aspects are included and which aren’t, and how response items are obtained (self report vs clinician rated), and how those items are scored • Never know which model is correct o Can work out which one better explains the construct, is more usefl, and more correct o One will better explain how symptoms are related to each other • Data = model + error o We are modelling data, not merely reporting it • Error represents a measure of fit, how well a model fits the data • Models that fit well explain more of the variance in the data – will explain why people have different scores Formal definition Operational definition • Taking a definition and turning it into questions or things to observe, how you actually measure things Conceptual definition True Score • True Score (T) = Observed Score (X) ± Error (E) • We cannot see any construct directly o We can’t see depression, shyness, etc • We make inferences abo
More Less
Unlock Document

Only pages 1,2,3,4 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit