EPI Qual Course Notes.pdf

43 Pages

Course Code
Epidemiology EPI202-01
Murray Mittleman

This preview shows pages 1,2,3,4. Sign up to view the full 43 pages of the document.
EPI Qual Course Notes/Questions EPI 202 Notes Second pass  Risk = individual; CI = average  Can estimate risk directly when follow-up starts at same time, little loss to follow- up, and no death from competing risks.  Risk is a monotonic measure  Number at risk at baseline difficult to determine with staggered entry.  Need to assume uninformative censoring in order for incidence rate/ hazard rate to solve problem of variable entry, loss to follow-up, and competing risks.  C~IΔt is wrong b/c even if no loss to follow-up/competing risks, the # people at risk decreases. So will overestimate # cases.  Cross-sectional studies can’t distinguish between duration and incidence.  When there is homogeneity of a measure of effect on ratio scale, usually EMM on difference scale, vice versa.  AR%: fxn of the disease among exposed attributed to exposure.  PAR%: excess or deficit risk which would occur in the population if the exposure were removed from population.  To check for confounding, treat confounder as “exposure” and: 1) check if confounder associated with outcome among the unexposed and calculate OR, 2) check if confounder associated with exposure in the study base (i.e. controls) and calculate prevalence ratio.  Closed cohorts only need one common time scale for which the membership event is defined.  Risks cannot be directly measured in an open cohort.  Etiologically relevant exposure depends on time (induction period) o The average induction time can be evaluated empirically based on where RR peaks. OR by using an indicator variable (per 2006 exam, Q4).  Person-time grid can be used to represent intensity of exposure (measured as average, maximum, lagged, etc.  Controls are a sample of the person-time that gave rise to the cases (the study base). Gives estimate of the odds of exposure in the person-time at risk. o Controls are a direct random (or conditionally random – matched) sample of the study base. o Exception to at risk assumption (i.e. in study base pool): sampling by proxy (e.g. blood type and nuns  STDs).  Sampling fraction  If the controls are sampled independently of exposure, the same fraction is taken from the exposed and unexposed person-time pools (i.e. controls represent exposure person-time in study base).  If sampled, cases must also be sampled independently of exposure. 1  Density sampling: unconditional logistic regression o Risk set sampling: conditional logistic regression  Closed cohort with complete follow-up: 2X2 tables, linear regression (continuous), logistic regression (dichotomous), poisson – if have person-time data.  Cumulative incidence sampling: unconditional logistic regression o If CI would have been used in cohort study, then this sampling can be used.  Case-cohort sampling: Cox, with variance adjustment. o With risk set matching: CLR, Cox  Case-crossover: conditional logistic regression (each indiv=strata)  Does Cox and CLR always need dichotomous outcome variables? What happens if we have person-time data and want to model a continuous outcome?  In a cohort study, the exposed group is matched to the unexposed group on the matching factor(s) o Crude unbiased, but adjusted is more precise o Effects of matching factors can be evaluated o Effect modification by matching factors can be evaluated  In a case-control study, the case group is matched to the control (referent) group on the matching factor(s) o Crude biased toward null if matching factor true confounder, must adjust o Effect of matching factor cannot be evaluated o Effect modification by matching factor can be evaluated o Matching can improve precision (efficiency) but it is not, in itself, a measure to achieve (or improve) validity  stratification is! o Analysis:  Appropriate matching: stratified analyses valid, but matched more efficient  Unnecessary matching (uncorrelated with exposure): all valid, but lose power with stratified analyses (with binary outcome)  Overmatching (uncorrelated with disease) – BAD: unstratified unmatched analysis valid and most precise. Stratified matched and unmatched also valid, but less precise.  Matching on an intermediate – really really bad: only unstratified unmatched analysis valid  Confounding and matching should be looked upon conditionally on other confounding factors, which are already accounted for  P-value: the probability that a result as extreme or more extreme than the one we observed would occur due to chance variation, if the null were true.  Hypothesis test don’t give info on magnitude/direction/range/power  chi-square test always two-tailed! Only gives info about consistency of data with null, don’t provide info on consistency of data with alternative states of nature (doesn’t take alternative into account  consider Bayesian methods or confidence intervals!)  If data-based CI does exclude the null, then sure that the results are statistically significant. If CI does not exclude null, then inconclusive (maybe still significant) o i.e., when the 100(1-α)% CI does not include E(X|H ), we may reject the 0 null and conclude that our finds are significant at the specified α-level 2  variance of binomial distribution reaches maximum when binomial proportion = 0.5  p-value changes not only as a function of overall sample size (N + N ), but also 1 0 as a function of other features of the study design, such as balance:1N / 1N +0 )  source of random variability in RCT: random treatment assignment – unmeasured confounding; source of random variability in obs. study: a) sampling of the conditional relation, i.e., sampling unrepresentative individuals from superpopulation, b) randomization by nature, i.e., after control for confounding, the exposure can be regarded as being randomly assigned by nature.  Interpreting rate difference = 1.85/1000 PY: These data indicate an excess of 1.85 CHD deaths per 1000 PY due to smoking among british male doctors.  Interpreting CI: These data are consistent with IRDs ranging from 1.2-2.5/1000 PY with 95% confidence, assuming no confounding, selection bias, and information bias  Interpreting ratio: there is a 2.5-fold increase in CHD risk/rate/odds from high CAT levels in these data.  The independent relationship between the confounder and the disease is an intrinsic feature of disease biology that cannot be altered by study design  The relationship between the confounder and the exposure of interest is a feature of the particular population base chosen for the study and of the specific individuals who are sampled. This association can be altered at the study design stage by o Matching o Restriction o Selection of a population base that is not characterized by this confounder- exposure association o Randomization  The maximum extent to which crude relative risk can be attributed to the effects of a confounding variable is limited by the minimum of: o Relative risk of the outcome (and exposure) from that confounding variable o Prevalence ratio of prevalence of confounder in the exposed group / prevalence of confounder in unexposed group o Prevalence of confounder: 50%  maximum bias  All estimates, regardless of the weights, are unbiased estimates, under the assumption, which is critical for this analysis, of no effect modification.  Don’t use Miettenen’s test-based CI for MH summaries (gives CI that are too narrow b/c variance under null smaller than variance under observed data) o Becomes increasingly biased as point estimate departs from the null  Summary ratio measures: o Equal weights: ignores that some strata are more informative than others o Proportionate to sample size: doesn’t take balance of the data into account o Inverse variance: when data sparse, weight goes to 0 (b/c have 1/0 in the denominator) and lose all information in the stratum; most efficient with large samples o MH weights: appropriate for sparse and large data 3  Test of heterogeneity: o H : 0R is the same across all l levels of the stratification variable(s) o H : AR is not the same across all l levels of the stratification variable(s) (i.e., ORine OR jor some i,j) o Failure to reject the null may imply insufficient power and NOT homogeneity o When rejected, best to report separate results or a summary from weights that do not reflect arbitrary features of the study design (i.e., standardization with appropriate population-based weights)  Effect modification is not the only reason for stratum-to-stratum variation in effect estimates. Selection bias, information bias, confounding, and chance may also produce variations. In case of chance, test of heterogeneity help assess whether random sampling variation can explain difference  OR MH disguises effect modification b/c this average is not taken over any recognizable age distribution representing a population to which we may wish to generalize the results, but over the distribution of within-stratum variances (reflects arbitrary features of the study design), which were observed in this particular study.  Effect modifier may be a risk factor of disease only in the presence of the exposure (not necessarily independent of exposure like a confounder)  Confounding exists independent of scale  Matching is a form of stratification (i.e., in a matched study, a matched set is exactly equivalent to a stratum in an unmatched stratified analysis)  Want to use matched analysis when the “level” of potential confounders is unique to each case in the study, or nearly so, one must match in order to obtain appropriate controls. Want to use stratified analysis when each matched set represents a level of potential confounders that is observed repeatedly during the study  Case control: match for efficiency NOT validity  Analyze matched case-control with matched-pair table!!  Crude analysis of matched case-control data generally biases the estimate of the OR towards the null  Standardization weights chosen based on distribution of effect modifier in population of interest to you!  Standardization is computationally identical to stratified analysis methods, just with different weights derived from the population.  Direct: proportion of cases expected among the unexposed, using stratum-specific rate observed among the exposed (comparison is if your unexposed exposed / unexposed)  Indirect: proportion of cases expected among the exposed, using stratum-specific rate observed among the unexposed (comparison is your exposed / if your exposed unexposed)  simplifies to O / E!!  SMR advantages: simple to compute – only need total # cases (don’t need to know who became ill), efficient statistically (uses stratum-specific rates in unexposed – bigger group  more stable estimate), counterfactual 4  Traditional approach: o The directly standardized rate among the exposed is the crude rate expected in the unexposed if the rates were as observed in the exposed o The directly standardized rate among the unexposed is the crude rate in the unexposed  If no EMM, SMR = SRR = MH  Sampling in case-control studies estimates relative pool of T to T (exposed and 1 2 unexposed) Assignments Exams 5 EPI 203 Notes  Closed cohort: assumes that exposure unchanging during follow-up. Unless you follow up at multiple time-points (Nurses’ health study open or closed)?  For case-cohort (case-base) sampled from open cohort (person-time), can use Cox with variance adjustment. Standard hazard ratio estimate, account for the fact that the risk set samples are not independent of one another in variance.  In case-crossover, can stratify on time of day, day of the week to control for potential cyclic confounders.  In matched nested case-control study, use CLR (stratified by time)  In open cohort, use Cox (but still, only risk sets with cases will be informative).  In RCTs, the intervention is the ACT of telling a participant to take the treatment. Noncompliance = instructions not relayed properly.  Exclude participants not at risk of outcome (not in study base).  Power may only be useful for future studies (not the current one).  Censoring is a form of missing data problem. Ideally, both the birth and death dates of a subject are known, in which case the lifetime is known. o Right censoring: if it is known only that the date of death is after some date, this is called right censoring. Right censoring will occur for those subjects whose birth date is known but who are still alive when they are lost to follow- up or when the study ends. o Left censoring: if a subject's lifetime is known to be less than a certain duration, the lifetime is said to be left-censored (i.e. died before end of follow- up = case). For a left censored datum, we know the subject exists.  “Community effects” may affect effect estimates (e.g. fumes from a factory get out to surrounding areas).  When want everybody that can take a drug to take it (last treatment option), then want FP of hypersensitivity to drug to be as low as possible (i.e. 100% specificity). E.g. if drug is lifesaving, but hypersensitivity is not fatal.  Rate is poisson distributed, with increasing rate, increasing variance (cuz variance = mean).  High-risk populations  greater # cases  greater power  Risk set analysis: get covariate distribution from person-time at which events occurred. Not representative of entire study period.  Beware of collinear covariates  Controls only have to represent exposure distribution in study base from which cases arose (e.g. blood type  STDs, can use nuns as controls for young men – even though not at risk of outcome).  When study base is difficult to define (e.g. case-crossover study in which cases define secondary base), can say study base is people that would’ve reported events (can’t verify) OR all of potential study people (with incomplete ascertainment of cases) (all may be better). 6  When there are no events arising from the person-time in a stratum, the stratum does not contribute to the rate ratio estimate by any technique that incorporates the stratification, such as MH estimate or a stratified MLE of the hazard ratio (Cox). Assignments  Collecting time-varying covariates as non-time-varying will facilitate more practical data collection (collection at only one point).  In some situations, even if: 1) controls may not represent the study base with respect to age; 2) age is a confounding factor in the study base, controlling for age in the case-control analysis will yield a valid age-adjusted estimate of the effect of smoking on mortality IF the controls accurately reflects the exposure distribution of the study base within age strata (i.e. distribution of the smokers and non-smokers must be the same in each age stratum of the controls as in that age stratum of the source population). o If a factor is a confounder, then controls must represent the exposure distribution of the study base from which the cases arose within each strata of the confounder (after stratification). o Confounding is also a source of selection bias? When don’t stratify by age, will have selection bias.  At any PS, the expected proportion of subjects who are treated will be the same, no matter what the covariate level that gave rise to the score. Thus, treatment and covariate level are uncorrelated at any specific PS; therefore, the expected proportion of treated at each covariate level is the same as the expected proportion of untreated. SO, among persons with a given PS, the distribution of the covariates X is on average the same among the treated and untreated. Covariates PS Treatment Outcome  Takehome: whether they got treatment is independent of covariates at specific PS. o PS and treatment modeled separately:  PS = linear combination of covariates  Pr(treatment) = PS + linear combination of other variables  Conditionally on PS, treatment and covariate levels are uncorrelated, so covariate levels aren’t confounders.  At each PS, there is a balance of covariates between treated and untreated.  SMR has smaller variance than PS-matched data (cuz you use everyone). Also not affected by further uncontrolled confounding attributable to imperfect matching. *Jenn’s notes* Cross-sectional study: a type of point-in-time survey where all subjects are survivors with 7 characteristics that are all functions of past history/events; date of survey is a key operational element; disease present reflects cumulative incidence over the past; can compute prevalence, odds, or any functions of these Closed cohort with fixed follow-up: past time is reflected in the cohort baseline characteristics, and effects of time play out during observation Open cohorts: person-time categories summarize relevant history; person-time itself is stage on which effects play out; person-time from different people considered to be interchangeable, so possible to compare different people or different periods of life in the same people Stratification by time: interested in risk for an event at a particular moment; continuously measured person-time unnecessary; only days with events enter the analysis; formation of risk sets; random sampling of risk sets will give same results as a full cohort survival analysis; conditional logistic regression matched on date; no rare disease assumption needed because of the fine stratification of time, and the summary odds ratio across strata is a direct estimate of the rate ratio Stratification by person: used in case-crossover design; only people with events enter analysis; conditional logistic regression matched on person Controls: controls in a case-control study are a group of persons or person-days whose exposure status collectively provides information about the distribution of exposure in the persons or person- time giving rise to the cases Effect modification: a change in the biological effect of an exposure by the direct or indirect physical consequences of another factor Source population: the individuals from whom the study population is selected Standard: the set of weights used for standardization; weights sum to 1 Study base: the person-time in the study population that gives rise to the cases Notes 8 1. In a cross-sectional study, subject selection does not need to be entirely at random. Just ensure that there is not differential selection due to both exposure and disease. Sometimes population representativeness will suffer in favor of efficiency and comparability. 2. When a study is looking at many outcomes, they will typically be related to one another, so power calculations for some expected effect is ok to be used for the other outcomes too. These outcomes are probably manifestations of the same underlying process. 3. For clinical trials, you want to treat people in whom the average treatment effect would be beneficial. 4. Power cannot be used to interpret results. We use estimates and confidence intervals to interpret results. Power is computed for some study in the future. 5. Endogenous variables change over time as they interact with each other. Exogenous variables arise from outside the study (e.g., random number generators). 6. Noncompliance could bias effect estimates either way. 7. In a randomized study, residual confounding is more likely when the sample size is small. 8. The power of a study is the probability of rejecting the null hypothesis. When you increase sample size, the distributions for the confidence limits become narrower and closer to one another. However, we should not only consider the null hypothesis because we could have an upper level of concern (i.e. some other relevant alternative). 9. If interested in group-level effects, compliance at the individual level may not be of concern. However, there could be community-level effects interacting with individual effects, which makes correction for compliance much more difficult, and possibly impossible. 10. In a prospective cohort study, where cohort sizes can become seriously imbalanced in later years, power calculations for the original research questions cannot be applied because the research question has changed over time. 11. We spend so much time studying rates, but with rapidly time-varying exposures, % measures are more clinically useful. 12. Person-time gives a good overall exposure distribution. Risk sets depend on event times and exposure distribution at the time of assessment. 13. Person-time analysis uses volumes of experience (i.e., the time contributed by each person). By lumping together person-time into some category, you assume that within that category, the risks are homogenous. Person-time analysis adjusts for time of follow-up per person. 14. With random risk set sampling, you can extrapolate exposure-specific follow-up times based on the sampling fraction. 15. In a matched case-control study, an unmatched analysis will be biased (to the null) when the matching factor has a strong association to exposure. Strata would have only small differences in exposure that get obscured and you lose contrast. 16. Secondary study bases are created by case eligibility and exclusion criteria. 17. We technically stratify on time with risk set sampling, but time is not a strata itself. 9 18. Controls can clearly not be representative of the source population but still be useful. 19. Effects may be amplified in univariate analyses. 20. Matching redefines risk sets. Stratification of the cohort by matched factors essentially samples from a smaller risk set. 21. The case-crossover is a finely-stratified cohort study. The case-crossover design is applicable when the postulated risk factor is common, to allow for ease of exposure recall, and transient, to allow for the ability to create a demarcation of exposure and so that the period of observation is related to the time you can reasonably gather information. The time-varying nature of exposure is related to how long we can collect information. We can look at exposure status in relation to risk in many time intervals. 22. Effect measures can be modified by differential loss of information in one group. 23. The PAR% formula can only be used in case-crossover studies if the exposed and unexposed periods correspond to the appropriate demarcations of time (i.e., the windows are correct). 24. Recall bias: In a case-control study, we are concerned with different people having different patterns of recall, but in a case-crossover study, we are concerned with one person having different patterns of recall (i.e., sleep deprivation & sharps injury). 25. Matching in a cohort study ensures that the prevalence of each matched factor is the same in compared cohorts, that for any combination of covariates there will be at least one control with which to compared with a case, and takes care of any interactions between the matched factors. 26. If matching is impossible using propensity scores, maybe an observational study is not appropriate because the groups aren’t comparable. 27. Propensity scores are often used when lots of information are available for starting the analysis. You should get the same results using propensity scores and ordinary covariates in a logistic regression. Propensity scores have the advantage that it is just one variable, while there is a 10:1 event-covariate guideline for regression models. 28. Propensity scores should be estimated separately for cohort-accrual blocks when attempting to control for time-varying covariates. This results in a prospective study design that captures a sociologic phenomenon that might not apply to other places and times. 29. A factor of low prevalence affects only a small % of the population, but it can still have a high OR if strongly associated. 30. When thinking about possible confounding variables, think about the possibility of intermediate variables and possible mechanisms of action. 31. Dichotomization of variables is an attempt to limit the time-varying characteristics. 32. Selection bias can be introduced when the sampling fractions for individuals in the base depend on an exposure variable in an unknown way. An analysis of the effects of other variables will be unbiased when the source of the dependence can be identified and handled in the analysis as if it were a confounder. Especially problematic is common effects in a case-control study. 10 33. If a factor among controls is not representative of the study base, a factor-adjusted effect estimate will still be valid if, within each stratum of that factor, the distribution of exposure among controls resembles the distribution in the source population. Thus, controls are representative of the exposure distribution in the study base within each factor-strata, IF we account for the stratification in our analyses. 34. Among persons with a given propensity score, the distribution of the covariates X is on average the same among the treated and untreated because propensity score is independent of the actual treatment. The propensity score is a summary score that gives the probability of treatment, and does not take into account actual treatment. A propensity score is created from the joint distribution of X’s. 35. The standardized rate in the denominator of the SMR represents the constructed crude rate for a hypothetical population that has a population distribution over covariate levels that is the same as the distribution of covariates among the exposed subjects only (i.e., the weights). 36. SMR-weighted estimators could have smaller confidence intervals than estimates arising from propensity matching because SMR-weighted estimators use all information (i.e., bigger power because of larger sample size with no loss of information). 37. Matching on risk set is necessary to adjust for time-varying covariates that cannot be controlled through randomization. 11 EPI 204 *read papers in notes* *summarize models* *watch the modeling lectures*  Notes Second pass  When using Cox, PH necessary because assuming no effect modification of main effect (unless specifically add stratum-specific coefficients or ixn term)  sampling fraction means fraction sampled from exposed and nonexposed person-time.  Do only discordant pairs contribute to conditional logistic regression? A: YES, the cross-product must not ≠ 0.  Induction period: time required for the effects of a specific exposure to become manifest o Latency period: portion of the induction period during which disease is present but unmanifest  Cohort effect: changes in disease frequency that are shared by all members of a group who entered follow-up at common time  Inception cohort: the persons who are under observation at the beginning of an exposure that defines cohort membership  Survivor cohort: the persons who remain under observation at some point after the beginning of an exposure that defines cohort membership  Age effect: a change in disease incidence that is due to a biological concomitant of aging  Period effect: changes in disease frequency that are specific to a calendar time o Commonly result of secular changes in definition of disease or diagnostic practices, or secular changes in exposure prevalence (with short induction periods)  Immortal person-time: person time prior to becoming eligible for a study (not part of the denominator of a rate calculation when observable person-time lies entirely outside the bounds of cohort membership). i.e. participation = had not died earlier  Models o Poisson: analysis of rates from cohort studies  baseline risks, excess relative or absolute risks  Parameter estimates for poisson regression models computed using MLE  using log-likelihood that would arise if the event counts in the table were independent Poisson random variables.  Models in which rates depend on parameters through a linear function (GLMs: linear regression) can be computed through least squares.  Cupples paper: pooled logistic regression o Assumptions:  The underlying risk of outcome in each interval is the same  The relationship between risk factors and outcome is the same for every interval 12  Only current risk profile is needed to predict outcome o Generalized person-years approach. Treats each observation interval (of equal length) as a mini-follow-up study in which the current risk factor measurements are employed to predict an event of interest in the interval among persons free of the event at the beginning of the interval. Observations over multiple intervals are pooled into a single sample to predict the short- term risk of an event. o MODELS  MH:  Parameters: I = incidence rate with given covariates at time t; I0= baseline incidence rate at time t, when covariates = 0; β1 = log incidence rate ratio for a one unit change in X  Assumptions: fewest assumptions, proportional over all the age strata (i.e., no ixn of main effect w/ any of stratifying variables)  so can get weighted average  Implicit model: if stratify on t, Z, and L β1X o Crude: I = I e 0 β1X o Stratified 2x2 tables: I(t | X, Z, L) = I0 ZL(t)e = o I(t | X, Z) = I0(t)e1X + β2Z + β3L + ixn terms between everything but with main effect…  Cox PH: anderson-gill  Assumptions: PH assumption  I(t|x=1) / I(t|x=0) = e ; β1 stratified cox allows different baseline hazards over time, at each time all betas same and proportional, other covariates don’t ixt (those without specific ixn terms/ stratified betas)  Crude: I = I 0 β1X  Time-adjusted: I(t | X, Z) = I 0t)eβ1X + β2(will be the same as crude if t and Z are not confounderβ1X + β2Z  Stratified I: I(t | X, Z, S) 0SI (t)e (will be the same as the time-adjusted if no interaction between S*t, and S is not a confounder β1SX + β2Z  Stratified II: I(t | X, Z, S) 0SI (t)e (will be the same as stratified I if there is no EMM by S*X  Poisson: anderson-gill  Assumptions: within each category (set of covariates), rate belongs to everyone in that set  Standard: I(t | X, Z) = I0(t)e1X + β2Z  Same as Cox, BUT NO stratification, just one giant pool of data!  Baseline must be explicitly stated  calculate absolute measures, but also more likely to be misspecified.  Counts or person-time  Pooled logistic (stratified = conditional): Anderson-gill  Assumptions: parametrization fulfills beta-string, log odds related to exp / (1+exp), the underlying risk of outcome in each 13 interval is the same, the relationship between risk factors and outcome is the same for every interval, only current risk profile is needed to predict outcome  NO stratification! Treat each interval as cohort study and pool intervals when analyzing  Standard model: I(t | X, Z) = e β0 + β1X + β2Z 1+ e β0 + β1X + β2Z  Baseline must be specified, modeling logit[I(t)], which is linear in the covariates of the model. On log of the odds of the incidence rate scale  Incidence rate can be directly estimated, inherently multiplicative  When disease rare logit[I(t)] ~ log[I(t)]  Conditional logistic  Calculating within each set, betas same across all sets, stratifying on time (if matched on age/calendar year)  CLR likelihood algebraically identical to Cox PH partial likelihood  odds ratio estimated will approach IRR estimated from Cox as the matching ratio goes to infinity (becomes cohort!)  Standard model: I(t) = I (t)e β1Xwhere i = qcpair 0i (encompasses all matching factors)  Parameters: I 0i) = baseline incidence rate for subjects when covariates = 0, at the value of the matching factors corresponding to the matching factors of the case in qcpair.  all possible cross-classifications of matching factors is controlled for (fully saturate w.r.t. matching factors)  unconditional logistic regression would not have the qcpair variable (i)  unstratified model  Assignments  Exams 14 EPI 289 *read Hernan and Robins 2006 for more IPW info *read Sato paper 2008 Notes Second pass  For this class, we assume: 1) causal effect in the entire population, 2) dichotomous variables, 3) deterministic counterfactuals, 4) no interference  Pseudo-population created by IPW is unconditionally exchangeable (same individuals used to make exposed and unexposed).  Standardization = IPW (algebraically equivalent). Weighting is equivalent of simulating what would happen in the study population if everybody had received treatment a.  Null paradox: plug in standard models to estimate each component of the g-formula, but doesn’t work because even if causal null true, the effect estimate will not be null (no parameter for null hypothesis). Model misspecified before collection of data!  Can use time-varying weights with IPW. Inverse probability of having your observed treatment history through t given your L history through t.  Problem with RCT: loss to follow-up, noncompliance, unblinding, other  In an observational study, average causal effects can be calculated under the assumptions of consistency, postivity, and exchangeability conditional on the covariates (by IPW, standardization).  Conditions for causal inference from ideal randomized experiments: o Consistency  If you are treated, your counterfactual outcome under treatment is your observed outcome  If you are not treated, your counterfactual outcome under no treatment is your observed outcome o Positivity  Some subjects receive treatment and some subjects receive no treatment o Exchangeability  If the treated had been untreated, they would have been expected to have the same average outcome as the untreated (AND the other way around for full exchangeability)  MCAR  In each level of L if conditional randomization  MAR  Commonly used methods to control for confounding (e.g. stratification, matching) only estimates effects in subset of the population.  The equivalent of pooling when using a regression model is not to include product (interaction) terms between A and L.  Problems with stratification/causal interpretation of conditional association measure o 1) heterogeneity (minor) o 2) collapsibility (minor) o 3) Time-varying exposures (potentially major) 15  If there is no heterogeneity then each conditional risk ratio is equal to the IPW/standardized risk ratio. If there is heterogeneity, then each conditional risk ratio is different and pooling makes no sense.  Matching yields conditional causal effects (i.e. effect in the exposed if each exposed subject is matched to one unexposed subject, then the matched sample will have the same distribution of risk factors as in the exposed, not as in the entire population).  To compute the marginal effect in the population: o IPW/standardization require conditional exchangeability of the exposed and the unexposed across the entire population o Stratification further requires no heterogeneity.  Selection bias not an issue of generalizability  lack of exchangeability.  Problem: detection bias under null o A: exogenous estrogens, Y: endometrial cancer, C: vaginal bleeding, Y’: EC diagnosis (mismeasured!) A Y C Y’ o Solution: screen women for endometrial cancer every 3-6 months whether they exhibit bleeding or not. This will eliminate the arrow going from C  Y’, and we wouldn’t have to stratify on C either since it’s a collider.  D-separation: two variables are marginally or conditionally independent, depending on whether it was necessary to control for any variables. o A path is only blocked if and only if it contains a noncollider that has been conditioned, or it contains a collider that has not been conditioned on and has no descendants that have been conditioned on.  DAGs cannot represent EMM (since DAGs represent population or individual causal effects): unfaithfulness. o Can have cancellation of arrows (e.g. common causes bias cancels out with common effects bias  matching in a cohort study??).  Identifiability: causal effect can be identified if 1) no common causes (i.e. no backdoor path, no confounding  exchangeability holds, OR 2) common causes BUT enough data to block the backdoor paths, no unmeasured confounders.  Confounder: 1) Collapsibility, 2) Standard (derived from comparability), 3) Causal ~ comparability. o Problem with collapsibility defn: OR is noncollapsible (except at null), maybe looking at common effect (even in randomized trials). o Problem with standard definition: M-bias  IV methods: consistently estimate causal effects in the absence of conditional exchangeability (with unmeasured confounding between AY). Require FOUR assumptions. o (1)-(3) Definition of instrument Z (non-negotiable):  1) Z and A are associated (b/c causal effect – e.g. RCT or common cause – e.g. lactose intolerance gene)  2) Z affects the outcome Y only through A – no direct effect (unverifiable) 16  3) Z does not share common causes with Y (unverifiable) o 4) INSTRUMENT NOT ENOUGH: In order for IV estimator equality to hold, a fourth unverifiable condition must hold.  NO IXN  No between-subject heterogeneity: effect of A  Y same for every individual (extreme no ixn). Isn’t this normally assumed when we calculate RRs? A: Yes, but must assume this to even use the IV. No alternatives (e.g. no standardization can adjust).  No interaction between instrument and exposure (can assume just no additive – IV estimator; or multiplicative – different IV estimator). Effect of AY same in levels of Z (or U*) for treated AND untreated.  MONOTONICITY is an alternative to no ixn assumptions. Defined for individual (i.e. no defiers – there is no individual who would have taken A=0 if assigned Z=1 AND taken A=1 if assigned Z=0).  Under monotonicity, IV estimator estimates the average effect of AY only in the compliers (but don’t know who these people are! Can’t distinguish from always takers or never takers. Generalizability?). o Intention-to-treat effect in the numerator inflated by noncompliance (measured in denominator) of standard IV estimator. o Problems: 1) it is impossible to verify that Z is an instrument – bad instrument may be more biased than unadjusted estimate, 2) a weak instrument Z blows up the bias, 3) Instruments are insufficient to estimate causal effects, 4) can’t deal with time-varying exposures. Effects of exposure estimated by IV methods may be much larger than effects estimated by conventional adjustment methods (b/c numerator small) o Adjusting for confounders always get you closer to truth (will never overshoot true effect). Not necessarily, with poorly measured confounder. o Instrument only allows to compute bounds? Standard IV estimator provides point estimate, not bounds. o Differential loss to follow-up can occur in observational and randomized studies. o Volunteer bias (self-selection bias) cannot occur in RCT (no association between treatment and selection). o Healthy worker bias another selection bias, will occur in RCT or observational study b/c initial study just incorrect. o Our example: would have to stratify on age (or time scale) to eliminate this selection bias. This is time-varying, have to deal with that! Vit D HD CC Age 17 o Selection bias can be solved by: 1) stratification, or 2) IPW. IPW always works; however, stratification requires that the confounding variable (A in above) CANNOT be a collider as well! IPW can simultaneously adjust for confounding and selection bias. o Hazard = Pr[Y =1|t+10] i.1. hazard at t1 = Pr[Y =1], h1zard at t2 = Pr[Y 21|Y =1] o In RCT and obs. studies, built in selection bias structure, data cannot determine between selection bias or EMM. o If HR reverses, strong indication of survivor cohort. o Measurement error: think 1) independence (i.e. errors for the exposure and outcome are independent, 2) nondifferentiality (i.e. errors for the exposure (outcome) is indep. of the true value of the outcome (exposure)) o Measurement bias exists under any of the 4 types of error (can be non- conservative). Nondifferential and independent  no bias under null. o Sources of structural bias: 1) pre-existing: common causes, 2) study-related: selection bias, measurement bias. o GLMs: 1) a functional form, 2) a statistical distribution (e.g. linear models: errors are assumed to be independent and normally distributed ~ N(0, var.)). o MSM: marginal = in pseudo-population, structural = causal interpretation. o Create pseudo-population by IPW, then fit model to pseudo-population to model counterfactual outcome variables (causal interpretation under cond’l exchangeability). IP weights can be estimated by using models, too. o Unstabilized don’t work well when modeling (high weight to subjects with low probabability of receiving the exposure level that they received. Estimators with large variance only when modeling). o For IPW weights, if want to adjust for L, then L must not be in the numerator, only in the denominator. o IPW/g-estimation MSM: 1) model exposure given covariates (weights), 2) model outcome given the exposure. Model misspecified if either incorrect. o When a covariate is in the numerator and denominator of IP weights, it is not adjusted for. 18 2009 Notes  Two types of violations of positivity o Structural: subjects with certain confounder values cannot possibly be exposed  Causal inferences cannot be made about subsets with structural nonpositivity o Random: sample not infinite so, if stratify on many confounders, start finding zero cells  Use parametric models to smooth over the zeros (borrowing info fro other subjects)  assuming random nonpositivity (MAR)  IPW creates pseudo-population with unconditional exchangeability  Estimation of parameters of MSMs (Robins 1998, Hernan 2000 BIO 223) o Use weighted regression model under the assumptions of conditional exchangeability o Parameter estimate for β u1ing  1 o Use robust variance to compute a conservative 95% CI  Structural: models for counterfactual outcome variables; Marginal: unconditional  Time-varying exposures: effect estimates from conditional models may not have a causal interpretation, even under conditional exchangeability!  Using IPW to estimate parameters in MSM is analogous to 1) building PS model (weights), 2) plugging PS into model that models outcome!!  IPW: individuals who are most underrepresented in relative treatment assignments must be given proportionally higher weights – becomes unstable!  To show relationships between SNMs: o We can write conditional exchangeability as: Y inaep. A|L=l for all a = Pr[A=1|L=1,Y a=1] = Pr[A=1|L=l, Y a=0] = Pr[A=1|L=l] o Thus, probability of exposure does not depend on value of counterfactual outcome, conditional on covariates  IF we had counterfactual outcomes, could fit logistic model to check for conditional exchangeability: o logit Pr[A=1|L, Y a=0] = α0+ α 1 a=0 + α2L o check if α 1 0!  G-null test o Test of sharp null hypothesis: Y a=0= Y a=1for all subjects i o Assume conditional exchangeability holds o Test: if sharp null hypothesis true: logit Pr[A=1|L, Y a=0 = α0+ α 1 + α L2  where α 1 0 o model used for g-null test is same model used for estimating denominator of IP weights plus Y o tested null by regressing exposure on outcome, instead of other way around  works for time-varying covariates o IPW estimates parameters in MSMs, g-estimation estimates parameters in SNMs  Can use g-estimation AND IPW simultaneously to adjust for confounding (g- estimation) AND selection bias (IPW) 19 o i.e., find value of ψ that produces the value of α cl1sest to 0 in the IP weighted model: Logit Pr[A=1|L,H(ψ)] = α + α H0ψ) +1α L 2  using estimates of the weights SW*  Causal effects with censoring: E[Y a=1, c=0– E[Y a=0,c=0, i.e., the causal effect of exposure had nobody’s outcome been censored  If possible, use both methods separately and compare answers! 20 Method a Modeling Non-dynamic regime Dynamic regime Non-time-varying Time-varying Assumptions -when consistency and 1) Consistency: If A=a for a given 1) Consistency: if A_bar=a_bar -random dynamic regime = conditional exchangeability fail to subject, then Y aY for that subject for a given subject, then Y a_bar sequentially randomized hold, IPW and g-formula still well- 2) Conditional Exchangeability: Y a = Y for that subject experiment defined, but no causal indep. A|L=l for each possible 2) Conditional Exchangeability: -Strengthened identifiability interpretation value a of A and l of L Y a _barndep. A(t)|A_bar(t -1) = conditions: -when positivity fails to hold for 3) Positivity: there are treated and a_bar(t-1), L_bar(t)=l_bar(t) for 1) strengthened consistency treatment level a, ipw remains untreated (think about this in all regimes a_bar and all 2) strengthened conditional well defined but no causal terms of intractable confounding) l_bar(t) (i.e., sequential exchangeability interpretation, while the g-formula -When exposure is randomization) 3) strengthened positivity is undefined for treatment level a unconditionally randomized, both 3) Positivity: there are treated -under these conditions, g- g-formula and the IPTW formula and untreated for each treated methods can be used to are equal to crude association (i.e., a(t) > 0 at all t) estimate not only E[Y ], but g also the optimal (deterministic) treatment regime  will always be non-random IPW/ -effectively simulates the data that -MSMs for time-varying SW= ∏ f{A(k)|A_bar(k-1)} -MUST USE MSM would have been observed had, exposures are typically f{A(k)|A_bar(k-1), L_bar(k)} UNSTABILIZED WEIGHTS contrary to fact, exposure been nonsaturated -if numerator misspecified, will -IPW estimate of E[Y g=(1,L(1)) a unconditionally randomized -saturated: E[Y ] = β 0 β A1 0 not result in bias! is the average of Y among -both SW and W create pseudo- β 1 1 β A3 0 1 -can add baseline covariates the subjects in the unstab. populations in which (i) the mean -if assume treatment is additive into numerator, but will need to Pseudo-population who a of Yais identical to to that in the (non-saturated): E[Y ] = β +0β 1 control for in MSM!! followed rebime g={1,L(1)}, actual study population but (ii) the (A 0 A )1 W = ∏ ______1____________ so calculate E(Y) only for exposure A indep. of L so that no -parameters estimated by f{A(k)|A_bar(k-1), L_bar(k)} those that followed regime LA in pseudopop only calculating means in pseudo- 1) Use logistic regression to -W = ∏ difference in W, Pr(A=1) = 1/2, population (using IPW – i.e., estimate weights – i.e. PS; 1- ______1____________ while in SW, Pr(A=1) is as in the weighted regression) PS f{A(0)}f{A(1)|A(0), actual population (where -can also include covariates to -p*0i P[A =01] L(1)} association = causation) evaluate effect modification by p0i P[A =01 | L =0l ] 0i 1) model IP weights (PS) using baseline covariates
More Less
Unlock Document

Only pages 1,2,3,4 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.