Biostat Case Studies 2004 Session 2

Biostatistics Case Studies

Peter D. Christenson

Biostatistician

http://gcrc.humc.edu/Biostat

Session 3: Missing Data in Longitudinal Studies

Case Study

Hall S et al: A comparative study of Carvedilol,

slow release Nifedipine, and Atenolol in the managementof essential hypertension.

J of Cardiovascular Pharmacology 1991;18(4)S35-38.

Case Study Outline

Subjects randomized to one of 3 drugs for controllinghypertension:

A: Carvedilol (new)B: Nifedipine (standard)C: Atenolol (standard)

Blood pressure and HR measured at baseline and 4 post-treatment periods.

Primary analysis is unclear, but changes over time in HRand bp are compared among the 3 groups.

Available Data: sitting dbp

Visit #

Week

Number of Subjects

Paper

Data

Screen

-1

Baseline

dbp1

100

Post 1

100

Post 2

Post 3

Post 4

Sitting dbp from Figure 2

Group A: Baseline and Final dbp

Week 0

Last Value:

Pre Week 8

Week 8

Final

Graph

N=100

103.04

± 0.52

N=83

90.43

± 0.96

N=83

90.43

± 0.96

12.61

± ?

Completers

N=83

102.99

± 0.53

N=83

90.43

± 0.96

N=83

90.43

± 0.96

12.55

± 1.10

LastObservation

Carried Forward

(LOCF)

N=100

103.04

± 0.52

N=17

97.47

± 3.47

N=83

90.43

± 0.96

N=100

91.63

± 1.02

11.41

± 1.11

Wanted: Use N=100 w/o LOCF

Combine:

Info on true 8 week change in 83 subjects.

Info on baseline only in 17 subjects.

Use week0-week8 correlation in 83 subjects.

More generally:

Suppose 9 subjects had only week 0 and 8 subjectshad only week 8.

Then, really 2 experiments, 1 paired (N=83) and 1unpaired (N1=9 and N2=8).

Combining involves weighting Δs from the 2experiments. Does not impute (substitute) values forthe 17 unknown values.

Generalize further to >2 time periods and >1 treatment, etc.

Mixed Models

Mixed models implement our need here.

“Mixed” means combination of fixed effects (e.g., drugs;want info on those particular drugs) and random effects(e.g., centers or patients; not interested in the particularones in the study).

AKA multilevel models, hierarchical models.

Very flexible, incorporate unequal patient variability,correlation, pairing, repeated values at multiple levels (e.g.,sitting and standing dbp in Fig 2, or if subjects wereclustered, say from the same family and genetics was anissue, etc), and data missing at random.

More assumptions required than typical analyses.

Data Structure for Software

Need:

patient week dbp

1 0 97

1 2 101

1 4 88

1 6 89

1 8 86

2 0 109

2 2 72

etc

Not:

patient wk0 wk2 wk4 wk6 wk8

1 97 101 88 89 86

2 109 72 . . .

Software

Need to use a mixed model module. Often, options areunclear. Use:

SPSS Analyze > Mixed

SAS proc mixed.

Repeated measures modules with options for randomfactors do not typically handle missing data, e.g.:

SPSS Analyze > GLM > Repeated > … Random

SAS proc glm; model …; random …;

are not in general OK, but will work with certain balancedpatterns of missing data.

Mixed Models in SPSS

Select Analyze > Mixed > Linear. First menu:

Mixed Models in SAS

Select Solutions > Analysis > Analyst >

Statistics > ANOVA > Mixed models

Alternatively, typical code is:

proc mixed;

class week patient;

model dbp=week/ddfm=satterthwaite;

lsmeans week/cl;

estimate 'Week Diff' week 1 -1;

repeated week/subject=patient type=un rcorr;

title 'Mixed Model N=100+83 Unstructured';

run;

Model 1 Results

Estimated Change:

Standard

Label Estimate Error DF t Value Pr > |t|

Week Diff 12.6058 1.0441 95.6 12.07 <.0001

So, Δ = 12.61±1.04 incorporates 100 + 83 observations.

Estimated Means:

Standard

Effect week Estimate Error

week 0 103.04 0.7059

week 8 90.43 0.7749

Group A: Baseline and Final dbp Update

Week 0

Last Value:

Pre Week 8

Week 8

Final

Graph

N=100

103.04

± 0.52

N=83

90.43

± 0.96

N=83

90.43

± 0.96

12.61

± 1.04

Completers

N=83

102.99

± 0.53

N=83

90.43

± 0.96

N=83

90.43

± 0.96

12.55

± 1.10

LastObservation

Carried Forward

(LOCF)

N=100

103.04

± 0.52

N=17

97.47

± 3.47

N=83

90.43

± 0.96

N=100

91.63

± 1.02

11.41

± 1.11

Is model appropriate? Depends on assumed covariance pattern.

Model 1 Covariance Pattern: Compound Symmetry

Software Output

Estimated R Correlation

Matrix for patient 4

Row Col1 Col2

1 1.0000 0.008760

2 0.008760 1.0000

Covariance ParameterEstimates

Cov Parm Subject Estimate

CS patient 0.4366

Residual 49.3989

Output Interpretation

Estimated Covariance Pattern:

Week 0 8

0 (7.06)2 0.44

8 0.44 (7.06)2

(7.06)2 = 49.3989 + 0.4366

Note that this model assumes thatvariability among subjects is thesame at each week, and thatthere is a correlation between theweeks (estimated at 0.00876).

But: Week 0 SD = 5.2

Week 8 SD = 8.8

Model 2 Covariance Pattern: Unstructured

Software Output

Estimated R Correlation

Matrix for patient 4

Row Col1 Col2

1 1.0000 0.01129

2 0.01129 1.0000

Covariance ParameterEstimates

Cov Parm Subject Estimate

UN(1,1) patient 27.1700

UN(2,1) patient 0.5169

UN(2,2) patient 77.2008

Output Interpretation

Estimated Covariance Pattern:

Week 0 8

0 (5.21)2 0.44

8 0.44 (8.79)2

(5.21)2 = 27.17

This model allows differentvariability among subjects at eachweek, and a correlation betweenthe weeks (estimated at 0.011).

This better models the SDs:

Week 0 SD = 5.2

Week 8 SD = 8.8

Model 3 Covariance: Heterogeneous Uncorrelated

Software Output

Estimated R Correlation

Matrix for patient 4

Row Col1 Col2

1 1.0000

2 1.0000

Covariance ParameterEstimates

Cov Parm Subject Estimate

UN(1,1) patient 27.1701

UN(2,1) patient 0

UN(2,2) patient 77.1998

Output Interpretation

Estimated Covariance Pattern:

Week 0 8

0 (5.21)2 0

8 0 (8.79)2

(5.21)2 = 27.17

This model allows differentvariability among subjects at eachweek, but no correlation betweenthe two weeks.

Matches: Week 0 SD = 5.2

Week 8 SD = 8.8

Choice of Covariance Pattern

Model

Covariance Pattern

-2 Log Likelihood

1: Comp Sym

1: Corr & = SDs

1230.2

2: Unstructured

2: Corr & ≠ SDs

1206.0

3: Heterog Uncorr

3: 0 Corr & ≠ SDs

1206.0

Use likelihood ratio test to test whether a more complex modelsignificantly improves fit of the data. Models must be “nested”.

Is model 2 significantly better than model 1?

Χ2 = 1230.2-1206.0 = 24.2 has Χ2 distribution with d.f.=difference in # of estimated parameters (here 3-2) if model 2 isnot an improvement. P-value=Prob(Χ2 >24.2) <0.0001, somodel 2 is needed. Final choice: model 3.

Model 3 Results

Estimated Change:

Standard

Label Estimate Error DF t Value Pr > |t|

Week Diff 12.6063 1.0963 128 11.50 <.0001

Thus, use Δ = 12.61±1.10 from 100 + 83 observations.

Estimated Means:

Standard

Effect week Estimate Error DF

week 0 103.04 0.5212 99

week 8 90.43 0.9644 82

Conclusions for Group A Week 0 to Week 8 dbp Δ

Last observation carried forward overestimates dbp at week 8.

Essentially 0 correlation between residual week 0 and week 8dbp.

Use mixed model with heterogeneous uncorrelated covariancepattern.

This mixed model is equivalent to a 2-sample t-test withunequal variance using Satterthwaite’s weighting. This wouldnot happen if either (1) some subjects only had dbp at week 8,or (2) correlation was stronger between weeks 0 and 8, whichusually happens.

Generalize: Group A with all 5 Time Periods

Covariance Pattern

Parameters

-2 Log Likelihood

Compound Symmetry

3193.7

Heterogeneous Uncorrelated

3245.4

Toeplitz

3172.0

Heterogeneous Toeplitz

3141.4

Unstructured

3111.7

Since LR = 3141.4 - 3111.7 = 30.7 is large for a Χ26 , thereis substantial unstructured correlation over weeks.

Conclusions: Repeated Measures with Mixed Models

Very useful for missing data.

Requires more than usual assumptions.

Mild deviations from assumed covariance pattern do not have alarge influence.

Software can be intimidating due to specifying many modelassumptions, since the method is so general and flexible.

May be difficult to apply unbiasedly in clinical trials where theprimary analysis needs to be specifically detailed.