mixed models

Semiparametric mixed model analysis benefits from variability estimates such as standard errors of effect estimates and variability bars to accompany curve estimates. We show how the underlying variance calculations can be done extremely efficiently compared with the direct naïve approach. These streamlined calculations are linear in the number of subjects, representing a two orders of magnitude improvement.


INTRODUCTION
A current vibrant area of research is the use of non-parametric regression or smoothing techniques in the analysis of longitudinal data. Prominent examples include [1][2][3][4][5]. Summaries may be found in books such as [6][7][8]. Figure 1 shows an example of data that benefit from such methodology. It consists of longitudinal measurements on the spinal bone mineral density (SBMD) of a cohort of young female subjects (source: Reference [9]). One question of interest concerns differences in mean SBMD among the four ethnic groups after accounting for age. An appropriate model is the additive mixed model Here, SBMD i j denotes the jth (1 j n i ) SBMD measurement on subject i (1 i m), f is a smooth, but otherwise unspecified, function for the mean effect of age and black i , hispanic i  and white i are indicator variables for ethnicity. In addition, the U i i.i.d. N(0, 2 U ) are random subject intercepts, and the i j i.i.d. N(0, 2 ), independent of the U i 's, account for within-subject variability. If f is modelled using penalized splines, such as where 1 , . . . , K is a dense set of knots, then (1) reduces to a linear mixed model. The details are given in Section 2. Standard fitting leads to the summary table given in Table I Figure 2. Data from Figure 1 with estimates of f (age) added. The dashed lines correspond to ±2 × the estimated standard error. Figure 2 provides a summary of the estimate of f and its variability. The curves correspond to the function f (age) with vertical shifting according to i , i = 2, 3, 4. The variability bars, shown as dashed lines, are ±2 × the estimated standard errors and correspond to approximate pointwise 95% confidence intervals of mean SBMD.
Underlying Table I and Figure 2 is the estimated covariance matrix of the coefficient estimates This covariance matrix involves the inversion of a (5  Figure 2, can be done in O(m) operations. The key is recognition that the contribution to M from the random-intercept component is an m × m diagonal matrix. Such streamlining essentially removes computational obstacles involving variances for models such as (1) for most practical values of m and, thus, greatly benefits semiparametric mixed model analysis.
Section 2 gives the details of our streamlined approach to variance calculations for models like (1). In Section 3, we describe the extension of the approach to subject-specific curve models. Closing remarks are made in Section 4.

ADDITIVE MIXED MODELS
In this section, we consider a more general version of (1): Here, y i j is the jth (1 j n i ) measurement of the response of the ith subject (1 i m), s i j is a predictor with a possibly non-linear effect and x i j is a p × 1 vector of predictors with a linear effect, with corresponding coefficient vector b The smooth function f is modelled using penalized splines of the form where z k , 1 k K , is an appropriate spline basis. The simplest choice is z k (s) = (s − k ) + for a dense set of knots 1 , . . . , K but many other options exist (e.g. Reference [6, Section 3.7]). If we let u k i.i.d. N(0, 2 u ) then (2) becomes a linear mixed model and the vectors y and e are defined analogously. Note that the random effects have been partitioned into spline coefficients (subscript G) and subject effects (subscript R). Let G G and G R be the restricted maximum likelihood estimates of G G and let G R and let b and u be the empirical best linear unbiased predictors of b and u (e.g. Reference [6, Section 4.5]). Then all variance calculations can be done using the estimated covariance matrix cov b The direct approach to obtaining standard errors of the entries of b (as in Table I) and variability bars for the smooth function estimate (as in Figure 2) involves inversion of M. As mentioned in Section 1, the matrix M will increase in dimension as the number of subjects, m grows, and for very large m it will become too computationally intensive to invert practically, since inversion is an O(m 3 ) process. However, we do not need to find all the entries of the matrix. For example, some entries relate to the correlation between different subjects' responses, which would rarely be useful. We can also exploit the fact that, for large m, most of M is diagonal. Hence, we propose a streamlined approach based around a block decomposition of M: Using standard results on the inverse of a block-partitioned matrix (e.g. Reference means that the matrix products amount to within-subject row sums, which are O(m), since for In addition, M 22 is diagonal: This feature is crucial. It enables us to find the biggest inverse required in (4) ). Thus, we can write M as . Then using (4) we see that This inverse is (2 + p + K ) × (2 + p + K ) and is relatively easy to compute. The summation term renders this whole process O(m). This matrix is all we require to plot global or group error bars, or perform hypothesis tests as in Table I. However, to find the error covariance of the individual subjects' fitted responses we need M 12 and M 22 , where Each submatrix is a constant times (2 + p + K ) × (2 + p + K ) times (2 + p + K ) × 1. However, we must calculate m of them, hence M 12 (and consequently M 21 ) requires O(m) calculations.

441
Finally, we only need to find the diagonal entries of M 22 since we would rarely be interested in the correlation between two subjects' fitted responses. These are given by The relevant diagonal entries can be calculated in O(m) steps. Therefore, the total asymptotic complexity of this process is O(m), representing an improvement of order m 2 over the naïve approach to matrix inversion. As the number of subjects, m increases, the improvements due to streamlining become enormous.
We could alter the model in (2) to include more random subject effects, such as random slopes. This would only affect the final stage of the calculation as it would alter the structure of M 22 . However, the result of this is that M 22 becomes block diagonal, hence, it can still be inverted in O(m) calculations.
The practical benefits of streamlined variance calculations were explored in a simulation study. Data were generated according to where the s i j were generated from the uniform distribution on (0, 1), the x i j were generated from the Bernoulli distribution with P(  Table II summarizes the results. Note that the naïve method approach failed for m = 12 500 due to required storage for M exceeding memory restrictions.
There is little practical difference between the two methods for m = 100 and 500. However, for m = 2500 streamlined variance calculation is much faster-taking about one-tenth of a second on average compared with almost 4 min for the naïve approach. For m = 12 500 the streamlined approach is still well under 1 s, while the naïve approach is not viable for typical 2006 computing environments.

EXTENSION TO SUBJECT-SPECIFIC CURVES
Models (1) and (2) featured random intercepts: the difference between the fitted subject response and the estimated population mean curve is constant. This may not in general be realistic; the subject-specific difference may be as complicated as the underlying function f . Durban et al. [14] develop a subject-specific curves model, based on penalized splines, in which the subject-specific difference is modelled by a random semiparametric function Earlier work on models of this type includes [2,3,5,15,16]. In this new model U i is replaced by where z gk , 1 k K g , is an appropriate spline basis. It is an advantage of our set-up that g i need not share the same spline basis as f . Therefore, the splines of f are labelled z f k , 1 k K f . Model (6) can be written as a linear mixed model and y, b, e, u G and G G are the same as in (3). D is a general, symmetric, 2 × 2 matrix. Note that the Z matrix takes a different form from that given in [14] and Section 9.3 of [6]. It has been changed to make the calculations more manageable.
As before, the variance calculations require the estimated covariance matrix M −1 . This matrix is now even larger than that of Section 2. Therefore, we propose a streamlined approach, 443 again based around a block decomposition of M: Durban et al. [14] fit a semiparametric model with subject-specific curves to longitudinal data on the heights of 190 girls with acute lymphoblastic leukaemia. We fit a similar model in which height is modelled as a smooth function of age: height i j = f (age i j ) + g i (age i j ) + i j (7) where f and g i are modelled using radial cubic splines with 15 and 10 knots, respectively. Figure 3 shows some fitted functions and error bars from this model, superimposed on the data. We have drawn out two subjects' responses that differ from each other in order to show how the subject-specific curves model can produce different estimated smooth curves for each subject. We have plotted error bars according to the subject-specific curves model (7) and also a randomintercepts model such as (2), both obtained via the streamlined method. The random-intercepts error bars are much wider than those of the subject-specific curves model.