EVOLUTION-MANAGER
Edit File: pyears.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Person Years</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for pyears {survival}"><tr><td>pyears {survival}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2> Person Years </h2> <h3>Description</h3> <p>This function computes the person-years of follow-up time contributed by a cohort of subjects, stratified into subgroups. It also computes the number of subjects who contribute to each cell of the output table, and optionally the number of events and/or expected number of events in each cell. </p> <h3>Usage</h3> <pre> pyears(formula, data, weights, subset, na.action, rmap, ratetable, scale=365.25, expect=c('event', 'pyears'), model=FALSE, x=FALSE, y=FALSE, data.frame=FALSE) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>formula</code></td> <td> <p>a formula object. The response variable will be a vector of follow-up times for each subject, or a <code>Surv</code> object containing the survival time and an event indicator. The predictors consist of optional grouping variables separated by + operators (exactly as in <code>survfit</code>), time-dependent grouping variables such as age (specified with <code>tcut</code>), and optionally a <code>ratetable</code> term. This latter matches each subject to his/her expected cohort. </p> </td></tr> <tr valign="top"><td><code>data</code></td> <td> <p>a data frame in which to interpret the variables named in the <code>formula</code>, or in the <code>subset</code> and the <code>weights</code> argument. </p> </td></tr> <tr valign="top"><td><code>weights</code></td> <td> <p>case weights. </p> </td></tr> <tr valign="top"><td><code>subset</code></td> <td> <p>expression saying that only a subset of the rows of the data should be used in the fit. </p> </td></tr> <tr valign="top"><td><code>na.action</code></td> <td> <p>a missing-data filter function, applied to the model.frame, after any <code>subset</code> argument has been used. Default is <code>options()$na.action</code>. </p> </td></tr> <tr valign="top"><td><code>rmap</code></td> <td> <p>an optional list that maps data set names to the ratetable names. See the details section below. </p> </td></tr> <tr valign="top"><td><code>ratetable</code></td> <td> <p>a table of event rates, such as <code>survexp.uswhite</code>. </p> </td></tr> <tr valign="top"><td><code>scale</code></td> <td> <p>a scaling for the results. As most rate tables are in units/day, the default value of 365.25 causes the output to be reported in years. </p> </td></tr> <tr valign="top"><td><code>expect</code></td> <td> <p>should the output table include the expected number of events, or the expected number of person-years of observation. This is only valid with a rate table. </p> </td></tr> <tr valign="top"><td><code>data.frame</code></td> <td> <p>return a data frame rather than a set of arrays.</p> </td></tr> <tr valign="top"><td><code>model, x, y</code></td> <td> <p>If any of these is true, then the model frame, the model matrix, and/or the vector of response times will be returned as components of the final result. </p> </td></tr> </table> <h3>Details</h3> <p>Because <code>pyears</code> may have several time variables, it is necessary that all of them be in the same units. For instance, in the call </p> <pre> py <- pyears(futime ~ rx, rmap=list(age=age, sex=sex, year=entry.dt), ratetable=survexp.us) </pre> <p>the natural unit of the ratetable is hazard per day, it is important that <code>futime</code>, <code>age</code> and <code>entry.dt</code> all be in days. Given the wide range of possible inputs, it is difficult for the routine to do sanity checks of this aspect. </p> <p>The ratetable being used may have different variable names than the user's data set, this is dealt with by the <code>rmap</code> argument. The rate table for the above calculation was <code>survexp.us</code>, a call to <code>summary{survexp.us}</code> reveals that it expects to have variables <code>age</code> = age in days, <code>sex</code>, and <code>year</code> = the date of study entry, we create them in the <code>rmap</code> line. The sex variable is not mapped, therefore the code assumes that it exists in <code>mydata</code> in the correct format. (Note: for factors such as sex, the program will match on any unique abbreviation, ignoring case.) </p> <p>A special function <code>tcut</code> is needed to specify time-dependent cutpoints. For instance, assume that age is in years, and that the desired final arrays have as one of their margins the age groups 0-2, 2-10, 10-25, and 25+. A subject who enters the study at age 4 and remains under observation for 10 years will contribute follow-up time to both the 2-10 and 10-25 subsets. If <code>cut(age, c(0,2,10,25,100))</code> were used in the formula, the subject would be classified according to his starting age only. The <code>tcut</code> function has the same arguments as <code>cut</code>, but produces a different output object which allows the <code>pyears</code> function to correctly track the subject. </p> <p>The results of <code>pyears</code> are normally used as input to further calculations. The <code>print</code> routine, therefore, is designed to give only a summary of the table. </p> <h3>Value</h3> <p>a list with components: </p> <table summary="R valueblock"> <tr valign="top"><td><code>pyears</code></td> <td> <p>an array containing the person-years of exposure. (Or other units, depending on the rate table and the scale). The dimension and dimnames of the array correspond to the variables on the right hand side of the model equation. </p> </td></tr> <tr valign="top"><td><code>n</code></td> <td> <p>an array containing the number of subjects who contribute time to each cell of the <code>pyears</code> array. </p> </td></tr> <tr valign="top"><td><code>event</code></td> <td> <p>an array containing the observed number of events. This will be present only if the response variable is a <code>Surv</code> object. </p> </td></tr> <tr valign="top"><td><code>expected</code></td> <td> <p>an array containing the expected number of events (or person years if <code>expect ="pyears"</code>). This will be present only if there was a <code>ratetable</code> term. </p> </td></tr> <tr valign="top"><td><code>data</code></td> <td> <p>if the <code>data.frame</code> option was set, a data frame containing the variables <code>n</code>, <code>event</code>, <code>pyears</code> and <code>event</code> that supplants the four arrays listed above, along with variables corresponding to each dimension. There will be one row for each cell in the arrays.</p> </td></tr> <tr valign="top"><td><code>offtable</code></td> <td> <p>the number of person-years of exposure in the cohort that was not part of any cell in the <code>pyears</code> array. This is often useful as an error check; if there is a mismatch of units between two variables, nearly all the person years may be off table. </p> </td></tr> <tr valign="top"><td><code>tcut</code></td> <td> <p>whether the call included any time-dependent cutpoints.</p> </td></tr> <tr valign="top"><td><code>summary</code></td> <td> <p>a summary of the rate-table matching. This is also useful as an error check. </p> </td></tr> <tr valign="top"><td><code>call</code></td> <td> <p>an image of the call to the function. </p> </td></tr> <tr valign="top"><td><code>observations</code></td> <td> <p>the number of observations in the input data set, after any missings were removed.</p> </td></tr> <tr valign="top"><td><code>na.action</code></td> <td> <p>the <code>na.action</code> attribute contributed by an <code>na.action</code> routine, if any. </p> </td></tr> </table> <h3>See Also</h3> <p><code><a href="ratetable.html">ratetable</a></code>, <code><a href="survexp.html">survexp</a></code>, <code><a href="Surv.html">Surv</a></code>. </p> <h3>Examples</h3> <pre> # Look at progression rates jointly by calendar date and age # temp.yr <- tcut(mgus$dxyr, 55:92, labels=as.character(55:91)) temp.age <- tcut(mgus$age, 34:101, labels=as.character(34:100)) ptime <- ifelse(is.na(mgus$pctime), mgus$futime, mgus$pctime) pstat <- ifelse(is.na(mgus$pctime), 0, 1) pfit <- pyears(Surv(ptime/365.25, pstat) ~ temp.yr + temp.age + sex, mgus, data.frame=TRUE) # Turn the factor back into numerics for regression tdata <- pfit$data tdata$age <- as.numeric(as.character(tdata$temp.age)) tdata$year<- as.numeric(as.character(tdata$temp.yr)) fit1 <- glm(event ~ year + age+ sex +offset(log(pyears)), data=tdata, family=poisson) ## Not run: # fit a gam model gfit.m <- gam(y ~ s(age) + s(year) + offset(log(time)), family = poisson, data = tdata) ## End(Not run) # Example #2 Create the hearta data frame: hearta <- by(heart, heart$id, function(x)x[x$stop == max(x$stop),]) hearta <- do.call("rbind", hearta) # Produce pyears table of death rates on the surgical arm # The first is by age at randomization, the second by current age fit1 <- pyears(Surv(stop/365.25, event) ~ cut(age + 48, c(0,50,60,70,100)) + surgery, data = hearta, scale = 1) fit2 <- pyears(Surv(stop/365.25, event) ~ tcut(age + 48, c(0,50,60,70,100)) + surgery, data = hearta, scale = 1) fit1$event/fit1$pyears #death rates on the surgery and non-surg arm fit2$event/fit2$pyears #death rates on the surgery and non-surg arm </pre> <hr /><div style="text-align: center;">[Package <em>survival</em> version 2.44-1.1 <a href="00Index.html">Index</a>]</div> </body></html>