Trauma Scoring Systems
|
Characterization of injury severity is crucial to the scientific study of trauma, yet the actual measurement of injury severity began only 50 years ago. In 1969, researchers developed the Abbreviated Injury Scale (AIS) to grade the severity of individual injuries. Since its introduction, researchers modified the AIS, most recently in 1990 (AIS-90). The AIS is the basis for the Injury Severity Score (ISS), which is the most widely used measure of injury severity in patients with trauma. Attempting to summarize the severity of injury in a patient with multiple traumas with a single number is difficult at best; therefore, multiple alternative scoring systems have been proposed, each with its own problems and limitations. This article reviews the conceptual and statistical background necessary to understand injury severity scoring, presents the most common scoring systems, and addresses new ideas and trends in trauma scoring.
APPLICATIONS OF TRAUMA SEVERITY SCORING An accurate method for quantitatively summarizing injury severity has many potential applications. The ability to predict outcome from trauma (ie, mortality) is perhaps the most fundamental use of injury severity scoring, a use that arises from the patient’s and the family’s desires to know the prognosis. More recently, physicians suggested that injury severity scoring can provide objective information for end-of-life decision-making and resource allocation. Unfortunately, trauma mortality prediction in the individual patient is limited and fraught with uncertainty. In fact, decisions for individual patients should never be based solely on a statistically derived injury severity score. Field trauma scoring also is used to facilitate rational prehospital triage decisions, thereby minimizing the time from injury occurrence to definitive management. Similarly, physicians suggest that it can enhance appropriate use of helicopters and timely transfer of severely injured patients to trauma wards. Trauma scoring also is used for quality assurance by allowing evaluation of trauma care both within and between trauma centers, a contentious and controversial area that is likely to only increase in importance. Perhaps the most important role for injury severity scoring is in trauma care research. Scientific study of the epidemiology of trauma and trauma outcomes would not be possible otherwise. Injury severity scoring is indispensable in stratifying patients into comparable groups for prospective clinical trials. Similarly, this technique can be used retrospectively to identify and control for differences in baseline injury severity between patient populations.
BASIC STATISTICAL CONCEPTS Fundamentally, trauma outcome prediction is a multivariate problem. Researchers use multiple independent variables (eg, age, injury severity) to predict the dependent variable (or outcome). Most physicians are familiar with the simplest form of regression analysis, simple linear regression, which describes the linear relationship between 2 variables. Multiple regression is an extension of this technique, in which more than one independent variable is used to describe a single, continuous dependent variable. Multiple regression is advantageous because it allows one to measure the association between a predictor variable and an outcome variable while controlling for other modifying factors. Researchers use multiple regression, therefore, to control for the effects of many variables and assess the independent effect of a single variable. In trauma severity scoring, mortality is the outcome that has elicited the most interest. Mortality is a dichotomous variable having only 2 possible values, death or survival. Although several methods are available, multiple logistic regression is the most popular approach when the outcome of interest is dichotomous because of some unique advantages of multiple logistic regression. This expression depicted as x varies from - infinity to + infinity. The resulting curve is sigmoidal and asymptotically approaches 0 when x = - infinity and 1 when x = + infinity. Therefore, f(x) must take on a value between 0 and 1 regardless of the value of x. This is the primary reason that logistic regression is so popular; typically it is used to describe the probability of an event occurring, which always is a number between 0 and 1. In epidemiologic terms, this probability describes the risk of an individual experiencing a particular outcome. Other statistical techniques may yield a probability that is greater than 1, which is not possible. The shape of the logistic function also is an important part of the popularity of logistic regression. The sigmoidal curve is appealing from the epidemiologic standpoint because it may represent the concept of threshold, which may apply to a variety of diseases. This means that an individual's risk for an outcome is minimal until reaching some threshold value. The risk then rises rapidly and plateaus. Logistic regression is mathematically convenient in that one can easily convert the coefficients of the equation into estimates of the risk of developing a disease or outcome given the presence of a particular risk factor. Researchers adjust these risk estimates for the effects of other risk factors or covariates included in the logistic regression equation. Outcome prediction never will be perfect, in part because injury severity is difficult to quantify. Perhaps more important is that the patient's response to injury is complex and difficult to model adequately; therefore, multiple scoring systems emerged. Practitioners should be able to assess the predictive performance of each system in order to compare them. Measures of predictive performance include explanatory power, discrimination, and calibration. Explanatory power is that proportion of the prediction outcome that can be explained by the model rather than by variation. This is reflected by the coefficient of determination (r2). Discrimination is the ability of the model to separate the patients into 2 groups; for example, those who survive and those who die. This involves sensitivity, specificity, and accuracy, which are concepts well understood by most physicians. However, when applied to predictive models, these concepts can be problematic. A trauma survival predictive model yields a probability of survival; while in reality, patients can only live or die. Therefore, a prediction rule must be established; typically, researchers assign a cutoff point of 0.5. Patients with a probability of survival greater than 0.5, therefore, are predicted to have lived, while those with a probability of survival less than or equal to 0.5 are predicted to have died. The problem is that sensitivity, specificity, and accuracy all vary depending on a prediction rule chosen. Receiver operating characteristic (ROC) curve analysis can help evaluate the accuracy and discrimination of a predictive model over a wide range of cutoff points. The ROC curve is constructed by plotting the sensitivity on the y-axis and (1 - specificity) on the x-axis at different cutoff points. The area under the ROC curve measures the accuracy of the model. A straight line arising from the origin at a 45° angle has an area under the curve of 0.5 and represents accuracy no better than flipping a coin. A perfect predictive model has an area under the curve of 1.0. As accuracy and discrimination improve, the ROC curve moves upward and to the left. ROC curves allow one to compare different predictive models used in the same population of patients. Calibration is the ability of the model to correctly predict outcome over the entire range of risk. Calibration can be assessed graphically by plotting the actual outcome against the predicted outcome. Calibration is assessed statistically by goodness-of-fit testing, most commonly the Hosmer-Lemeshow test. This test involves grouping patients into risk categories and using a modified chi-square analysis to compare the observed and predicted outcomes in each group. The hypothesis tested is that the model’s predictions are the same as the actual outcome; therefore, higher P values are desired and reflect a good fit. |
PHYSIOLOGIC SCORES Revised Trauma Score Physicians apply the physiologic injury severity scoring largely in the prehospital setting as a triage tool. The Revised Trauma Score (RTS) is one of the more common physiologic scores. It uses 3 specific physiologic parameters, (1) the Glasgow Coma Scale (GCS), (2) systemic blood pressure (SBP), and (3) the respiratory rate (RR). Practitioners code parameters from 0-4 based on the magnitude of the physiologic derangement. The RTS has 2 forms depending on its use. When used for field triage, the RTS is determined by adding each of the coded values together. Thus, the RTS ranges from 0-12 and is calculated very easily. An RTS of less than 11 is used to indicate the need for transport to a designated trauma center. The coded form of the RTS is used more frequently for quality assurance and outcome prediction. The coded RTS is calculated as follows, in which SBPc, RRc, and GCSc represent the coded values of each variable:
Obviously, this value is more complicated to compute, which limits its usefulness in the field. The main advantage of the coded RTS is that the weighting of the individual components emphasizes the significant impact of traumatic brain injury on outcome. The RTS has several limitations that affect its usefulness. Most of these limitations are related to the GCS. As originally described, the GCS was intended to measure the functional status of the central nervous system. Because of the importance of head injury in determining trauma outcome, the GCS also is used by many as a component of trauma severity scoring. Problems inherent to the GCS (and RTS) include the inability to accurately score patients who are intubated and mechanically ventilated. Determining the verbal component of the GCS and the respiratory rate are difficult in these patients. Moreover, patients who are pharmacologically paralyzed or under the influence of alcohol or illicit drugs also are difficult to score. Alternative approaches in this setting include using the best motor response and the eye-opening response to calculate or predict the verbal response. Research has shown that substitution of the best motor response for the GCS results in no loss of predictive capability. More recently, researchers have shown that the best motor response predicts trauma mortality as well as or better than other trauma severity scores. Acute Physiology and Chronic Health Evaluation The Acute Physiology and Chronic Health Evaluation (APACHE), introduced in 1981, is used widely for the assessment of illness severity in intensive care units (ICUs). This system has 2 components, (1) the chronic health evaluation, which incorporates the influence of comorbid conditions (eg, diabetes mellitus, cirrhosis) and (2) the Acute Physiology Score (APS). The APS consists of weighted variables representing the major physiologic systems, including neurologic, cardiovascular, respiratory, renal, gastrointestinal, metabolic, and hematologic variables. Researchers use data that are the most abnormal during the first 24 hours. In 1985, the APACHE system was revised (ie, APACHE II) by reducing the number of APS variables from 34 to 12, restricting the comorbid conditions and deriving coefficients for specific diseases. APACHE II is the most widely applied APACHE system; however, it has several potential limitations. The GCS, which forms a powerful predictive component of the APS, was not intended to reflect extracranial injuries. Being a relatively younger population, comorbidity is unusual in these patients and the potential exists for lead-time bias. By using only ICU data and not accounting for prior treatment, APACHE II underestimates mortality in patients who are transferred to the ICU after relative stabilization. Patients with trauma frequently are resuscitated in the emergency department or operating room prior to admission to the ICU. Patients with trauma comprise only 8% of the population used to develop APACHE II, with only a 9% case-fatality rate. Moreover, 85% of trauma fatalities were related to traumatic brain injury. In 1992, researchers showed that APACHE II is inferior to the Trauma and Injury Severity Score (TRISS) in predicting mortality in injured patients. Poor performance was related largely to the absence of an anatomic component in the APACHE system. The most recent version, APACHE III, was published in 1991 and was designed to address many of these issues. The most important modifications were including 17 variables; limiting comorbid conditions to those affecting immune function; disease-specific equations, including multiple trauma; distinguishing between head and nonhead trauma; and accounting for potential lead-time bias. Practitioners do not widely accept APACHE III, partially because it is proprietary and expensive. In addition, its accuracy needs to be convincingly validated in patients with trauma.
ANATOMIC SCORES Injury Severity Score Researchers developed the AIS as a simple numerical method for grading and comparing injuries by severity. Although originally intended for use with vehicular injuries, its scope is increasingly expanded to include other injuries. The AIS is a consensus-derived, anatomically based system of grading injuries on an ordinal scale ranging from 1 (minor injury) to 6 (lethal injury). The AIS does not reflect the combined effects of multiple injuries; however, it forms the foundation for the ISS. Baker et al introduced the ISS in 1974 as a means of summarizing multiple injuries in a single patient. The ISS is defined as the sum of squares of the highest AIS grade in the 3 most severely injured body regions. Six body regions are defined, as follows: the thorax, abdomen and visceral pelvis, head and neck, face, bony pelvis and extremities, and external structures. Only one injury per body region is allowed. The ISS ranges from 1-75, and an ISS of 75 is assigned to anyone with an AIS of 6. The ISS has several limitations. The most obvious limitation is its inability to account for multiple injuries to the same body region. Similarly, it limits the total number of contributing injuries to only 3. This seriously impairs the usefulness of the ISS in penetrating injuries, in which multiple injuries are common. The ISS weights injuries to each body region equally, ignoring the importance of head injuries in mortality from trauma. Furthermore, mortality is not strictly an increasing function of the ISS. The mortality rate for an ISS of 16, therefore, is higher than the mortality rate for an ISS of 17 because of the different combinations of AIS scores that comprise each. Another idiosyncrasy of the ISS is that many ISS values cannot occur, while other ISS values can result from multiple different combinations of AIS scores. Obviously, this makes the ISS a heterogeneous score and reduces its predictive ability. Although the classic use of the ISS is to predict mortality from trauma, the ISS also has been noted to be a consistent risk factor predictor for postinjury multiple-organ failure (MOF). In developing predictive models for MOF, researchers categorized risk factors as related to tissue injury severity, cellular shock severity, the magnitude of the systemic inflammatory response to the injury, and host factors (eg, age, sex, comorbidity). Tissue injury severity is a major component of these predictive models, and it is readily quantifiable using the ISS. Recognizing the limitations of the ISS, researchers subsequently investigated the Anatomic Profile (AP) as an alternative measure of tissue injury severity, observing that the AP offered no advantage over the ISS in predicting postinjury MOF. Moreover, they found the AP difficult to calculate with greater interrater variability compared to the ISS. Recently, Osler et al reported a modified ISS (new ISS or NISS) based on the 3 most severe injuries regardless of body region. This simple but significant modification of the ISS avoids many of its previously acknowledged limitations. By preserving the AIS as the framework for injury severity scoring, the NISS remains familiar and user-friendly. Preliminary studies suggest that the NISS is a more accurate predictor of trauma mortality than the ISS, particularly in penetrating trauma. Other researchers demonstrated that the NISS is superior to the ISS as a measure of tissue injury in predictive models of postinjury MOF. Osler et al recommend that the NISS replace the ISS as the standard anatomic measure of injury severity. Anatomic Profile In response to the limitations of the ISS, researchers developed the AP. Unlike the ISS, the AP includes all serious injuries in a body region. Moreover, the AP appropriately weights head and torso injuries more heavily than other body regions. This index summarizes all serious injuries (AIS greater >3) into 3 categories. Category A includes the head and spinal cord. Category B encompasses the thorax and anterior neck. Category C includes all remaining serious injuries. A fourth category, category D, summarizes all nonserious injuries. Practitioners calculate each component as the square root of the sum of squares of the AIS scores of all serious injuries within each region. A region with no injury receives a score of zero. Using logistic regression, these AP component values are used to calculate a probability of survival. The AP performs better than the ISS in discriminating survivors from nonsurvivors and may provide a more rational basis for comparing injury severity between patients. However, the AP failed to garner much interest or support, probably due to its mathematical complexity and only modest improvement in predictive performance. International Classification of Diseases Another, more recent approach to anatomic injury scoring is based on the International Classification of Disease, Ninth Edition (ICD-9) codes. This method is termed ICD-9 Injury Severity Score (ICISS) and uses survival risk ratios (SRRs) calculated for each ICD-9 discharge diagnosis. SRRs are derived by dividing the number of survivors in each ICD-9 code by the total number of patients with the same ICD-9 code. ICISS is calculated as the simple product of the SRRs for each of the patient’s injuries. ICISS has some advantages over the ISS. First, it represents a true continuous variable that takes on values between 0 and 1. Second, it includes all injuries. Third, ICD-9 codes are readily available and do not require special training or expertise to determine. Finally, ICD-9 has better predictive power when compared to the ISS. Moreover, ICISS has the potential to better account for the effects of comorbidity on outcome by including the SRR for each comorbidity present. Recently, researchers have shown that the ICISS outperforms the ISS in predicting other outcomes of interest (eg, hospital length of stay, hospital charges). Despite the apparent advantage of the ICISS, it has not yet replaced other methods of outcome analysis. Further validation is needed before it can be used widely. COMBINED SCORES Trauma and Injury Severity Score The predictive capability of any model usually is improved with the inclusion of additional relevant information. Champion and colleagues exemplified this concept with the development of the TRISS. This test combines both anatomic and physiologic measures of injury severity (ISS and RTS, respectively) and patient age in order to predict survival from trauma. Recognizing the difference between blunt and penetrating injury, researchers developed separate models for each mechanism. The logistic regression equation predicts the probability of survival, ie, P. RTSc is the coded version of the RTS, and patient age is categorized such that age is equal to zero if the patient is younger than 55 years and age is equal to one otherwise. The coefficients will differ for blunt and penetrating trauma. TRISS quickly became the standard methodology for outcome assessment. It appears to be valid for both adult and pediatric patients but has been criticized because (1) it is only moderately accurate for predicting survival; (2) problems already are noted with the ISS (eg, inhomogeneity, inability to account for multiple injuries to the same body region); (3) no information is incorporated related to preexisting conditions (eg, cardiac disease, chronic obstructive pulmonary disease, cirrhosis); (4) similar to the RTS, it cannot include intubated patients because respiratory rate and verbal responses are not obtainable; and (5) it does not incorporate an accounting for patient mix (making comparisons between trauma centers difficult). A Severity Characterization of Trauma In an attempt to address these shortcomings, Champion et al introduced A Severity Characterization of Trauma (ASCOT) in 1990 as an improvement over TRISS. ASCOT uses the AP in place of the ISS and categorizes age into deciles. In addition, changes include the individual components of the coded RTS that were included as independent predictors in the final logistic regression model. Despite these modifications, the predictive performance of ASCOT is only marginally better than the ISS. This, coupled with the complex nature of the AP component, has discouraged widespread acceptance of ASCOT. ICISS also is combined with age and the RTS in a manner similar to TRISS analysis. This model has superior predictive power and is better calibrated than TRISS. Moreover, this ICISS-based model is a superior predictor of resource utilization in injured patients. CONCLUSION Despite its imperfections, trauma severity scoring remains important for many reasons. ICISS may reflect a significant improvement in methodology, but this requires further validation. Continued research hopefully will improve methodology and make accurate trauma prediction a reality. Until that time, exercise caution in using existing severity scores for purposes for which they were not intended, eg, decisions to withdraw support or allocate limited resources. |