ICVTS Click here for other ICVTS advertising opportunities
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to Personal Folders
Right arrow Download to citation manager
Right arrow Permission Requests
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Vanagas, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vanagas, G.
Related Collections
Right arrow Cardiac - other
Right arrow Education
Right arrow Coronary disease
Interactive Cardiovascular and Thoracic Surgery 3:319-322(2004)
© 2004 European Association of Cardio-Thoracic Surgery


Work in progress report - Cardiac general

Receiver operating characteristic curves and comparison of cardiac surgery risk stratification systems

Giedrius Vanagas*

Kaunas University of Medicine Heart Center, Department of Cardiac Surgery, Eiveniu 2, Kaunas, Lithuania

* Tel.: +370-650-91393; fax: +370-37-326934
kaunosirdies{at}centras.lt

Received December 15, 2003; received in revised form January 16, 2004; accepted January 19, 2004


    Abstract
 Top
 Abstract
 1. Introduction
 3. Basic principles of...
 6. Conclusions
 References
 
We use past experiences every day when we choose one therapy over another; we frequently base our decisions on the relative probability that a particular treatment will be successful in an individual patient. Preoperative risk score systems are an essential tool for risk assessment in cardiac surgery. If we use just any risk stratification we will make diagnostic errors. During the last decade the examination of the performance of cardiac surgery risk stratification systems became very popular. There are a lot of studies, which show that risk stratification systems have high predictive value but they overpredict mortality rates for sample population. When reading these articles it is unclear what influences the mortality overprediction? We review main principles of receiver operating characteristic curve use for risk stratification systems' performance assessment and describe basic statistical explanations regarding errors in mortality prediction.

Key Words: Cardiac surgery; Score systems; Validation; Receiver operating characteristic curves; Risk stratification


    1. Introduction
 Top
 Abstract
 1. Introduction
 3. Basic principles of...
 6. Conclusions
 References
 
Preoperative risk score systems are an essential tool for risk assessment in cardiac surgery. Twelve score systems have been developed to predict mortality after adult heart surgery [1]. Most cardiac surgery risk stratification systems were primarily designed to predict mortality, postoperative morbidity and have been acknowledged as the major determinant of hospital cost, length of stay and quality of care [2,3]. Prognostic scoring helps the doctor, patient and the family to weigh the risk, benefit of medical care and clarifies their expectations. Accurate and validated risk stratification will result in better communication with patients and their relatives. Treatment is more likely to be consistent with the patient's value system and can be used as objective outcome predictor, which helps the appropriate allocation of financial and human resources.

For many clinicians, the most important question regarding prognostic scoring systems is, how can they help with individual patient care decisions? Many physicians believe that group statistics do not apply to individuals. Individual patients have unique characteristics; they also share many common features with previous patients and consideration of these similarities permits us to anticipate the patients' responses and predict their outcomes. We use past experiences every day when we choose one therapy over another; we frequently base our decisions on the relative probability that a particular treatment will be successful in an individual patient.

Statistical predictions of outcome produced by prognostic scoring systems in most cases are more reliable and are apparently at least as accurate as clinical predictions. These findings suggest that the predictions available from prognostic scoring systems could be useful in clinical judgment and decision making for individual patients [4]. Although all these score systems are based on patient-derived data such as age, gender, co-morbidity, and so forth, there are considerable differences between scores with regard to their design and validity.

During the last decade the examination of the performance of cardiac surgery risk stratification systems became very popular. There are a lot of studies which show that risk stratification systems have high predictive value but they overpredict mortality rates for sample population [5–9]. When reading these articles it is unclear what influences the mortality overprediction? This article will review main principles of receiver operating characteristic (ROC) curve use for risk stratification systems' performance assessment and describe basic statistical explanations regarding errors in mortality prediction.

2. Sensitivity, specificity and predictive value

If we use just any risk stratification we will make diagnostic errors. Commonly used measures of the performance of a test are the sensitivity (Se) and specificity (Sp). For risk stratification purposes all cases are classified into compatible (positive) or not compatible (negative) within system-predicted mortality. Let us assume that it is possible to allocate all patients to the groups, survivors or non-survivors, without any errors. We can then study how frequently the test we applied gives rise to true positive (TP), false positive (FP), true negative (TN) and false negative (FN) results. This leads to the following matrix (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1 Matrix of risk stratification system observed and predicted mortality

 
Se refers to how good a risk stratification system is at correctly identifying people who have to die. When calculating sensitivity we are therefore interested in only this group of people. From this we can derive the following formula [10]:

Sp is concerned with how good the risk stratification system is at correctly identifying people who will alive after cardiac surgery and could be accounted by the following formula [10]:

Ideally we want both Se and Sp to be one. Increasing either Se or Sp will usually result in a decrease in the other measure. That means in practice it is impossible to predict surgery outcomes with 100% Se and Sp.

Testing of predictive value can be used to assess performance of the risk stratification systems. Predictive value can be positive and negative: positive predicting value (PPV) is the percentage of patients among subjects correctly classified on the basis of the risk stratification system result as positive (death), where predictive value of negative results (NPV) is the percentage of subjects correctly classified as negative (alive) by the following formulae [10]:



    3. Basic principles of ROC curves
 Top
 Abstract
 1. Introduction
 3. Basic principles of...
 6. Conclusions
 References
 
ROC curves are widely used in the medical literature to assess the performance of a diagnostic test. The ROC curve is a graphical technique to try and establish the optimal cut point and is a procedure derived from the early days of radar and sonar detection used in the Second World War, hence the name receiver operating characteristic, and only later applied in medicine [8,11]. In order to construct an ROC curve we need to calculate the Se and Sp of the test for each possible cut point value.

To make the ROC graph, the -axis is 1–Sp (the false positive rate) and the -axis is the Se (the true positive rate). We draw a diagonal line on the graph from (0,0) in the lower left hand corner to (1,1) in the upper right hand corner. This line reflects the characteristics of a test with no discriminating power. The underlying assumption of ROC analysis is that a diagnostic variable is used to discriminate between two states after performed cardiac surgery: non-survivors and survivors. The diagnostic Se and Sp are functions of the selected cut point value. ROC analysis assesses the diagnostic performance of the system in terms of Se and (1–Sp) for each possible cut point value of the test.

The area under the ROC curve (AUC) is a summary statistic of diagnostic performance. ROC plots for risk stratification systems with perfect discrimination between non-survivors and survivors pass through the coordinates (0,1), which represent 100% Se and Sp, and the AUC would be 1. The AUC could distinguish between non-predictive (AUC 0.5), less predictive (0.5<AUC<0.7), moderately predictive (0.7<AUC<0.9), highly predictive (0.9<AUC<1) and perfect prediction (AUC 1) [8,12].

The AUC summarizes the ROC curve as a whole, and therefore attributes the same weighting to both relevant and irrelevant parts of the curve. In practice, one would not select cut point values from those parts of the ROC curve that have either maximum (lower left part) or minimum slope (upper right part) because other cut point values exist that lead to better Se without loss of Sp or better Sp without loss of Se, respectively [8]. The AUC statistic gives equal weighting to Se and Sp, which should be considered for interpretation.

4. Use of ROC analysis for comparison

The ROC curves are most helpful when comparing two or more risk stratification systems. Risk stratification systems may be compared for several reasons. The evaluation of a new risk stratification system against an established reference system is an example of risk stratification systems' comparison. Often it is of interest to compare different risk stratification systems on sample population to validate their performance [1,5,13,14].

The Se and Sp at a single cut point rather represent a summary statistic of the overall diagnostic performance of the test. AUCs are useful measures for comparison of the overall diagnostic performance of two tests. However, given the equal weighting attributed to all parts under the curves, it is possible that the comparison of the AUC will be non-significant for two tests that differ in an area of practical relevance. Comparison of crossing ROC curves may also result in misleading inferences from AUC estimates.

From the ROC curve analysis we cannot conclude the accuracy of predicted value of the risk stratification systems. ROC curve analysis is just an analysis of the discriminatory power of a certain cardiac surgery risk stratification system, which can be compared with another one, but gives no evidence on the actual accuracy of predicted value.

Thompson [15] describes the use of accuracy indices for comparing two or more systems. Accuracy values reflect the possible discrepancies between predicted and observed values. Accuracy of the risk stratification system is the percentage of non-survivors and survivors correctly classified on the basis of the risk stratification system results:

Low accuracy values indicate that risk stratification system can lead to imprecise mortality prediction as can be observed. High accuracy values indicate that risk stratification system can accurately predict mortality.

5. Results interpretation. Type I and Type II errors

Each application of a diagnostic test is associated with specific consequences of the possible outcomes. When we use ROC analysis for risk stratification systems' comparison it is possible to err in either of two directions: we can disagree with something that is true or agree with something that is false.

The first error is that the risk stratification system predicts surviving when in reality it is not survives (false negative results). Researchers call this a Type I error. Another possible error is to fail to predict mortality when it is false (false positive results). This is called a Type II error (Table 2).


View this table:
[in this window]
[in a new window]
 
Table 2 Distinguishing between Type I and Type II errors in mortality prediction

 
Having results from risk stratification systems' comparison and its negative and positive predictive values we can account false negative and positive percentage (FNP and FPP) by the following formulae:


FNP is the percent of the persons wrongly classified as survivors or percent of Type I errors. FPP is the percent of the persons wrongly classified as non-survivors or the percent of Type II errors. This classification of the results can explain why risk stratification systems with high AUC or predictive value overpredict mortality rate. This is due to wrong classification as non-survivors and survivors within risk stratification system or in other words it is due to high percentage of Type I errors.


    6. Conclusions
 Top
 Abstract
 1. Introduction
 3. Basic principles of...
 6. Conclusions
 References
 
Statistical predictions of outcome produced by prognostic scoring systems in most cases are reliable and can be useful in mortality prediction for performing cardiac surgery. Accurate and validated risk stratification will result in better communication with patients and their relatives.

Risk stratification systems with wide area under receiver operating characteristics curve or high predictive value overpredict mortality rate due to wrong classification as non-survivors and survivors within risk stratification system due to high percentage of Type I errors.

doi:10.1016/j.icvts.2004.01.008


    References
 Top
 Abstract
 1. Introduction
 3. Basic principles of...
 6. Conclusions
 References
 

  1. Pinna-Pintor P, Bobbio M, Colangelo S, Veglia F, Giammaria M, Cuni D, Maisano F, Alfieri O. Inaccuracy of four coronary surgery risk-adjusted models to predict mortality in individual patients. Eur J Cardiothorac Surg. 2002;(199):204
  2. Metz CE, Herman BA, Shen JH. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med. 1998;17(9):1033–1053[CrossRef][Medline]
  3. Higgins TL. Quantifying risk and assessing outcome in cardiac surgery. J Cardiothorac Vasc Anesth. 1998;12:330–340[CrossRef][Medline]
  4. Gefeller O, Brenner H. How to correct for chance agreement in the estimation of sensitivity and specificity of diagnostic tests. Methods Inf Med. 1994;33(2):180–186[Medline]
  5. Vanagas G, Kinduris S, Leveckyte A. Comparison of various score systems for risk stratification in heart surgery. Medicina (Kaunas). 2003;39(8):739–744[Medline]
  6. Yende S, Wunderink R. Validity of scoring systems to predict risk of prolonged mechanical ventilation after coronary artery bypass graft surgery. Chest. 2002;122(1):239–244[Abstract/Free Full Text]
  7. Kurki TS, Jarvinen O, Kataja MJ, Laurikka J, Tarkka M. Performance of three preoperative risk indices; CABDEAL, EuroSCORE and Cleveland models in a prospective coronary bypass database. Eur J Cardiothorac Surg. 2002;21(3):406–410[Abstract/Free Full Text]
  8. Greiner M, Pfeiffer D, Smith RD. Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. Prev Vet Med. 2000;45(1–2):23–41[CrossRef][Medline]
  9. Wynne-Jones K, Jacson M, Grotte G. Limitations of the Parsonnet score for measuring risk stratified mortality in the north west of england. Heart. 2000;84:71–78[Abstract/Free Full Text]
  10. Loong TW. Understanding sensitivity and specificity with the right side of the brain. Br Med J. 2003;327:716–719[Free Full Text]
  11. Zweig MH, Petrovich GN, Prijovich ZM. ROC plots display test accuracy, but are still limited by the study design. Clin Chem. 1993;39(6):1345–1346[Free Full Text]
  12. Swets JA. Measuring the accuracy of diagnostic systems. Science. 1988;240(4857):1285–1293[Abstract/Free Full Text]
  13. Al-Ruzzeh S, Asimakopoulos G, Ambler G, Omar RZ, Hasan R, Fabri B, El-Gamel A, DeSouza A, Zamvar V, Griffin S, Keenan D, Triverdi U, Pullann M, Cale A, Cowen M, Taylor KM, Amrani M. Validation of four different risk stratification systems in patients undergoing off-pump coronary artery bypass surgery: a UK multicentre analysis of 2223 patients. Heart. 2003;89:432–435[Abstract/Free Full Text]
  14. Karabulut H, Toraman F, Alhan C, Camur G, Evrenkaya S, Dagdelen S, Tarcan S. EuroSCORE overestimates the cardiac operative risk. Cardiovasc Surg. 2003;11(4):295–298[CrossRef][Medline]
  15. Thompson ML. Assessing the diagnostic accuracy of a sequence of tests. Biostatistics. 2003;4(3):341–351[Abstract]



This article has been cited by other articles:


Home page
J. Clin. Pathol.Home page
D. Sisci, C. Morelli, C. Garofalo, F. Romeo, L. Morabito, F. Casaburi, E. Middea, S. Cascio, E. Brunelli, S. Ando, et al.
Expression of nuclear insulin receptor substrate 1 in breast cancer
J. Clin. Pathol., June 1, 2007; 60(6): 633 - 641.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to Personal Folders
Right arrow Download to citation manager
Right arrow Permission Requests
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Vanagas, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vanagas, G.
Related Collections
Right arrow Cardiac - other
Right arrow Education
Right arrow Coronary disease


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
ANN THORAC SURG ASIAN CARDIOVASC THORAC ANN EUR J CARDIOTHORAC SURG
J THORAC CARDIOVASC SURG ICVTS ALL CTSNet JOURNALS