Blog

Why Overall Accuracy Isn't Sufficient? 

There are many changes impacting the IVD world at present, changes to regulation such as the impending implementation of the IVD Regulation in Europe and also new disease states such as SARS-Cov-2 that have been impacting all our lives.

 

 I was recently reading a performance summary for a SARS-CoV-2 antigen assay where the manufacturer claimed the assay was 90% accurate.  These types of statements by themselves are not sufficient to describe the product performance.  It sounds good, so where is the problem?  The problem is with only a claim of 90% accuracy there is no way to determine sensitivity, specificity, positive and negative predictive values. Here are some definitions and a couple of examples to illustrate this point.

This is a standard 2X2 table used comparing a new qualitative assay method to a reference method or clinical truth.

Reference (or Clinical Truth)

Method X

Positive

Negative

Total

Positive

TP

FP

TP + FP

Negative

FN

TN

FN + TN

Total

TP + FN

FP + TN

N

 

Where:

TP (true positive) = reference positive and method positive

FP (false positive) = reference negative and method positive

FN (false negative) = reference positive and method negative

TN (true negative) = reference negative and method negative

Definitions of key performance statistics:

Accuracy = 100 x (TP+TN)/N

Sensitivity = 100 x TP/(TP+FN)

Specificity = 100 x TN/(FP+TN)

Disease prevalence = 100x(TP+FN)/N

Positive Predictive Value (PPV) = 100xTP/(TP+FP)

Negative Predictive Value (NPV) = 100xTN/(FN+TN)

 

Accuracy estimates [across both positive and negative samples (by reference)] what percentage of the total N matched the reference or clinical truth.  While this is a “feel good” number it alone is insufficient to provide as an estimate of the product performance.  For qualitative diagnostic products there are typically two groups the assay is trying to identify that make up the intended use population. Those that have the disease or in this case virus, and those that don’t. 

Accuracy is a weighted average of the sensitivity and specificity performance statistics.  With an accuracy of 90%, it doesn’t inform you about sensitivity or specificity.  Since accuracy  is a type of average (weighted) when one is lower the other is higher, it is likely that either sensitivity or specificity is lower than 90% while the other is higher than 90%.  It is also possible that both are close to 90%.  The problem is, it is impossible to determine what the performance actually is if the only statistic provided is accuracy.  The exceptions are when accuracy is 100% (or 0%) then both the sensitivity and specificity also have to be 100% (or 0%).

Consider these two examples, the number of positive reference samples will be held constant at 100 observations, prevalence of 20% for a total of 500 observations.  Both of these examples have an estimated accuracy of 90%.

Reference (or Clinical Truth)

Method 1

Positive

Negative

Total

Positive

99

50

149

Negative

1

350

351

Total

100

400

500

 

 

Reference (or Clinical Truth)

Method 2

Positive

Negative

Total

Positive

52

2

54

Negative

48

398

446

Total

100

400

500

Statistic

Estimate

Accuracy

90%

Sensitivity

99.0%

Specificity

87.5%

PPV

66.4%

NPV

99.7%

Statistic

Estimate

Accuracy

90%

Sensitivity

52.0%

Specificity

99.5%

PPV

96.3%

NPV

89.2%

 

 

 

 

 

 

 

As both Method 1 and 2 have accuracy of 90%, but is their performance the same?  No.

Is it clear which one would always be preferred over the other? No. 

It depends on the intended use/purpose of the assay.  Generally speaking, the positive and negative predictive values provide more insight for the benefit and risk of the assay. 

Sensitivity and specificity are driven by the performance of the product and are independent of prevalence.  Positive and negative predictive values are driven by both the product performance (sensitivity and specificity) as well as the prevalence.

The positive predictive value (PPV) estimates how likely is it that the subject or sample are actually positive when the new method has a positive result.  If the PPV is 66.4% (99/149) as with Method 1, then a sample with a positive assay result (by the new method) has a 66.4% chance of actually being positive by the reference (or clinical truth).  If the PPV is 96.3% (52/54) as with Method 2, then a sample with a positive test result would have a 96.3% chance that it is actually positive by the reference (or clinical truth). 

The NPV works the same way for negative test results of the new method.  Based on Method 1 the NPV is 99.7%, therefore if the Method 1 assay has a negative result there is a 99.7% that this is correct with respect to the reference (or clinical truth).  For Method 2 the NPV is 89.2% so there is a 89.2% chance that a negative Method 2 result would actually be negative by the reference (or clinical truth).

As both of these examples have the same accuracy of 90% it should be clear that this statistic by itself doesn’t provide a sufficient description of an IVD products performance.  Minimally, the sensitivity and specificity are both needed along with expected prevalence for the intended use population and estimated PPV & NPV.  Additionally, there are other statistics that can be useful in summarizing product performance but that is outside the scope of this discussion.

Understanding the sensitivity, specificity and PPV and NPV are essential so that clinicians can select devices with the appropriate clinical performance they need and are also part of the transparency expected under regulatory submissions including the IVD Regulation.

Final thought, some IVD products have multiple intended purposes and/or with different subgroups with different levels of prevalence.  This can require different levels of performance for the product and should be considered when developing acceptance criteria.

Next time, we will look at the impact of prevalence on PPV and NPV for different scenarios.

Leonard Buchner
发布日期: 七月 29, 2020
标签
帮助 联系我们