Why Overall Accuracy Isn't Sufficient?

There are many changes impacting the IVD world at present, changes to regulation such as the impending implementation of the IVD Regulation in Europe and also new disease states such as SARS-Cov-2 that have been impacting all our lives.

I was recently reading a performance summary for a SARS-CoV-2 antigen assay where the manufacturer claimed the assay was 90% accurate. These types of statements by themselves are not sufficient to describe the product performance. It sounds good, so where is the problem? The problem is with only a claim of 90% accuracy there is no way to determine sensitivity, specificity, positive and negative predictive values. Here are some definitions and a couple of examples to illustrate this point.

This is a standard 2X2 table used comparing a new qualitative assay method to a reference method or clinical truth.

	Reference (or Clinical Truth)
Method X	Positive	Negative	Total
Positive	TP	FP	TP + FP
Negative	FN	TN	FN + TN
Total	TP + FN	FP + TN	N

Where:

TP (true positive) = reference positive and method positive

FP (false positive) = reference negative and method positive

FN (false negative) = reference positive and method negative

TN (true negative) = reference negative and method negative

Definitions of key performance statistics:

Accuracy = 100 x (TP+TN)/N

Sensitivity = 100 x TP/(TP+FN)

Specificity = 100 x TN/(FP+TN)

Disease prevalence = 100x(TP+FN)/N

Positive Predictive Value (PPV) = 100xTP/(TP+FP)

Negative Predictive Value (NPV) = 100xTN/(FN+TN)

Accuracy estimates [across both positive and negative samples (by reference)] what percentage of the total N matched the reference or clinical truth. While this is a “feel good” number it alone is insufficient to provide as an estimate of the product performance. For qualitative diagnostic products there are typically two groups the assay is trying to identify that make up the intended use population. Those that have the disease or in this case virus, and those that don’t.

Accuracy is a weighted average of the sensitivity and specificity performance statistics. With an accuracy of 90%, it doesn’t inform you about sensitivity or specificity. Since accuracy is a type of average (weighted) when one is lower the other is higher, it is likely that either sensitivity or specificity is lower than 90% while the other is higher than 90%. It is also possible that both are close to 90%. The problem is, it is impossible to determine what the performance actually is if the only statistic provided is accuracy. The exceptions are when accuracy is 100% (or 0%) then both the sensitivity and specificity also have to be 100% (or 0%).

Consider these two examples, the number of positive reference samples will be held constant at 100 observations, prevalence of 20% for a total of 500 observations. Both of these examples have an estimated accuracy of 90%.

	Reference (or Clinical Truth)
Method 1	Positive	Negative	Total
Positive	99	50	149
Negative	1	350	351
Total	100	400	500

	Reference (or Clinical Truth)
Method 2	Positive	Negative	Total
Positive	52	2	54
Negative	48	398	446
Total	100	400	500

Statistic	Estimate
Accuracy	90%
Sensitivity	99.0%
Specificity	87.5%
PPV	66.4%
NPV	99.7%

Statistic	Estimate
Accuracy	90%
Sensitivity	52.0%
Specificity	99.5%
PPV	96.3%
NPV	89.2%

As both Method 1 and 2 have accuracy of 90%, but is their performance the same? No.

Is it clear which one would always be preferred over the other? No.

It depends on the intended use/purpose of the assay. Generally speaking, the positive and negative predictive values provide more insight for the benefit and risk of the assay.

Sensitivity and specificity are driven by the performance of the product and are independent of prevalence. Positive and negative predictive values are driven by both the product performance (sensitivity and specificity) as well as the prevalence.

The positive predictive value (PPV) estimates how likely is it that the subject or sample are actually positive when the new method has a positive result. If the PPV is 66.4% (99/149) as with Method 1, then a sample with a positive assay result (by the new method) has a 66.4% chance of actually being positive by the reference (or clinical truth). If the PPV is 96.3% (52/54) as with Method 2, then a sample with a positive test result would have a 96.3% chance that it is actually positive by the reference (or clinical truth).

The NPV works the same way for negative test results of the new method. Based on Method 1 the NPV is 99.7%, therefore if the Method 1 assay has a negative result there is a 99.7% that this is correct with respect to the reference (or clinical truth). For Method 2 the NPV is 89.2% so there is a 89.2% chance that a negative Method 2 result would actually be negative by the reference (or clinical truth).

As both of these examples have the same accuracy of 90% it should be clear that this statistic by itself doesn’t provide a sufficient description of an IVD products performance. Minimally, the sensitivity and specificity are both needed along with expected prevalence for the intended use population and estimated PPV & NPV. Additionally, there are other statistics that can be useful in summarizing product performance but that is outside the scope of this discussion.

Understanding the sensitivity, specificity and PPV and NPV are essential so that clinicians can select devices with the appropriate clinical performance they need and are also part of the transparency expected under regulatory submissions including the IVD Regulation.

Final thought, some IVD products have multiple intended purposes and/or with different subgroups with different levels of prevalence. This can require different levels of performance for the product and should be considered when developing acceptance criteria.

Next time, we will look at the impact of prevalence on PPV and NPV for different scenarios.

Veröffentlicht am:: Juli 29, 2020