In the previous blog “Why Overall Accuracy Isn’t Sufficient”, the need to review multiple measures of clinical performance for IVD assays was discussed and showed that overall accuracy by itself is not sufficient to describe the clinical performance of IVDs. The concepts of sensitivity and specificity were discussed along with positive and negative predictive values (PPV and NPV).
This blog will dive deeper into positive predictive values (PPVs) and negative predictive values (NPVs) and using them as a way to consider setting limits for sensitivity and specificity. The used definitions are shown in the standard 2x2 table (Table 1), commonly used for comparing a new qualitative assay method to a reference method or clinical truth.
Table 1: Standard 2x2 table used to compare a new assay to a reference method
- TP (true positive) = reference positive and method positive
- FP (false positive) = reference negative and method positive
- FN (false negative) = reference positive and method negative
- TN (true negative) = reference negative and method negative
Definitions of key performance statistics:
- Accuracy = 100 x (TP+TN)/N
- Sensitivity = 100 x TP/(TP+FN)
- Specificity = 100 x TN/(FP+TN)
- Disease prevalence = 100x(TP+FN)/N
- Positive Predictive Value (PPV) = 100xTP/(TP+FP)
- Negative Predictive Value (NPV) = 100xTN/(FN+TN)
One can see from this 2x2 table, sensitivity and specificity are independent of prevalence. Let’s take a closer look at this with an example that demonstrates the independence.
If we were performing a study to estimate sensitivity and specificity, the underlying expected performance of sensitivity and specificity remains the same if the prevalence goes from 50% to 1% (Table 2a and 2b).
Table 2: Examples for sensitivity and specificity with A.) a prevalence of 50% and B.) a prevalence of 1%
It’s important to remember that sensitivity and specificity look at the performance relative to the clinical truth of the patient or refence assay. If the patient is positive or negative, what percent of the time does this assay get it right (sensitivity and specificity) relative to the clinical truth or the reference assay?
Positive and negative predictive value look at the 2x2 table from the direction of the new assay’s result. The predictive values are, given a positive (or negative) result from the new assay, the percentage of the time the new assay is correct relative to the number of positive results the new assay reports? Or in other words, what percentage of the new assay’s positive (or negative) results are actually positive (or negative) clinically or with the reference assay? Where PPV is an estimate for positive results and NPV is the same for negative results, from the new assay’s perspective.
These are key pieces of information a clinician needs to determine the next step in the diagnostic pathway. The test’s result is positive (or negative), how likely is that to be true?
For example, the set of 2x2 tables above (Test A and Test B, Table 2a and 2b) have exactly the same sensitivity and specificity (95% and 99% respectively) with Test A having prevalence of 50% and Test B having a prevalence of 1%. The PPV goes from 99% with a 50% prevalence down to 49% with a 1% prevalence. A PPV of 99% indicates that with a positive assay result there’s a 99% chance of it being correct. Likewise, with a 49% PPV, there is only a 49% chance that the patient is actually positive. Depending on the intended use of the product, one, both, or neither of these predictive values might be sufficient. The important thing to remember is that PPV (and NPV) show how likely a new assay test result is correct.
As the predictive values involve the sensitivity and specificity of the assay and the prevalence of the disease or condition in the intended use population one can see that different intended purposes with the same assay can result in vastly different predictive value performances. When determining the requirements for sensitivity and specificity of an assay it is critical to look at the expected prevalence to estimate the expected PPV and NPV and review that in the context of the intended use.
There are different uses or purposes for IVDs, e.g. screening, diagnostic, prognostic, etc. How the clinician will use the assay’s test result should drive the requirements for PPV and NPV. How likely an incorrect result is to occur when using the new assay and the different risks for false positives and false negatives must be considered. How low can the PPV and NPV be and still have a positive benefit-risk for the use of this new assay? This in turn will drive the requirements for sensitivity and specificity, given an estimated prevalence in the intended use population.
For example, let’s assume there is a need for an assay, where the PPV must be ≥ 90% and the NPV ≥ 99%, with an expected prevalence of 20%. This translates to needing a high level of confidence when the new assay’s result is negative and the clinicians can tolerate a few more new assay positives that are false. Here’s one set of sensitivity and specificity that meet the PPV and NPV requirements at 20% prevalence (Table 3).
Table 3: Set of PPVs and NPVs per prevalence for a 96% sensitivity and 98% specificity
A sensitivity of 96%, and a specificity of 98% at 20% prevalence meets the requirements for PPV (≥90%) and NPV (≥99%). There are lots of combinations of sensitivity and specificity at 20% prevalence that will meet these PPV and NPV requirements. Depending on the specifics of the assay it may be easier to meet one set of sensitivity and specificity requirements than another in order to meet the PPV and NPV requirements. This should be considered before setting the sensitivity and specificity requirements for the assay.
Another reason to review how prevalence impacts PPV and NPV for a given set of sensitivity and specificity values, is that all values of prevalence are estimates. Prevalence should be estimated from a clinical trial that truly represents the intended use population and is based on the clinical truth or reference assay (not the new assay). In the example above, if the best estimate for prevalence was 20% but could be as low as 10% this could be a problem as the PPV drops below 90%. Therefore, it’s important to understand how good the estimates are for the prevalence and how well they relate to the assay’s specific intended use population.
When thinking about setting requirements for an IVD assay it is important to understand how the assay will be used, what minimum levels for PPV and NPV can be tolerated in the context of the clinical use and the prevalence in the intended use population. When starting a new assay development project some or all of this information may be unknown. It is critical during early development to fill in these unknowns to one degree or another in order to avoid developing an IVD assay that doesn’t meet any clinical needs.