How to Estimate Sensitivity and Specificity for a LinkSolv Model

You can estimate sensitivity and specificity for a linkage model by calculating expected values for the number of True Positive Links, False Positive Links, True Negative Links, and False Negative Links for all merged pairs above the cutoff probability. The Merged Pairs table for each imputation (MP1, MP2, etc.) lists these pairs, each Match Probability (the probability that a pair is a true link), and each linkage status:

LP = Linked Pairs, candidate pairs imputed as true one to one links.

IP = Imputed Pairs, candidate pairs imputed as true many to many links that share a common record with a linked pair.

MP = Merged Pairs, candidate pairs imputed as false links.

Match Probability = probability that a pair is a true link, so (1 – Match Probability) = probability that a pair is a false link.

For a many to many linkage:

Expected True Positives = Sum of Match Probability over pairs with status = LP, IP
Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP, IP
Expected False Negatives = Sum of Match Probability over pairs with status = MP
Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP
Sensitivity = True Positives / (True Positives + False Negatives)
Specificity = True Negatives / (False Positives + True Negatives)

For a one to one linkage, only pairs with status = LP count as pairs imputed as true:

Expected True Positives = Sum of Match Probability over pairs with status = LP
Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP
Expected False Negatives = Sum of Match Probability over pairs with status = MP
Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP
Sensitivity = True Positives / (True Positives + False Negatives)
Specificity = True Negatives / (False Positives + True Negatives)

In general, you will get approximately the same Sensitivity and Specificity for each imputation. You can obtain multiple imputed estimates by combining Sensitivity and Specificity over all imputations:

Values for one imputed linkage, which is one draw from the posterior distribution for true matches, don’t really capture the spirit of multiple imputation. We analyze multiple imputations because one draw is seldom the whole story. As an example, suppose each of multiple imputations has Sensitivity = 0.50 and Specificity = 0.99.

Each imputation does a wonderful job of identifying true non-matches. However, each imputation does a relatively poor job of distinguishing true matches from false matches. This happens when many true matches don’t have very high probabilities – often the case for large files unless there are many match fields with high weights.

With multiple imputations, a true match found in any imputation contributes to analysis results when combined using SAS PROC MIANALYZE (c) SAS Corporation or IVEware (c) University of Michigan . Effectively, each imputation is an independent diagnosis and we can combine sensitivity and specificity in parallel. Sensitivity increases while specificity decreases, as described in

Combining screening tests in series or parallel, http://www.epidemiolog.net, V. Schoenbach, 9/21/2005

Following this approach,

Multiply Imputed Sensitivity and Specifity
Imputations Combined Parallel Sensitivity Parallel Specificity
1 50.00% 99.00%
1,2 75.00% 98.01%
1,2,3 87.50% 97.03%

You have to build custom queries to sum match probabilities. Open LinkSolv and open the Manage Project dialog for the project of interest. Here are examples of queries for Crash Hospital match, Imputation 1:

qryManyToManyTruePos1

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)<>’MP’));

qryManyToManyTruePos1
CountOfMatchProbability TruePositives FalsePositives
164,450 147,152 17,298

qryOneToOneTruePos1

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)=’LP’));

qryOneToOneTruePos1
CountOfMatchProbability TruePositives FalsePositives
164,450 147,152 17,298

qryFalseNegatives1

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS FalseNegatives, Sum(1-[MatchProbability]) AS TrueNegatives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)=’MP’));

qryFalseNegatives1
CountOfMatchProbability FalseNegatives TrueNegatives
1,417,356 146,509 1,270,847

You can use Excel © Microsoft Corporation to do the math:

For MP1
One to One
True Positives = Sum (Match Probability) over status = LP 147,152
False Positives = Sum (1 – Match Probability) over Status = LP 17,298
False Negatives = Sum (Match Probability) over Status = MP 146,509
True Negatives = Sum  (1 – Match Probability) over Status = MP 1,270,847
Sensitivity = True Positives / (True Positives + False Negatives) 50.11%
Specificity = True Negatives / (False Positives + True Negatives) 98.66%
Advertisements

One thought on “How to Estimate Sensitivity and Specificity for a LinkSolv Model

  1. I’ve been thinking further about measuring sensitivity and specificity for linkage models. Values for one imputed linkage, which is one draw from a posterior distribution for true matches, don’t really capture the spirit of multiple imputation. We analyze multiple imputations because one draw is seldom the whole story. As an example, suppose each of multiple imputations has Sensitivity = 0.50 and Specificity = 0.99.

    Each imputation does a wonderful job of identifying true non-matches. However, each imputation does a relatively poor job of distinguishing true matches from false matches. This happens when many true matches don’t have very high probabilities – often the case for large files unless there are many match fields with high weights.

    With multiple imputations, a true match found in any imputation contributes to analysis results when combined using SAS PROC MIANALYZE or IVEware. Effectively, each imputation is an independent diagnosis and we can combine sensitivity and specificity in parallel. Sensitivity increases while specificity decreases, as described in

    Combining screening tests in series or parallel, http://www.epidemiolog.net, V. Schoenbach, 9/21/2005

    Following this approach,

    Imputations
    Combined Parallel Sensitivity Parallel Specificity
    1 50.00% 99.00%
    1, 2 75.00% 98.01%
    1, 2, 3 87.50% 97.03%

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s