You can estimate sensitivity and specificity for a linkage model by calculating expected values for the number of True Positive Links, False Positive Links, True Negative Links, and False Negative Links for all merged pairs above the cutoff probability. The Merged Pairs table for each imputation (MP1, MP2, etc.) lists these pairs, each Match Probability (the probability that a pair is a true link), and each linkage status:

LP = Linked Pairs, candidate pairs imputed as true one to one links.

IP = Imputed Pairs, candidate pairs imputed as true many to many links that share a common record with a linked pair.

MP = Merged Pairs, candidate pairs imputed as false links.

Match Probability = probability that a pair is a true link, so (1 – Match Probability) = probability that a pair is a false link.

For a many to many linkage:

Expected True Positives = Sum of Match Probability over pairs with status = LP, IP |

Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP, IP |

Expected False Negatives = Sum of Match Probability over pairs with status = MP |

Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP |

Sensitivity = True Positives / (True Positives + False Negatives) |

Specificity = True Negatives / (False Positives + True Negatives) |

For a one to one linkage, only pairs with status = LP count as pairs imputed as true:

Expected True Positives = Sum of Match Probability over pairs with status = LP |

Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP |

Expected False Negatives = Sum of Match Probability over pairs with status = MP |

Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP |

Sensitivity = True Positives / (True Positives + False Negatives) |

Specificity = True Negatives / (False Positives + True Negatives) |

In general, you will get approximately the same Sensitivity and Specificity for each imputation. You can obtain multiple imputed estimates by combining Sensitivity and Specificity over all imputations:

Values for one imputed linkage, which is one draw from the posterior distribution for true matches, don’t really capture the spirit of multiple imputation. We analyze multiple imputations because one draw is seldom the whole story. As an example, suppose each of multiple imputations has Sensitivity = 0.50 and Specificity = 0.99.

Each imputation does a wonderful job of identifying true non-matches. However, each imputation does a relatively poor job of distinguishing true matches from false matches. This happens when many true matches don’t have very high probabilities – often the case for large files unless there are many match fields with high weights.

With multiple imputations, a true match found in any imputation contributes to analysis results when combined using SAS PROC MIANALYZE (c) SAS Corporation or IVEware (c) University of Michigan . Effectively, each imputation is an independent diagnosis and we can combine sensitivity and specificity in parallel. Sensitivity increases while specificity decreases, as described in

Combining screening tests in series or parallel, http://www.epidemiolog.net, V. Schoenbach, 9/21/2005

Following this approach,

Multiply Imputed Sensitivity and Specifity |
||

Imputations Combined |
Parallel Sensitivity |
Parallel Specificity |

1 | 50.00% | 99.00% |

1,2 | 75.00% | 98.01% |

1,2,3 | 87.50% | 97.03% |

You have to build custom queries to sum match probabilities. Open LinkSolv and open the Manage Project dialog for the project of interest. Here are examples of queries for Crash Hospital match, Imputation 1:

**qryManyToManyTruePos1**

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)<>’MP’));

qryManyToManyTruePos1 |
||

CountOfMatchProbability |
TruePositives |
FalsePositives |

164,450 | 147,152 | 17,298 |

**qryOneToOneTruePos1**

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)=’LP’));

qryOneToOneTruePos1 |
||

CountOfMatchProbability |
TruePositives |
FalsePositives |

164,450 | 147,152 | 17,298 |

**qryFalseNegatives1**

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS FalseNegatives, Sum(1-[MatchProbability]) AS TrueNegatives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)=’MP’));

qryFalseNegatives1 |
||

CountOfMatchProbability |
FalseNegatives |
TrueNegatives |

1,417,356 | 146,509 | 1,270,847 |

**You can use Excel © Microsoft Corporation to do the math:**

For MP1 | |||||

One to One | |||||

True Positives = Sum (Match Probability) over status = LP | 147,152 | ||||

False Positives = Sum (1 – Match Probability) over Status = LP | 17,298 | ||||

False Negatives = Sum (Match Probability) over Status = MP | 146,509 | ||||

True Negatives = Sum (1 – Match Probability) over Status = MP | 1,270,847 | ||||

Sensitivity = True Positives / (True Positives + False Negatives) | 50.11% | ||||

Specificity = True Negatives / (False Positives + True Negatives) | 98.66% |

I’ve been thinking further about measuring sensitivity and specificity for linkage models. Values for one imputed linkage, which is one draw from a posterior distribution for true matches, don’t really capture the spirit of multiple imputation. We analyze multiple imputations because one draw is seldom the whole story. As an example, suppose each of multiple imputations has Sensitivity = 0.50 and Specificity = 0.99.

Each imputation does a wonderful job of identifying true non-matches. However, each imputation does a relatively poor job of distinguishing true matches from false matches. This happens when many true matches don’t have very high probabilities – often the case for large files unless there are many match fields with high weights.

With multiple imputations, a true match found in any imputation contributes to analysis results when combined using SAS PROC MIANALYZE or IVEware. Effectively, each imputation is an independent diagnosis and we can combine sensitivity and specificity in parallel. Sensitivity increases while specificity decreases, as described in

Combining screening tests in series or parallel, http://www.epidemiolog.net, V. Schoenbach, 9/21/2005

Following this approach,

Imputations

Combined Parallel Sensitivity Parallel Specificity

1 50.00% 99.00%

1, 2 75.00% 98.01%

1, 2, 3 87.50% 97.03%