Summary of Recent Changes
Build Build Date Comment
9.0.0405 07-Jan-2016 Summary Fixed syntax error so that Tabulate Duplicates is compatible with SQL Server
9.0.0407 07-Jan-2016 Summary Implemented Self Terminate using VBScript with delay
9.0.0405 07-Jan-2016 Summary Fixed syntax error so that Tabulate Duplicates is compatible with SQL Server
9.0.0407 07-Jan-2016 Summary Implemented Self Terminate using VBScript with delay
9.0.0247 28-Sep-2015 Summary Fixed intermittent error — Size of Random Sample not recalculated
9.0.0250 28-Sep-2015 Summary Included pairs with Keep Status IP in Maximum Likelihood Estimate for Set Method Take All Pairs
9.0.0253 23-Oct-2015 Summary Fixed intermittent error — Data Type Mismatch in Initialize, Update Match Probabilities
9.0.0257 23-Oct-2015 Summary Added LSAP 1-1 Pairs option for Set Method in Merge Passes
9.0.0262 24-Nov-2015 Summary Created fake data with proportions and counts closer to specs
9.0.0270 24-Nov-2015 Summary Selected Best Pairs from Maximum Likelihood Estimate using record scores
9.0.0280 24-Nov-2015 Summary Fixed intermittent error when Outcome Tests in Review Match conflict with Perform Match
9.0.0287 24-Nov-2015 Summary Accepted with a warning any input fields with 100% missing
9.0.0289 24-Nov-2015 Summary Fixed intermittent error Truncated tab control with larger form size 5.5″ X 7.25″
9.0.0290 24-Nov-2015 Summary Tested Function.PINYIN2, a better SOUNDEX for PINYIN names
9.0.0315 09-Dec-2015 Summary Improved management of objects opened by LinkSolv
9.0.0334 09-Dec-2015 Summary Fixed intermittent error — Access 2013 on Windows 8.1 crashes on close
9.0.0336 09-Dec-2015 Summary Added cases for Take All SubSets to Match.GroupPairsIntoSets
9.0.0351 10-Dec-2015 Summary Fixed intermittent error — Access running after close on Windows 8.1 and 10.
9.0.0374 10-Dec-2015 Summary Declared Global form UPPER and LEFT locations to be 0.05″ (72 FIPS)
9.0.0375 10-Dec-2015 Summary Fixed case logic for SelfMatch in SpecifyMatch.TotalMatchesAfter
9.0.0384 22-Dec-2015 Summary Added text to speech and contrasting colors for accessibility
9.0.0390 22-Dec-2015 Summary Fixed errors in new MLE,MLB and Take All SubSets algorithms
9.0.0399 22-Dec-2015 Summary Fixed errors in new Quit LinkSolv algorithms
1. Software Compatibility
2. Installing LINKSOLV 9.0
3. Updating LINKSOLV 9.0
______________________________________________________________
1. Software Compatibility
Microsoft, Microsoft Windows, Microsoft Office, and Microsoft SQL Server are trademarks or registered trademarks of Microsoft Corporation. LINKSOLV 9.0 is compatible with the latest releases of Windows Vista, 7, 8, 8.1, and 10, Windows Server 2008, 2008 R2, 2012, and 10; Office 2007, 2010, 2013, 2016, SQL Server 2008, 2008 R2, 2012, and 2014. If you are running earlier releases of these products then update the software on your computer before installing LINKSOLV. LINKSOLV may not operate correctly if you use earlier releases.
________________________________________________________________
2. Installing LINKSOLV 9.0
2.1 Backup and document all custom work if you wish to reference it in new projects.
2.1.1 Create backup copies of all files in the LINKSOLV installation folder (default folder is C:\Program Files\Strategic Matching\LinkSolv 9.0 for 32-bit Windows or C:\Program Files (x86)\Strategic Matching\LinkSolv 9.0 for 64-bit Windows).
2.1.2 Create reports for all old LINKSOLV projects including a) simulation specifications; b) data specifications; c) match specifications; d) Bayesian model checks.
2.2 Download and run the setup program for a new installation
2.2.1 Obtain the serial number and password for LINKSOLV 9.0 new installation from Strategic Matching.
2.2.2 Go to http://www.strategicmatching.com. Click on the Downloads tab.
2.2.3 Search Downloads for your serial number. If more than one link is listed then click on the one that includes your system configuration (for example, Windows 7 and Access 2010).
2.2.4 Click on the link for Setup 9.0 for New Installation. Respond to the download and update dialogs. Enter the password when prompted.
2.3 Set options for Microsoft Access
2.3.1 If necessary, create a folder for new Access databases created during record linkage projects, (for example C:\LinkSolv Projects). For better performance, the LinkSolv Projects folder should be local to the computer where LinkSolv is installed, not on a remote drive.
2.3.2 Enable all macros in the Access Trust Center Settings. Or, set trusted locations for LinkSolv Programs, LinkSolv Projects, and LinkSolv Data.
2.3.3 Go to Access Options, General tab and set the default location for new databases to the LinkSolv Projects folder.
2.3.4 Set Default Database format to Access 2007/2016.
2.3.5 Open LinkSolv and set LinkSolv options for your projects. Set Data Provider to JET for Access databases or SQL for SQL Server before you try to create new projects.
2.3.6 Click on the LinkSolv Help button to open the User Guide. Read the Getting Started section.
2.4 Import Old Projects
2.4.1 You can copy one or more projects from any archived STMTData.mdb (for LinkSolv 8.3) or STMTData.accdb (for LinkSolv 9.0) database.
2.4.2 Open LinkSolv, select the Projects tab, Click on Import Project and respond to the dialog.
________________________________________________________________
3. Updating LINKSOLV 9.0
3.1 Backup LINKSOLV Program Files (default folder is C:\Program Files\Strategic Matching\LinkSolv 9.0 for 32-bit Windows or C:\Program Files (x86)\Strategic Matching\LinkSolv 9.0 for 64-bit Windows).
3.2 Download and run the update program
3.2.1 Obtain the update version number and password for LINKSOLV 9.0 from Strategic Matching.
3.2.2 Go to http://www.strategicmatching.com. Click on the Downloads tab.
3.2.3 Search Downloads for the update version number. From the list of links, click on that includes your system configuration (for example, Windows 7 and Access 2010).
3.1.4 Click on the link for LINKSOLV Update. Respond to the download and update dialogs. Enter the password when prompted.
______________________________________________________________
The setup program LinkSolv9.0.0404Update.exe will update earlier releases of LinkSolv 9.0 to 9.0.0404 by replacing the LinkSolv database file (LinkSolv.accdb) and the Strategic Matching Library database file (STMTLibrary.accde) in the LinkSolv installation folder.
The old LinkSolv database file and Strategic Matching Library database file must already exist in the installation folder for LinkSolv 9.0. If not, setup will stop.
The old database files will be moved to a BACKUP folder and renamed. These files can be restored if you uninstall the update.
The updated database files will be placed in the installation folder for LinkSolv 9.0 with the same permissions as the old files.
PINYIN is a standard for approximating the sound of Chinese names using the western alphabet. Most letters are pronounced roughly like they are in English with exceptions. These exceptions can reduce the usefulness of SOUNDEX coding done for linkage purposes. First, SOUNDEX coding suppresses all vowels except for the first letter in a name and the correct Chinese pronunciation depends greatly on the vowels. SOUNDEX codes cannot differentiate between LI, LIU, LU, and LUI. Second, PINYIN Q is pronounced like CH, C and X like SH, and ZH like J. For example, PINYIN QIANG (SOUNDEX Q520) is pronounced like CHEE-AHNG (SOUNDEX C520) and might be misspelled CHIANG (SOUNDEX C520). The good news is that all of these consonants with exceptions are in SOUNDEX group 2 so they are all coded as digit 2 except as first letters. So, if we change the first one or two letters of some PINYIN names then SOUNDEX codes for the modified names will be more useful for record linkage.
‘Step 4 – Change letters from the following groups into the corresponding digits:
‘1 = B, F, P, V
‘2 = C, G, J, K, Q, S, X, Z
‘3 = D, T
‘4 = L
‘5 = M, N
‘6 = R
Public Function ModPinYin(varPinYin As Variant) As Variant
ModPinYin = Null
If Not IsNull(varPinYin) Then
ModPinYin = CStr(varPinYin)
If Left(PinYin2, 2) = “CH” Then ‘ No change
ModPinYin = ModPinYin
ElseIf Left(PinYin2, 1) = “Q” Then ‘ Replace Q with CH
ModPinYin = “CH” & Mid(ModPinYin, 2)
ElseIf Left(PinYin2, 1) = “C” Then ‘Replace C with SH
ModPinYin = “SH” & Mid(ModPinYin, 2)
ElseIf Left(ModPinYin, 2) = “HS” Then Replace HS with SH
ModPinYin = “SH” & Mid(ModPinYin, 3)
ElseIf Left(PinYin2, 1) = “X” Then ‘Replace X with SH
ModPinYin = “SH” & Mid(ModPinYin, 2)
ElseIf Left(ModPinYin, 2) = “ZH” Then Replace ZH with J
ModPinYin = “J” & Mid(ModPinYin, 3)
End If
End If
End Function
The Voice of America English News web site has a guide for pronouncing Chinese names.
http://pronounce.voanews.com/browse-oneregion.php?region=China
The Pinyin and Pronounced columns in the following table were copied from this web site. Soundex is the SOUNDEX code for Pinyin. PinyonMod is the same as Pinyon with recommended changes to initial letters and Soundex2 is the corresponding SOUNDEX code. Pronounced is Voice of America’s guide for English speakers and Soundex3 the corresponding SOUNDEX code.
PinYinNames | |||||
Pinyin | Soundex | PinyinMod | Soundex2 | Pronounced | Soundex3 |
BEIJING | B252 | BEIJING | B252 | BAY-JEENG | B252 |
BOXIONG | B252 | BOXIONG | B252 | BOH SHAWNG | B252 |
CHEN | C500 | CHEN | C500 | CHUHN | C500 |
CHEN KAIGE | C520 | CHEN KAIGE | C520 | CHUHN K-EYE GUH | C520 |
CUI | C000 | SHUI | S000 | SWAY | S000 |
CUI TIANKAI | C352 | SHUI TIANKAI | S352 | SWAY TEE-EHN K-EYE | S352 |
DING | D520 | DING | D520 | DEENG | D520 |
DING GUIZIONG | D525 | DING GUIZIONG | D525 | DEENG GWAY JAWNG | D525 |
DU | D000 | DU | D000 | DOO | D000 |
DU QINGLIN | D252 | DU QINGLIN | D252 | DOO CHEENG LEEN | D252 |
ENLAI | E540 | ENLAI | E540 | EHN L-EYE | E540 |
FU | F000 | FU | F000 | FOO | F000 |
FU QUANYOU | F250 | FU QUANYOU | F250 | FOO CHWAHNYOH | F250 |
GAO | G000 | GAO | G000 | GAH-OH | G000 |
GAO QIANG | G252 | GAO QIANG | G252 | GAH-OH CHEE-AHNG | G252 |
GAO ZHAN | G250 | GAO ZHAN | G250 | GAH-OH JAHN | G250 |
GUANGLIE | G524 | GUANGLIE | G524 | GWAHNG LAY | G524 |
GUIZIONG | G252 | GUIZIONG | G252 | GWAY JAWNG | G252 |
GUO | G000 | GUO | G000 | GWOH | G000 |
GUO BOXIONG | G125 | GUO BOXIONG | G125 | GWOH BOH SHAWNG | G125 |
HE | H000 | HE | H000 | HUH | H000 |
HE XINTONG | H253 | HE XINTONG | H253 | HUH SHEEN TAWNG | H253 |
HSING | H252 | SHING | S520 | SHEENG | S520 |
HSING HSING | H252 | SHING HSING | S525 | SHEENG SHEENG | S525 |
HU | H000 | HU | H000 | HOO | H000 |
HU JIA | H200 | HU JIA | H200 | HOO JYAH | H200 |
HUANG | H520 | HUANG | H520 | HWAHNG | H520 |
HUANG JU | H520 | HUANG JU | H520 | HWAHNG JOO | H520 |
JIANG | J520 | JIANG | J520 | jee-YAHNG | J520 |
JIANGXI | J520 | JIANGXI | J520 | jee-YAHNG SHEE | J520 |
JIANMIN | J550 | JIANMIN | J550 | JAHN MEEN | J550 |
LI | L000 | LI | L000 | LEE | L000 |
LI XI | L200 | LI XI | L200 | LEE SHEE | L200 |
LIANG | L520 | LIANG | L520 | LYAHNG | L520 |
LIANG GUANGLIE | L525 | LIANG GUANGLIE | L525 | LYAHNG GWAHNG LAY | L525 |
LIN | L500 | LIN | L500 | LEEN | L500 |
LUI | L000 | LUI | L000 | L-YOO | L000 |
LUI XIA | L200 | LUI XIA | L200 | L-YOO SH-YAH | L200 |
MA | M000 | MA | M000 | MAH | M000 |
MA ZHAOXU | M200 | MA ZHAOXU | M200 | MAH JAH-OH SHOO | M200 |
MEI XIANG | M252 | MEI XIANG | M252 | MAY SH-YAHNG | M252 |
QIAN | Q500 | CHIAN | C500 | CHEHN | C500 |
QIAN QICHEN | Q525 | CHIAN QICHEN | C525 | CHEHN CHEE-CHUHN | C525 |
QIANG | Q520 | CHIANG | C520 | CHEE-AHNG | C520 |
QINGDAO | Q523 | CHINGDAO | C523 | CHEENG DOW | C523 |
QINGLIN | Q524 | CHINGLIN | C524 | CHEENG LEEN | C524 |
QUANYOU | Q500 | CHUANYOU | C500 | CHWAHNYOH | C500 |
SILIN | S450 | SILIN | S450 | SUH-LEEN | S450 |
SUN | S500 | SUN | S500 | SOON | S500 |
SUN ZHENYU | S525 | SUN ZHENYU | S525 | SOON JUHN YOO | S525 |
TIANKAI | T520 | TIANKAI | T520 | TEE-EHN K-EYE | T520 |
WANG | W520 | WANG | W520 | WAHNG | W520 |
WANG DAN | W523 | WANG DAN | W523 | WAHNG DAHN | W523 |
WU | W000 | WU | W000 | WOO | W000 |
WU JIANMIN | W255 | WU JIANMIN | W255 | WOO JAHN MEEN | W255 |
XIANG | X520 | SHIANG | S520 | SH-YAHNG | S520 |
XIAOHUA | X000 | SHIAOHUA | S000 | SHAH-OH HWAH | S000 |
XIAOLEI | X400 | SHIAOLEI | S400 | SHYAU LEY | S400 |
XINTONG | X535 | SHINTONG | S535 | SHEEN TAWNG | S535 |
XU | X000 | SHU | S000 | SHOO | S000 |
XU CAIHOU | X200 | SHU CAIHOU | S200 | SHOO TS-EYE HOH | S320 |
YAN | Y500 | YAN | Y500 | YAHN | Y500 |
YANG | Y520 | YANG | Y520 | YAHNG | Y520 |
YANG JIANLI | Y525 | YANG JIANLI | Y525 | YAHNG JAHN LEE | Y525 |
ZHAN | Z500 | JAN | J500 | JAHN | J500 |
ZHANG | Z520 | JANG | J520 | JAHNG | J520 |
ZHANG HONGBAO | Z525 | JANG HONGBAO | J525 | JAHNG HAWNG-BOW | J525 |
ZHANG QIYUE | Z520 | JANG QIYUE | J520 | JAHNG CHEE YOO-WEH | J520 |
ZHAO | Z000 | JAO | J000 | JAH-OH | J000 |
ZHAO YAN | Z500 | JAO YAN | J500 | JAH-OH YAHN | J500 |
ZHAOXU | Z200 | JAOXU | J200 | JAH-OH SHOO | J200 |
ZHENG | Z520 | JENG | J520 | JUHNG | J520 |
ZHENG SILIN | Z524 | JENG SILIN | J524 | JUHNG SUH-LEEN | J524 |
ZHOU | Z000 | JOU | J000 | JOH | J000 |
ZHOU ENLAI | Z540 | JOU ENLAI | J540 | JOH EHN L-EYE | J540 |
ZHU | Z000 | JU | J000 | JOO | J000 |
ZHU XIAOHUA | Z200 | JU XIAOHUA | J200 | JOO SHAH-OH HWAH | J200 |
You can estimate sensitivity and specificity for a linkage model by calculating expected values for the number of True Positive Links, False Positive Links, True Negative Links, and False Negative Links for all merged pairs above the cutoff probability. The Merged Pairs table for each imputation (MP1, MP2, etc.) lists these pairs, each Match Probability (the probability that a pair is a true link), and each linkage status:
LP = Linked Pairs, candidate pairs imputed as true one to one links.
IP = Imputed Pairs, candidate pairs imputed as true many to many links that share a common record with a linked pair.
MP = Merged Pairs, candidate pairs imputed as false links.
Match Probability = probability that a pair is a true link, so (1 – Match Probability) = probability that a pair is a false link.
For a many to many linkage:
Expected True Positives = Sum of Match Probability over pairs with status = LP, IP |
Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP, IP |
Expected False Negatives = Sum of Match Probability over pairs with status = MP |
Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP |
Sensitivity = True Positives / (True Positives + False Negatives) |
Specificity = True Negatives / (False Positives + True Negatives) |
For a one to one linkage, only pairs with status = LP count as pairs imputed as true:
Expected True Positives = Sum of Match Probability over pairs with status = LP |
Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP |
Expected False Negatives = Sum of Match Probability over pairs with status = MP |
Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP |
Sensitivity = True Positives / (True Positives + False Negatives) |
Specificity = True Negatives / (False Positives + True Negatives) |
In general, you will get approximately the same Sensitivity and Specificity for each imputation. You can obtain multiple imputed estimates by combining Sensitivity and Specificity over all imputations:
Values for one imputed linkage, which is one draw from the posterior distribution for true matches, don’t really capture the spirit of multiple imputation. We analyze multiple imputations because one draw is seldom the whole story. As an example, suppose each of multiple imputations has Sensitivity = 0.50 and Specificity = 0.99.
Each imputation does a wonderful job of identifying true non-matches. However, each imputation does a relatively poor job of distinguishing true matches from false matches. This happens when many true matches don’t have very high probabilities – often the case for large files unless there are many match fields with high weights.
With multiple imputations, a true match found in any imputation contributes to analysis results when combined using SAS PROC MIANALYZE (c) SAS Corporation or IVEware (c) University of Michigan . Effectively, each imputation is an independent diagnosis and we can combine sensitivity and specificity in parallel. Sensitivity increases while specificity decreases, as described in
Combining screening tests in series or parallel, http://www.epidemiolog.net, V. Schoenbach, 9/21/2005
Following this approach,
Multiply Imputed Sensitivity and Specifity | ||
Imputations Combined | Parallel Sensitivity | Parallel Specificity |
1 | 50.00% | 99.00% |
1,2 | 75.00% | 98.01% |
1,2,3 | 87.50% | 97.03% |
You have to build custom queries to sum match probabilities. Open LinkSolv and open the Manage Project dialog for the project of interest. Here are examples of queries for Crash Hospital match, Imputation 1:
qryManyToManyTruePos1
SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives
FROM CrashHospital__MP1
WHERE (((CrashHospital__MP1.KeepStatus)<>’MP’));
qryManyToManyTruePos1 | ||
CountOfMatchProbability | TruePositives | FalsePositives |
164,450 | 147,152 | 17,298 |
qryOneToOneTruePos1
SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives
FROM CrashHospital__MP1
WHERE (((CrashHospital__MP1.KeepStatus)=’LP’));
qryOneToOneTruePos1 | ||
CountOfMatchProbability | TruePositives | FalsePositives |
164,450 | 147,152 | 17,298 |
qryFalseNegatives1
SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS FalseNegatives, Sum(1-[MatchProbability]) AS TrueNegatives
FROM CrashHospital__MP1
WHERE (((CrashHospital__MP1.KeepStatus)=’MP’));
qryFalseNegatives1 | ||
CountOfMatchProbability | FalseNegatives | TrueNegatives |
1,417,356 | 146,509 | 1,270,847 |
You can use Excel © Microsoft Corporation to do the math:
For MP1 | |||||
One to One | |||||
True Positives = Sum (Match Probability) over status = LP | 147,152 | ||||
False Positives = Sum (1 – Match Probability) over Status = LP | 17,298 | ||||
False Negatives = Sum (Match Probability) over Status = MP | 146,509 | ||||
True Negatives = Sum (1 – Match Probability) over Status = MP | 1,270,847 | ||||
Sensitivity = True Positives / (True Positives + False Negatives) | 50.11% | ||||
Specificity = True Negatives / (False Positives + True Negatives) | 98.66% |
We want to compare types of collisions in order to increase the probability of true links with Name or Birth Date disagreements and also decrease the probability of false links for the same person in two different crashes. Even if collision type is not available on a crash record we should be able to derive collision type from Crash Events for comparison with Standard Collision derived from trauma E Codes.
Table 1 Std. Collide vs. Crash Events (Most Harmful Event, First Harmful Event, or Sequential Event 1) |
||||||||||||||
Std. Collide |
Pairs |
Pedestrian |
Pedal Cycle |
Non Collision |
Ran Off Road Left |
Ran Off Road Right |
Overturn Rollover |
Any Non Events |
Any Objects |
Only Non Events |
Only Objects |
Most Vehicles |
First Vehicles |
Seq. Vehicle |
31 |
1 |
5 |
0 |
2 |
3 |
6 |
9 |
3 |
9 |
3 |
14 |
21 |
2 |
|
N/A |
23 |
1 |
1 |
0 |
0 |
3 |
0 |
3 |
6 |
2 |
5 |
14 |
19 |
4 |
NO |
289 |
1 |
1 |
27 |
53 |
103 |
103 |
194 |
79 |
135 |
20 |
65 |
118 |
11 |
OBJ |
96 |
1 |
0 |
3 |
20 |
33 |
10 |
60 |
57 |
22 |
19 |
13 |
36 |
7 |
OTH |
69 |
0 |
24 |
1 |
1 |
7 |
2 |
10 |
5 |
7 |
2 |
28 |
47 |
7 |
PED |
69 |
46 |
8 |
0 |
20 |
3 |
0 |
23 |
1 |
23 |
1 |
10 |
40 |
1 |
VEH |
814 |
5 |
5 |
9 |
25 |
52 |
59 |
139 |
68 |
101 |
30 |
544 |
704 |
135 |
I analyzed high probability linked pairs from a preliminary Crash to Trauma linkage as shown in Table 1. Std. Collide lists the collision types derived from E Codes and Pairs shows the number of pairs with that value. The other columns show the number of pairs for particular events or groups of events. Events were counted if found in Most Harmful Event, First Harmful Event, or Sequential Event 1. A single pair might be counted in multiple columns depending on reported events. Ideally, one group of crash events should show high counts for one and only one Std. Collide value. Clearly, this doesn’t happen very often. For example, 84% of Pedestrian go to PED (46 / 55) with 16% errors, 79% of Most Harmful Vehicle go to VEH (544 / 688) with 21% errors, and 45% of Only No Collision Events go to NO (135 / 299) with 55% errors.
I found a number of reasons for disagreements and tried to correct them. First, based on details about disagreements observed, I collapsed collision types produced by Std. Collide to PED, VEH (old OTH and VEH), and ONE (old NO and OBJ) and reassigned some E Codes to PED based on their fourth digit. Second, I used number of vehicles to predict collision type except for Pedestrian and Pedal Cycle events.
Table 2 Std. Collide vs. Crash Events for Num. Vehicles = 1 (Most Harmful Event, First Harmful Event, or Sequential Event 1) |
||||||||||||||
New Collide |
Pairs |
Pedestrian |
Pedal Cycle |
Non Collision |
Ran Off Road Left |
Ran Off Road Right |
Overturn Rollover |
Any Non Events |
Any Objects |
Only Non Events |
Only Objects |
Most Vehicles |
First Vehicles |
Seq. Vehicle |
11 |
0 |
0 |
0 |
1 |
3 |
5 |
7 |
1 |
7 |
1 |
1 |
4 |
0 |
|
N/A |
8 |
1 |
1 |
0 |
0 |
2 |
0 |
2 |
4 |
2 |
4 |
0 |
4 |
0 |
ONE |
282 |
2 |
1 |
24 |
68 |
117 |
106 |
221 |
117 |
134 |
30 |
24 |
76 |
2 |
PED |
96 |
49 |
39 |
0 |
22 |
3 |
0 |
25 |
0 |
25 |
0 |
7 |
48 |
0 |
VEH |
60 |
2 |
1 |
6 |
13 |
18 |
10 |
41 |
26 |
22 |
7 |
10 |
27 |
1 |
Table 2 shows that comparisons are better for one vehicle cases. 92% of Pedestrian and Pedal Cycle go to PED (88 / 96) with 8% errors and 78% of other one vehicle cases go to ONE (282 / 361) with 22% errors.
Table 3 Std. Collide vs. Crash Events for Num. Vehicles > 1 (Most Harmful Event, First Harmful Event, or Sequential Event 1) |
||||||||||||||
New Collide |
Pairs |
Pedestrian |
Pedal Cycle |
Non Collision |
Ran Off Road Left |
Ran Off Road Right |
Overturn Rollover |
Any Non Events |
Any Objects |
Only Non Events |
Only Objects |
Most Vehicles |
First Vehicles |
Seq. Vehicle |
12 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
2 |
1 |
2 |
10 |
11 |
2 |
|
N/A |
15 |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
2 |
0 |
1 |
14 |
15 |
4 |
ONE |
103 |
0 |
0 |
6 |
5 |
19 |
7 |
33 |
19 |
23 |
9 |
54 |
78 |
16 |
PED |
15 |
1 |
2 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
9 |
12 |
1 |
VEH |
788 |
0 |
0 |
4 |
12 |
41 |
51 |
107 |
47 |
85 |
25 |
559 |
709 |
140 |
Table 3 shows that comparisons are also better for multiple vehicle cases. 84% of multiple vehicle cases go to VEH (788 / 933) with 16% errors.
Comparison results with these reclassifications were the basis for a better program to derive type of collision from number of vehicles and reported crash events. The largest number of disagreements is 103 multiple vehicle cases classified ONE in trauma. It might be that these cases involve a vehicle that is not a motor vehicle. If so, these disagreements could be corrected based on vehicle type. I added two custom functions to the User Library database.
Public Function NewCollide(varStdCollide As Variant, varECode As Variant) As Variant
NewCollide = Null
Select Case Nz(varECode, “”)
Case
“812.6”, “812.7”, “813.6”, “826.0”, “826.1″
NewCollide = “PED”
Case Else
Select
Case Nz(varStdCollide, “”)
Case
“NO”, “OBJ”
NewCollide = “ONE”
Case
“OTH”, “VEH”
NewCollide = “VEH”
Case
Else
NewCollide = varStdCollide
End Select
End Select
End Function
Public Function CrashCollide(varNumVehicles As Variant, varEvents As Variant) As Variant
CrashCollide = Null
If IsNumeric(varNumVehicles) Then
Select Case Val(varNumVehicles)
Case 1
CrashCollide = “ONE”
Case Else
CrashCollide = “VEH”
End Select
End If
If InStr(1, Nz(varEvents, “”),
“PEDAL-CYCLE”) Or InStr(1, Nz(varEvents, “”),
“PEDESTRIAN”) Then
CrashCollide = “PED”
End If
End Function
LinkSolv Statistical Models for Merging Passes and Creating Linked Datasets
Limited Bayesian Model
Set Cutoff Probability = 0.90 or some other high value in the Specify Match dialog. Set Linked Data Sets = 1 on the Merge Passes tab. LinkSolv uses your estimates of total matches, error probabilities, and frequencies of data values in all probability calculations. All candidate pairs over the specified cutoff probability are accepted as linked pairs and assigned keep status = LP.
Take All Pairs
Use this method if you expect more true links to be many-to-many rather than one-to-one. Set Cutoff Probability = 0.01 or some other low value so that almost all true links are over the cutoff. Set Pairs to Analyze = Take All Pairs on the Merge Passes tab. Set Linked Datasets > 1 – 3 to 5 imputations are usually enough to capture uncertainty about linked pairs when you analyze linked datasets. LinkSolv starts with your prior estimates of total matches, error probabilities, and frequencies of data values to calculate posterior estimates that take into account observed values using Markov Chain Monte Carlo iterative methods. Set Number of Iterations great enough so that MCMC posterior estimates have converged to stable values – 5 to 10 iterations are usually enough and you can always add new iterations if necessary after reviewing the Bayesian Model Check report. LinkSolv actually runs twice as many iterations as you specify then ignores the first half as “burn in,” likely to be too dependent on prior estimates. Specify Standard Deviation and Standard Error to quantify uncertainty about your Total Matches estimate. These parameters can be calculated from the sample of values you used to estimate Total Matches, where Standard Error = Standard Deviation / Square Root of the Number of Values. Use Standard Deviation = 10% of Total Matches if it’s only based on one value.
Take 1-1 Pairs, Take Max Pairs (new label for Take 1-1 Pairs), Draw 1-1 Pairs, or Take LSAP Pairs
Use one of these methods if you expect most true links to be one-to-one rather than many-to-many. Set Cutoff Probability = 0.01 or some other low value so that almost all true links are over the cutoff. Set Pairs to Analyze to one of these values on the Merge Passes tab. Set other merge parameters following the guidelines for Take All Pairs. All of these methods group many-to-many pairs into sets based on having a common record – if two pairs have the same record from table A or from table B then they are assigned to the same set. The methods differ in how one-to-one pairs are selected from each set but one-to-one pairs are always assigned Keep Status = LP and others get Keep Status = IP.
Take Max Pairs includes the pair from each set with maximum probability in the one-to-one linkage and excludes competing pairs (a common record) with lower probabilities. The process repeats until all pairs in all sets have Keep Status = LP or IP.
Draw 1-1 Pairs extends the original Bayesian model so that one-to-one pairs are drawn as part of the overall probability model. This method is preferred by theorists, particularly if you plan to analyze multiple imputations using IVEWARE or SAS PROC MIANALYZE. LinkSolv calculates the probability of each one-to-one permutation of records in each set. For example, if a set includes records A1 and A2 from table A and records B1 and B2 from table B then the one-to-one permutations are (A1, B1; A2, B2) and (A1, B2; A2, B1), either of which might be drawn in each iteration.
Take LSAP Pairs treats selecting linked pairs as a Linear Sum Assignment Problem (LSAP). Given the probabilities of each one-to-one permutation calculated for Draw 1-1 Pairs, Take LSAP Pairs takes the one-to-one permutation which maximizes the sum of match weights, which is the same as the permutation with maximum probability.
Best Pairs
LinkSolv ranks pairs in sets by probability from greatest to least and then selects as many pairs from the top of the list as possible given your specified False Positive Rate. Remember that Match Probability = 0.9 means that 9 out of 10 such links are true and 1 out of 10 is false. So, each 0.9 link in the list contributes 0.9 links toward Expected True Positives and 0.1 links toward Expected False Positives, and similarly for other probabilities. For a given sample of pairs from the top, LinkSolv estimates False Positive Rate = Expected False Positives / (Expected False Positives + Expected True Positives).
Maximum Likelihood Linkage
LinkSolv calculates the likelihood of the Keep Status of all merged pairs for each iteration after burn-in. Each iteration produces an unbiased draw from a stable posterior distribution. Each merged pair can contributes two possible values to the likelihood. If Keep Status = LP then the contribution is the Match Probability for the pair. Otherwise, the contribution is (1 – Match Probability). LinkSolv compares the likelihoods for all iterations and saves the one draw with the maximum as a Maximum Likelihood Estimate. The MLE may not be one of the final imputations of linked pairs.
SQL Server, SQL Server Management Studio, Windows Explorer, Office (c) Microsoft Corporation
Changes and Bug Fixes in LinkSolv 9.0.0245
Added new 1-1 algorithm Draw 1-1 Pairs. This allows drawing 1-1 pairs from all competing imputed pairs (with a common record) based on their likelihoods as part of the full Bayesian model. The old algorithm is still available: Take 1-1 Pairs always selects the one competing pair with maximum likelihood for 1-1 linkages.
Added Max Locks Per File estimate to Execute SQL method. This fixes intermittent error caused by low Max Locks Per File.
Raised error If Prior Error Prob <= 0 Or Prior Error Prob >= 1 in Initialize Match Probabilities. Prior error probabilities are drawn from distributions and might be outside (0, 1) in extreme cases.
Kept merged pairs with Match Probability < 0.90 in Limited Model. This allows confirmation that complete agreements might be under 0.90.
Corrected for Views with invalid IN DATABASE phrase that look like Procedures. This fixes intermittent error caused by moving LinkSolv to a new folder.
Corrected for Table Name AS Alias Name in SQL for Update Join. This allows LinkSolv to use join queries with aliases for better program clarity.
Corrected call to Excel.WorksheetFunction.NormInv(Random X, Mean, Std. Dev) in Draw Total Matches. This fixes incorrect draws that were always centered on the mean and within two standard deviations but not normally distributed.
Corrected call to Execute SQL (…As Is=True) in Create View On Server. This fixes intermittent error when creating a view on SQL Server.
Corrected typo in Count Rows (…MergedPairs0) in Imputation Property. This fixes incorrect estimates of Max Locks Per File.
Raised error if two consecutive draws are out of range in Draw Total Matches. In extreme cases, thousands of draws might be made and increase run times.
Corrected arguments passed to Copy Folder, Copy File, and Delete File in Copy Project. This allows copying projects without file warnings.
Corrected New Project Description in Copy Project. This allows opening the new project without warnings.
Added test for Excel subtypes in Restore Access Link. This restores links to .xlsx files.
Added test for ODBC link in JET project in Restore Access Link. This restores links to SQL Server tables.
Increased length of Link Name field to 50 in Source Links and Match Links tables. This allows longer names for data tables.
Called Requery Form in all Prepare Data Actions. This keeps the Link Tables tab current when working on Select Input and Standardize.
Called Update Release in Welcome Open Method. This updates information on the About tab.
Improved management of Connection Pool for Access databases. LinkSolv connects to many different Access databases because each one is limited to 2 GB size. This improvement reduces the time to connect and fixes intermittent errors caused by connection conflicts.
Microsoft ended support for Access 2003 on 4/8/2014 (Access (C) Microsoft Corporation). LinkSolv 9.0 uses the Access 2007-2013 database format (.accdb files). Old LinkSolv 8.3 projects that use the Access 2000-2003 database format (.mdb files) can be imported without conversion.
LinkSolv 9.0 is compatible with both 32-bit and 64-bit Microsoft Access. LinkSolv is still a 32-bit application so integer values are limited to -2,147,483,648 through +2,147,483,647. Record numbers and record counts cannot exceed 2,147,483,647.
LinkSolv 9.0 is compatible with Microsoft Windows (Windows (C) Microsoft Corporation) accessibility tools including on-screen keyboard, touch screen, magnifier, high-contrast display, keyboard shortcuts, narrator, and voice recognition.
The Match Pass object class in LinkSolv 9.0 has been redesigned to better encapsulate pass-specific properties and methods, an important object oriented design principle.
LinkSolv 9.0 includes algorithms for managing the pool of connections to different Access databases. The new algorithms improve performance and reduce the potential for database access conflicts, especially over networks.
LinkSolv 9.0 includes algorithms for estimating and setting “Max Locks Per File” before running each query. This reduces the potential for two common errors: “File Sharing Lock Count Exceeded. Increase MaxLocksPerFile Registry Entry” and “There is not enough memory or disk space to complete the operation.”
LinkSolv 9.0 fixes an intermittant bug. If a view (a SELECT query) refers to a non-existant table because a file was moved or renamed then Access classifies the view as a procedure. LinkSolv now checks both views and procedures to find a SELECT query.
LinkSolv 9.0 includes a new algorithm for merging passes for dual matches. As always, imputed pairs are randomly drawn from merged pairs by comparing estimated match probabilities to random numbers. So, 90% of 0.90 probability pairs, 50% of 0.50 probablity pairs, 10% of 0.10 probability pairs, etc. are selected as imputed pairs. Imputed pairs are not necessarily one-to-one. LinkSolv groups imputed pairs into sets by collecting pairs with the same UniqueID or same UniqueID_B. If Pairs to analyze for error rates and set members is set to “Take All Pairs” or “Take 1-1 Pairs” then the pair in a group with the greatest probability is given status LP (Linked Pair) and other pairs in the same group are given status IP (Imputed Pair).
If Pairs to analyze for error rates and set members is set to “Draw 1-1 Pairs” then the new algorithm is used. First, all possible one-to-one permutations are identified for each set. Second, the posterior probability that each permutation is true is calculated. Third, one permutation is drawn at random given the cumulative probability distribution for all permutations. In this way, one-to-one pairs are drawn as part of the Bayesian model, the preferred approach. Every pair in each drawn permutation is given status LP and other pairs are given status IP. Sometimes the new results can be quite different.
For example, suppose the record pairs in a set consist of 2 A records and 2 B records like this:
Pair Match Probability Pair Status
(A1, B1) 0.90 LP
(A1, B2) 0.80 IP
(A2, B1) 0.50 IP
(A2, B2) 0.10 LP
Baring large swings in probabilities, the above status is applied in all Markov Chain iterations.
There are 2 one-to-one permutations:
Permutation Permutation Likelihood Permutation Probability
(A1, B1) (A2, B2) 0.90 x 0.10 = 0.09 0.18
(A1, B2) (A2, B1) 0.80 x 0.50 = 0.40 0.82
The old status is applied in about 18% of Markov Chain iterations:
Pair Match Probability Pair Status
(A1, B1) 0.90 LP
(A1, B2) 0.80 IP
(A2, B1) 0.50 IP
(A2, B2) 0.10 LP
This new status is applied in about 82% of Markov Chain iterations:
Pair Match Probability Pair Status
(A1, B1) 0.90 IP
(A1, B2) 0.80 LP
(A2, B1) 0.50 LP
(A2, B2) 0.10 IP
Following Statistical Analysis with Missing Data, Little and Rubin, Wiley 2002,
to estimate combined P value for multiply imputed logistic regression given parameter estimates and covariance matrix for each imputation.
Equation 10.17 gives an estimated Wald statistic W for k>1 parameter components that can be compared to an F random variable to get a P value as Probability[F > W].
We need to calculate trace(B inverse(V bar)) for 10.17, where B is the Between Imputation Variance and V bar is the average of covariance matrices for all imputations.
Equation (10.13) is for a scalar parameter, D draws (imputations), where
( 1 / (D-1)) Sum d=1 to D (theta^d – theta^)**2 is identified as B
THIS EQUATION HAS A TYPO! The second theta^ should be theta bar, the average over all imputations. So,
B = ( 1 / (D-1)) Sum d=1 to D (theta^d – theta_)**2
The last sentence in the paragraph with equations 10.13 and 10.14 is
“For vector theta, the variance Vd is replaced by a covariance matrix, and (theta^d – theta bar)**2 is replaced by (theta^d – theta bar) transpose(theta^d – theta bar).”
(theta^d – theta bar) transpose(theta^d – theta bar) = ||(theta^d – theta bar)||, a scalar.
B = ( 1 / (D-1)) Sum d=1 to D (theta^d – theta bar) transpose(theta^d – theta bar)
Suppose B = 3 and
inverse(V bar) = |0.1 0.3|
|0.3 0.2|
then B inverse(V bar) = 3 |0.1 0.3| = |0.3 0.9|
|0.3 0.2| |0.9 0.6|
trace B inverse(V bar) = 0.3 + 0.6