LinkSolv 9.0.0404 Now Available

Summary of Recent Changes

Build     Build Date  Comment

9.0.0247  28-Sep-2015 Summary Fixed intermittent error — Size of Random Sample not recalculated
9.0.0250 28-Sep-2015 Summary Included pairs with Keep Status IP in Maximum Likelihood Estimate for Set Method Take All Pairs
9.0.0253 23-Oct-2015 Summary Fixed intermittent error — Data Type Mismatch in Initialize, Update Match Probabilities
9.0.0257 23-Oct-2015 Summary Added LSAP 1-1 Pairs option for Set Method in Merge Passes
9.0.0262 24-Nov-2015 Summary Created fake data with proportions and counts closer to specs
9.0.0270 24-Nov-2015 Summary Selected Best Pairs from Maximum Likelihood Estimate using record scores
9.0.0280 24-Nov-2015 Summary Fixed intermittent error when Outcome Tests in Review Match conflict with Perform Match
9.0.0287 24-Nov-2015 Summary Accepted with a warning any input fields with 100% missing
9.0.0289 24-Nov-2015 Summary Fixed intermittent error Truncated tab control with larger form size 5.5″ X 7.25″
9.0.0290 24-Nov-2015 Summary Tested Function.PINYIN2, a better SOUNDEX for PINYIN names
9.0.0315 09-Dec-2015 Summary Improved management of objects opened by LinkSolv
9.0.0334 09-Dec-2015 Summary Fixed intermittent error — Access 2013 on Windows 8.1 crashes on close
9.0.0336 09-Dec-2015 Summary Added cases for Take All SubSets to Match.GroupPairsIntoSets
9.0.0351 10-Dec-2015 Summary Fixed intermittent error — Access running after close on Windows 8.1 and 10.
9.0.0374 10-Dec-2015 Summary Declared Global form UPPER and LEFT locations to be 0.05″ (72 FIPS)
9.0.0375 10-Dec-2015 Summary Fixed case logic for SelfMatch in SpecifyMatch.TotalMatchesAfter
9.0.0384 22-Dec-2015 Summary Added text to speech and contrasting colors for accessibility
9.0.0390 22-Dec-2015 Summary Fixed errors in new MLE,MLB and Take All SubSets algorithms
9.0.0399 22-Dec-2015 Summary Fixed errors in new Quit LinkSolv algorithms

Update Instructions

1. Software Compatibility
2. Installing LINKSOLV 9.0
3. Updating LINKSOLV 9.0

______________________________________________________________

1. Software Compatibility

Microsoft, Microsoft Windows, Microsoft Office, and Microsoft SQL Server are trademarks or registered trademarks of Microsoft Corporation. LINKSOLV 9.0 is compatible with the latest releases of Windows Vista, 7, 8, 8.1, and 10, Windows Server 2008, 2008 R2, 2012, and 10; Office 2007, 2010, 2013, 2016, SQL Server 2008, 2008 R2, 2012, and 2014. If you are running earlier releases of these products then update the software on your computer before installing LINKSOLV. LINKSOLV may not operate correctly if you use earlier releases.

________________________________________________________________

2. Installing LINKSOLV 9.0

2.1 Backup and document all custom work if you wish to reference it in new projects.

2.1.1 Create backup copies of all files in the LINKSOLV installation folder (default folder is C:\Program Files\Strategic Matching\LinkSolv 9.0 for 32-bit Windows or C:\Program Files (x86)\Strategic Matching\LinkSolv 9.0 for 64-bit Windows).

2.1.2 Create reports for all old LINKSOLV projects including a) simulation specifications; b) data specifications; c) match specifications; d) Bayesian model checks.

2.2 Download and run the setup program for a new installation

2.2.1 Obtain the serial number and password for LINKSOLV 9.0 new installation from Strategic Matching.

2.2.2 Go to http://www.strategicmatching.com. Click on the Downloads tab.

2.2.3 Search Downloads for your serial number. If more than one link is listed then click on the one that includes your system configuration (for example, Windows 7 and Access 2010).

2.2.4 Click on the link for Setup 9.0 for New Installation. Respond to the download and update dialogs. Enter the password when prompted.

2.3 Set options for Microsoft Access

2.3.1 If necessary, create a folder for new Access databases created during record linkage projects, (for example C:\LinkSolv Projects). For better performance, the LinkSolv Projects folder should be local to the computer where LinkSolv is installed, not on a remote drive.

2.3.2 Enable all macros in the Access Trust Center Settings. Or, set trusted locations for LinkSolv Programs, LinkSolv Projects, and LinkSolv Data.

2.3.3  Go to Access Options, General tab and set the default location for new databases to the LinkSolv Projects folder.

2.3.4 Set Default Database format to Access 2007/2016.

2.3.5 Open LinkSolv and set LinkSolv options for your projects. Set Data Provider to JET for Access databases or SQL for SQL Server before you try to create new projects.

2.3.6 Click on the LinkSolv Help button to open the User Guide. Read the Getting Started section.

2.4 Import Old Projects

2.4.1 You can copy one or more projects from any archived STMTData.mdb (for LinkSolv 8.3) or STMTData.accdb (for LinkSolv 9.0) database.

2.4.2 Open LinkSolv, select the Projects tab, Click on Import Project and respond to the dialog.

________________________________________________________________

3. Updating LINKSOLV 9.0

3.1 Backup LINKSOLV Program Files (default folder is C:\Program Files\Strategic Matching\LinkSolv 9.0 for 32-bit Windows or C:\Program Files (x86)\Strategic Matching\LinkSolv 9.0 for 64-bit Windows).

3.2 Download and run the update program

3.2.1 Obtain the update version number and password for LINKSOLV 9.0 from Strategic Matching.

3.2.2 Go to http://www.strategicmatching.com. Click on the Downloads tab.

3.2.3 Search Downloads for the update version number. From the list of links, click on that includes your system configuration (for example, Windows 7 and Access 2010).

3.1.4 Click on the link for LINKSOLV Update. Respond to the download and update dialogs. Enter the password when prompted.
______________________________________________________________

The setup program LinkSolv9.0.0404Update.exe will update earlier releases of LinkSolv 9.0 to 9.0.0404 by replacing the LinkSolv database file (LinkSolv.accdb) and the Strategic Matching Library database file (STMTLibrary.accde) in the LinkSolv installation folder.

The old LinkSolv database file and Strategic Matching Library database file must already exist in the installation folder for LinkSolv 9.0. If not, setup will stop.

The old database files will be moved to a BACKUP folder and renamed. These files can be restored if you uninstall the update.

The updated database files will be placed in the installation folder for LinkSolv 9.0 with the same permissions as the old files.

 

 

How to Modify PINYIN Names for SOUNDEX Coding and Record Linkage

PINYIN is a standard for approximating the sound of Chinese names using the western alphabet. Most letters are pronounced roughly like they are in English with exceptions. These exceptions can reduce the usefulness of SOUNDEX coding done for linkage purposes. First, SOUNDEX coding suppresses all vowels except for the first letter in a name and the correct Chinese pronunciation depends greatly on the vowels. SOUNDEX codes cannot differentiate between LI, LIU, LU, and LUI. Second, PINYIN Q is pronounced like CH, C and X like SH, and ZH like J. For example, PINYIN QIANG (SOUNDEX Q520) is pronounced like CHEE-AHNG (SOUNDEX C520) and might be misspelled CHIANG (SOUNDEX C520). The good news is that all of these consonants with exceptions are in SOUNDEX group 2 so they are all coded as digit 2 except as first letters. So, if we change the first one or two letters of some PINYIN names then SOUNDEX codes for the modified names will be more useful for record linkage.

‘Step 4 – Change letters from the following groups into the corresponding digits:

‘1 = B, F, P, V

‘2 = C, G, J, K, Q, S, X, Z

‘3 = D, T

‘4 = L

‘5 = M, N

‘6 = R

Public Function ModPinYin(varPinYin As Variant) As Variant

ModPinYin = Null

If Not IsNull(varPinYin) Then

ModPinYin = CStr(varPinYin)

If Left(PinYin2, 2) = “CH” Then ‘ No change

ModPinYin = ModPinYin

ElseIf Left(PinYin2, 1) = “Q” Then ‘ Replace Q with CH

ModPinYin = “CH” & Mid(ModPinYin, 2)

ElseIf Left(PinYin2, 1) = “C” Then ‘Replace C with SH

ModPinYin = “SH” & Mid(ModPinYin, 2)

ElseIf Left(ModPinYin, 2) = “HS” Then Replace HS with SH

ModPinYin = “SH” & Mid(ModPinYin, 3)

ElseIf Left(PinYin2, 1) = “X” Then ‘Replace X with SH

ModPinYin = “SH” & Mid(ModPinYin, 2)

ElseIf Left(ModPinYin, 2) = “ZH” Then Replace ZH with J

ModPinYin = “J” & Mid(ModPinYin, 3)

End If

End If

End Function

The Voice of America English News web site has a guide for pronouncing Chinese names.

http://pronounce.voanews.com/browse-oneregion.php?region=China

The Pinyin and Pronounced columns in the following table were copied from this web site. Soundex is the SOUNDEX code for Pinyin. PinyonMod is the same as Pinyon with recommended changes to initial letters and Soundex2 is the corresponding SOUNDEX code. Pronounced is Voice of America’s guide for English speakers and Soundex3 the corresponding SOUNDEX code.

PinYinNames
Pinyin Soundex PinyinMod Soundex2 Pronounced Soundex3
BEIJING B252 BEIJING B252 BAY-JEENG B252
BOXIONG B252 BOXIONG B252 BOH SHAWNG B252
CHEN C500 CHEN C500 CHUHN C500
CHEN KAIGE C520 CHEN KAIGE C520 CHUHN K-EYE GUH C520
CUI C000 SHUI S000 SWAY S000
CUI TIANKAI C352 SHUI TIANKAI S352 SWAY TEE-EHN K-EYE S352
DING D520 DING D520 DEENG D520
DING GUIZIONG D525 DING GUIZIONG D525 DEENG GWAY JAWNG D525
DU D000 DU D000 DOO D000
DU QINGLIN D252 DU QINGLIN D252 DOO CHEENG LEEN D252
ENLAI E540 ENLAI E540 EHN L-EYE E540
FU F000 FU F000 FOO F000
FU QUANYOU F250 FU QUANYOU F250 FOO CHWAHNYOH F250
GAO G000 GAO G000 GAH-OH G000
GAO QIANG G252 GAO QIANG G252 GAH-OH CHEE-AHNG G252
GAO ZHAN G250 GAO ZHAN G250 GAH-OH JAHN G250
GUANGLIE G524 GUANGLIE G524 GWAHNG LAY G524
GUIZIONG G252 GUIZIONG G252 GWAY JAWNG G252
GUO G000 GUO G000 GWOH G000
GUO BOXIONG G125 GUO BOXIONG G125 GWOH BOH SHAWNG G125
HE H000 HE H000 HUH H000
HE XINTONG H253 HE XINTONG H253 HUH SHEEN TAWNG H253
HSING H252 SHING S520 SHEENG S520
HSING HSING H252 SHING HSING S525 SHEENG SHEENG S525
HU H000 HU H000 HOO H000
HU JIA H200 HU JIA H200 HOO JYAH H200
HUANG H520 HUANG H520 HWAHNG H520
HUANG JU H520 HUANG JU H520 HWAHNG JOO H520
JIANG J520 JIANG J520 jee-YAHNG J520
JIANGXI J520 JIANGXI J520 jee-YAHNG SHEE J520
JIANMIN J550 JIANMIN J550 JAHN MEEN J550
LI L000 LI L000 LEE L000
LI XI L200 LI XI L200 LEE SHEE L200
LIANG L520 LIANG L520 LYAHNG L520
LIANG GUANGLIE L525 LIANG GUANGLIE L525 LYAHNG GWAHNG LAY L525
LIN L500 LIN L500 LEEN L500
LUI L000 LUI L000 L-YOO L000
LUI XIA L200 LUI XIA L200 L-YOO SH-YAH L200
MA M000 MA M000 MAH M000
MA ZHAOXU M200 MA ZHAOXU M200 MAH JAH-OH SHOO M200
MEI XIANG M252 MEI XIANG M252 MAY SH-YAHNG M252
QIAN Q500 CHIAN C500 CHEHN C500
QIAN QICHEN Q525 CHIAN QICHEN C525 CHEHN CHEE-CHUHN C525
QIANG Q520 CHIANG C520 CHEE-AHNG C520
QINGDAO Q523 CHINGDAO C523 CHEENG DOW C523
QINGLIN Q524 CHINGLIN C524 CHEENG LEEN C524
QUANYOU Q500 CHUANYOU C500 CHWAHNYOH C500
SILIN S450 SILIN S450 SUH-LEEN S450
SUN S500 SUN S500 SOON S500
SUN ZHENYU S525 SUN ZHENYU S525 SOON JUHN YOO S525
TIANKAI T520 TIANKAI T520 TEE-EHN K-EYE T520
WANG W520 WANG W520 WAHNG W520
WANG DAN W523 WANG DAN W523 WAHNG DAHN W523
WU W000 WU W000 WOO W000
WU JIANMIN W255 WU JIANMIN W255 WOO JAHN MEEN W255
XIANG X520 SHIANG S520 SH-YAHNG S520
XIAOHUA X000 SHIAOHUA S000 SHAH-OH HWAH S000
XIAOLEI X400 SHIAOLEI S400 SHYAU LEY S400
XINTONG X535 SHINTONG S535 SHEEN TAWNG S535
XU X000 SHU S000 SHOO S000
XU CAIHOU X200 SHU CAIHOU S200 SHOO TS-EYE HOH S320
YAN Y500 YAN Y500 YAHN Y500
YANG Y520 YANG Y520 YAHNG Y520
YANG JIANLI Y525 YANG JIANLI Y525 YAHNG JAHN LEE Y525
ZHAN Z500 JAN J500 JAHN J500
ZHANG Z520 JANG J520 JAHNG J520
ZHANG HONGBAO Z525 JANG HONGBAO J525 JAHNG HAWNG-BOW J525
ZHANG QIYUE Z520 JANG QIYUE J520 JAHNG CHEE YOO-WEH J520
ZHAO Z000 JAO J000 JAH-OH J000
ZHAO YAN Z500 JAO YAN J500 JAH-OH YAHN J500
ZHAOXU Z200 JAOXU J200 JAH-OH SHOO J200
ZHENG Z520 JENG J520 JUHNG J520
ZHENG SILIN Z524 JENG SILIN J524 JUHNG SUH-LEEN J524
ZHOU Z000 JOU J000 JOH J000
ZHOU ENLAI Z540 JOU ENLAI J540 JOH EHN L-EYE J540
ZHU Z000 JU J000 JOO J000
ZHU XIAOHUA Z200 JU XIAOHUA J200 JOO SHAH-OH HWAH J200

How to Estimate Sensitivity and Specificity for a LinkSolv Model

You can estimate sensitivity and specificity for a linkage model by calculating expected values for the number of True Positive Links, False Positive Links, True Negative Links, and False Negative Links for all merged pairs above the cutoff probability. The Merged Pairs table for each imputation (MP1, MP2, etc.) lists these pairs, each Match Probability (the probability that a pair is a true link), and each linkage status:

LP = Linked Pairs, candidate pairs imputed as true one to one links.

IP = Imputed Pairs, candidate pairs imputed as true many to many links that share a common record with a linked pair.

MP = Merged Pairs, candidate pairs imputed as false links.

Match Probability = probability that a pair is a true link, so (1 – Match Probability) = probability that a pair is a false link.

For a many to many linkage:

Expected True Positives = Sum of Match Probability over pairs with status = LP, IP
Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP, IP
Expected False Negatives = Sum of Match Probability over pairs with status = MP
Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP
Sensitivity = True Positives / (True Positives + False Negatives)
Specificity = True Negatives / (False Positives + True Negatives)

For a one to one linkage, only pairs with status = LP count as pairs imputed as true:

Expected True Positives = Sum of Match Probability over pairs with status = LP
Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP
Expected False Negatives = Sum of Match Probability over pairs with status = MP
Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP
Sensitivity = True Positives / (True Positives + False Negatives)
Specificity = True Negatives / (False Positives + True Negatives)

In general, you will get approximately the same Sensitivity and Specificity for each imputation. You can obtain multiple imputed estimates by combining Sensitivity and Specificity over all imputations:

Values for one imputed linkage, which is one draw from the posterior distribution for true matches, don’t really capture the spirit of multiple imputation. We analyze multiple imputations because one draw is seldom the whole story. As an example, suppose each of multiple imputations has Sensitivity = 0.50 and Specificity = 0.99.

Each imputation does a wonderful job of identifying true non-matches. However, each imputation does a relatively poor job of distinguishing true matches from false matches. This happens when many true matches don’t have very high probabilities – often the case for large files unless there are many match fields with high weights.

With multiple imputations, a true match found in any imputation contributes to analysis results when combined using SAS PROC MIANALYZE (c) SAS Corporation or IVEware (c) University of Michigan . Effectively, each imputation is an independent diagnosis and we can combine sensitivity and specificity in parallel. Sensitivity increases while specificity decreases, as described in

Combining screening tests in series or parallel, http://www.epidemiolog.net, V. Schoenbach, 9/21/2005

Following this approach,

Multiply Imputed Sensitivity and Specifity
Imputations Combined Parallel Sensitivity Parallel Specificity
1 50.00% 99.00%
1,2 75.00% 98.01%
1,2,3 87.50% 97.03%

You have to build custom queries to sum match probabilities. Open LinkSolv and open the Manage Project dialog for the project of interest. Here are examples of queries for Crash Hospital match, Imputation 1:

qryManyToManyTruePos1

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)<>’MP’));

qryManyToManyTruePos1
CountOfMatchProbability TruePositives FalsePositives
164,450 147,152 17,298

qryOneToOneTruePos1

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)=’LP’));

qryOneToOneTruePos1
CountOfMatchProbability TruePositives FalsePositives
164,450 147,152 17,298

qryFalseNegatives1

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS FalseNegatives, Sum(1-[MatchProbability]) AS TrueNegatives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)=’MP’));

qryFalseNegatives1
CountOfMatchProbability FalseNegatives TrueNegatives
1,417,356 146,509 1,270,847

You can use Excel © Microsoft Corporation to do the math:

For MP1
One to One
True Positives = Sum (Match Probability) over status = LP 147,152
False Positives = Sum (1 – Match Probability) over Status = LP 17,298
False Negatives = Sum (Match Probability) over Status = MP 146,509
True Negatives = Sum  (1 – Match Probability) over Status = MP 1,270,847
Sensitivity = True Positives / (True Positives + False Negatives) 50.11%
Specificity = True Negatives / (False Positives + True Negatives) 98.66%

How to Derive Collision Type from Reported Crash Events

We want to compare types of collisions in order to increase the probability of true links with Name or Birth Date disagreements and also decrease the probability of false links for the same person in two different crashes. Even if collision type is not available on a crash record we should be able to derive collision type from Crash Events for comparison with Standard Collision derived from trauma E Codes.

Table 1 Std. Collide vs. Crash Events

(Most Harmful Event, First Harmful Event, or Sequential Event 1)

Std.

Collide

Pairs

Pedestrian

Pedal

Cycle

Non

Collision

Ran

Off

Road

Left

Ran

Off

Road

Right

Overturn

Rollover

Any

Non

Events

Any

Objects

Only

Non

Events

Only

Objects

Most

Vehicles

First

Vehicles

Seq.

Vehicle

31

1

5

0

2

3

6

9

3

9

3

14

21

2

N/A

23

1

1

0

0

3

0

3

6

2

5

14

19

4

NO

289

1

1

27

53

103

103

194

79

135

20

65

118

11

OBJ

96

1

0

3

20

33

10

60

57

22

19

13

36

7

OTH

69

0

24

1

1

7

2

10

5

7

2

28

47

7

PED

69

46

8

0

20

3

0

23

1

23

1

10

40

1

VEH

814

5

5

9

25

52

59

139

68

101

30

544

704

135

I analyzed high probability linked pairs from a preliminary Crash to Trauma linkage as shown in Table 1. Std. Collide lists the collision types derived from E Codes and Pairs shows the number of pairs with that value. The other columns show the number of pairs for particular events or groups of events. Events were counted if found in Most Harmful Event, First Harmful Event, or Sequential Event 1. A single pair might be counted in multiple columns depending on reported events. Ideally, one group of crash events should show high counts for one and only one Std. Collide value. Clearly, this doesn’t happen very often. For example, 84% of Pedestrian go to PED (46 / 55) with 16% errors, 79% of Most Harmful Vehicle go to VEH (544 / 688) with 21% errors, and 45% of Only No Collision Events go to NO (135 / 299) with 55% errors.

I found a number of reasons for disagreements and tried to correct them. First, based on details about  disagreements observed, I collapsed collision types produced by Std. Collide to PED, VEH (old OTH and VEH), and ONE (old NO and OBJ) and reassigned some E Codes to PED based on their fourth digit. Second, I used number of vehicles to predict collision type except for Pedestrian and Pedal Cycle events.

Table 2 Std. Collide vs. Crash Events for Num. Vehicles = 1

(Most Harmful Event, First Harmful Event, or Sequential Event 1)

New

Collide

Pairs

Pedestrian

Pedal

Cycle

Non

Collision

Ran

Off

Road

Left

Ran

Off

Road

Right

Overturn

Rollover

Any

Non

Events

Any

Objects

Only

Non

Events

Only

Objects

Most

Vehicles

First

Vehicles

Seq.

Vehicle

11

0

0

0

1

3

5

7

1

7

1

1

4

0

N/A

8

1

1

0

0

2

0

2

4

2

4

0

4

0

ONE

282

2

1

24

68

117

106

221

117

134

30

24

76

2

PED

96

49

39

0

22

3

0

25

0

25

0

7

48

0

VEH

60

2

1

6

13

18

10

41

26

22

7

10

27

1

Table 2 shows that comparisons are better for one vehicle cases. 92% of Pedestrian and Pedal Cycle go to PED (88 / 96) with 8% errors and 78% of other one vehicle cases go to ONE (282 / 361) with 22% errors.

Table 3 Std. Collide vs. Crash Events for Num. Vehicles > 1

(Most Harmful Event, First Harmful Event, or Sequential Event 1)

New

Collide

Pairs

Pedestrian

Pedal

Cycle

Non

Collision

Ran

Off

Road

Left

Ran

Off

Road

Right

Overturn

Rollover

Any

Non

Events

Any

Objects

Only

Non

Events

Only

Objects

Most

Vehicles

First

Vehicles

Seq.

Vehicle

12

0

0

0

0

0

1

1

2

1

2

10

11

2

N/A

15

0

0

0

0

1

0

1

2

0

1

14

15

4

ONE

103

0

0

6

5

19

7

33

19

23

9

54

78

16

PED

15

1

2

0

0

0

0

0

1

0

1

9

12

1

VEH

788

0

0

4

12

41

51

107

47

85

25

559

709

140

Table 3 shows that comparisons are also better for multiple vehicle cases. 84% of multiple vehicle cases go to VEH (788 / 933) with 16% errors.

Comparison results with these reclassifications were the basis for a better program to derive type of collision from number of vehicles and reported crash events. The largest number of disagreements is 103 multiple vehicle cases classified ONE in trauma. It might be that these cases involve a vehicle that is not a motor vehicle. If so, these disagreements could be corrected based on vehicle type. I added two custom functions to the User Library database.

Public Function NewCollide(varStdCollide As Variant, varECode As Variant) As Variant

    NewCollide = Null

    Select Case Nz(varECode, “”)

        Case
“812.6”, “812.7”, “813.6”, “826.0”, 826.1″

            NewCollide = “PED”

        Case Else

            Select
Case Nz(varStdCollide, “”)

                Case
“NO”, “OBJ”

                    NewCollide = “ONE”

                Case
“OTH”, “VEH”

                    NewCollide = “VEH”

                Case
Else

                    NewCollide = varStdCollide

            End Select

    End Select

End Function

Public Function CrashCollide(varNumVehicles As Variant, varEvents As Variant) As Variant

    CrashCollide = Null

        If IsNumeric(varNumVehicles) Then

        Select Case Val(varNumVehicles)

            Case 1

                CrashCollide = “ONE”

            Case Else

                CrashCollide = “VEH”

        End Select

    End If       

    If InStr(1, Nz(varEvents, “”),
“PEDAL-CYCLE”) Or InStr(1, Nz(varEvents, “”),
“PEDESTRIAN”) Then

        CrashCollide = “PED”

    End If

End Function

Models for Merging Passes and Creating Linked Datasets

LinkSolv Statistical Models for Merging Passes and Creating Linked Datasets

Limited Bayesian Model

Set Cutoff Probability = 0.90 or some other high value in the Specify Match dialog. Set Linked Data Sets = 1 on the Merge Passes tab. LinkSolv uses your estimates of total matches, error probabilities, and frequencies of data values in all probability calculations. All candidate pairs over the specified cutoff probability are accepted as linked pairs and assigned keep status = LP.

Take All Pairs

Use this method if you expect more true links to be many-to-many rather than one-to-one. Set Cutoff Probability = 0.01 or some other low value so that almost all true links are over the cutoff. Set Pairs to Analyze = Take All Pairs on the Merge Passes tab. Set Linked Datasets > 1 – 3 to 5 imputations are usually enough to capture uncertainty about linked pairs when you analyze linked datasets. LinkSolv starts with your prior estimates of total matches, error probabilities, and frequencies of data values to calculate posterior estimates that take into account observed values using Markov Chain Monte Carlo iterative methods. Set Number of Iterations great enough so that MCMC posterior estimates have converged to stable values – 5 to 10 iterations are usually enough and you can always add new iterations if necessary after reviewing the Bayesian Model Check report. LinkSolv actually runs twice as many iterations as you specify then ignores the first half as “burn in,” likely to be too dependent on prior estimates. Specify Standard Deviation and Standard Error to quantify uncertainty about your Total Matches estimate. These parameters can be calculated from the sample of values you used to estimate Total Matches, where Standard Error = Standard Deviation / Square Root of the Number of Values. Use Standard Deviation = 10% of Total Matches if it’s only based on one value.

Take 1-1 Pairs, Take Max Pairs (new label for Take 1-1 Pairs), Draw 1-1 Pairs, or Take LSAP Pairs

Use one of these methods if you expect most true links to be one-to-one rather than many-to-many. Set Cutoff Probability = 0.01 or some other low value so that almost all true links are over the cutoff. Set Pairs to Analyze to one of these values on the Merge Passes tab. Set other merge parameters following the guidelines for Take All Pairs. All of these methods group many-to-many pairs into sets based on having a common record – if two pairs have the same record from table A or from table B then they are assigned to the same set. The methods differ in how one-to-one pairs are selected from each set but one-to-one pairs are always assigned Keep Status = LP and others get Keep Status = IP.

Take Max Pairs includes the pair from each set with maximum probability in the one-to-one linkage and excludes competing pairs (a common record) with lower probabilities. The process repeats until all pairs in all sets have Keep Status = LP or IP.

Draw 1-1 Pairs extends the original Bayesian model so that one-to-one pairs are drawn as part of the overall probability model. This method is preferred by theorists, particularly if you plan to analyze multiple imputations using IVEWARE or SAS PROC MIANALYZE. LinkSolv calculates the probability of each one-to-one permutation of records in each set. For example, if a set includes records A1 and A2 from table A and records B1 and B2 from table B then the one-to-one permutations are (A1, B1; A2, B2) and (A1, B2; A2, B1), either of which might be drawn in each iteration.

Take LSAP Pairs treats selecting linked pairs as a Linear Sum Assignment Problem (LSAP). Given the probabilities of each one-to-one permutation calculated for Draw 1-1 Pairs, Take LSAP Pairs takes the one-to-one permutation which maximizes the sum of match weights, which is the same as the permutation with maximum probability.

Best Pairs

LinkSolv ranks pairs in sets by probability from greatest to least and then selects as many pairs from the top of the list as possible given your specified False Positive Rate. Remember that Match Probability = 0.9 means that 9 out of 10 such links are true and 1 out of 10 is false. So, each 0.9 link in the list contributes 0.9 links toward Expected True Positives and 0.1 links toward Expected False Positives, and similarly for other probabilities. For a given sample of pairs from the top, LinkSolv estimates False Positive Rate = Expected False Positives / (Expected False Positives + Expected True Positives).

Maximum Likelihood Linkage

LinkSolv calculates the likelihood of the Keep Status of all merged pairs for each iteration after burn-in. Each iteration produces an unbiased draw from a stable posterior distribution. Each merged pair can contributes two possible values to the likelihood. If Keep Status = LP then the contribution is the Match Probability for the pair. Otherwise, the contribution is (1 – Match Probability). LinkSolv compares the likelihoods for all iterations and saves the one draw with the maximum as a Maximum Likelihood Estimate. The MLE may not be one of the final imputations of linked pairs.

Recommendations for Record Linkage with LinkSolv and SQL Server

SQL Server, SQL Server Management Studio, Windows Explorer, Office (c) Microsoft Corporation

  1. Lock pages in memory – Enabled for SQL Server service
  2. Perform volume maintenance tasks – Enabled for SQL Server service
  3. Server authentication – Windows Authentication mode. LinkSolv also supports SQL Server Authentication, in which case UID and PWD parameters are saved in a LinkSolv table.
  4. Follow guidelines for Maximum server memory in https://sqlserverperformance.wordpress.com/2009/10/29/suggested-max-memory-settings-for-sql-server-20052008/. This limit allows a memory cushion to do other things, like checking something in Windows Explorer or SQL Server Management Studio.
  5. Recovery Model – Simple. LinkSolv has not been tested with other recovery models. After the database has been backed up, shrink the log file.
  6. Create a separate database for each LinkSolv project. Database object names are not guaranteed unique outside a project.
  7. LinkSolv creates tables, views, stored procedures, and scalar functions as part of each project. LinkSolv does not create or delete databases. Scalar functions are the same for every project and could by managed by Administrator.
  8. Use ODBC Data Source Administrator in SySWoW, unless you have 64-bit Office, to create a System DSN named LINKSOLV for SQL Server on each client.
  9. Use size of project folders on client for old projects to estimate database sizes and scale Memory and Maximum server memory accordingly.

LinkSolv 9.0.0245 Update is Now Available

Changes and Bug Fixes in LinkSolv 9.0.0245

Added new 1-1 algorithm Draw 1-1 Pairs. This allows drawing 1-1 pairs from all competing imputed pairs (with a common record) based on their likelihoods as part of the full Bayesian model. The old algorithm is still available:  Take 1-1 Pairs always selects the one competing pair with maximum likelihood for 1-1 linkages.

Added Max Locks Per File estimate to Execute SQL method. This fixes intermittent error caused by low Max Locks Per File.

Raised error If Prior Error Prob <= 0 Or Prior Error Prob >= 1 in Initialize Match Probabilities. Prior error probabilities are drawn from distributions and might be outside (0, 1) in extreme cases.

Kept merged pairs with Match Probability < 0.90 in Limited Model. This allows confirmation that complete agreements might be under 0.90.

Corrected for Views with invalid IN DATABASE phrase that look like Procedures. This fixes intermittent error caused by moving LinkSolv to a new folder.

Corrected for Table Name AS Alias Name in SQL for Update Join. This allows LinkSolv to use join queries with aliases for better program clarity.

Corrected call to Excel.WorksheetFunction.NormInv(Random X, Mean, Std. Dev) in Draw Total Matches. This fixes incorrect draws that were always centered on the mean and within two standard deviations but not normally distributed.

Corrected call to Execute SQL (…As Is=True) in Create View On Server. This fixes intermittent error when creating a view on SQL Server.

Corrected typo in Count Rows (…MergedPairs0) in Imputation Property. This fixes incorrect estimates of Max Locks Per File.

Raised error if two consecutive draws are out of range in Draw Total Matches. In extreme cases, thousands of draws might be made and increase run times.

Corrected arguments passed to Copy Folder, Copy File, and Delete File in Copy Project. This allows copying projects without file warnings.

Corrected New Project Description in Copy Project. This allows opening the new project without warnings.

Added test for Excel subtypes in Restore Access Link. This restores links to .xlsx files.

Added test for ODBC link in JET project in Restore Access Link. This restores links to SQL Server tables.

Increased length of Link Name field to 50 in Source Links and Match Links tables. This allows longer names for data tables.

Called Requery Form in all Prepare Data Actions. This keeps the Link Tables tab current when working on Select Input and Standardize.

Called Update Release in Welcome Open Method. This updates information on the About tab.

Improved management of Connection Pool for Access databases. LinkSolv connects to many different Access databases because each one is limited to 2 GB size. This improvement reduces the time to connect and fixes intermittent errors caused by connection conflicts.

LinkSolv 9.0.0190 is Now Available

Microsoft ended support for Access 2003 on 4/8/2014 (Access (C) Microsoft Corporation). LinkSolv 9.0 uses the Access 2007-2013 database format (.accdb files). Old LinkSolv 8.3 projects that use the Access 2000-2003 database format (.mdb files) can be imported without conversion.

LinkSolv 9.0 is compatible with both 32-bit and 64-bit Microsoft Access. LinkSolv is still a 32-bit application so integer values are limited to -2,147,483,648 through +2,147,483,647. Record numbers and record counts cannot exceed 2,147,483,647.

LinkSolv 9.0 is compatible with Microsoft Windows (Windows (C) Microsoft Corporation) accessibility tools including on-screen keyboard, touch screen, magnifier, high-contrast display, keyboard shortcuts, narrator, and voice recognition.

The Match Pass object class in LinkSolv 9.0 has been redesigned to better encapsulate pass-specific properties and methods, an important object oriented design principle.

LinkSolv 9.0 includes algorithms for managing the pool of connections to different Access databases. The new algorithms improve performance and reduce the potential for database access conflicts, especially over networks.

LinkSolv 9.0 includes algorithms for estimating and setting “Max Locks Per File” before running each query. This reduces the potential for two common errors:  “File Sharing Lock Count Exceeded. Increase MaxLocksPerFile Registry Entry” and “There is not enough memory or disk space to complete the operation.”

LinkSolv 9.0 fixes an intermittant bug. If a view (a SELECT query) refers to a non-existant table because a file was moved or renamed then Access classifies the view as a procedure. LinkSolv now checks both views and procedures to find a SELECT query.

LinkSolv 9.0 includes a new algorithm for merging passes for dual matches. As always, imputed pairs are randomly drawn from merged pairs by comparing estimated match probabilities to random numbers. So, 90% of 0.90 probability pairs, 50% of 0.50 probablity pairs, 10% of 0.10 probability pairs, etc. are selected as imputed pairs. Imputed pairs are not necessarily one-to-one. LinkSolv groups imputed pairs into sets by collecting pairs with the same UniqueID or same UniqueID_B. If Pairs to analyze for error rates and set members is set to “Take All Pairs” or “Take 1-1 Pairs” then the pair in a group with the greatest probability is given status LP (Linked Pair) and other pairs in the same group are given status IP (Imputed Pair).

If Pairs to analyze for error rates and set members is set to “Draw 1-1 Pairs” then the new algorithm is used. First, all possible one-to-one permutations are identified for each set. Second, the posterior probability that each permutation is true is calculated. Third, one permutation is drawn at random given the cumulative probability distribution for all permutations. In this way, one-to-one pairs are drawn as part of the Bayesian model, the preferred approach. Every pair in each drawn permutation is given status LP and other pairs are given status IP. Sometimes the new results can be quite different.

For example, suppose the record pairs in a set consist of 2 A records and 2 B records like this:

Pair        Match Probability Pair Status
(A1, B1) 0.90                      LP
(A1, B2) 0.80                      IP
(A2, B1) 0.50                      IP
(A2, B2) 0.10                      LP

Baring large swings in probabilities, the above status is applied in all Markov Chain iterations.

There are 2 one-to-one permutations:

Permutation         Permutation Likelihood  Permutation Probability
(A1, B1) (A2, B2) 0.90 x 0.10 = 0.09          0.18
(A1, B2) (A2, B1) 0.80 x 0.50 = 0.40          0.82

The old status is applied in about 18% of Markov Chain iterations:

Pair        Match Probability Pair Status
(A1, B1) 0.90                      LP
(A1, B2) 0.80                      IP
(A2, B1) 0.50                      IP
(A2, B2) 0.10                      LP

This new status is applied in about 82% of Markov Chain iterations:

Pair        Match Probability Pair Status
(A1, B1) 0.90                      IP
(A1, B2) 0.80                      LP
(A2, B1) 0.50                      LP
(A2, B2) 0.10                      IP

Estimate Combined P Value for Multiply Imputed Logistic Regression

Following Statistical Analysis with Missing Data, Little and Rubin, Wiley 2002,

to estimate combined P value for multiply imputed logistic regression given parameter estimates and covariance matrix for each imputation.

Equation 10.17 gives an estimated Wald statistic W for k>1 parameter components that can be compared to an F random variable to get a P value as Probability[F > W].

We need to calculate trace(B inverse(V bar)) for 10.17, where B is the Between Imputation Variance and V bar is the average of covariance matrices for all imputations.

Equation (10.13) is for a scalar parameter, D draws (imputations), where

( 1 / (D-1)) Sum d=1 to D (theta^d – theta^)**2 is identified as B

THIS EQUATION HAS A TYPO! The second theta^ should be theta bar, the average over all imputations. So,

 B = ( 1 / (D-1)) Sum d=1 to D (theta^d – theta_)**2

 The last sentence in the paragraph with equations 10.13 and 10.14 is

 “For vector theta, the variance Vd is replaced by a covariance matrix, and (theta^d – theta bar)**2 is replaced by (theta^d – theta bar) transpose(theta^d – theta bar).”

(theta^d – theta bar) transpose(theta^d – theta bar) = ||(theta^d – theta bar)||, a scalar.

 B = ( 1 / (D-1)) Sum d=1 to D (theta^d – theta bar) transpose(theta^d – theta bar)

Suppose B = 3 and

inverse(V bar) = |0.1 0.3|

                             |0.3 0.2|

 then B inverse(V bar)  = 3 |0.1 0.3| = |0.3 0.9|

                                               |0.3 0.2|    |0.9 0.6|

 trace  B inverse(V bar) = 0.3 + 0.6