January 7, 2016 Uncategorized Leave a comment

LinkSolv 9.0.0408 Now Available

Summary of Recent Changes

Build Build Date Comment

9.0.0405 07-Jan-2016 Summary Fixed syntax error so that Tabulate Duplicates is compatible with SQL Server
9.0.0407 07-Jan-2016 Summary Implemented Self Terminate using VBScript with delay

January 5, 2016 Uncategorized Leave a comment

LinkSolv 9.0.0404 Now Available

by mikemcglincy

Summary of Recent Changes

Build Build Date Comment

9.0.0247 28-Sep-2015 Summary Fixed intermittent error — Size of Random Sample not recalculated
9.0.0250 28-Sep-2015 Summary Included pairs with Keep Status IP in Maximum Likelihood Estimate for Set Method Take All Pairs
9.0.0253 23-Oct-2015 Summary Fixed intermittent error — Data Type Mismatch in Initialize, Update Match Probabilities
9.0.0257 23-Oct-2015 Summary Added LSAP 1-1 Pairs option for Set Method in Merge Passes
9.0.0262 24-Nov-2015 Summary Created fake data with proportions and counts closer to specs
9.0.0270 24-Nov-2015 Summary Selected Best Pairs from Maximum Likelihood Estimate using record scores
9.0.0280 24-Nov-2015 Summary Fixed intermittent error when Outcome Tests in Review Match conflict with Perform Match
9.0.0287 24-Nov-2015 Summary Accepted with a warning any input fields with 100% missing
9.0.0289 24-Nov-2015 Summary Fixed intermittent error Truncated tab control with larger form size 5.5″ X 7.25″
9.0.0290 24-Nov-2015 Summary Tested Function.PINYIN2, a better SOUNDEX for PINYIN names
9.0.0315 09-Dec-2015 Summary Improved management of objects opened by LinkSolv
9.0.0334 09-Dec-2015 Summary Fixed intermittent error — Access 2013 on Windows 8.1 crashes on close
9.0.0336 09-Dec-2015 Summary Added cases for Take All SubSets to Match.GroupPairsIntoSets
9.0.0351 10-Dec-2015 Summary Fixed intermittent error — Access running after close on Windows 8.1 and 10.
9.0.0374 10-Dec-2015 Summary Declared Global form UPPER and LEFT locations to be 0.05″ (72 FIPS)
9.0.0375 10-Dec-2015 Summary Fixed case logic for SelfMatch in SpecifyMatch.TotalMatchesAfter
9.0.0384 22-Dec-2015 Summary Added text to speech and contrasting colors for accessibility
9.0.0390 22-Dec-2015 Summary Fixed errors in new MLE,MLB and Take All SubSets algorithms
9.0.0399 22-Dec-2015 Summary Fixed errors in new Quit LinkSolv algorithms

Update Instructions

1. Software Compatibility
2. Installing LINKSOLV 9.0
3. Updating LINKSOLV 9.0

______________________________________________________________

1. Software Compatibility

Microsoft, Microsoft Windows, Microsoft Office, and Microsoft SQL Server are trademarks or registered trademarks of Microsoft Corporation. LINKSOLV 9.0 is compatible with the latest releases of Windows Vista, 7, 8, 8.1, and 10, Windows Server 2008, 2008 R2, 2012, and 10; Office 2007, 2010, 2013, 2016, SQL Server 2008, 2008 R2, 2012, and 2014. If you are running earlier releases of these products then update the software on your computer before installing LINKSOLV. LINKSOLV may not operate correctly if you use earlier releases.

________________________________________________________________

2. Installing LINKSOLV 9.0

2.1 Backup and document all custom work if you wish to reference it in new projects.

2.1.1 Create backup copies of all files in the LINKSOLV installation folder (default folder is C:\Program Files\Strategic Matching\LinkSolv 9.0 for 32-bit Windows or C:\Program Files (x86)\Strategic Matching\LinkSolv 9.0 for 64-bit Windows).

2.1.2 Create reports for all old LINKSOLV projects including a) simulation specifications; b) data specifications; c) match specifications; d) Bayesian model checks.

2.2 Download and run the setup program for a new installation

2.2.1 Obtain the serial number and password for LINKSOLV 9.0 new installation from Strategic Matching.

2.2.2 Go to http://www.strategicmatching.com. Click on the Downloads tab.

2.2.3 Search Downloads for your serial number. If more than one link is listed then click on the one that includes your system configuration (for example, Windows 7 and Access 2010).

2.2.4 Click on the link for Setup 9.0 for New Installation. Respond to the download and update dialogs. Enter the password when prompted.

2.3 Set options for Microsoft Access

2.3.1 If necessary, create a folder for new Access databases created during record linkage projects, (for example C:\LinkSolv Projects). For better performance, the LinkSolv Projects folder should be local to the computer where LinkSolv is installed, not on a remote drive.

2.3.2 Enable all macros in the Access Trust Center Settings. Or, set trusted locations for LinkSolv Programs, LinkSolv Projects, and LinkSolv Data.

2.3.3 Go to Access Options, General tab and set the default location for new databases to the LinkSolv Projects folder.

2.3.4 Set Default Database format to Access 2007/2016.

2.3.5 Open LinkSolv and set LinkSolv options for your projects. Set Data Provider to JET for Access databases or SQL for SQL Server before you try to create new projects.

2.3.6 Click on the LinkSolv Help button to open the User Guide. Read the Getting Started section.

2.4 Import Old Projects

2.4.1 You can copy one or more projects from any archived STMTData.mdb (for LinkSolv 8.3) or STMTData.accdb (for LinkSolv 9.0) database.

2.4.2 Open LinkSolv, select the Projects tab, Click on Import Project and respond to the dialog.

________________________________________________________________

3. Updating LINKSOLV 9.0

3.1 Backup LINKSOLV Program Files (default folder is C:\Program Files\Strategic Matching\LinkSolv 9.0 for 32-bit Windows or C:\Program Files (x86)\Strategic Matching\LinkSolv 9.0 for 64-bit Windows).

3.2 Download and run the update program

3.2.1 Obtain the update version number and password for LINKSOLV 9.0 from Strategic Matching.

3.2.2 Go to http://www.strategicmatching.com. Click on the Downloads tab.

3.2.3 Search Downloads for the update version number. From the list of links, click on that includes your system configuration (for example, Windows 7 and Access 2010).

3.1.4 Click on the link for LINKSOLV Update. Respond to the download and update dialogs. Enter the password when prompted.
______________________________________________________________

The setup program LinkSolv9.0.0404Update.exe will update earlier releases of LinkSolv 9.0 to 9.0.0404 by replacing the LinkSolv database file (LinkSolv.accdb) and the Strategic Matching Library database file (STMTLibrary.accde) in the LinkSolv installation folder.

The old LinkSolv database file and Strategic Matching Library database file must already exist in the installation folder for LinkSolv 9.0. If not, setup will stop.

The old database files will be moved to a BACKUP folder and renamed. These files can be restored if you uninstall the update.

The updated database files will be placed in the installation folder for LinkSolv 9.0 with the same permissions as the old files.

October 22, 2015 Uncategorized Leave a comment

How to Modify PINYIN Names for SOUNDEX Coding and Record Linkage

by mikemcglincy

PINYIN is a standard for approximating the sound of Chinese names using the western alphabet. Most letters are pronounced roughly like they are in English with exceptions. These exceptions can reduce the usefulness of SOUNDEX coding done for linkage purposes. First, SOUNDEX coding suppresses all vowels except for the first letter in a name and the correct Chinese pronunciation depends greatly on the vowels. SOUNDEX codes cannot differentiate between LI, LIU, LU, and LUI. Second, PINYIN Q is pronounced like CH, C and X like SH, and ZH like J. For example, PINYIN QIANG (SOUNDEX Q520) is pronounced like CHEE-AHNG (SOUNDEX C520) and might be misspelled CHIANG (SOUNDEX C520). The good news is that all of these consonants with exceptions are in SOUNDEX group 2 so they are all coded as digit 2 except as first letters. So, if we change the first one or two letters of some PINYIN names then SOUNDEX codes for the modified names will be more useful for record linkage.

‘Step 4 – Change letters from the following groups into the corresponding digits:

‘1 = B, F, P, V

‘2 = C, G, J, K, Q, S, X, Z

‘3 = D, T

‘4 = L

‘5 = M, N

‘6 = R

Public Function ModPinYin(varPinYin As Variant) As Variant

ModPinYin = Null

If Not IsNull(varPinYin) Then

ModPinYin = CStr(varPinYin)

If Left(PinYin2, 2) = “CH” Then ‘ No change

ModPinYin = ModPinYin

ElseIf Left(PinYin2, 1) = “Q” Then ‘ Replace Q with CH

ModPinYin = “CH” & Mid(ModPinYin, 2)

ElseIf Left(PinYin2, 1) = “C” Then ‘Replace C with SH

ModPinYin = “SH” & Mid(ModPinYin, 2)

ElseIf Left(ModPinYin, 2) = “HS” Then Replace HS with SH

ModPinYin = “SH” & Mid(ModPinYin, 3)

ElseIf Left(PinYin2, 1) = “X” Then ‘Replace X with SH

ModPinYin = “SH” & Mid(ModPinYin, 2)

ElseIf Left(ModPinYin, 2) = “ZH” Then Replace ZH with J

ModPinYin = “J” & Mid(ModPinYin, 3)

End If

End Function

The Voice of America English News web site has a guide for pronouncing Chinese names.

http://pronounce.voanews.com/browse-oneregion.php?region=China

The Pinyin and Pronounced columns in the following table were copied from this web site. Soundex is the SOUNDEX code for Pinyin. PinyonMod is the same as Pinyon with recommended changes to initial letters and Soundex2 is the corresponding SOUNDEX code. Pronounced is Voice of America’s guide for English speakers and Soundex3 the corresponding SOUNDEX code.

PinYinNames
Pinyin	Soundex	PinyinMod	Soundex2	Pronounced	Soundex3
BEIJING	B252	BEIJING	B252	BAY-JEENG	B252
BOXIONG	B252	BOXIONG	B252	BOH SHAWNG	B252
CHEN	C500	CHEN	C500	CHUHN	C500
CHEN KAIGE	C520	CHEN KAIGE	C520	CHUHN K-EYE GUH	C520
CUI	C000	SHUI	S000	SWAY	S000
CUI TIANKAI	C352	SHUI TIANKAI	S352	SWAY TEE-EHN K-EYE	S352
DING	D520	DING	D520	DEENG	D520
DING GUIZIONG	D525	DING GUIZIONG	D525	DEENG GWAY JAWNG	D525
DU	D000	DU	D000	DOO	D000
DU QINGLIN	D252	DU QINGLIN	D252	DOO CHEENG LEEN	D252
ENLAI	E540	ENLAI	E540	EHN L-EYE	E540
FU	F000	FU	F000	FOO	F000
FU QUANYOU	F250	FU QUANYOU	F250	FOO CHWAHNYOH	F250
GAO	G000	GAO	G000	GAH-OH	G000
GAO QIANG	G252	GAO QIANG	G252	GAH-OH CHEE-AHNG	G252
GAO ZHAN	G250	GAO ZHAN	G250	GAH-OH JAHN	G250
GUANGLIE	G524	GUANGLIE	G524	GWAHNG LAY	G524
GUIZIONG	G252	GUIZIONG	G252	GWAY JAWNG	G252
GUO	G000	GUO	G000	GWOH	G000
GUO BOXIONG	G125	GUO BOXIONG	G125	GWOH BOH SHAWNG	G125
HE	H000	HE	H000	HUH	H000
HE XINTONG	H253	HE XINTONG	H253	HUH SHEEN TAWNG	H253
HSING	H252	SHING	S520	SHEENG	S520
HSING HSING	H252	SHING HSING	S525	SHEENG SHEENG	S525
HU	H000	HU	H000	HOO	H000
HU JIA	H200	HU JIA	H200	HOO JYAH	H200
HUANG	H520	HUANG	H520	HWAHNG	H520
HUANG JU	H520	HUANG JU	H520	HWAHNG JOO	H520
JIANG	J520	JIANG	J520	jee-YAHNG	J520
JIANGXI	J520	JIANGXI	J520	jee-YAHNG SHEE	J520
JIANMIN	J550	JIANMIN	J550	JAHN MEEN	J550
LI	L000	LI	L000	LEE	L000
LI XI	L200	LI XI	L200	LEE SHEE	L200
LIANG	L520	LIANG	L520	LYAHNG	L520
LIANG GUANGLIE	L525	LIANG GUANGLIE	L525	LYAHNG GWAHNG LAY	L525
LIN	L500	LIN	L500	LEEN	L500
LUI	L000	LUI	L000	L-YOO	L000
LUI XIA	L200	LUI XIA	L200	L-YOO SH-YAH	L200
MA	M000	MA	M000	MAH	M000
MA ZHAOXU	M200	MA ZHAOXU	M200	MAH JAH-OH SHOO	M200
MEI XIANG	M252	MEI XIANG	M252	MAY SH-YAHNG	M252
QIAN	Q500	CHIAN	C500	CHEHN	C500
QIAN QICHEN	Q525	CHIAN QICHEN	C525	CHEHN CHEE-CHUHN	C525
QIANG	Q520	CHIANG	C520	CHEE-AHNG	C520
QINGDAO	Q523	CHINGDAO	C523	CHEENG DOW	C523
QINGLIN	Q524	CHINGLIN	C524	CHEENG LEEN	C524
QUANYOU	Q500	CHUANYOU	C500	CHWAHNYOH	C500
SILIN	S450	SILIN	S450	SUH-LEEN	S450
SUN	S500	SUN	S500	SOON	S500
SUN ZHENYU	S525	SUN ZHENYU	S525	SOON JUHN YOO	S525
TIANKAI	T520	TIANKAI	T520	TEE-EHN K-EYE	T520
WANG	W520	WANG	W520	WAHNG	W520
WANG DAN	W523	WANG DAN	W523	WAHNG DAHN	W523
WU	W000	WU	W000	WOO	W000
WU JIANMIN	W255	WU JIANMIN	W255	WOO JAHN MEEN	W255
XIANG	X520	SHIANG	S520	SH-YAHNG	S520
XIAOHUA	X000	SHIAOHUA	S000	SHAH-OH HWAH	S000
XIAOLEI	X400	SHIAOLEI	S400	SHYAU LEY	S400
XINTONG	X535	SHINTONG	S535	SHEEN TAWNG	S535
XU	X000	SHU	S000	SHOO	S000
XU CAIHOU	X200	SHU CAIHOU	S200	SHOO TS-EYE HOH	S320
YAN	Y500	YAN	Y500	YAHN	Y500
YANG	Y520	YANG	Y520	YAHNG	Y520
YANG JIANLI	Y525	YANG JIANLI	Y525	YAHNG JAHN LEE	Y525
ZHAN	Z500	JAN	J500	JAHN	J500
ZHANG	Z520	JANG	J520	JAHNG	J520
ZHANG HONGBAO	Z525	JANG HONGBAO	J525	JAHNG HAWNG-BOW	J525
ZHANG QIYUE	Z520	JANG QIYUE	J520	JAHNG CHEE YOO-WEH	J520
ZHAO	Z000	JAO	J000	JAH-OH	J000
ZHAO YAN	Z500	JAO YAN	J500	JAH-OH YAHN	J500
ZHAOXU	Z200	JAOXU	J200	JAH-OH SHOO	J200
ZHENG	Z520	JENG	J520	JUHNG	J520
ZHENG SILIN	Z524	JENG SILIN	J524	JUHNG SUH-LEEN	J524
ZHOU	Z000	JOU	J000	JOH	J000
ZHOU ENLAI	Z540	JOU ENLAI	J540	JOH EHN L-EYE	J540
ZHU	Z000	JU	J000	JOO	J000
ZHU XIAOHUA	Z200	JU XIAOHUA	J200	JOO SHAH-OH HWAH	J200

September 16, 2015 Uncategorized 1 Comment

How to Estimate Sensitivity and Specificity for a LinkSolv Model

by mikemcglincy

You can estimate sensitivity and specificity for a linkage model by calculating expected values for the number of True Positive Links, False Positive Links, True Negative Links, and False Negative Links for all merged pairs above the cutoff probability. The Merged Pairs table for each imputation (MP1, MP2, etc.) lists these pairs, each Match Probability (the probability that a pair is a true link), and each linkage status:

LP = Linked Pairs, candidate pairs imputed as true one to one links.

IP = Imputed Pairs, candidate pairs imputed as true many to many links that share a common record with a linked pair.

MP = Merged Pairs, candidate pairs imputed as false links.

Match Probability = probability that a pair is a true link, so (1 – Match Probability) = probability that a pair is a false link.

For a many to many linkage:

Expected True Positives = Sum of Match Probability over pairs with status = LP, IP

Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP, IP

Expected False Negatives = Sum of Match Probability over pairs with status = MP

Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP

Sensitivity = True Positives / (True Positives + False Negatives)

Specificity = True Negatives / (False Positives + True Negatives)

For a one to one linkage, only pairs with status = LP count as pairs imputed as true:

Expected True Positives = Sum of Match Probability over pairs with status = LP

Expected False Positives = Sum of (1 – Match Probability) over pairs with status = LP

Expected False Negatives = Sum of Match Probability over pairs with status = MP

Expected True Negatives = Sum of (1 – Match Probability) over pairs with status = MP

Sensitivity = True Positives / (True Positives + False Negatives)

Specificity = True Negatives / (False Positives + True Negatives)

In general, you will get approximately the same Sensitivity and Specificity for each imputation. You can obtain multiple imputed estimates by combining Sensitivity and Specificity over all imputations:

Values for one imputed linkage, which is one draw from the posterior distribution for true matches, don’t really capture the spirit of multiple imputation. We analyze multiple imputations because one draw is seldom the whole story. As an example, suppose each of multiple imputations has Sensitivity = 0.50 and Specificity = 0.99.

Each imputation does a wonderful job of identifying true non-matches. However, each imputation does a relatively poor job of distinguishing true matches from false matches. This happens when many true matches don’t have very high probabilities – often the case for large files unless there are many match fields with high weights.

With multiple imputations, a true match found in any imputation contributes to analysis results when combined using SAS PROC MIANALYZE (c) SAS Corporation or IVEware (c) University of Michigan . Effectively, each imputation is an independent diagnosis and we can combine sensitivity and specificity in parallel. Sensitivity increases while specificity decreases, as described in

Combining screening tests in series or parallel, http://www.epidemiolog.net, V. Schoenbach, 9/21/2005

Following this approach,

Multiply Imputed Sensitivity and Specifity
Imputations Combined	Parallel Sensitivity	Parallel Specificity
1	50.00%	99.00%
1,2	75.00%	98.01%
1,2,3	87.50%	97.03%

You have to build custom queries to sum match probabilities. Open LinkSolv and open the Manage Project dialog for the project of interest. Here are examples of queries for Crash Hospital match, Imputation 1:

qryManyToManyTruePos1

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)<>’MP’));

qryManyToManyTruePos1
CountOfMatchProbability	TruePositives	FalsePositives
164,450	147,152	17,298

qryOneToOneTruePos1

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS TruePositives, Sum(1-[MatchProbability]) AS FalsePositives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)=’LP’));

qryOneToOneTruePos1
CountOfMatchProbability	TruePositives	FalsePositives
164,450	147,152	17,298

qryFalseNegatives1

SELECT Count(CrashHospital__MP1.MatchProbability) AS CountOfMatchProbability, Sum(CrashHospital__MP1.MatchProbability) AS FalseNegatives, Sum(1-[MatchProbability]) AS TrueNegatives

FROM CrashHospital__MP1

WHERE (((CrashHospital__MP1.KeepStatus)=’MP’));

qryFalseNegatives1
CountOfMatchProbability	FalseNegatives	TrueNegatives
1,417,356	146,509	1,270,847

You can use Excel © Microsoft Corporation to do the math:

For MP1

One to One
True Positives = Sum (Match Probability) over status = LP					147,152
False Positives = Sum (1 – Match Probability) over Status = LP					17,298
False Negatives = Sum (Match Probability) over Status = MP					146,509
True Negatives = Sum (1 – Match Probability) over Status = MP					1,270,847
Sensitivity = True Positives / (True Positives + False Negatives)					50.11%
Specificity = True Negatives / (False Positives + True Negatives)					98.66%

September 9, 2015 Uncategorized Leave a comment

How to Derive Collision Type from Reported Crash Events

by mikemcglincy

We want to compare types of collisions in order to increase the probability of true links with Name or Birth Date disagreements and also decrease the probability of false links for the same person in two different crashes. Even if collision type is not available on a crash record we should be able to derive collision type from Crash Events for comparison with Standard Collision derived from trauma E Codes.

Table 1 Std. Collide vs. Crash Events (Most Harmful Event, First Harmful Event, or Sequential Event 1)
Std. Collide	Pairs	Pedestrian	Pedal Cycle	Non Collision	Ran Off Road Left	Ran Off Road Right	Overturn Rollover	Any Non Events	Any Objects	Only Non Events	Only Objects	Most Vehicles	First Vehicles	Seq. Vehicle
	31	1	5	0	2	3	6	9	3	9	3	14	21	2
N/A	23	1	1	0	0	3	0	3	6	2	5	14	19	4
NO	289	1	1	27	53	103	103	194	79	135	20	65	118	11
OBJ	96	1	0	3	20	33	10	60	57	22	19	13	36	7
OTH	69	0	24	1	1	7	2	10	5	7	2	28	47	7
PED	69	46	8	0	20	3	0	23	1	23	1	10	40	1
VEH	814	5	5	9	25	52	59	139	68	101	30	544	704	135

I analyzed high probability linked pairs from a preliminary Crash to Trauma linkage as shown in Table 1. Std. Collide lists the collision types derived from E Codes and Pairs shows the number of pairs with that value. The other columns show the number of pairs for particular events or groups of events. Events were counted if found in Most Harmful Event, First Harmful Event, or Sequential Event 1. A single pair might be counted in multiple columns depending on reported events. Ideally, one group of crash events should show high counts for one and only one Std. Collide value. Clearly, this doesn’t happen very often. For example, 84% of Pedestrian go to PED (46 / 55) with 16% errors, 79% of Most Harmful Vehicle go to VEH (544 / 688) with 21% errors, and 45% of Only No Collision Events go to NO (135 / 299) with 55% errors.

I found a number of reasons for disagreements and tried to correct them. First, based on details about disagreements observed, I collapsed collision types produced by Std. Collide to PED, VEH (old OTH and VEH), and ONE (old NO and OBJ) and reassigned some E Codes to PED based on their fourth digit. Second, I used number of vehicles to predict collision type except for Pedestrian and Pedal Cycle events.

Table 2 Std. Collide vs. Crash Events for Num. Vehicles = 1 (Most Harmful Event, First Harmful Event, or Sequential Event 1)
New Collide	Pairs	Pedestrian	Pedal Cycle	Non Collision	Ran Off Road Left	Ran Off Road Right	Overturn Rollover	Any Non Events	Any Objects	Only Non Events	Only Objects	Most Vehicles	First Vehicles	Seq. Vehicle
	11	0	0	0	1	3	5	7	1	7	1	1	4	0
N/A	8	1	1	0	0	2	0	2	4	2	4	0	4	0
ONE	282	2	1	24	68	117	106	221	117	134	30	24	76	2
PED	96	49	39	0	22	3	0	25	0	25	0	7	48	0
VEH	60	2	1	6	13	18	10	41	26	22	7	10	27	1

Table 2 shows that comparisons are better for one vehicle cases. 92% of Pedestrian and Pedal Cycle go to PED (88 / 96) with 8% errors and 78% of other one vehicle cases go to ONE (282 / 361) with 22% errors.

Table 3 Std. Collide vs. Crash Events for Num. Vehicles > 1 (Most Harmful Event, First Harmful Event, or Sequential Event 1)
New Collide	Pairs	Pedestrian	Pedal Cycle	Non Collision	Ran Off Road Left	Ran Off Road Right	Overturn Rollover	Any Non Events	Any Objects	Only Non Events	Only Objects	Most Vehicles	First Vehicles	Seq. Vehicle
	12	0	0	0	0	0	1	1	2	1	2	10	11	2
N/A	15	0	0	0	0	1	0	1	2	0	1	14	15	4
ONE	103	0	0	6	5	19	7	33	19	23	9	54	78	16
PED	15	1	2	0	0	0	0	0	1	0	1	9	12	1
VEH	788	0	0	4	12	41	51	107	47	85	25	559	709	140

Table 3 shows that comparisons are also better for multiple vehicle cases. 84% of multiple vehicle cases go to VEH (788 / 933) with 16% errors.

Comparison results with these reclassifications were the basis for a better program to derive type of collision from number of vehicles and reported crash events. The largest number of disagreements is 103 multiple vehicle cases classified ONE in trauma. It might be that these cases involve a vehicle that is not a motor vehicle. If so, these disagreements could be corrected based on vehicle type. I added two custom functions to the User Library database.

Public Function NewCollide(varStdCollide As Variant, varECode As Variant) As Variant

NewCollide = Null

Select Case Nz(varECode, “”)

Case
“812.6”, “812.7”, “813.6”, “826.0”, “826.1″

NewCollide = “PED”

Case Else

Select
Case Nz(varStdCollide, “”)

Case
“NO”, “OBJ”

NewCollide = “ONE”

Case
“OTH”, “VEH”

NewCollide = “VEH”

Case
Else

NewCollide = varStdCollide

End Select

End Function

Public Function CrashCollide(varNumVehicles As Variant, varEvents As Variant) As Variant

CrashCollide = Null

If IsNumeric(varNumVehicles) Then

Select Case Val(varNumVehicles)

Case 1

CrashCollide = “ONE”

Case Else

CrashCollide = “VEH”

End Select

End If

If InStr(1, Nz(varEvents, “”),
“PEDAL-CYCLE”) Or InStr(1, Nz(varEvents, “”),
“PEDESTRIAN”) Then

CrashCollide = “PED”

End If

End Function

August 31, 2015 Uncategorized Leave a comment

Models for Merging Passes and Creating Linked Datasets

by mikemcglincy

LinkSolv Statistical Models for Merging Passes and Creating Linked Datasets

Limited Bayesian Model

Set Cutoff Probability = 0.90 or some other high value in the Specify Match dialog. Set Linked Data Sets = 1 on the Merge Passes tab. LinkSolv uses your estimates of total matches, error probabilities, and frequencies of data values in all probability calculations. All candidate pairs over the specified cutoff probability are accepted as linked pairs and assigned keep status = LP.

Take All Pairs

Use this method if you expect more true links to be many-to-many rather than one-to-one. Set Cutoff Probability = 0.01 or some other low value so that almost all true links are over the cutoff. Set Pairs to Analyze = Take All Pairs on the Merge Passes tab. Set Linked Datasets > 1 – 3 to 5 imputations are usually enough to capture uncertainty about linked pairs when you analyze linked datasets. LinkSolv starts with your prior estimates of total matches, error probabilities, and frequencies of data values to calculate posterior estimates that take into account observed values using Markov Chain Monte Carlo iterative methods. Set Number of Iterations great enough so that MCMC posterior estimates have converged to stable values – 5 to 10 iterations are usually enough and you can always add new iterations if necessary after reviewing the Bayesian Model Check report. LinkSolv actually runs twice as many iterations as you specify then ignores the first half as “burn in,” likely to be too dependent on prior estimates. Specify Standard Deviation and Standard Error to quantify uncertainty about your Total Matches estimate. These parameters can be calculated from the sample of values you used to estimate Total Matches, where Standard Error = Standard Deviation / Square Root of the Number of Values. Use Standard Deviation = 10% of Total Matches if it’s only based on one value.

Take 1-1 Pairs, Take Max Pairs (new label for Take 1-1 Pairs), Draw 1-1 Pairs, or Take LSAP Pairs

Use one of these methods if you expect most true links to be one-to-one rather than many-to-many. Set Cutoff Probability = 0.01 or some other low value so that almost all true links are over the cutoff. Set Pairs to Analyze to one of these values on the Merge Passes tab. Set other merge parameters following the guidelines for Take All Pairs. All of these methods group many-to-many pairs into sets based on having a common record – if two pairs have the same record from table A or from table B then they are assigned to the same set. The methods differ in how one-to-one pairs are selected from each set but one-to-one pairs are always assigned Keep Status = LP and others get Keep Status = IP.

Take Max Pairs includes the pair from each set with maximum probability in the one-to-one linkage and excludes competing pairs (a common record) with lower probabilities. The process repeats until all pairs in all sets have Keep Status = LP or IP.

Draw 1-1 Pairs extends the original Bayesian model so that one-to-one pairs are drawn as part of the overall probability model. This method is preferred by theorists, particularly if you plan to analyze multiple imputations using IVEWARE or SAS PROC MIANALYZE. LinkSolv calculates the probability of each one-to-one permutation of records in each set. For example, if a set includes records A1 and A2 from table A and records B1 and B2 from table B then the one-to-one permutations are (A1, B1; A2, B2) and (A1, B2; A2, B1), either of which might be drawn in each iteration.

Take LSAP Pairs treats selecting linked pairs as a Linear Sum Assignment Problem (LSAP). Given the probabilities of each one-to-one permutation calculated for Draw 1-1 Pairs, Take LSAP Pairs takes the one-to-one permutation which maximizes the sum of match weights, which is the same as the permutation with maximum probability.

Best Pairs

LinkSolv ranks pairs in sets by probability from greatest to least and then selects as many pairs from the top of the list as possible given your specified False Positive Rate. Remember that Match Probability = 0.9 means that 9 out of 10 such links are true and 1 out of 10 is false. So, each 0.9 link in the list contributes 0.9 links toward Expected True Positives and 0.1 links toward Expected False Positives, and similarly for other probabilities. For a given sample of pairs from the top, LinkSolv estimates False Positive Rate = Expected False Positives / (Expected False Positives + Expected True Positives).

Maximum Likelihood Linkage

LinkSolv calculates the likelihood of the Keep Status of all merged pairs for each iteration after burn-in. Each iteration produces an unbiased draw from a stable posterior distribution. Each merged pair can contributes two possible values to the likelihood. If Keep Status = LP then the contribution is the Match Probability for the pair. Otherwise, the contribution is (1 – Match Probability). LinkSolv compares the likelihoods for all iterations and saves the one draw with the maximum as a Maximum Likelihood Estimate. The MLE may not be one of the final imputations of linked pairs.

August 24, 2015 Uncategorized Leave a comment

Recommendations for Record Linkage with LinkSolv and SQL Server

by mikemcglincy

SQL Server, SQL Server Management Studio, Windows Explorer, Office (c) Microsoft Corporation

Lock pages in memory – Enabled for SQL Server service
Perform volume maintenance tasks – Enabled for SQL Server service
Server authentication – Windows Authentication mode. LinkSolv also supports SQL Server Authentication, in which case UID and PWD parameters are saved in a LinkSolv table.
Follow guidelines for Maximum server memory in https://sqlserverperformance.wordpress.com/2009/10/29/suggested-max-memory-settings-for-sql-server-20052008/. This limit allows a memory cushion to do other things, like checking something in Windows Explorer or SQL Server Management Studio.
Recovery Model – Simple. LinkSolv has not been tested with other recovery models. After the database has been backed up, shrink the log file.
Create a separate database for each LinkSolv project. Database object names are not guaranteed unique outside a project.
LinkSolv creates tables, views, stored procedures, and scalar functions as part of each project. LinkSolv does not create or delete databases. Scalar functions are the same for every project and could by managed by Administrator.
Use ODBC Data Source Administrator in SySWoW, unless you have 64-bit Office, to create a System DSN named LINKSOLV for SQL Server on each client.
Use size of project folders on client for old projects to estimate database sizes and scale Memory and Maximum server memory accordingly.

August 22, 2015 Uncategorized Leave a comment

LinkSolv 9.0.0245 Update is Now Available

by mikemcglincy

Changes and Bug Fixes in LinkSolv 9.0.0245

Added new 1-1 algorithm Draw 1-1 Pairs. This allows drawing 1-1 pairs from all competing imputed pairs (with a common record) based on their likelihoods as part of the full Bayesian model. The old algorithm is still available: Take 1-1 Pairs always selects the one competing pair with maximum likelihood for 1-1 linkages.

Added Max Locks Per File estimate to Execute SQL method. This fixes intermittent error caused by low Max Locks Per File.

Raised error If Prior Error Prob <= 0 Or Prior Error Prob >= 1 in Initialize Match Probabilities. Prior error probabilities are drawn from distributions and might be outside (0, 1) in extreme cases.

Kept merged pairs with Match Probability < 0.90 in Limited Model. This allows confirmation that complete agreements might be under 0.90.

Corrected for Views with invalid IN DATABASE phrase that look like Procedures. This fixes intermittent error caused by moving LinkSolv to a new folder.

Corrected for Table Name AS Alias Name in SQL for Update Join. This allows LinkSolv to use join queries with aliases for better program clarity.

Corrected call to Excel.WorksheetFunction.NormInv(Random X, Mean, Std. Dev) in Draw Total Matches. This fixes incorrect draws that were always centered on the mean and within two standard deviations but not normally distributed.

Corrected call to Execute SQL (…As Is=True) in Create View On Server. This fixes intermittent error when creating a view on SQL Server.

Corrected typo in Count Rows (…MergedPairs0) in Imputation Property. This fixes incorrect estimates of Max Locks Per File.

Raised error if two consecutive draws are out of range in Draw Total Matches. In extreme cases, thousands of draws might be made and increase run times.

Corrected arguments passed to Copy Folder, Copy File, and Delete File in Copy Project. This allows copying projects without file warnings.

Corrected New Project Description in Copy Project. This allows opening the new project without warnings.

Added test for Excel subtypes in Restore Access Link. This restores links to .xlsx files.

Added test for ODBC link in JET project in Restore Access Link. This restores links to SQL Server tables.

Increased length of Link Name field to 50 in Source Links and Match Links tables. This allows longer names for data tables.

Called Requery Form in all Prepare Data Actions. This keeps the Link Tables tab current when working on Select Input and Standardize.

Called Update Release in Welcome Open Method. This updates information on the About tab.

Improved management of Connection Pool for Access databases. LinkSolv connects to many different Access databases because each one is limited to 2 GB size. This improvement reduces the time to connect and fixes intermittent errors caused by connection conflicts.

July 30, 2015 Uncategorized Leave a comment

LinkSolv 9.0.0190 is Now Available

by mikemcglincy

Microsoft ended support for Access 2003 on 4/8/2014 (Access (C) Microsoft Corporation). LinkSolv 9.0 uses the Access 2007-2013 database format (.accdb files). Old LinkSolv 8.3 projects that use the Access 2000-2003 database format (.mdb files) can be imported without conversion.

LinkSolv 9.0 is compatible with both 32-bit and 64-bit Microsoft Access. LinkSolv is still a 32-bit application so integer values are limited to -2,147,483,648 through +2,147,483,647. Record numbers and record counts cannot exceed 2,147,483,647.

LinkSolv 9.0 is compatible with Microsoft Windows (Windows (C) Microsoft Corporation) accessibility tools including on-screen keyboard, touch screen, magnifier, high-contrast display, keyboard shortcuts, narrator, and voice recognition.

The Match Pass object class in LinkSolv 9.0 has been redesigned to better encapsulate pass-specific properties and methods, an important object oriented design principle.

LinkSolv 9.0 includes algorithms for managing the pool of connections to different Access databases. The new algorithms improve performance and reduce the potential for database access conflicts, especially over networks.

LinkSolv 9.0 includes algorithms for estimating and setting “Max Locks Per File” before running each query. This reduces the potential for two common errors: “File Sharing Lock Count Exceeded. Increase MaxLocksPerFile Registry Entry” and “There is not enough memory or disk space to complete the operation.”

LinkSolv 9.0 fixes an intermittant bug. If a view (a SELECT query) refers to a non-existant table because a file was moved or renamed then Access classifies the view as a procedure. LinkSolv now checks both views and procedures to find a SELECT query.

LinkSolv 9.0 includes a new algorithm for merging passes for dual matches. As always, imputed pairs are randomly drawn from merged pairs by comparing estimated match probabilities to random numbers. So, 90% of 0.90 probability pairs, 50% of 0.50 probablity pairs, 10% of 0.10 probability pairs, etc. are selected as imputed pairs. Imputed pairs are not necessarily one-to-one. LinkSolv groups imputed pairs into sets by collecting pairs with the same UniqueID or same UniqueID_B. If Pairs to analyze for error rates and set members is set to “Take All Pairs” or “Take 1-1 Pairs” then the pair in a group with the greatest probability is given status LP (Linked Pair) and other pairs in the same group are given status IP (Imputed Pair).

If Pairs to analyze for error rates and set members is set to “Draw 1-1 Pairs” then the new algorithm is used. First, all possible one-to-one permutations are identified for each set. Second, the posterior probability that each permutation is true is calculated. Third, one permutation is drawn at random given the cumulative probability distribution for all permutations. In this way, one-to-one pairs are drawn as part of the Bayesian model, the preferred approach. Every pair in each drawn permutation is given status LP and other pairs are given status IP. Sometimes the new results can be quite different.

For example, suppose the record pairs in a set consist of 2 A records and 2 B records like this:

Pair        Match Probability Pair Status
(A1, B1) 0.90                      LP
(A1, B2) 0.80                     IP
(A2, B1) 0.50                     IP
(A2, B2) 0.10                     LP

Baring large swings in probabilities, the above status is applied in all Markov Chain iterations.

There are 2 one-to-one permutations:

Permutation         Permutation Likelihood Permutation Probability
(A1, B1) (A2, B2) 0.90 x 0.10 = 0.09          0.18
(A1, B2) (A2, B1) 0.80 x 0.50 = 0.40          0.82

The old status is applied in about 18% of Markov Chain iterations:

Pair        Match Probability Pair Status
(A1, B1) 0.90                     LP
(A1, B2) 0.80                     IP
(A2, B1) 0.50                     IP
(A2, B2) 0.10                      LP

This new status is applied in about 82% of Markov Chain iterations:

Pair        Match Probability Pair Status
(A1, B1) 0.90                    IP
(A1, B2) 0.80                      LP
(A2, B1) 0.50                      LP
(A2, B2) 0.10                      IP

July 22, 2015 Uncategorized 1 Comment

Estimate Combined P Value for Multiply Imputed Logistic Regression

by mikemcglincy

Following Statistical Analysis with Missing Data, Little and Rubin, Wiley 2002,

to estimate combined P value for multiply imputed logistic regression given parameter estimates and covariance matrix for each imputation.

Equation 10.17 gives an estimated Wald statistic W for k>1 parameter components that can be compared to an F random variable to get a P value as Probability[F > W].

We need to calculate trace(B inverse(V bar)) for 10.17, where B is the Between Imputation Variance and V bar is the average of covariance matrices for all imputations.

Equation (10.13) is for a scalar parameter, D draws (imputations), where

( 1 / (D-1)) Sum d=1 to D (theta^d – theta^)**2 is identified as B

THIS EQUATION HAS A TYPO! The second theta^ should be theta bar, the average over all imputations. So,

B = ( 1 / (D-1)) Sum d=1 to D (theta^d – theta_)**2

The last sentence in the paragraph with equations 10.13 and 10.14 is

“For vector theta, the variance Vd is replaced by a covariance matrix, and (theta^d – theta bar)**2 is replaced by (theta^d – theta bar) transpose(theta^d – theta bar).”

(theta^d – theta bar) transpose(theta^d – theta bar) = ||(theta^d – theta bar)||, a scalar.

B = ( 1 / (D-1)) Sum d=1 to D (theta^d – theta bar) transpose(theta^d – theta bar)

Suppose B = 3 and

inverse(V bar) = |0.1 0.3|

|0.3 0.2|

then B inverse(V bar) = 3 |0.1 0.3| = |0.3 0.9|

|0.3 0.2| |0.9 0.6|

trace B inverse(V bar) = 0.3 + 0.6

Strategic Matching

Using LinkSolv software for record linkage

Category Archives: Uncategorized

LinkSolv 9.0.0408 Now Available

Summary of Recent Changes

Build Build Date Comment

LinkSolv 9.0.0404 Now Available

Summary of Recent Changes

Build Build Date Comment

Update Instructions

How to Modify PINYIN Names for SOUNDEX Coding and Record Linkage

How to Estimate Sensitivity and Specificity for a LinkSolv Model

How to Derive Collision Type from Reported Crash Events

Models for Merging Passes and Creating Linked Datasets

Recommendations for Record Linkage with LinkSolv and SQL Server

LinkSolv 9.0.0245 Update is Now Available

LinkSolv 9.0.0190 is Now Available

Estimate Combined P Value for Multiply Imputed Logistic Regression