Multiple Imputation with Markov Chain Monte Carlo

You can continue MCMC after the last complete imputation instead of restarting from scratch. For example, the Markov Chain may not have converged well after 5 iterations, say, and you want to run 5 more. Or, LinkSolv or Access may have been stopped for any reason before completing all of the imputations you specified.

When you click on the Merge Passes button in the Perform Match dialog, LinkSolv looks for an earlier merge. If there is, you might be given a choice to Continue or Restart. You must have the Confirm Messages Option = “Y” to get the choice. Otherwise, LinkSolv Restarts by default.

Possible Actions by LinkSolv:

  • Last Complete Imputation = 0 And Last Complete Iteration = 0 — Nothing is complete. Restart from scratch.
  • Last Complete Imputation = Number of Imputations and Last Complete Iteration = Number of Iterations — everything is complete. Restart from scratch.
  • Last Complete Imputation > Number of Imputations or Last Complete Iteration > Number of Iterations — everything is complete. Restart from scratch. This happens if you reduce the Number of Imputations or Number of Iterations from a prior run.
  • Last Complete Imputation = 0 And Last Complete Iteration > 0 — Continue Imputation 1 with next Iteration.
  • Last Complete Imputation > 0 And Last Complete Iteration >= 0 — Continue with the next Imputation at Iteration 0.
Advertisements

LinkSolv 8.3.0855 is now available — 1

LinkSolv 8.3.0855 has an improved algorithm for agreements within a tolerance. Ideally, we would compare every value in a frequency table for Source A with every value in the corresponding frequency table for Source B to find which values agree, given a specified comparison method and tolerance. This is called a CROSS JOIN between the two tables. Probabilities for all agreeing values in frequency table B are summed (‘Sum Prob B’) for each value in frequency table A for match probability calculations. However, CROSS JOINs can take very long to run even if the database software doesn’t complain:  365 Dates X 365 Dates = 133,225 comparisons; 1,440 Times X 1,440 Times = 2,073,600 comparisons; 36,500 Birth Dates X 36,500 Birth Dates = 1,332,250,000 comparisons.

In the old algorithm, LinkSolv first compared all values in frequency table A against all values in frequency table B for EXACT equality (this is not a CROSS JOIN so relatively quick). Second, all values in frequency table A were compared against a subset of frequency table B for additional agreements that were not EXACT. The subset was chosen so that the number of comparisons was less than 10,000,000 and included as many of the most common B values as possible — these are the values that make the greatest contribution to Sum Prob B.

For some distributions, the 10,000,000 comparisons limit resulted in small subsets. For frequency tables with, say, about 300,000 different names the subset of B would include only about 33 of the most common names (300,000 X 33 = 9,900,000 comparisons). This happens to work well for PREFIX comparisons of names to initials because many initials show up in the top 33 values, but other situations are not as lucky.

Simply increasing the maximum number of comparisons to 20,000,000 or 40,000,000 was not feasible — 10,000,000 comparisons already take a fairly long time on many computers and 20,000,000 would takes for than twice as long. Also, when Microsoft Windows or Access runs out of system resources and gives up it is usually after hours of chugging along.

In the new algorithm, LinkSolv runs several CROSS JOINs successively in a loop, each comparing a subset of values in frequency table B to all values in frequency table A. Subsets are chosen so that the number of comparisons in each CROSS JOIN is less than 4,000,000 and takes a few minutes to run. Also, subsets are chosen so that values with highest probabilities that have not yet been compared are selected first. LinkSolv limits total run time for all CROSS JOINs to about 20 minutes — about 10 CROSS JOINs. LinkSolv also keeps track of the corresponding fraction of all B records compared. For example, if the most common B value had probability 0.10, the second had 0.05, and the third had 0.01 then comparing just these three values would be equivalent to comparing 16% (10% + 5% + 1%) of all B records.

If not all B values can be compared in 20 minutes then LinkSolv adds an average contribution to Sum Prob B for all uncompared values. The average contribution is based on statistics collected for all CROSS JOINs that were run.

One way to break LinkSolv

Recently I was installing Microsoft updates on a Windows Vista/Server 2008 x64 test platform. One of the updates broke LinkSolv by breaking Access.
Microsoft, Windows, Access, and .Net are trademarks or registered trademarks of Microsoft Corporation.

Log entry copied from Windows Event Viewer:

Log Name:      Application
Source:        SideBySide
Date:          4/20/2014 5:39:04 AM
Event ID:      33
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      HV-7
Description:
Activation context generation failed for “C:\PROGRA~2\COMMON~1\MICROS~1\OFFICE12\ACEOLEDB.DLL”. Dependent Assembly Microsoft.VC90.CRT,processorArchitecture=”x86″,publicKeyToken=”1fc8b3b9a1e18e3b”,type=”win32″,version=”9.0.30729.1″ could not be found.
Please use sxstrace.exe for detailed diagnosis.

How it’s supposed to work:

Copied from Microsoft Developer Network: A Windows side-by-side assembly is described by manifests. A side-by-side assembly contains a collection of resources—a group of DLLs, Windows classes, COM servers, type libraries, or interfaces—that are always provided to applications together. These are described in the assembly manifest… Starting with Windows XP, multiple versions of side-by-side assemblies can be used by applications running at the same time.

How it did not work:

I found that installing an update for Microsoft .Net Framework broke the side-by-side information for Access. ACEOLEDB.DLL was still in the correct folder but Access couldn’t find it.

How I fixed it:

There are several possible fixes described on various web sites, including editing manifests, editing Windows Registry values, repairing Office, and rebuilding MSACCESS.EXE. I did the last step by

copying ACEOLEDB.DLL to the same folder as MSACCESS.EXE without changing anything else. To recreate MSACCESS.EXE using the new DLL, rename the old file and then open Access.

LinkSolv and SQL Server

I’ve been re-working LinkSolv’s use of SQL Server.

SQL Server Express ((c) Microsoft Corporation)

Some of my development and test platforms still used the Microsoft SQL Server Data Engine 2000 (MSDE 2000). All development and test platforms now use either SQL Server Express 2005, 2008, or 2012, and I test cross-platform compatibility.

SQL Server Express is a free version of SQL Server with limited file size. For example, SQL Server Express 2012 allows one 10GB database file per database, while SQL Server 2012 allows about 32,000 16TB database files per database.

LinkSolv creates all tables in the same SQL Server database. To avoid name conflicts, create a different SQL Server database for each LinkSolv project. Give the database the same name as the project. Copy your raw data tables to this database before starting the LinkSolv project. LinkSolv connects to SQL Server using an ODBC System DSN named LINKSOLV that you must create on your computer. If you have 64-bit Windows then use the ODBC Data Sources utility C:\Windows\SYSWoW64\odbcad32.exe, not C:\Windows\System32\odbcad32.exe. BTW — WoW64 means Windows on Windows 64.

SQL Server Management Studio ((c) Microsoft Corporation)

Access 2013 does not support Access Data Projects (*.adp files). I removed ADP references from all LinkSolv programs for cross-platform compatibility.

Before, I used ADP files to manage SQL Server databases and to test LinkSolv. For example, if you create an ADP file in Access 2010 and select File >> Info >> Server you can do the following tasks:

Connection — Open the Data Link Properties dialog box.

Server Properties — Display the Server Properties dialog box.

Link Tables — Link tables from another SQL Server database into this database.

Back Up SQL Database — Create a backup of the current database.

Restore SQL Database – Restore a Microsoft SQL Server database from a backup copy.

Transfer Database — Transfer a database from one server to another.

Copy Database File — Copy a database top another server.

Drop SQL Database — Delete a SQL Server database.

Set Logon Password — Set or reset your logon password.

Now, I use SQL Server Management Studio for these tasks and for testing LinkSolv on all platforms.

CHOOSE and IIF

SQL Server 2012 has two new logic function: CHOOSE and IIF. Access has always had these functions. LinkSolv converts most built-in Access functions to equivalent built-in SQL Server functions for better performance. For example, LinkSolv converts Access IIF statements to SQL Server CASE statements — which is the same way the SQL Server query optimizer handles the new IIF function. There was no simple conversion for Access CHOOSE statements so I wrote a CHOOSE function for SQL Server using Transact SQL. Now I test the SQL Server version before creating my own CHOOSE function.

SOFTWARE LICENSE AND SUPPORT AGREEMENT — 2

SOFTWARE LICENSE AND SUPPORT AGREEMENT
© 2013 Strategic Matching, Inc.
Schedule A — License and Support Fees
Effective October 24, 2013

1. Software License Fees
Item Fee
First LinkSolv Software License for Licensee $3,000.00
Second Software License for the same Licensee $1,500.00
Third or fourth Software License for the same Licensee $750.00

2. Technical Support Fees
Item Fee
Technical support and software upgrades for one year $1,680.00
Technical support and upgrades for second license $840.00
Technical support and upgrades for third or fourth license $420.00

3. Training Fees
Item Fee
Installation and startup training via online sessions No charge
8 hours of custom training via online sessions $840.00
8 hours of custom onsite training $1,680.00*
*Plus travel expenses.

4. Consultation Fees
8 hours of consultation by phone, email, or online sessions $840.00
8 hours of onsite consultation $1,680.00*
*Plus travel expenses.

SOFTWARE LICENSE AND SUPPORT AGREEMENT
© 2013 Strategic Matching, Inc.
Schedule B — Technical Support Policy
Effective October 24, 2013

1. Support services provided by Strategic Matching under the terms and conditions of this agreement will be limited to issues relating to current versions of LinkSolv Software only, including installation, setup, troubleshooting, and operational information for versions 8.1.8432, 8.1.9219, 8.2.9964, and 8.3.0744.
2. Strategic Matching will use reasonable efforts to provide support services in a timely and effective manner by, among other things, (i) accommodating Licensee’s work schedule, (ii) conducting phone or email dialogs with Licensee, (iii) conducting remote desktop and web conference sessions to evaluate Licensee’s installation, and (iv) providing web transfer of large files.
3. Strategic Matching expects Licensee’s cooperation when investigating and resolving issues relating to LinkSolv Software.
4. Strategic Matching will determine the methods, details, and means of performing the above-described services.
5. LinkSolv Software is supported only for the following versions of software from Microsoft Corporation (Windows and Office are registered trademarks of Microsoft Corporation):
– Windows XP, Service Pack 3
– Windows Vista, Service Pack 2
– Windows 7, Service Pack 1
– Windows 8
– Windows 2003 Server, Service Pack 2
– Windows 2008 Server, Service Pack 2
– Windows 2008 Server R2, Service Pack 1
– Office XP, Service Pack 3
– Office 2003, Service Pack 3
– Office 2007, Service Pack 3
– Office 2010, Service Pack 1
– Office 2013

Archive and Recover Projects

Archive a Project. Data for LinkSolv Projects are stored in two places. Project-specific data tables created by LinkSolv are stored in multiple databases in a project folder, such as C:\Linkage Projects\CrashTest83. Project management data for all current projects are kept in one database installed as part of LinkSolv, C:\Program Files\Strategic Matching\LinkSolv 8.3\STMTData.mdb. To archive a project you must archive both STMTData.mdb and all of the databases in the project folder.

Recover a Project to Original Location. Copy all databases archived from the project folder to the original folder location. Copy the archived STMTData.mdb to the LinkSolv installation folder as STMTDataArchive.mdb. Open LinkSolv and click on the Import Project command button. Select any project from the list of archived projects to import. Importing a project only recovers project specifications — project data are not updated in any way. If you want to apply the latest LinkSolv algorithms then you must rerun the project.

Recover a Project to New Location. LinkSolv stores location information in tables in STMTData.mdb. To recover an archived project to a new location, copy all databases archived from an old project folder to a new folder location with original names. Copy the archived STMTData.mdb to the LinkSolv installation folder as STMTDataArchive.mdb. Change path values for your project to the new location in tables in STMTDataArchive.mdb:  Link Description in Match Links, Query Description in Match Queries, Project Home in Projects, Link Description in Source Links, and Link Description in Triple Match Links. Open LinkSolv and click on the Import Project command button. Select one project from the list of archived projects to import.

Recover an Obsolete Project. Project management tables in STMTData.mdb for LinkSolv 8.1 or 8.2 are not compatible as-is for importing into LinkSolv 8.3. However, queries to make necessary changes are available and I can help you run them. It is possible to recover most projects specifications from project-specific databases even if you did not archive a copy of STMTData.mdb, but it is much more work.