LinkSolv 9.0.0245 Update is Now Available

Changes and Bug Fixes in LinkSolv 9.0.0245

Added new 1-1 algorithm Draw 1-1 Pairs. This allows drawing 1-1 pairs from all competing imputed pairs (with a common record) based on their likelihoods as part of the full Bayesian model. The old algorithm is still available:  Take 1-1 Pairs always selects the one competing pair with maximum likelihood for 1-1 linkages.

Added Max Locks Per File estimate to Execute SQL method. This fixes intermittent error caused by low Max Locks Per File.

Raised error If Prior Error Prob <= 0 Or Prior Error Prob >= 1 in Initialize Match Probabilities. Prior error probabilities are drawn from distributions and might be outside (0, 1) in extreme cases.

Kept merged pairs with Match Probability < 0.90 in Limited Model. This allows confirmation that complete agreements might be under 0.90.

Corrected for Views with invalid IN DATABASE phrase that look like Procedures. This fixes intermittent error caused by moving LinkSolv to a new folder.

Corrected for Table Name AS Alias Name in SQL for Update Join. This allows LinkSolv to use join queries with aliases for better program clarity.

Corrected call to Excel.WorksheetFunction.NormInv(Random X, Mean, Std. Dev) in Draw Total Matches. This fixes incorrect draws that were always centered on the mean and within two standard deviations but not normally distributed.

Corrected call to Execute SQL (…As Is=True) in Create View On Server. This fixes intermittent error when creating a view on SQL Server.

Corrected typo in Count Rows (…MergedPairs0) in Imputation Property. This fixes incorrect estimates of Max Locks Per File.

Raised error if two consecutive draws are out of range in Draw Total Matches. In extreme cases, thousands of draws might be made and increase run times.

Corrected arguments passed to Copy Folder, Copy File, and Delete File in Copy Project. This allows copying projects without file warnings.

Corrected New Project Description in Copy Project. This allows opening the new project without warnings.

Added test for Excel subtypes in Restore Access Link. This restores links to .xlsx files.

Added test for ODBC link in JET project in Restore Access Link. This restores links to SQL Server tables.

Increased length of Link Name field to 50 in Source Links and Match Links tables. This allows longer names for data tables.

Called Requery Form in all Prepare Data Actions. This keeps the Link Tables tab current when working on Select Input and Standardize.

Called Update Release in Welcome Open Method. This updates information on the About tab.

Improved management of Connection Pool for Access databases. LinkSolv connects to many different Access databases because each one is limited to 2 GB size. This improvement reduces the time to connect and fixes intermittent errors caused by connection conflicts.

Advertisements

LinkSolv 9.0.0190 is Now Available

Microsoft ended support for Access 2003 on 4/8/2014 (Access (C) Microsoft Corporation). LinkSolv 9.0 uses the Access 2007-2013 database format (.accdb files). Old LinkSolv 8.3 projects that use the Access 2000-2003 database format (.mdb files) can be imported without conversion.

LinkSolv 9.0 is compatible with both 32-bit and 64-bit Microsoft Access. LinkSolv is still a 32-bit application so integer values are limited to -2,147,483,648 through +2,147,483,647. Record numbers and record counts cannot exceed 2,147,483,647.

LinkSolv 9.0 is compatible with Microsoft Windows (Windows (C) Microsoft Corporation) accessibility tools including on-screen keyboard, touch screen, magnifier, high-contrast display, keyboard shortcuts, narrator, and voice recognition.

The Match Pass object class in LinkSolv 9.0 has been redesigned to better encapsulate pass-specific properties and methods, an important object oriented design principle.

LinkSolv 9.0 includes algorithms for managing the pool of connections to different Access databases. The new algorithms improve performance and reduce the potential for database access conflicts, especially over networks.

LinkSolv 9.0 includes algorithms for estimating and setting “Max Locks Per File” before running each query. This reduces the potential for two common errors:  “File Sharing Lock Count Exceeded. Increase MaxLocksPerFile Registry Entry” and “There is not enough memory or disk space to complete the operation.”

LinkSolv 9.0 fixes an intermittant bug. If a view (a SELECT query) refers to a non-existant table because a file was moved or renamed then Access classifies the view as a procedure. LinkSolv now checks both views and procedures to find a SELECT query.

LinkSolv 9.0 includes a new algorithm for merging passes for dual matches. As always, imputed pairs are randomly drawn from merged pairs by comparing estimated match probabilities to random numbers. So, 90% of 0.90 probability pairs, 50% of 0.50 probablity pairs, 10% of 0.10 probability pairs, etc. are selected as imputed pairs. Imputed pairs are not necessarily one-to-one. LinkSolv groups imputed pairs into sets by collecting pairs with the same UniqueID or same UniqueID_B. If Pairs to analyze for error rates and set members is set to “Take All Pairs” or “Take 1-1 Pairs” then the pair in a group with the greatest probability is given status LP (Linked Pair) and other pairs in the same group are given status IP (Imputed Pair).

If Pairs to analyze for error rates and set members is set to “Draw 1-1 Pairs” then the new algorithm is used. First, all possible one-to-one permutations are identified for each set. Second, the posterior probability that each permutation is true is calculated. Third, one permutation is drawn at random given the cumulative probability distribution for all permutations. In this way, one-to-one pairs are drawn as part of the Bayesian model, the preferred approach. Every pair in each drawn permutation is given status LP and other pairs are given status IP. Sometimes the new results can be quite different.

For example, suppose the record pairs in a set consist of 2 A records and 2 B records like this:

Pair        Match Probability Pair Status
(A1, B1) 0.90                      LP
(A1, B2) 0.80                      IP
(A2, B1) 0.50                      IP
(A2, B2) 0.10                      LP

Baring large swings in probabilities, the above status is applied in all Markov Chain iterations.

There are 2 one-to-one permutations:

Permutation         Permutation Likelihood  Permutation Probability
(A1, B1) (A2, B2) 0.90 x 0.10 = 0.09          0.18
(A1, B2) (A2, B1) 0.80 x 0.50 = 0.40          0.82

The old status is applied in about 18% of Markov Chain iterations:

Pair        Match Probability Pair Status
(A1, B1) 0.90                      LP
(A1, B2) 0.80                      IP
(A2, B1) 0.50                      IP
(A2, B2) 0.10                      LP

This new status is applied in about 82% of Markov Chain iterations:

Pair        Match Probability Pair Status
(A1, B1) 0.90                      IP
(A1, B2) 0.80                      LP
(A2, B1) 0.50                      LP
(A2, B2) 0.10                      IP

Estimate Combined P Value for Multiply Imputed Logistic Regression

Following Statistical Analysis with Missing Data, Little and Rubin, Wiley 2002,

to estimate combined P value for multiply imputed logistic regression given parameter estimates and covariance matrix for each imputation.

Equation 10.17 gives an estimated Wald statistic W for k>1 parameter components that can be compared to an F random variable to get a P value as Probability[F > W].

We need to calculate trace(B inverse(V bar)) for 10.17, where B is the Between Imputation Variance and V bar is the average of covariance matrices for all imputations.

Equation (10.13) is for a scalar parameter, D draws (imputations), where

( 1 / (D-1)) Sum d=1 to D (theta^d – theta^)**2 is identified as B

THIS EQUATION HAS A TYPO! The second theta^ should be theta bar, the average over all imputations. So,

 B = ( 1 / (D-1)) Sum d=1 to D (theta^d – theta_)**2

 The last sentence in the paragraph with equations 10.13 and 10.14 is

 “For vector theta, the variance Vd is replaced by a covariance matrix, and (theta^d – theta bar)**2 is replaced by (theta^d – theta bar) transpose(theta^d – theta bar).”

(theta^d – theta bar) transpose(theta^d – theta bar) = ||(theta^d – theta bar)||, a scalar.

 B = ( 1 / (D-1)) Sum d=1 to D (theta^d – theta bar) transpose(theta^d – theta bar)

Suppose B = 3 and

inverse(V bar) = |0.1 0.3|

                             |0.3 0.2|

 then B inverse(V bar)  = 3 |0.1 0.3| = |0.3 0.9|

                                               |0.3 0.2|    |0.9 0.6|

 trace  B inverse(V bar) = 0.3 + 0.6

 

Dependent Outcomes

I add match weights for each match field to get a combined match weight for each record pair. The combined weight is accurate only if comparison outcomes (agreements and disagreements) are independent on both matched pairs and unmatched pairs. Sometimes this is not the case so I have to test for dependent outcomes and correct the combined weights if I find them. 

Given linked pairs from Imputation 1 as a sample of all matched pairs, I count pairs for the following 2X2 contingency table to test whether outcomes are independent: 

Field 1 Disagrees and Field 2 Disagrees  |  Field 1 Disagrees and Field 2 Agrees

Field 1 Agrees and Field 2 Disagrees       |  Field 1 Agrees and Field 2 Agrees

Usually, Agree/Agree has the greatest cell count for matched pairs.

I use Change In Deviance from logistic regression models rather than Chi Squared to test significance because it is more robust in case there are low cell counts. Either field can be taken as the independent variable for testing. 

I follow Hosmer and Lemeshow. Applied Logistic Regression (John Wiley & Sons, 2000).

Chapter 1. Introduction to the Logistic Regression Model. 

    Likelihood Ratio Test   

    ChangeInDeviance = NullDeviance – FittedDeviance   

    LikelihoodRatioPValue = NumericalRecipes.ChiSquarePValue(ChangeInDeviance, 1)

Any single imputation can be biased so I average cell counts for all imputations to get unbiased results for all  matched pairs. 

I draw about 1,000,000 random pairs as a sample of unmatched pairs to get corresponding cell counts and test for dependency. The count is not exactly 1,000,000 because some pairs have missing values — neither agreement nor disagreement.

Usually, Disagree/Disagree has the greatest cell count for unmatched pairs. 

I take LikelihoodRatioPValue < 0.05 as significant. 

I estimate X factors, similar to Uncertainty Coefficients for entropy, as measures the strength of significant associations. That is, how much must combined match weights change to correct for dependency? 

For entropies H and uncertainty coefficient U, 

H(x,y) = (H(x) + H(y)) (1 – U(x,y)/2) 

For match weights W and X factor X, 

W(x,y) = (W(x) + W(y)) X(x,y)

W(x), W(y), and W(x,y) can all be estimated from the contingency cell counts for matched and unmatched pairs. There are different X factors for each contingency cell. 

I follow Press, Teukolsky, Vetterling & Flannery. Numerical Recipes – The art of scientific computing (Cambridge University Press, 1992). 

14.4 Contingency Table Analysis of Two [multinomial] Distributions 

Measures of association based on Entropy 

Here, H(x) is the entropy of x, pij are cell probabilities, pi. and p.j are marginal probabilities.

H(x) = – Sum pi. log base 2 (pi.), H(y) = – Sum p.j log base 2 (p.j), H(x,y) = – Sum pij log base 2 (pij),

U(y|x) defined as (H(y) – H(y|x)) / H(y),  the fraction of y’s entropy lost if x is already known. 

U(x|y) defined as (H(x) – H(x|y)) / H(x),  the fraction of x’s entropy lost if y is already known. 

U(x,y) defined so that H(x,y) = (H(x) + H(y)) (1 – U(x,y)/2). 

Match weights are not entropies because of negative information from noisy outcomes:   match fields sometimes agree on unmatched pairs or disagree on matched pairs. Negative information from noise can be taken into account in entropy calculations.  

I follow Stanford Goldman. Information Theory (Dover Publications, 1968). 

1.9 The transmission of quantized information in the presence of noise 

Let m be the probability of match field agreement on matched pairs with m < 1 because of data errors.

Let u be the probability of match field agreement on unmatched pairs with u > 0 because of agreement by chance.

Then the information agreement on x contributes to the match is 

H(x|noise) = -log base 2(1 / m) – (-log base 2(1 / u)) = log base 2(m/u) = W(x)

 

Bug Reports — Trauma Centers, Continue Merging, Standard Copy

Trauma Centers — The list of hospitals when you use LinkSolv to create fake data might contain Kentucky trauma centers. CORRECTED Build 861 dated 17-Jul-2014. This was caused by a programming error in Build 703 dated 31-May-2013 when I changed the simulation program to use both Emergency and Trauma provider types in the simulation. The reason for the change was to allow comparison of the number of patients taken to trauma centers in fake data with real data. Remember that the simulation program always transports patients to the closest hospital. In the future, it might be possible to give trauma centers priority in certain cases.

Continue Merging — You can continue with Perform Match >> Merge Passes without rerunning earlier Imputations but some earlier summary counts such as merged pairs per decile might not be correctly recalculated. CORRECTED Build 871 dated 22-Jul-2014.This capability was added in Build 622 dated 28-Sep-2012. You might want to do this to increase the number of Imputations or Iterations after finding poor MCMC convergence of parameter estimates. You also might want to continue merging if LinkSolv halted for any reason before completing the specified number of Imputations and Iterations.

Standard Copy — StdCopy copies raw data values without changing them except that non-text values are converted to text. All standardized values are text so that they can be compared in queries. If you enter a comma separated list of values as the Second Input for StdCopy then all values in the list will be converted to missing (NULL values). This is important because Unknown vs. Unknown should not count as agreement and Unknown vs. New York, say, should not count as disagreement. Build 704 dated 01-Jun-2013 changed StdCopy to make SQL commands that were compatible with SQL Server. A programming error caused errors in Access if Second Input was used and a raw value was NULL. CORRECTED Build 876 dated 04-Sep-2014.

LinkSolv and IVEware

Unattributed quotes are from the IVEware website http://www.isr.umich.edu/src/smp/ive/:

“IVEware is Imputation and Variance Estimation Software developed by the Researchers at the Survey Methodology Program, Survey Research Center, Institute for Social Research, University of Michigan.”

IVEware requires SAS Statistical Analysis Software (c) SAS Institute. Input datasets for IVEware must be SAS datasets. SRCware is a limited version of IVEware that does not require SAS. Input datasets for SRCware must be delimited text files. Both IVEware and SRCware licenses are free.

I tested SRCware with linked datasets from LinkSolv. I created delimited text files by exporting the usual JOIN queries from a LinkSolv project database. I saved a permanent export format and used it to create a separate text file for each imputed linkage in the DATAIN folder for SRCware. I also created a similar file for the MLE linked dataset. As far as I know, special fields like imputation or _imputation_ are not used. I used the SRCware METADATA dialog to convert each of my text files into SRCware files. All converted files were much larger. 11,339 KB as delimited txt became 16,691 KB as a SRCware dat file. I found no alternative to converting my text files one at a time using the METADATA dialog even though all specifications were identical. Other SRCware dialogs allow you to paste saved scripts.

“IMPUTE [module] uses a multivariate sequential regression approach for multiply imputing item missing values in a data set.”

Only single converted DATAIN files can be used as input to the IMPUTE module. With PROC MI, all LinkSolv imputations can be in one table and PROC MI can be run BY IMPUTATION, so this might be the case for the %IMPUTE SAS macro. I found it much easier to set up an imputation model with categorical variables than with PROC MI. Once I had created a script to IMPUTE missing values for one LinkSolv imputation it was easy to cut and paste it for the others. I did not find any scripts saved in SRCware so I saved them externally as text files. SRCware imputed datasets are created in the DATAOUT folder either as a single file or one file per imputation. Imputations are referred to as either Imputations or Multiples depending on the context. Imputations can also refer to each regression run by SRCware to impute one variable with missing data.

“DESCRIBE [module] estimates the population means, proportions, subgroup differences, contrasts and linear combinations of means and proportions. For complex surveys, the Taylor Series approach is used to obtain variance estimates. The item missing values can be multiply imputed for the variables while performing the analysis.” If you IMPUTE while performing DESCRIBE only the variables analyzed in DESCRIBE are imputed. This may not be the best imputation model for your data.

The IVEWare manual says, “To perform multiple imputation analysis, more than one SAS data file can follow the DATAIN keyword in the DESCRIBE module. When multiple data sets are specified each is analyzed separately and the inferences–estimates and variances–are combined (Rubin 1987b).”

“REGRESS [module] fits linear, logistic, polytomous, Poisson, Tobit and proportional hazard regression models. The Jackknife Repeated Replication (JRR) approach is used to estimate the sampling variances for complex survey data. The item missing values may be multiply imputed while performing the regression analysis.”

The IVEWare manual says, “More than one SAS data file can follow the DATAIN keyword in the REGRESS module. The use of more than one data file is restricted to the analysis of a multiply imputed data file. When multiple data sets are specified each is analyzed separately and the inferences–estimates and variances–are combined (Rubin 1987b).”

I was not able to get SRCware to recognize multiple DATAIN files for DESCRIBE or REGRESS (see below). Instead, I copied my .dat and .met files from DATAIN to DATAOUT and analyzed them from there with both DESCRIBE and REGRESS without problems. Or, I could have imputed one complete dataset for each LinkSolv imputation to get files in DATAOUT. The IMPUTE module appears to use a single chain of iterations to get multiple imputations. This might mean changing the Random Number Seed to get independent missing value imputations for LinkSolv linkage imputations. I didn’t test this.

Here is SRCware OUTPUT for a logistic regression of fake data:  TYPE of Hospital Treatment for crash victims (ED or IN) predicted by police reported seatbelt use (SAFETY) and injury severity (KABCO):

IVEware Setup Checker, Thu Aug 14 12:26:49 2014                                  1

Setup listing:

datain CMOD_LP CMOD_LP2 CMOD_LP3 CMOD_LP4 CMOD_LP5;
dependent Type;
predictor Safety KABCO;
categorical Safety KABCO;
LINK LOGISTIC;
MDATA SKIP;
PRINT DETAILS;
TITLE LinkSolv LP1 to LP5;
run;

IVEware Jackknife Regression Procedure, Thu Aug 14 12:26:49 2014                 1

LinkSolv LP1 to LP5

Regression type:        Logistic
Dependent variable:     Type 
Predictors:             Safety 
                        KABCO 
Cat. var. ref. codes:   Safety  Y      
                        KABCO  O      
                        Type  IN     

Imputation 1

Valid cases               5888

Degr freedom              5882

-2 LogLike         4014.545188

Variable              Estimate         Std Error         Wald test        Prob > Chi
Intercept            1.9895419         0.3778466          27.72524           0.00000
Safety              -0.0371723         0.0984730           0.14250           0.70581
KABCO.A             -0.0912491         0.3960102           0.05309           0.81776
KABCO.B              0.0480942         0.3844964           0.01565           0.90046
KABCO.C              0.2336027         0.3815882           0.37477           0.54042
KABCO.K              0.1122288         0.5331188           0.04432           0.83327

Variable                  Odds             95% Confidence Interval
                         Ratio             Lower             Upper
Intercept  
Safety               0.9635101         0.7943630         1.1686746
KABCO.A              0.9127903         0.4199702         1.9839172
KABCO.B              1.0492695         0.4937841         2.2296517
KABCO.C              1.2631426         0.5978309         2.6688636
KABCO.K              1.1187688         0.3934208         3.1814377

IVEware Jackknife Regression Procedure, Thu Aug 14 12:26:50 2014                                                       2

LinkSolv LP1 to LP5

Covariance of Estimates

                     Intercept            Safety           KABCO.A           KABCO.B           KABCO.C           KABCO.K
   Intercept       0.142768017   -0.002252033584     -0.1417211472     -0.1420926543     -0.1423574392     -0.1414074213
      Safety   -0.002252033584    0.009696935099   -0.002255638414  -0.0006559817325   0.0004841441046   -0.003606497235
     KABCO.A     -0.1417211472   -0.002255638414      0.1568241015       0.142397591      0.1421323822      0.1430839208
     KABCO.B     -0.1420926543  -0.0006559817325       0.142397591      0.1478374909      0.1422122492      0.1424889743
     KABCO.C     -0.1423574392   0.0004841441046      0.1421323822      0.1422122492      0.1456095533      0.1420649372
     KABCO.K     -0.1414074213   -0.003606497235      0.1430839208      0.1424889743      0.1420649372      0.2842156088

Imputation 2 …

Imputation 3 …

Imputation 4 …

Imputation 5 …

IVEware Jackknife Regression Procedure, Thu Aug 14 12:26:50 2014                11

LinkSolv LP1 to LP5

All imputations

Valid cases             5878.6

Degr freedom        156.300454

-2 LogLike         3981.126462

Variable              Estimate         Std Error         Wald test        Prob > Chi
Intercept            2.1490963         0.4310275          24.86004           0.00000
Safety              -0.0463743         0.0990113           0.21937           0.63952
KABCO.A             -0.2488553         0.4471064           0.30979           0.57781
KABCO.B             -0.1020323         0.4370641           0.05450           0.81541
KABCO.C              0.0894600         0.4333471           0.04262           0.83645
KABCO.K             -0.0351887         0.5704009           0.00381           0.95081

Variable                  Odds             95% Confidence Interval
                         Ratio             Lower             Upper
Intercept  
Safety               0.9546845         0.7850979         1.1609031
KABCO.A              0.7796928         0.3223863         1.8856906
KABCO.B              0.9030004         0.3808515         2.1410175
KABCO.C              1.0935836         0.4646311         2.5739234
KABCO.K              0.9654233         0.3128986         2.9787355

IVEware Jackknife Regression Procedure, Thu Aug 14 12:26:50 2014                                                      12

LinkSolv LP1 to LP5

Covariance of Estimates

                     Intercept            Safety           KABCO.A           KABCO.B           KABCO.C           KABCO.K
   Intercept      0.1857847308    -0.00292855021     -0.1846385183     -0.1850508878     -0.1848255749     -0.1833826597
      Safety    -0.00292855021    0.009803229944   -0.001677674415 -9.548667416e-005    0.001109146047   -0.003134571861
     KABCO.A     -0.1846385183   -0.001677674415      0.1999041094      0.1854123466      0.1845216378      0.1851703385
     KABCO.B     -0.1850508878 -9.548667416e-005      0.1854123466      0.1910250559      0.1846617151      0.1845702783
     KABCO.C     -0.1848255749    0.001109146047      0.1845216378      0.1846617151      0.1877897414      0.1835397425
     KABCO.K     -0.1833826597   -0.003134571861      0.1851703385      0.1845702783      0.1835397425      0.3253571316

Maximum Likelihood Linked Datasets

Maximum Likelihood Linked Datasets with LinkSolv — What
datasets to post for public use?

Multiply imputed linked datasets are the best way to provide  unbiased analysis results and capture uncertainty about true link status but this  demands a lot from the public user. A single linked dataset consisting of only high probability links is relatively easy to prepare and analyze but provides  incomplete and probably biased frequency counts. A better choice for a single linked dataset would be any one of the imputed datasets generated by LinkSolv. The best choice for a single linked dataset is the imputed dataset that maximizes the likelihood of the linkage results. LinkSolv draws match probabilities and linked status from their Bayesian posterior distributions for all merged records pairs for every iteration of every imputation in the MCMC  process. For example, if you run 5 imputations with 10 iterations after burn-in, then LinkSolv draws 50 linked datasets, calculates the likelihood of each dataset, and saves the one with the maximum likelihood. This is how the likelihood is calculated:  Suppose merged pair 1 has match probability 0.91 and linked status Matched, pair 2 has match probability 0.12 and linked status Matched, pair 3 has match probability 0.13 and linked status Unmatched, pair 4 has match probability 0.84 and linked status Unmatched, etc.

Linked Status =  Matched, Matched, Unmatched, Unmatched, …

Probability Matched = 0.91, 0.12, 0.13, 0.84, …

Probability Unmatched = 1 – Probability Matched = 0.09, 0.88, 0.87, 0.16, …

The loglikelihood of all merged pairs for this  iteration is ln(0.91 * 0.12 * 0.87 * 0.16 * …) = ln(0.91) + ln(0.12) + ln(0.87)  + ln(0.16) + … LinkSolv’s maximum likelihood linked pairs algorithm was inspired by Maximum Likelihood Parameter Estimation for a coin toss, but the objectives are different. Likelihoods are calculated in the same way but for linked pairs we know probabilities for Linked status from Fellegi and Suntert know anything about the structure of the coin but we assume the probability for Heads is the same for every toss. Let p = probability of Heads, 1 – p = probability of Tails. Let trial outcomes be Heads, Heads, Tails, Tails, … for N trials. Then

likelihood(p) = p * p * (1 – p) * (1 – p) …

This likelihood can be written as a short formula that holds for all outcomes with H Heads, regardless of when they show up in the trials: 

likelihood(p) = p^H * (1 – p)^(N – H).

We can use calculus to find MLE because the derivative of likelihood = 0 at maximums and minimums. At maximum likelihood p:  d(likelihoood(p)) / dp = 0. d(likelihoood(p)) / dp = (H * p ^ (H – 1)) + ((N – H) * (1 – p)^(N – H – 1) * (-1)). Formulas are simpler for loglikelihood = ln(likelihoood(p)).

loglikelihood(p) = H * ln(p) + (N – H) * ln(1 – p).

loglikelihood has same maximum likelihood p where d(loglikelihood(p)) / dp = 0.

d(loglikelihood(p)) / dp = (H / p) + (-1) * (N – H) / (1 – p)) = 0.

Solve for p:  H / p = (N – H) / (1 – p), H * (1 – p) = (N – H) * p, H – H * p = N * p. p = H / N. That is, the maximum likelihood estimate of parameter p equals the observed proportion of Heads. For example, if H = 45 and N = 100 then p = 45 / 100 = 0.45. For a different algorithm more like LinkSolv that works even when you can’t take derivatives, we can compute the likelihood for several possible values for p and pick the one with the maximum (see table). Note that all loglikelihoods are negative so that the maximum is the least negative. We could also use Bayesian analysis and MCMC techniques to estimate parameters of any binomial or multinomial distributions as LinkSolv does for parameters of linkage models. This link has a short introduction http://www.ccs.neu.edu/home/rjw/csg220/lectures/MLE-vs-Bayes.pdf.

p

ln(p)

ln(1-p)

H

N-H

loglikelihood

 

0.1

-2.30259

-0.10536

45

55

-109.4111575

 

0.2

-1.60944

-0.22314

45

55

-84.69760138

 

0.3

-1.20397

-0.35667

45

55

-73.79589811

 

0.4

-0.91629

-0.51083

45

55

-69.32849224

 

0.5

-0.69315

-0.69315

45

55

-69.31471806

MLE

0.6

-0.51083

-0.91629

45

55

-73.38314332

 

0.7

-0.35667

-1.20397

45

55

-82.26887672

 

0.8

-0.22314

-1.60944

45

55

-98.56054499

 

0.9

-0.10536

-2.30259

45

55

-131.3834033