Category Archives: probability

Bayesian inference of multinomial distribution parameters

The multinomial distribution is used to describe data where each observation is one of k possible outcomes. In his book, Bayesian Data Analysis (pg 83), Andrew Gelman demonstrates how to use Bayesian methods to make inferences about the parameters of a multinomial distribution. He used data from a sample survey by CBS news prior to the 1988 presidential election. In his book Bayesian Computation with R (pg. 60), Jim Albert used R to perform the same inferences on the same data as Gelman. I’ll now demonstrate how to perform these inferences in Clojure, with Incanter, using the sample-multinomial-params function. I will then describe the Bayesian model underlying sample-multinomial-params, and demonstrate how to do the same calculations directly with the sample-dirichlet function.

First, load the necessary libraries, including the incanter.bayes library.

(use '(incanter core stats bayes charts))

The data consist of 1447 survey responses; 727 respondents supported George Bush Sr., 583 supported Michael Dukakis, and 137 supported another candidate, or were undecided. Define a vector, y, that includes the counts for each of the outcomes.

(def y [727 583 137])

The proportion of supporters for the candidates, calculated by dividing y by the total responses,

(div y 1447)

are (0.502 0.403 0.095). If the survey data follows a multinomial distribution, these proportions are the parameters of that distribution. The question is, how good an estimate of the population parameters are these values?

Confidence intervals for the parameters can be calculated by applying the quantile function to a sample drawn from an appropriate distribution, which in the case of the parameters of a multinomial distribution is a Dirichlet distribution.

Samples can be drawn from the appropriate Dirichlet distribution with the sample-multinomial-params function.

(def  theta (sample-multinomial-params 1000 y))

This returns 1000 samples of the multinomial parameters, or probabilities of each of the outcomes in the data. Now, create variables representing the samples for each parameter,

(def theta1 (sel theta :cols 0))
(def theta2 (sel theta :cols 1))
(def theta3 (sel theta :cols 2))

and calculate the mean, standard deviation, 95% confidence interval, and display a histogram for each sample.

(mean theta1)
(sd theta1)
(quantile theta1 :probs [0.025 0.975]) 
(view (histogram theta1
                 :title "Bush, Sr. Support"))

The mean for theta1 is 0.502, the standard deviation is 0.013, and the 95% confidence interval is (0.476 0.528).

(mean theta2) 
(sd theta2)
(quantile theta2 :probs [0.025 0.975]) 
(view (histogram theta2
                 :title "Dukakis Support"))

The mean for theta2 is 0.403, the standard deviation is 0.013, and the 95% confidence interval is (0.376 0.427).

(mean theta3) 
(sd theta3)
(quantile theta3 :probs [0.025 0.975]) 
(view (histogram theta3
                 :title "Other/Undecided"))

The mean for theta3 is 0.095, the standard deviation is 0.008, and the 95% confidence interval is (0.082 0.111).

Using the sample distributions for the parameters, we can view the distribution of other statistics, like the difference in support between Bush, Sr. and Dukakis. View a histogram of the difference in support for the two candidates.

(view (histogram (minus theta1 theta2) 
                 :title "Bush, Sr. and Dukakis Diff"))

The sample-multinomial-params functions returns values from a Dirichlet distribution. The Dirichlet is the conjugate prior distribution for the parameters of the multinomial distribution, such that the posterior Dirichlet’s parameters equal the sum of the count data, y, and the parameters from the prior Dirichlet, alpha.

 p(\theta | y+\alpha) = p(y | \theta)p(\alpha)/p(y)  

Where

 y = (y_1, y_2, ..., y_k)  

are the count data that follow a multinomial distribution,

 \theta = (\theta_1, \theta_2, ..., \theta_k)  

are the parameters of the multinomial distribution, and

 \alpha = (\alpha_1, \alpha_2, ..., \alpha_k)  

are the hyper-parameters of the prior Dirichlet distribution. The likelihood function, p(y | theta), is a multinomial conditional distribution that describes the likelihood of seeing the observed y values, given the values of the parameter, theta.

The prior, p(alpha), is made uninformative by setting all the alpha values equal to one, producing a uniform distribution.

 \alpha = (1, 1, ..., 1) 

Then the posterior Dirichlet becomes

 p(\theta | y_1+1, y_2+1, .., y_3+1)  

Sample the proportions directly from the above Dirichlet distribution using the sample-dirichlet function.

(def alpha [1 1 1])
(def  props (sample-dirichlet 1000 (plus y alpha)))

This returns data from the same distribution as the sample-multinomial-params function, so all the means, standard deviations, and confidence intervals can be calculated as above.

The complete code for this example is here.

Monty Hall in Monte Carlo

The Monty Hall problem, named after the host of the Let’s Make a Deal game show, first appeared in Marilyn vos Savant‘s column in Parade magazine in 1990, although people still vigorously debate the answer, as discussed here and here.

The basic premise is that you must choose one of three doors, one of which contains a prize. Once you have selected a door, Monty Hall reveals the contents of one of the doors you did not select, usually containing something like a goat. He then asks if you would like to switch your choice to the remaining unopened door. The question is, will switching doors increase you chances of winning, decrease your chances, or have no effect at all?

One thing to keep in mind, is that when Monty Hall shows you what’s behind one of the doors you didn’t select, he doesn’t ever reveal the door containing the prize. This fact should change your decision making process.

We can answer the question of which strategy is best by calculating the conditional probabilities of winning given you either switch or stay by performing a Monte Carlo simulation. The conditional probability of winning given you decide to switch doors can be written as Prob(win | switch), and is equal to the probability that you switch and win divided by the probability that you switch,

Prob(win | switch) = Prob(switch & win)/Prob(switch)

and the conditional probability of winning given you decide to stay with your original door is equal to the probability you stay and win divided by the probability that you stay.

Prob(win | stay) = Prob(stay & win)/Prob(stay)

In this simulation, we will generate a bunch of initial guesses, a bunch of prize locations, and a bunch of switch decisions and then use them to calculate the probabilities. First, load the necessary libraries.

(use '(incanter core stats charts))

Set a simulation sample size, generate samples of initial-guesses, prize-doors, and switch decisions with the sample function. The probability that the prize is behind any of the three doors is 1/3, as is the probability that you select a particular door as an initial guess, while the probability that you switch doors is 1/2.

(def n 10000)
(def initial-guesses (sample [1 2 3] :size n))
(def prize-doors (sample [1 2 3] :size n))
(def switches (sample [true false] :size n))

Define a function that returns one if a switch decision results in winning, a zero otherwise.

(defn switch-win? [initial-guess switch prize-door] 
  (if (and switch (not= initial-guess prize-door)) 1 0))

Calculate the joint probability of winning and switching, by mapping the switch-win? function over the generated data, summing the occurrences of winning after switching, and then dividing this number by the total number of samples, n.

(def prob-switch-win (/ (sum (map switch-win? 
                              initial-guesses 
                              switches 
                              prize-doors)) 
                        n))

Calculate the probability of switching doors. Use the indicator and sum functions to calculate the frequency of switches, and then divide by the total number of samples.

(def prob-switch (/ (sum (indicator true? switches)) n))

Finally, use the formula above to calculate the conditional probability of winning given a switch.

(def prob-win-given-switch (/ prob-switch-win prob-switch))

Now repeat the process to calculate the conditional probability of winning given the decision to stay with your original choice. First, define a function that returns a one if a stay decision results in winning, zero otherwise.

(defn stay-win? [initial-guess switch prize-door] 
  (if (and (not switch) (= initial-guess prize-door)) 1 0))

Next, calculate the joint probability of winning and staying,

(def prob-stay-win (/ (sum (map stay-win? 
                            initial-guesses 
                            switches 
                            prize-doors)) 
                      n))

and, calculate the probability of staying with the original door.

(def prob-stay (/ (sum (indicator false? switches)) n))

Finally, calculate the conditional probability of winning given you stay with the original door

(def prob-win-given-stay (/ prob-stay-win prob-stay))

View the results with the pie-chart function.

(view (pie-chart ["Switch" "Stay"] 
                 [prob-win-given-switch prob-win-given-stay]))

The conditional probability of winning given you switch doors is approximately 0.66, while the probability of winning given you stay with your original choice is approximately 0.33.

To some this outcome may seem non-intuitive. Here’s why it’s the case: the probability that the prize is behind any of the three doors is 1/3, so once you have selected a door, the probability that the prize is behind a door you didn’t select is 2/3. Since Monty Hall knows which door has the prize, and never selects it as the door to reveal, all of the 2/3 probability belongs to the other unselected door. So switching will give you a 2/3 probability of winning the prize, and sticking with your original decision only gives you the original 1/3 chance.

The complete code for this example can be found here.