A couple of days ago I posted a Bayesian re-analysis of the data from a paper on prenatal progesterone exposure and sexual orientation. For that analysis, I used uniform priors for both exposed and unexposed subjects – that is, I assumed we pretty much don’t know anything about how common non-heterosexuality is, and that the effects of progesterone exposure could be anywhere from infinity to nothing. These priors didn’t seem very realistic, but the results I got seem fairly intuitive, given the data and outside figures on how common non-heterosexuality is.

Now I’m going to run a different analysis using a model that seems more realistic to me. While the previous analysis was based on the A/B testing example from section 2.2.3 of *Bayesian Methods for Hackers*, this one will be based on the space shuttle example from section 2.2.10. If you’re not a numbers person, I apologize – I’m still fiddling with the format of this blog and who my audience is.

First, set up the data:

`control = [i`

treatment = [i=ssize for i in range(2*ssize)]).astype(int)

Next, set up the logistic model and its priors:

`baserate = 0.035`

intercept = np.log(1/baserate-1)

nb = 0.001

na = 0.001

beta = pm.Normal("beta",0,nb, value=0)

alpha = pm.Normal("alpha", intercept, na, value=0)

`@pm.deterministic`

def p(ex=ex, alpha=alpha, beta=beta):

Note some arbitrary-ish decisions – I chose my prior for the mean of alpha based on my prior knowledge that about 3.5% of Americans identify as other-than-straight, and I chose the prior for the mean of beta based on a conservative assumption that progesterone has no effect on that. Those choices seemed straightforward; the problem is how to choose the standard deviations – which, in the PyMC package, show up in inverse form as “tau.” The *Challenger* example from the book uses 0.001 as the tau for both alpha and beta, which implies an extreme degree of uncertainty about the values of the priors. Let’s start with that…

`bern = pm.Bernoulli("bern", p, value=obs, observed=True)`

model = pm.Model([bern, beta, alpha])

map_ = pm.MAP(model)

map_.fit()

mcmc = pm.MCMC(model)

mcmc.sample(120000,100000,2)

alphas = mcmc.trace('alpha')[:, None]

In that case, our estimate of the population base rate looks like this:

`base = 1/(1+np.exp(alphas))`

plt.hist(base, range=(0,0.1),bins=25)

And for the effect of progesterone:

`betas = mcmc.trace('beta')[:, None]`

odds = np.exp(betas)

plt.hist(odds, range=(0,1000000), bins=100)

What a mess! When we “let the data speak for itself”, our small sample size bites us – we found that zero out of thirty-four unexposed subjects identified as other-than-straight, and now we estimate that an unexposed baby has far less than a 1% chance of growing up to be other-than-straight, and that progesterone exposure makes that baby up to tens of thousands of times more likely to be other-than-straight.

What happens if, instead, I set a more realistic prior for the base rate? We’ll keep the high degree of uncertainty for the effect of progesterone, but set the standard deviation equal to half the mean.

`nb = 0.001`

na = 4.0/(intercept*intercept)

beta = pm.Normal("beta",0,nb, value=0)

alpha = pm.Normal("alpha", intercept, na, value=0)

Now the model thinks there’s about a 1.9% chance of an unexposed baby growing up to be other-than-straight.

And it thinks the chances of an exposed baby being other-than-straight are about ten times that:

1/(1+np.exp(alphas.mean()-betas.mean()))

Odds ratios, hard to interpret:

And I totally believe those results! The results of tweaking the priors suggest to me that in a case like this, with a small sample size, choosing the right prior is **extremely** important. Of course, the odds ratio on the frequentist estimate is even worse – well over 999, which is the maximum that my software package can show – but one could argue that the frequentist analysis makes more modest claims; for example, that the p-value against the null hypothesis is 0.0041.