Predicting place probabilities from win odds

On Betfair, most horse racing events offer both win and place markets, where place means the horse comes first, second or third. It is natural to ask what the relationship is between both markets - given the win odds for each horse, how well can we predict the probability that each horse will place?

In this post I will look at a natural way of modelling this mathematically, called the normal ranking model. Whilst this does not necessarily beat place market odds by itself, I have found it a helpful feature for building more complex machine learning models to predict place probabilities.

Most of this post is inspired by Ali (1998) . See references below for some other relevant papers.

Prerequisites

Python 3.8+
A working installation of Julia
Python packages: numpy, scipy, julia.

After pip install julia, you’ll need to install it by manually running julia.install() once.
Julia packages: Distributions, QuadGK, IterTools, PyCall. Get these by running from the interactive Julia prompt:

Pkg.add(["Distributions", "QuadGK", "IterTools", "PyCall"]).
Some background understanding of statistics and probability.

Normal ranking model

Suppose we have a race with $N$ horses. Before the race, we can look at the win market and since win odds represent win probabilities accurately , we have estimates of the probability each horse will win (obtained by inverting the decimal odds). Let us denote the win probability of horse $i$ as $y_i$:

$$ y_i = 1 / (\textrm{last traded price of horse i}). $$

Our goal is to build a model that maps these probabilities to place probabilities.

Let us define $N$ random variables $X_1, \ldots, X_N$, with $X_i \sim \mathcal{N}(\alpha_i, 1)$ - that is, each horse gets a normal random variable with some mean $\alpha_i$ and variance $1$. In the normal ranking model, we then model the probability of any given finishing permutation $\pi = (X_{n_1}; X_{n_2}; …; X_{n_N})$ as

$$ p(\pi) = P(X_{n_1} < X_{n_2} < … < X_{n_N}). $$

The idea is that better horses have lower $\alpha_i$. It might help to think of the $X_i$ as the finishing time of horse $i$, which we assume is distributed normally with mean $\alpha_i$. The probability of a given finishing permutation is then the probability that the finishing times are ordered as such ($X_i$ doesn’t actually represent the finishing time, but it helps to understand the idea).

To use this model, there are two steps:

Use the known win market probabilities to estimate the parameters of the model (the $\alpha_i$)
Use the learned model to predict place probabilities.

Learning the model

To estimate the parameters ${\alpha_i}$, we proceed according to Ali (1998) and match the win probabilities as predicted by the model with those given by the true market odds - this gives maximum likelihood estimates.

The first step is therefore to compute the win probabilities predicted by the model:

$$ \begin{align} \hat{y_i} &= \int_{-\infty}^{\infty} p_i(x_i) \cdot P(\textrm{horse $i$ wins} \vert X_i = x_i) \mathop{dx_i} &\quad\textrm{(law of total probability)} \\
&= \int_{-\infty}^{\infty} p_i(x_i) \cdot P(X_j > x_i \,\, \forall j\neq i) \mathop{dx_i} &\quad\textrm{} \\
&= \int_{-\infty}^{\infty} p_i(x_i) \prod_{\substack{j=1 \\ j\neq i}}^N P(X_j > x_i) \mathop{dx_i} &\quad\textrm{(by iid assumption)} \\
&= \int_{-\infty}^{\infty} p_i(x_i) \prod_{\substack{j=1 \\ j\neq i}}^N (1 - F(x_i;\alpha_j)) \mathop{dx_i} &\quad\textrm{} \end{align} $$

where $F(x_i, \alpha_j)$ denotes the normal CDF with mean $\alpha_j$ at $x_i$.

This integral admits no analytic solution so must be evaluted numerically. Let’s try doing this with Python’s scipy.integrate :

import numpy as np
from scipy import integrate, optimize
from scipy.stats import norm

def compute_win_probs(alpha, variance=1):
    # Win probabilities for normal ranking model given parameter vector `alpha`.
    y = []
    k = len(alpha)
    for i in range(k):
        # f is the function under the integral
        def f(x):
            return norm.pdf(x, loc=alpha[i], scale=variance) * np.prod([
                (1 - norm.cdf(x, loc=alpha[j], scale=variance))
                for j in range(k) if j != i
            ])

        y.append(integrate.quad(f, -np.inf, np.inf)[0])
    return np.array(y)

This function takes a list containing the ${\alpha}$ and returns a list of the same length containing the win probabilities predicted by the model. We can test it out with a set of 10 runners with uniformly-spaced $\alpha = -5, \ldots, 4$:

def test_compute_win_probs():
    alpha = list(range(-5, 5))
    win_probs = compute_win_probs(alpha)
    print(alpha)
    print(win_probs)
    print(sum(win_probs))

In [1]: test_compute_win_probs()
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
[7.25064308e-01 2.22150574e-01 4.63917256e-02 5.94938896e-03
 4.27576698e-04 1.61175056e-05 3.05598359e-07 2.79957299e-09
 1.26808414e-11 2.67769610e-14]
0.9999999999621239

The function works as expected - it takes a list of alpha and returns a list of probabilities that sum to 1. We don’t know the $\alpha$ yet, of course, but the function above is necessary for the learning process.

Unfortunately, it is very slow:

In [1]: %timeit test_compute_win_probs()
<truncated>
4.05 s ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

To learn the ${\alpha}$ we will need to call compute_win_probs many times in succession, so 4 seconds per iteration won’t work. We could try to speed up the Python code, but a much better option is to move the computationally intensive part out of Python entirely.

Julia is a good choice for this. The syntax is easy to learn, it runs very fast , and is easy to call directly from Python. This means we can write the performance-critical code in Julia, but still use Python for everything else.

Moving performance-critical code to Julia

Let’s rewrite compute_win_probs and our test function in Julia. The syntax is similar enough to Python that the conversion is straightforward:

# To install these requirements:
# using Pkg; Pkg.add(["Distributions", "QuadGK", "IterTools", "PyCall"])
using Distributions
using QuadGK
using IterTools

SIGMA = 1

function compute_win_probs(alpha)
    # Win probabilities for normal ranking model given parameter vector `alpha`.
    k = length(alpha)
    y = []
    for i in 1:k
        function f(x)
            return pdf(Normal(alpha[i], SIGMA), x) * prod([
                (1 - cdf(Normal(alpha[j], SIGMA), x))
                for j in 1:k if j != i
            ])
        end
        # atol necessary for when the integrand is almost zero everywhere
        append!(y, quadgk(f, -Inf, Inf, atol=1e-8)[1])
    end
    return y
end

function test_compute_win_probs()
    @time begin
        alpha = float(collect(-5:4))
        win_probs = compute_win_probs(alpha)
        println(alpha)
        println(win_probs)
        println(sum(win_probs))
    end
end

Running the test function, this time we get:

julia> test_compute_win_probs()
[-5.0, -4.0, -3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0, 4.0]
Any[0.7250643084767985, 0.22215057435373503, 0.046391725560646066, 0.00594938895700788, 0.0004275766976697599, 1.6117505559671497e-5, 3.055984397099697e-7, 4.6358648147621366e-9, 1.2911756761429743e-11, 1.6164335517420775e-14]
1.0000000017986495
  0.010849 seconds (108.33 k allocations: 2.521 MiB)

We’re getting just over 0.01 seconds per integral computation instead of 4 - almost a 400x speedup! (Note that for accurate timings in Julia, you need to run the function once first to force compilation).

Minimizing the error

Now that we have an efficient way of computing the win probabilities predicted by the model, we can turn to the problem of optimising the model parameters ${\alpha}$ so that the predicted win probabilities match the true market odds.

Let us define the sum-of-squares error between the model-based win probabilities and true win probabilities as

$$ S = \sum_{i=1}^N \left( y_i - \hat{y_i} \right)^2. $$

Our goal now is to find the set of $\alpha$ that minimises this function. For this we will use scipy.optimize.least_squares , which is a general optimisation routine for non-linear least-squares problems. It expects an evaluation function that takes a vector (in our case, a guess of the ${\alpha_i}$) and returns a vector of residuals, i.e. the non-squared differences between each predicted $\alpha_i$ and the true values.

def learn_alpha(y):
    """Takes a list of true win probabilities y, and returns a list of the alpha that produce y as closely as possible."""
    y = np.array(y)

    # Import our Julia function
    # For this to work, you need to `pip install julia` and then run julia.install()
    # See https://pypi.org/project/julia/
    import julia
    jl = julia.Julia()
    jl.eval('include("normalmodel.jl")')
    compute_win_probs_jl = jl.eval('compute_win_probs')

    # Now we can call compute_win_probs_jl as if it were a Python function!
    def _get_residuals(alpha):
        return np.array(compute_win_probs_jl(alpha) - y)

    init = np.arange(0, len(y)/10, 0.1)  # Initial guess - see text
    res = optimize.least_squares(_get_residuals, init, method='lm', verbose=2)
    return res.x

Here we have utilised our Julia implementation of the numerical integral and passed this to the optimisation routine. The Julia Python package handles argument conversions automatically for us.

Note about initial guess: Any nonlinear optimisation problem like this requires an initial guess at the solution, and depending on the problem the final result is often sensitive to this. Ali (1998) provides an analytic approximation based on Henery (1981) , but I was unable to make sense of the logic behind this and empirically, it didn’t work very well. Instead, based on trial and error I’m using a uniform spacing separated by a distance of $0.1$. This works better than other naive initialisations that I tried, such as zero. There may well be better initial approximations based on the true $y_i$ and/or the live place market probabilities.

Now let’s test this with a hypothetical field of 7 runners with win probabilities $0.3, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05$:

def test_learn_alpha():
    win_probs = [0.3, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05]
    alpha = learn_alpha(win_probs)
    print("alpha:", alpha)
    print("market win probabilities:", win_probs)
    print("model win probabilities:", compute_win_probs(alpha))

In [1]: test_learn_alpha()
`xtol` termination condition is satisfied.
Function evaluations 134, initial cost 6.9647e-03, final cost 3.5493e-24, first-order optimality 2.83e-15.
alpha: [-0.28389965  0.00630525  0.00630525  0.44878142  0.44878142  0.83811324
  0.83811324]
market win probabilities: [0.3, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05]
model win probabilities: [0.3  0.2  0.2  0.1  0.1  0.05 0.05]

We can see that our learned ${\alpha}$ produce win probabilities that match the market-based win probabilities, as desired.

Predicting place probabilities

Now that we have successfully fitted the parameters of our normal ranking model, the next step is to use the model to predict place probabilities. We have already shown how to use the model to compute win probabilities, we now extend this to first, second, and third place.

First, consider the problem of finding the probability that a given horse $i$ finishes in exactly position $k$ for general $k$. As above, by the law of total probability,

$$ P(\textrm{horse $i$ finishes in position $k$}) = \int_{-\infty}^{\infty} p_i(x_i) \cdot P(\textrm{horse $i$ finishes in position $k$} \vert X_i = x_i) \mathop{dx_i}. $$

Define $A$ to be the subset of horses that finish ahead of $i$, and $B$ the subset that finish behind, and sum over all possible configurations of these:

$$ \begin{align} & \int_{-\infty}^{\infty} p_i(x_i) \cdot P(\textrm{horse $i$ finishes in position $k$} \vert X_i = x_i) \mathop{dx_i} \\
&= \int_{-\infty}^{\infty} p_i(x_i) \cdot \sum_{A, B} P(\textrm{horse $i$ finishes in position $k$} \vert X_i = x_i, A, B) \mathop{dx_i} \\
&= \int_{-\infty}^{\infty} p_i(x_i) \cdot \sum_{A, B} \prod_{j\in A}F(x_i;\alpha_j) \prod_{j\in B}(1 - F(x_i;\alpha_j)) \end{align} $$

Again, we must evaluate this numerically. The only significant difference between this and computing the win probabilities is that we now have to sum over all possible configurations of subsets $A$ and $B$. Let’s define another function in Julia to do this:

function compute_exact_finish_prob(alpha, idx, position)
    # Returns probability that horse `idx` finishes in position `position` according to the model defined by `alpha`.
    # alpha: our learned model
    # idx: index of horse that we want probability for
    # position: finishing position

    # idx is zero-based to match Python, convert to Julia-style (so is position, but it's OK)
    idx += 1

    other_runners = collect(1:length(alpha))
    deleteat!(other_runners, idx)
    prob = 0

    for runners_ahead in subsets(other_runners, position)
        runners_behind = setdiff(other_runners, runners_ahead)

        function f(x)
            return pdf(Normal(alpha[idx], SIGMA), x) * prod(Float64[
                cdf(Normal(alpha[j], SIGMA), x)
                for j in runners_ahead
             ]) * prod(Float64[
                (1 - cdf(Normal(alpha[j], SIGMA), x))
                for j in runners_behind
             ])
        end
        prob += quadgk(f, -Inf, Inf, atol=1e-8)[1]
    end
    return prob

end

Let’s test this with the same win probabilities we used above ($0.3, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05$). We will use these win probabilities to learn the ${\alpha}$, and then use the learned model to predict the probability that each horse will finish exactly second:

def test_compute_exact_finish_prob():
    import julia
    jl = julia.Julia()
    jl.eval('include("normalmodel.jl")')
    compute_exact_finish_prob_jl = jl.eval('compute_exact_finish_prob')

    win_probs = [0.3, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05]
    alpha = learn_alpha(win_probs)
    second_place_probs = [compute_exact_finish_prob_jl(alpha, i, 1) for i in range(len(win_probs))]
    print("alpha:", alpha)
    print("market win probabilities:", win_probs)
    print("predicted second place probabilities ", second_place_probs)

In [1]: test_compute_exact_finish_prob()
`xtol` termination condition is satisfied.
Function evaluations 134, initial cost 6.9647e-03, final cost 3.5493e-24, first-order optimality 2.83e-15.
alpha: [-0.28389965  0.00630525  0.00630525  0.44878142  0.44878142  0.83811324 0.83811324]
market win probabilities: [0.3, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05]
predicted second place probabilities  [0.21120089563371375, 0.18813321104507638, 0.18813321104507988, 0.12760081218674318, 0.12760081218674496, 0.0786655289513073, 0.07866552895130477]

Note that we’re predicting the probability that each horse will finish exactly second here, not first or second - so the predicted probabilities make rough sense.

It is now straightforward to use our model to predict place probabilities. All we need to do is sum the probabilities that each horse will finish first, second or third:

def predict_place_probs(y):
    alpha = learn_alpha(y)
    place_probs = [
        sum([compute_exact_finish_prob_jl(alpha, i, k) for k in range(3)])
        for i in range(len(y))
    ]
    return place_probs

Testing with our hypothetical 7-runner field:

In [1]: predict_place_probs([0.3, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05])
`xtol` termination condition is satisfied.
Function evaluations 134, initial cost 6.9647e-03, final cost 3.5493e-24, first-order optimality 2.83e-15.
Out[1]:
[0.6723801663180751,
 0.556947599816801,
 0.5569475998168132,
 0.37266284384187975,
 0.3726628438418848,
 0.23419947318571618,
 0.23419947318570902]

Finally we have the predicted probabilities of each horse running first, second or third. The model predicts, for example, that the favourite (at a price of $3.33) has a 67% chance of placing. Even though the win odds here were completely made up, the results are surprisingly close to the long-term averages published by Betfair .

Next steps

What can we do with this model? The most obvious thing is to compare the place probabilities it predicts with the place probabilities in live place markets, and see if we can make a profit. To evaluate the strategy properly, we need access to a lot of Betfair historical data . Spoiler alert: the predictions are surprisingly close to the live markets in most cases, and it’s therefore probably not a profitable strategy on its own once commission is taken into account (I will update this post with some empirical results at some stage).

More realistically, we can use the predictions as a feature in other machine learning models that predict place probabilities from a wider array of data. I have found the normal ranking model to be quite helpful in this regard.

References

Henery, R. J. “Permutation probabilities as models for horse races.” Journal of the Royal Statistical Society: Series B (Methodological) 43.1 (1981): 86-91. (link )
Harville, David A. “Assigning probabilities to the outcomes of multi-entry competitions.” Journal of the American Statistical Association 68.342 (1973): 312-316. (link )
Ali, Mukhtar M. “Probability models on horse-race outcomes.” Journal of Applied Statistics 25.2 (1998): 221-229. (link )