qEI with QMCPy

Michael McCourt

July 19, 2020

This post demonstrates how QMCPy low-discrepancy samples can improve Monte Carlo estimation of q-Expected Improvement in Bayesian optimization.

Quasi-Monte Carlo methods (QMC) are a valuable tool for sampling random variables in a structured fashion. This allows for computing key statistics of random variables more efficiently than with i.i.d. sampling. Such quantities can play fundamental roles in larger algorithms, making their efficient computation fundamental to practical implementations of numerous applications. This was the motivation for creating the QMCPy library. In this post, we demonstrate the use of QMC methods in computing a key quantity in Bayesian optimization.

Bayesian Optimization

Bayesian optimization, also called many other names, including sequential model based optimization or Gaussian process optimization, is a broad class of algorithms which involves alternately building statistical models of scattered data and optimally sampling for further data to try to optimize a given function. To conduct this optimal sampling process, a strategy must be applied to take the statistical model and appropriately balance a desire to explore the optimization domain against a desire to exploit the high performing values found thus far and search in their proximity. We refer to this strategy as the acquisition function.

Probably the most popular acquisition function is Expected Improvement (EI), which is widely used both because of its closed form for a Gaussian process surrogate model and because of its effectiveness. When trying to push beyond the sequential model of Bayesian optimization to allow for parallelism, the EI acquisition function can be redefined for optimally choosing the next \(q\) points at which to sample. This qEI acquisition function, first defined here, with more recent discussion here, unfortunately does not have a simple closed form, and is generally computed through Monte Carlo estimation.

qEI Estimation

We consider a simple problem to demonstrate the value of QMC in estimating this qEI quantity. The left panel of Figure 1 depicts 5 noisy observations of a 1D function. We fit a Gaussian process (GP) model to this data, and provide examples of posterior draws in the center panel.

Observed points, Gaussian process posterior draws, and qEI surface — **Figure 1**: *left*: 5 observed points, drawn from a function, with error bars displayed. *center*: 128 quasi-random posterior draws from a Gaussian process fit to the observed points. *right*: The qEI associated with \(q=2\) points given this Gaussian process and the observed points; note the symmetry across the line \(y=x\), which occurs because the points will be sampled at the same time.

The right panel of Figure 1 depicts the qEI quantity for \(q=2\) future points to be sampled. qEI is defined, for a maximization problem, with a \(q\)-dimensional integral as

\[ \operatorname{qEI}(x_1, \ldots, x_q) = \int_{\mathbb{R}^q} \max_{1 \le i \le q}(y_i - y^*)_+ p_{Y_{x_1, \ldots, x_q}}(y_1, \ldots, y_q) \, \mathrm{d}y_1 \cdots \mathrm{d}y_q, \]

where \(Y_{x_1, \ldots, x_q}\) is the joint posterior GP distribution at \(x_1, \ldots, x_q\) and \(y^*\) is the best value observed thus far. Note the symmetric nature of the qEI: since both of the \(q=2\) points will be evaluated simultaneously, the designation of "first" and "second" is arbitrary. The highest qEI value occurs when sampling near \(x_1=0\) and \(x_2=0.6\), points which are near the highest valued draws of the posterior.

This integral is generally estimated using Monte Carlo, as discussed here and here, among other places, with i.i.d. draws. Using Quasi-Monte Carlo, we can reach the same level of accuracy with fewer GP posterior draws. Figure 2 shows the results of using different quasi-random sequences available in QMCPy to estimate qEI for \(q=5\) next points, located at \(0.158\), \(0.416\), \(0.465\), \(0.718\), and \(0.935\).

Convergence of IID Monte Carlo and QMC methods for estimating qEI — **Figure 2**: Monte Carlo estimation, computed both using QMCPy's i.i.d. sampler and NumPy's `normal` sampler, is compared to QMC methods available in QMCPy. The Sobol' and Lattice methods perform comparably, as expected, and converge at roughly \(\mathcal{O}(N^{-1})\), in contrast to the \(\mathcal{O}(N^{-1/2})\) convergence of i.i.d. sampling. Integrals were estimated 50 times with different random seeds: the medians (solid lines) and interquartile ranges (shaded regions) are plotted.

The integrals present in Bayesian optimization are another example of a computation which benefits from QMC. More efficient acquisition function computation gives us the ability to conduct more complicated strategies in a feasible amount of time. Check out the QMCPy documentation or contact us to learn more about the QMCPy project and how QMC can help your work.

For a related executable notebook, see the qEI demo for blog.