# Odds and ends: solvers and optimizers

Welcome back.

I hope you all had nice vacations. As for me, my summer was definitely
a productive one: I finally managed to complete *Implementing
QuantLib*. The full book is
available on Leanpub;
I’ll be also posting the new content here in the next few weeks,
leading up to the next edition of my course in October (more info in
the banner above). In today’s post, we start with a look at solvers
and optimizers.

Follow me on Twitter if you want to be notified of new posts, or subscribe via RSS or email: the buttons for that are in the footer. Also, I’m available for on-site training in Europe and UK: visit my Training page for more information.

### One-dimensional solvers

Solvers were used in the bootstrap routines described in this post, the yield calculations mentioned in this other one, and any code that needs a calculated value to match a target; i.e, that needs, given a function \( f \), to find the \( x \) such that \( f(x) = \xi \) within a given accuracy.

The existing solvers will find the \( x \) such that
\( f(x) = 0 \); of course, this doesn’t make them any less generic,
but it requires you to define the additional helper function
\( g(x) \equiv f(x)-\xi \). There are a few of them, all based on
algorithms which were taken from *Numerical Recipes in C* [1]
and duly reimplemented. (This also goes
for a few multi-dimensional optimizers. In that case, apart from the
obvious copyright issues, we also rewrote them in order to use
idiomatic C++ and start indexing arrays from 0.)

The following listing shows the interface of the class template
`Solver1D`

, used as a base by the available solvers.

It provides some boilerplate code, common to all of them: one overload
of the `solve`

method looks for lower and upper values of \( x \)
that bracket the solution, while the other checks that the solution is
actually bracketed by the passed minimum and maximum values. In both
cases, the actual calculation is delegated to the `solveImpl`

method
defined by the derived class and implementing a specific algorithm.
Other methods allow you to set constraints on the explored range, or
the number of function evaluations.

The forwarding to `solveImpl`

is implemented using the Curiously
Recurring Template Pattern, already described in a previous post. When we
wrote these classes, we were at the height of our template craze (did
I mention we even had an implementation of expression templates?) so
you might suspect that the choice was dictated by the fashion of that
time. However, it wouldn’t have been possible to use dynamic
polymorphism. We wanted the solvers to work with any function pointer
or function object, and `boost::function`

wasn’t around yet, which
forced us to use a template method. Since the latter couldn’t be
virtual, CRTP was the only way to put the boilerplate code in the base
class and let it call a method defined in derived ones.

A few notes to close this subsection. First: if you want to write a
function that takes a solver, the use of CRTP forces you to make it a
template, which might be awkward. To be honest, most of the times we
didn’t bother and just hard-coded an explicit choice of solver. I
won’t blame you if you do the same. Second: most solvers only use the
passed `f`

by calling `f(x)`

, so they work with anything that can be
called as a function, but `Newton`

and `NewtonSafe`

also require that
`f.derivative(x)`

be defined. This, too, might have been awkward if
we used dynamic polymorphism. Third, and last: the `Solver1D`

interface doesn’t specify if the passed accuracy \( \epsilon \)
should apply to \( x \) (that is, if the returned
\( \tilde{x} \) should be within \( \epsilon \) of the true
root) or to \( f(x) \) (that is, if \( f(\tilde{x}) \) should be
within \( \epsilon \) of 0). However, all existing solvers treat
it as the accuracy on \( x \).

### Optimizers

Multi-dimensional optimizers are more complex than 1-D solvers. In a nutshell, they find the set of variables \( \tilde{\mathbf{x}} \) for which the cost function \( f(\mathbf{x}) \) returns its minimum value; but there’s a bit more to it, as we’ll see when we look at the implementation of the cost function.

In this case, we didn’t use templates. Optimizers inherit from the
base class `OptimizationMethod`

, shown in the next listing.

Its only method, apart from the virtual
destructor, is `minimize`

. The method takes a reference to a
`Problem`

instance, which in turn contains references to the function
to minimize and to any constraints, and performs the calculations; at
the end of which, it has the problem of having too many things to
return. Besides the best-solution array \( \tilde{\mathbf{x}} \),
it must return at least the reason for exiting the calculations (did
the minimization converge, or is it returning its best guess after the
maximum number of evaluations was hit?) and it would be nice to return
the value of the function in \( \tilde{\mathbf{x}} \) as well, since
it’s likely that it’s already calculated.

In the current implementation, the method returns the reason for
exiting and stores the other results inside the `Problem`

instance. This is also the reason for passing the problem as a
non-`const`

reference; an alternative solution might have been to
leave the `Problem`

instance alone and to return all required
values in a structure, but I see how this might be seen as more
cumbersome. On the other hand, I see no reason for `minimize`

itself to be non-`const`

: my guess is that it was an oversight on
our part (I’ll get back to this later).

Onward to the `Problem`

class, shown in the listing below.

As I mentioned, it groups together arguments such as the cost function to minimize, the constraints, and an optional guess. It provides methods that call the underlying cost function while keeping track of the number of evaluation, and that I’ll describe when talking about cost functions; a few inspectors for its components; and methods to retrieve the results (as well as to set them; the latter are used by the optimizers).

The problem (no pun intended) is that it takes and stores its
components as non-`const`

references. I’ll talk later about
whether they might be `const`

instead. The fact that they’re
reference is an issue in itself, since it puts the responsibility on
client code to make sure that their lifetimes last at least as much as
that of the `Problem`

instance.

In an alternative implementation in which the optimizer returned its
results in a structure, the issue might be moot: we might do away with
the `Problem`

class and pass its components directly to the
`minimize`

method. This would sidestep the lifetime issue, since
they wouldn’t be stored. The disadvantage would be that each
optimizer would have to keep track of the number of function
evaluations, causing some duplication in the code base.

Also unlike for 1-D solvers, the cost function is not a template
parameter for the minimization method. It needs to inherit from the
`CostFunction`

class, shown in the next listing.

Unsurprisingly, its interface declares the `value`

method, which
returns, well, the value of the function for the given array of
arguments (although you might be a bit surprised that it doesn’t
declare `operator()`

instead); but it also declares a `values`

method
returning an array, which is a bit more surprising until you remember
that optimizers are often used for calibrating over a number of
quotes. Whereas `value`

returns the total error to minimize (let’s
say, the sum of the squares of the errors, or something like it),
`values`

returns the set of errors over each quote; there are
algorithms that can make use of this information to converge more
quickly.

Other methods return the derivatives of the value, or the values, for
use by some specific algorithms: `gradient`

calculates the
derivative of `value`

with respect to each variable and stores it
in the array passed as first argument, `jacobian`

does the same
for `values`

filling a matrix, and the `valueAndGradient`

and `valuesAndJacobian`

methods calculate both values and
derivatives at the same time for efficiency. They have a default
implementation that calculates the derivatives by finite differences;
but of course that’s costly, and I’m not sure that it’s worthwhile to
use derivative-based algorithms if you can’t override the method with
an analytic calculation.

A note: checking the interface of `CostFunction`

shows that all
its methods are declared as `const`

, so passing it as non-{const}
reference to the `Problem`

constructor was probably a goof.
Changing it to `const`

would widen the contract of the
constructor, so it should be possible without breaking backward
compatibility.

Finally, the `Constraint`

class, shown in the following listing.

It works as a base class for constraints to be applied to the domain
of the cost function; the library also defines a few predefined ones,
not shown here, as well as a `CompositeConstraint`

class that can be
used to merge a number of them into a single one.

Its main method is `test`

, which takes an array of variables and
returns whether or not they satisfy the constraint; that is, if the
array belongs to the domain we specified as valid. It also defines
the `upperBound`

and `lowerBound`

methods, which in theory should
specify the maximum and minimum value of the variables but in practice
can’t always specify them correctly; think of the case in which the
domain is a circle, and you’ll see that there are cases in which
\( x \) and \( y \) are both between their upper and lower bounds
but the resulting point is outside the domain.

A couple more notes. First, the `Constraint`

class also defines an
`update`

method. It’s not `const`

, which would make sense if it
updated the constraint; except it doesn’t. It takes an array of
variables and a direction, and extends the original array in the given
direction until it satisfies the constraint. It should have been
`const`

, it should have been named differently, and as the kids say I
can’t even. This might be fixed, though. Second, the class uses the
pimpl idiom (see this post) and the default
constructor also takes an optional pointer to the implementation.
Were I to write the class today, I’d have a default constructor taking
no arguments and an additional constructor taking the implementation
and declared as protected and to be used by derived classes only.

Some short final thoughts on the `const`

-correctness of these classes.
In summary: it’s not great, with some methods that can be fixed and
some others that can’t. For instance, changing the `minimize`

method
would break backwards compatibility (since it’s a virtual method, and
`const`

ness is part of its signature) as well as a few optimizers,
that call other methods from `minimize`

and use data members as a way
to transfer information between methods. (Some of them declare those
data members as `mutable`

, suggesting the the methods might have been
`const`

in the past. As I write this, I haven’t investigated this
further.) We could have avoided this if we had put some more effort in
reviewing code before version 1.0. Let this be a lesson for you,
young coders.

#### Bibliography

[1] W.H. Press, S.A. Teukolsky, W.T. Vetterling and B.P. Flannery,
*Numerical Recipes in C*, 2nd edition. Cambridge University Press,
1992.