Odds and ends: solvers and optimizers

Welcome back.

I hope you all had nice vacations. As for me, my summer was definitely a productive one: I finally managed to complete Implementing QuantLib. The full book is available on Leanpub; I’ll be also posting the new content here in the next few weeks, leading up to the next edition of my course in October (more info in the banner above). In today’s post, we start with a look at solvers and optimizers.

Subscribe to my Substack to receive my posts in your inbox, or follow me on Twitter or LinkedIn if you want to be notified of new posts, or subscribe via RSS if you’re the tech type: the buttons for all that are in the footer. Also, I’m available for training, both online and (when possible) on-site: visit my Training page for more information.

One-dimensional solvers

Solvers were used in the bootstrap routines described in this post, the yield calculations mentioned in this other one, and any code that needs a calculated value to match a target; i.e, that needs, given a function \( f \), to find the \( x \) such that \( f(x) = \xi \) within a given accuracy.

The existing solvers will find the \( x \) such that \( f(x) = 0 \); of course, this doesn’t make them any less generic, but it requires you to define the additional helper function \( g(x) \equiv f(x)-\xi \). There are a few of them, all based on algorithms which were taken from Numerical Recipes in C [1] and duly reimplemented. (This also goes for a few multi-dimensional optimizers. In that case, apart from the obvious copyright issues, we also rewrote them in order to use idiomatic C++ and start indexing arrays from 0.)

The following listing shows the interface of the class template Solver1D, used as a base by the available solvers.

    template <class Impl>
    class Solver1D : public CuriouslyRecurringTemplate<Impl> {
      public:
        template <class F>
        Real solve(const F& f,
                   Real accuracy,
                   Real guess,
                   Real step) const;
        template <class F>
        Real solve(const F& f,
                   Real accuracy,
                   Real guess,
                   Real xMin,
                   Real xMax) const;
        void setMaxEvaluations(Size evaluations);
        void setLowerBound(Real lowerBound);
        void setUpperBound(Real upperBound);
    };

    class Brent : public Solver1D<Brent> {
      public:
        template <class F>
        Real solveImpl(const F& f,
                       Real xAccuracy) const;
    };

    class Newton : public Solver1D<Newton> {
      public:
        template <class F>
        Real solveImpl(const F& f,
                       Real xAccuracy) const;
    };

It provides some boilerplate code, common to all of them: one overload of the solve method looks for lower and upper values of \( x \) that bracket the solution, while the other checks that the solution is actually bracketed by the passed minimum and maximum values. In both cases, the actual calculation is delegated to the solveImpl method defined by the derived class and implementing a specific algorithm. Other methods allow you to set constraints on the explored range, or the number of function evaluations.

The forwarding to solveImpl is implemented using the Curiously Recurring Template Pattern, already described in a previous post. When we wrote these classes, we were at the height of our template craze (did I mention we even had an implementation of expression templates?) so you might suspect that the choice was dictated by the fashion of that time. However, it wouldn’t have been possible to use dynamic polymorphism. We wanted the solvers to work with any function pointer or function object, and boost::function wasn’t around yet, which forced us to use a template method. Since the latter couldn’t be virtual, CRTP was the only way to put the boilerplate code in the base class and let it call a method defined in derived ones.

A few notes to close this subsection. First: if you want to write a function that takes a solver, the use of CRTP forces you to make it a template, which might be awkward. To be honest, most of the times we didn’t bother and just hard-coded an explicit choice of solver. I won’t blame you if you do the same. Second: most solvers only use the passed f by calling f(x), so they work with anything that can be called as a function, but Newton and NewtonSafe also require that f.derivative(x) be defined. This, too, might have been awkward if we used dynamic polymorphism. Third, and last: the Solver1D interface doesn’t specify if the passed accuracy \( \epsilon \) should apply to \( x \) (that is, if the returned \( \tilde{x} \) should be within \( \epsilon \) of the true root) or to \( f(x) \) (that is, if \( f(\tilde{x}) \) should be within \( \epsilon \) of 0). However, all existing solvers treat it as the accuracy on \( x \).

Optimizers

Multi-dimensional optimizers are more complex than 1-D solvers. In a nutshell, they find the set of variables \( \tilde{\mathbf{x}} \) for which the cost function \( f(\mathbf{x}) \) returns its minimum value; but there’s a bit more to it, as we’ll see when we look at the implementation of the cost function.

In this case, we didn’t use templates. Optimizers inherit from the base class OptimizationMethod, shown in the next listing.

    class OptimizationMethod {
      public:
        virtual ~OptimizationMethod() {}
        virtual EndCriteria::Type minimize(
                        Problem& P,
                        const EndCriteria& endCriteria) = 0;
    };

Its only method, apart from the virtual destructor, is minimize. The method takes a reference to a Problem instance, which in turn contains references to the function to minimize and to any constraints, and performs the calculations; at the end of which, it has the problem of having too many things to return. Besides the best-solution array \( \tilde{\mathbf{x}} \), it must return at least the reason for exiting the calculations (did the minimization converge, or is it returning its best guess after the maximum number of evaluations was hit?) and it would be nice to return the value of the function in \( \tilde{\mathbf{x}} \) as well, since it’s likely that it’s already calculated.

In the current implementation, the method returns the reason for exiting and stores the other results inside the Problem instance. This is also the reason for passing the problem as a non-const reference; an alternative solution might have been to leave the Problem instance alone and to return all required values in a structure, but I see how this might be seen as more cumbersome. On the other hand, I see no reason for minimize itself to be non-const: my guess is that it was an oversight on our part (I’ll get back to this later).

Onward to the Problem class, shown in the listing below.

    class Problem {
      public:
        Problem(CostFunction& costFunction,
                Constraint& constraint,
                const Array& initialValue = Array());

        Real value(const Array& x);
        Disposable<Array> values(const Array& x);
        void gradient(Array& grad_f, const Array& x);
        // ... other calculations ...

        Constraint& constraint() const;
        // ... other inspectors ...

        const Array& currentValue();
        Real functionValue() const;
        void setCurrentValue(const Array& currentValue);
        Integer functionEvaluation() const;
        // ... other results ...
    };

As I mentioned, it groups together arguments such as the cost function to minimize, the constraints, and an optional guess. It provides methods that call the underlying cost function while keeping track of the number of evaluation, and that I’ll describe when talking about cost functions; a few inspectors for its components; and methods to retrieve the results (as well as to set them; the latter are used by the optimizers).

The problem (no pun intended) is that it takes and stores its components as non-const references. I’ll talk later about whether they might be const instead. The fact that they’re reference is an issue in itself, since it puts the responsibility on client code to make sure that their lifetimes last at least as much as that of the Problem instance.

In an alternative implementation in which the optimizer returned its results in a structure, the issue might be moot: we might do away with the Problem class and pass its components directly to the minimize method. This would sidestep the lifetime issue, since they wouldn’t be stored. The disadvantage would be that each optimizer would have to keep track of the number of function evaluations, causing some duplication in the code base.

Also unlike for 1-D solvers, the cost function is not a template parameter for the minimization method. It needs to inherit from the CostFunction class, shown in the next listing.

    class CostFunction {
      public:
        virtual ~CostFunction() {}

        virtual Real value(const Array& x) const = 0;
        virtual Array values(const Array& x) const = 0;

        virtual void gradient(Array& grad, const Array& x) const;
        virtual Real valueAndGradient(Array& grad,
                                      const Array& x) const;
        virtual void jacobian(Matrix &jac, const Array &x) const;
        virtual Array valuesAndJacobian(Matrix &jac,
                                        const Array &x) const;
    };

Unsurprisingly, its interface declares the value method, which returns, well, the value of the function for the given array of arguments (although you might be a bit surprised that it doesn’t declare operator() instead); but it also declares a values method returning an array, which is a bit more surprising until you remember that optimizers are often used for calibrating over a number of quotes. Whereas value returns the total error to minimize (let’s say, the sum of the squares of the errors, or something like it), values returns the set of errors over each quote; there are algorithms that can make use of this information to converge more quickly.

Other methods return the derivatives of the value, or the values, for use by some specific algorithms: gradient calculates the derivative of value with respect to each variable and stores it in the array passed as first argument, jacobian does the same for values filling a matrix, and the valueAndGradient and valuesAndJacobian methods calculate both values and derivatives at the same time for efficiency. They have a default implementation that calculates the derivatives by finite differences; but of course that’s costly, and I’m not sure that it’s worthwhile to use derivative-based algorithms if you can’t override the method with an analytic calculation.

A note: checking the interface of CostFunction shows that all its methods are declared as const, so passing it as non-{const} reference to the Problem constructor was probably a goof. Changing it to const would widen the contract of the constructor, so it should be possible without breaking backward compatibility.

Finally, the Constraint class, shown in the following listing.

    class Constraint {
      protected:
        class Impl;
      public:
        bool test(const Array& p) const;
        Array upperBound(const Array& params) const;
        Array lowerBound(const Array& params) const;
        Real update(Array& p,
                    const Array& direction,
                    Real beta);
        Constraint(const shared_ptr<Impl>& impl =
                                             shared_ptr<Impl>());
    };

It works as a base class for constraints to be applied to the domain of the cost function; the library also defines a few predefined ones, not shown here, as well as a CompositeConstraint class that can be used to merge a number of them into a single one.

Its main method is test, which takes an array of variables and returns whether or not they satisfy the constraint; that is, if the array belongs to the domain we specified as valid. It also defines the upperBound and lowerBound methods, which in theory should specify the maximum and minimum value of the variables but in practice can’t always specify them correctly; think of the case in which the domain is a circle, and you’ll see that there are cases in which \( x \) and \( y \) are both between their upper and lower bounds but the resulting point is outside the domain.

A couple more notes. First, the Constraint class also defines an update method. It’s not const, which would make sense if it updated the constraint; except it doesn’t. It takes an array of variables and a direction, and extends the original array in the given direction until it satisfies the constraint. It should have been const, it should have been named differently, and as the kids say I can’t even. This might be fixed, though. Second, the class uses the pimpl idiom (see this post) and the default constructor also takes an optional pointer to the implementation. Were I to write the class today, I’d have a default constructor taking no arguments and an additional constructor taking the implementation and declared as protected and to be used by derived classes only.

Some short final thoughts on the const-correctness of these classes. In summary: it’s not great, with some methods that can be fixed and some others that can’t. For instance, changing the minimize method would break backwards compatibility (since it’s a virtual method, and constness is part of its signature) as well as a few optimizers, that call other methods from minimize and use data members as a way to transfer information between methods. (Some of them declare those data members as mutable, suggesting the the methods might have been const in the past. As I write this, I haven’t investigated this further.) We could have avoided this if we had put some more effort in reviewing code before version 1.0. Let this be a lesson for you, young coders.

Bibliography

[1] W.H. Press, S.A. Teukolsky, W.T. Vetterling and B.P. Flannery, Numerical Recipes in C, 2nd edition. Cambridge University Press, 1992.