When a “tech issue” becomes a management catastrophe

Amidst all of the change and turbulence in the years since the financial crisis, one constant trend has been a continuing rise in the proportion of trades that are conducted electronically. This brings many advantages: relative to voice trading, e-trading tends to be convenient, fast, consistent, reliable, auditable and cheap. However, when human agency is removed from the mechanics of placing each trade there are also new risks. For example, as a matter of routine, positions can accrue on several different trading venues that the responsible supervising human can only see and validate after the fact. Worse, if there is a coding error – or a code deployment error – in trading logic, there is a risk of a trading algorithm “spinning out of control”. The paradigmatic example was the 2012 Knight Capital event in which a machine erroneously executed around 1,500 trades per second, losing around $165,000 per second, for a period of about 45 minutes. This kind of calamity can only be achieved by automation!

Naturally, market participants are extremely motivated to prevent such errors. Regulators are also increasingly insistent that trading firms can evidence a credible set of controls around automated trading to avoid calamitous market disruption. This is a real and legitimate concern since there can be a tendency for management to view the control environment for automated trading as a “tech issue” – until it fails. Regulators are thus increasingly focusing not only on the important matter of what specific risk controls are in place but also how the process of developing and deploying such controls is managed.

Ways to fail

Every firm that deploys machines to place electronic trades necessarily has checks in place to avoid catastrophic failure. However, this is not easy to do well and there are several ways for the control infrastructure to fall short. Examples include:

  1. Control logic is implemented in the same code line as the trading system. While the reasons to do this are obvious – it’s convenient; it may save the performance cost of crossing a process boundary; the developers with the expertise to write the trading system are likely to have the most appropriate skills and experience – the flaws in this approach are equally plain: the only true check is an independent check written by different staff and implemented in separate software.
  2. The check is too slow to execute. Not all markets move at the same speed but in the fastest markets microseconds matter. If the trade control infrastructure adds more latency than the trading business wants to afford the checks will be compromised or by-passed.
  3. The check is not inline. It is certainly useful to have a risk check that will quickly detect a problem post facto and alert trading supervisors so that it can be rapidly remedied. However, this is not enough, either from the business or regulatory perspective. The risk control framework must check each electronic trade before it is executed so that rogue trades can be intercepted before they are placed.
  4. The controls do not span all trading venues. An important check on electronic trading is to ensure that the net positions built up over a number of trading venues lie, in aggregate, within limits. This requires risk checks to span all relevant trading venues. Single venue checks such as checks provide by trading venues themselves do not logically suffice, even if they are collated after execution.
  5. There are gaps in the rule set. There are many ways for an electronic trading agent to mis-perform – for example, trading at the wrong price or in the wrong quantity; accruing positions that are too large; not recognising that an order has been filled and repeatedly re-issuing it; or falling into a pathological race condition with another electronic trading agent. The “rule set” that is checked against must, as far as reasonably possible, cover all such eventualities. It must also meet all regulatory requirements in each product in each jurisdiction. An incomplete rule set, by definition, leaves some risks unaddressed. 
  6. The rule set is too complex. Once a trading firm has the ability to set in place rules to control e-trading there is a risk that if the process for doing so is not appropriately managed a “rules spaghetti” can result. This is just as dangerous as having gaps in the rule set since if there is insufficient governance the behaviour of the control environment becomes unpredictable. Any institution that finds itself with dozens or hundreds of such rules should treat this in itself as a red flag.
  7. The control software does not do what the developer intended. Even if there is a sound design for the control environment and its rule set, bugs in implementation can lead to failure.
  8. The wrong software is deployed. Even if a trading firm has software that correctly implements a sound control environment, if the wrong software is deployed to production that too can lead to failure.

These are not simply theoretical possibilities: they have all occurred in real life.

What “good” looks like

The kind of control environment that addresses these risks should have the following characteristics:

COMPREHENSIVE It has a “last mile check” or “edge check” that inspects every electronic trade or order before it is committed. This lives on the periphery of the trading firm’s architecture (even if this is run for them as a service).

PERFORMANT  The performance of the last mile check must be sufficient to avoid an adverse impact on trading (or on the adoption and use of the check). Typically, a check should be expected to take around one or two microseconds for major participants in fast markets.

INDEPENDENT  The last mile check is developed independently of the systems that generate the orders and trades. It is to be expected that the trading systems have their own checking logic too, leaving the last mile check as a failsafe.

REAL-TIME A live view of positions, suitably defined, is maintained on a server spanning all trading venues. Some of the last mile check logic refers to these positions.

DEFINED  There is a “rule set” that defines the logic of the last mile check. While the definitive record of the rule set is the software that implements it, this is well documented and the documentation is maintained.

WELL GOVERNED  There is appropriate governance of the entire rule set by all relevant trading supervisors. Business responsibility for the rule set is clearly assigned. The rules and changes to the rules require appropriate review and sign-off that is evidenced electronically.

FULLY TESTED A full set of unit tests exists to validate correct checks against the rule set across all traded products and markets. Ideally, the running of these tests is automated.

CONFIGURABLE  There is a “configuration, messaging and event” infrastructure for human interaction with the last mile check. Here risk limits and alert levels are set, and messages concerning electronic trades and risk levels are generated, filtered and transmitted to traders and supervisors. There are GUI’s for managing and monitoring all of the above and a full audit trail.

DOCUMENTED  Written procedures exist for testing, staging and releasing the software for the last mile check. Software changes are audited and documented. These procedures are followed and signed off by all relevant technology and business managers.

EFFICIENT  Not all checks on trading activity need to be inline. Checks for which a failure need not prevent the entry of orders or trades are implemented asynchronously (as “taps”). These are catalogued separately from the main rule set and do not impede live trading.

MONITORED  Separately, the network may be monitored to ensure that no unauthorised agents establish live connections to trading gateways. This requires introspection of packets and can be implemented either in software or hardware.

Collaboration

Every firm that places trades electronically is faced with the same requirement to avoid errors. Moreover, it is not in the interest of any firm to see its trading counterparties driven out of business by disastrous e-trading losses. Addressing error-avoidance robustly is both difficult and expensive. Since it doesn’t affect what trades are placed or what business a firm chooses to engage in, there is no intrinsic proprietary advantage in building a custom solution. Indeed, it is to the advantage of everyone to agree on a rule set for a last mile check. Not only would this attain a “wisdom of crowds” benefit from the scrutiny of many parties, it could also be expected to improve the quality of discussion with regulators.

There are a number of levels to which such a collaboration could be taken. 

The most elementary is agreeing on a rule set as a written specification. Although weak, this would already be useful. However, in itself it, at best, only addresses issues 5 and 6 of the eight common failures cited above.

A more meaningful and ambitious level of collaboration would be sharing a “community source code” implementation of the last mile check. This would be far more powerful, potentially addressing all eight of the common failures. Such an implementation could be integrated into each firm’s own configuration, messaging and event infrastructure.

For firms that use third party software the shared last mile check could be adopted by the software providers and integrated into their own systems.Collaboration

A further level of collaboration could be achieved by creating a utility to run a last mile check on behalf of the collaborating firms. While this offers benefits over and above sharing source code amongst the community of participants, it also brings new issues of data security, legal liability and constant critical dependence upon the utility owner. Furthermore, all firms using such a utility must be able to prove that its operation is sufficiently closely under their control that they can fully meet strong regulatory diligence standards without reliance on representations from the technology provider.

We therefore believe that, at least initially, the sweet spot for collaboration lies in community code sharing.

The industry track record of collaboration

The Financial Services sector spends a higher proportion of its revenues and higher absolute amounts on IT than any other. Nonetheless, post crisis, while the greater part of this expenditure has been directed towards regulatory-driven market structure changes that all firms must implement to the same ends, successes in collaboration have been few and far between. The same factors that have impeded the realisation of very many other collaboration plays stand in the way of this one too.

Paradoxically, one of the factors that has impeded the realisation of many collaboration plays is a natural tendency to reach directly for the most beneficial possible outcome. As we saw above, the most ambitious play would be to create a utility for executing last mile checks. We also saw that it is the most heroic venture, which, like community source code sharing, requires agreeing on every point of specification – almost certainly without the illuminating ability to inspect the code that implements it – as well as tackling several other kinds of issue (data security, liability, business dependency) at the same time.

Community code sharing, while simpler, has rarely been tried. Partly, this is due to issues of trust in the quality and portability of another firm’s code; concerns about support; and the lack of standardised commercial and legal terms for such a transaction. eCo Financial Technology was incorporated to address precisely these issues.

Beyond this, there are cultural factors arising from the ingrained disposition of leading firms to develop their own software and of smaller firms to license systems from established vendors. However, several reasons that might sometimes weigh against the use of community source are less relevant in the case of a last mile check:

PROPRIETARY EDGE  While it is natural to look for differential advantage in trading logic, the dominating requirement of an edge check is logical and technical soundness. Here, having code-level scrutiny from peer firms is a plus rather than a minus.

SOURCING  In some functional areas it can be hard to identify a code line in use at a relevant institution that can be transplanted conveniently to other firms. There are, though, top quality implementations of last mile checks available in commercial community source form through eCo.

STACK RE-USE  Often a bank will, correctly, strongly favour the re-use of componentry from its own IT stack rather than sourcing equivalent code from elsewhere. However, in the case of a last mile check there is a positive advantage to using code developed independently from the trading software.

COST  Software that comes from third parties is sometimes perceived to have a higher total cost of ownership than software that a firm can develop for itself, especially if it can leverage its existing componentry. In our assessment, few, if any, firms could safely deliver a fully satisfactory solution for edge checking on terms that are favourable to licensing a top quality implementation from elsewhere. Moreover, even if the in-house development costs were zero, when the price of failure can run into the millions, tens of millions or hundreds of millions of US Dollars, the prudential benefit of a community source solution is compelling.

If banks and trading firms come together to adopt a common rule set for last mile checking the chances of any one of them suffering a catastrophic trading error are greatly reduced. If, in addition, these firms also adopt a top quality reference implementation of this rule set in software the chances are reduced far further. On a community source model in which each party that chooses to do so can check in enhancements and extensions and augment the test suite in a properly supervised code acceptance framework, the cost to each party of ensuring the safe adaptability to future requirements is minimised. This should bring comfort to and be encouraged by shareholders and regulators alike.

If beyond this, once the details of business logic and technical design are hammered out through common use, there is a desire also to address those issues of data security, legal liability and continuing critical dependence that are entrained by the creation of a ‘last mile checking utility’ then the prospects of its realisation would be greatly enhanced.


The immediate need for stronger controls is paramount. While it is not possible to eliminate risk entirely, firms should strive to reduce risk by applying strong, layered controls.

Senior Supervisors Group

Algorithmic Trading Briefing Note


Regulatory Pressure

Existing and proposed regulations continue to bring scrutiny to electronic trading and risk controls.

SEC Market Access Rule 15c3-5

“…establish, document, and maintain a system of risk management controls and supervisory procedures reasonably designed to manage the financial, regulatory, and other risks of this business activity…”

The risk management controls must prevent the entry of

  • “orders that exceed appropriate pre-set credit or capital thresholds”
  • “erroneous orders”
  • “orders unless there has been compliance with all regulatory requirements that must be satisfied on a pre-order entry basis”

MiFID II

“…have in place effective systems, procedures and arrangements to reject orders that exceed pre-determined volume and price thresholds or are clearly erroneous.”

“Pre-trade controls should be conducted before an order is submitted to a trading venue. Investment firms should also monitor their trading activity and implement real-time alerts which identify signs of disorderly trading or a breach of their pre-trade limits.”

CFTC Proposed Regulation AT

Proposed regulation requiring firms to “implement pre-trade and other risk controls to address the risks of Algorithmic Trading. These must include pre-trade risk controls (maximum order message and execution frequency per unit time, order price and maximum order size parameters), and order cancellation systems.”


Software or Hardware?

In principle, the encoding of relatively fixed logic for which consistent, very low latency is paramount can sometimes be optimised by an implementation in hardware rather than software. Using a Field-Programmable Gate Array (FPGA), such logic can be encoded directly in an integrated circuit.  As well as offering potential performance benefits, by providing the logic on a physical card or “appliance”, plant cost can be reduced and ease of deployment increased. However, this comes at the cost of opacity: expertise in FPGA programming is hard to come by and programmers without this expertise are unable to validate the implementation. Also, the potential performance benefits are not always realised. For a last mile check, in which the rule set must be versatile, flexible and transparent, we do not, on balance, with the current state of the art, recommend a hardware implementation.