kdb+ Tick

Background

kdb+ is a database designed to handle huge amounts of high volume time series data (usually financial data). As such, you would expect some kind of basic infrastructure or libraries to exist to achieve this. Enter kdb+ Tick...

kdb+ Tick is a standardised architecture developed by Kx which allows the capture, processing, and querying of real-time and historical data.

The term 'tick' in the name refers to 'tick data', which is the most granular level of data available in financial markets, and is usually a sequential dataset containing every single trade and if not all orders/quotes then some aggregation thereof (different levels of Market Data are explained here).

What exactly is kdb+ Tick?

When capturing and processing the data described above, the last thing you want is a bloated, over-engineered architecture that tries to achieve too much at once. Thankfully, and in keeping with the general ethos of the language/database, kdb+ Tick is small and lightweight. 

In fact, as of the time of writing, the entire 'vanilla' kdb+ Tick repository consists of three q scripts with a combined total of 34 lines of code. It can be mind boggling to think that the top banks and financial institutions all have critical infrastructure based on these 34 lines of code, however less code means less points of failure, and thanks to the terse nature of q, an awful lot is achieved in those 34 lines.

The scripts are designed to be used by four main components which constitute the 'vanilla' tick setup, however each implementation will inevitably be different to the rest.

The 'vanilla' Tick setup

The vanilla tick architecture is as below:

A Feedhandler is a process that subscribes to an upstream data feed and parses the incoming data into a format suitable for kdb+. Once parsed, it sends the data to the Tickerplant.

Generally FHs are written in C++ or Java, although you can make your own FH as a q process.

The Tickerplant is a q process that is responsible for receiving data from the FH and publishing that data to all relevant subscribers. The TP is a lightweight process and all other q processes in the framework rely on it for data. Importantly, the TP logs all received updates to a log file (or a journal) for recovery in the event of failure.

The TP uses the tick.q script and the u.q script.

The Realtime Database is a q process which contains all of the day's updates. The RDB subscribes to the TP for all (or a subset of) updates received and keeps them in memory. At end of day (EOD) the RDB writes the day's updates to disk and clears its memory.

The RDB uses the r.q script.

The Historical Database is a q process that loads the on-disk (historical) kdb+ database. 

Other Components

The Real Time Engine, or Real Time Subscriber, is a q process responsible for performing analytics, calculations or manipulations on incoming data, and either making that data available to other processes or to simply publish it back to the feedhandler.

For example, a simple RTE might subscribe to the trade table, calculate VWAP per sym and publish a VWAP table back to the tickerplant.

Intraday Database (IHDB)

Persistent Database (PDB)

Write Database (WDB)

Writedown (WD)

In Depth Walkthroughs

Let's look in detail at the components and scripts of kdb+ Tick:

Further Reading

Comments