VWAP

What is it?

Volume Weighted Average Price (VWAP) is a measure of average price over a certain time period. Consider the following trades:

Time     | Sym | Price | Volume

-------------------------------

10:01:01 | `AA | 1.234 | 1000

10:01:02 | `AA | 1.3   | 1

10:01:03 | `AA | 1.235 | 2000

10:01:04 | `AA | 1.234 | 500

If we were to take a simple average (mean) of the prices of all trades for symbol `AA, it would be 1.25075. But this is clearly not an accurate reflection of the average price. The one share traded at a clearly erroneous price of 1.3 is given just as much weight in the calculation as the 2000 shares traded at price 1.235. We will get a more accurate reflection of the average price by weighting each price based on the number of shares traded at that price - a volume weighted average price.

How is it calculated normally?

From Wikipedia:

Essentially, rather than adding up all of the prices and dividing by the number of prices, to get a weighted average we should multiply each price and volume, and add up the sum of those, and then divide that by the sum of the volumes. 

First, let's multiply each price and volume to get what is commonly referred to as the notional, and then sum the Volume and Notional rows:

time     | sym | price | volume | notional

------------------------------------------

10:01:01 | `AA | 1.234 | 1000   | 1234

10:01:02 | `AA | 1.3   | 1      | 1.3

10:01:03 | `AA | 1.235 | 2000   | 2470

10:01:04 | `AA | 1.234 | 500    | 617

      SUM | 3501   | 4322.30

The VWAP calculation is the sum of each price*quantity divided by the sum of the quantity, so that is just 4322.30 % 3501 = 1.234590.

How is it calculated in kdb?

Simple VWAP

Although there is no 'vwap' function/keyword in kdb, a VWAP is just a specific term for a weighted average using volume and price. A weighted average can be calculated for not just volume and price, and there is a keyword in kdb for it - 'wavg.' It is a binary function that takes two numerical lists and returns the average of the y list weighted by the x list.

e.g.:

q)1 2 wavg 1 4

3f

To calculate the VWAP of the previous table in kdb we can use 'wavg' with the columns from the table:

q)wavg[trade`volume;trade`price]

1.23459

However this wouldn't work if the table had multiple symbols. Much easier to calculate the VWAP value 'by sym' in an update statement:

q)update vwap:volume wavg price by sym from trade

time     sym price volume vwap

---------------------------------

10:01:01 AA  1.234 1000   1.23459

10:01:02 AA  1.3   1      1.23459

10:01:03 AA  1.235 2000   1.23459

10:01:04 AA  1.234 500    1.23459

Running VWAP

In the above table the 'vwap' column gives the same value in all rows - the VWAP of all four trades.

But it would be useful if we knew the VWAP 'up until that point' i.e. a running VWAP of the total VWAP up to and including each trade. To get the VWAP of a trade and all previous trades, we need to know the total notional value and total volume of all trades up to and including each trade.

q)update totalNotional:sums price*volume, totalVol:sums volume by sym from `trade


q)update rvwap:totalNotional%totalVol from trade

time     sym price volume totalVol totalNotional rvwap

-----------------------------------------------------

10:01:01 AA  1.234 1000   1000     1234          1.234

10:01:02 AA  1.3   1      1001     1235.3        1.234066

10:01:03 AA  1.235 2000   3001     3705.3        1.234688

10:01:04 AA  1.234 500    3501     4322.3        1.23459

Note: a better method of calculating (sums price*volume) is (volume wsum price)

Moving VWAP - by tick

What if we wanted to find the VWAP over a 2 trade window?

q)update vwap2tick: ( 2 msum price * volume ) % 2 msum volume by sym from trade

time     sym price volume vwap2tick

-----------------------------------

10:01:01 AA  1.234 1000   1.234

10:01:02 AA  1.3   1      1.234066

10:01:03 AA  1.235 2000   1.235032

10:01:04 AA  1.234 500    1.2348

Moving VWAP - by time

And if we wanted to find the VWAP over a 1 minute window? Let's make the table a bit more complex so we can better demonstrate:

time                          sym price volume px

-------------------------------------------------------

2023.05.04D14:11:47.841485000 AA  0.964 100    0.964

2023.05.04D14:11:57.841485000 AA  0.714 700    0.74525

2023.05.04D14:12:07.841485000 AA  1.574 700    1.132

2023.05.04D14:12:17.841485000 BB  15.5  2000   15.5

2023.05.04D14:12:27.841485000 AA  1.894 400    1.292421

2023.05.04D14:12:37.841485000 BB  15.75 5000   15.67857

2023.05.04D14:12:47.841485000 AA  1.134 500    1.259417

2023.05.04D14:12:57.841485000 BB  14.89 3000   15.442

2023.05.04D14:13:07.841485000 AA  1.484 400    1.51

2023.05.04D14:13:17.841485000 BB  14.57 4000   15.19286

To calculate the one minute VWAP by sym we could do something like:

q)update vwap:volume wavg price by 1 xbar time.minute, sym from `a

time                          sym price volume vwap

------------------------------------------------------

2023.05.04D14:11:47.841485000 AA  0.964 100    0.74525

2023.05.04D14:11:57.841485000 AA  0.714 700    0.74525

2023.05.04D14:12:07.841485000 AA  1.574 700    1.5165

2023.05.04D14:12:17.841485000 BB  15.5  2000   15.442

2023.05.04D14:12:27.841485000 AA  1.894 400    1.5165

2023.05.04D14:12:37.841485000 BB  15.75 5000   15.442

2023.05.04D14:12:47.841485000 AA  1.134 500    1.5165

2023.05.04D14:12:57.841485000 BB  14.89 3000   15.442

2023.05.04D14:13:07.841485000 AA  1.484 400    1.484

2023.05.04D14:13:17.841485000 BB  14.57 4000   14.57

This gives the VWAP value for each sym across each 1 minute period, so any values within the same minute will be the same. 

Let's go one further: how can we calculate the VWAP of the preceding 1 minute for each trade, by sym?

Here is our trade table:

q)trade

time                          sym price       size

---------------------------------------------------------

2023.05.08D06:31:29.497898000 AA  116.9936556 758.8380747

2023.05.08D06:31:39.497898000 BB  20.58571976 3.467626427

2023.05.08D06:31:49.497898000 AA  110.6224288 722.0672367

2023.05.08D06:31:59.497898000 BB  23.62214103 7.185156806

2023.05.08D06:32:09.497898000 AA  112.9249087 306.9863676

2023.05.08D06:32:19.497898000 BB  27.84005021 81.26024252

2023.05.08D06:32:29.497898000 AA  104.0386648 14.84118844

2023.05.08D06:32:39.497898000 BB  22.34468815 73.76396409

2023.05.08D06:32:49.497898000 AA  102.5257953 480.4908182

2023.05.08D06:32:59.497898000 BB  24.02710968 72.15169056


Using an update statement:

q)update vwap1m1:{[x;y]exec size wavg price from trade where sym=y, time within(x-00:01:00;x)}'[time;sym] from `trade

time                          sym price       size        vwap1m1

---------------------------------------------------------------------

2023.05.08D06:31:29.497898000 AA  116.9936556 758.8380747 116.9936556

2023.05.08D06:31:39.497898000 BB  20.58571976 3.467626427 20.58571976

2023.05.08D06:31:49.497898000 AA  110.6224288 722.0672367 113.8871409

2023.05.08D06:31:59.497898000 BB  23.62214103 7.185156806 22.63374444

2023.05.08D06:32:09.497898000 AA  112.9249087 306.9863676 113.7219228

2023.05.08D06:32:19.497898000 BB  27.84005021 81.26024252 27.23663577

2023.05.08D06:32:29.497898000 AA  104.0386648 14.84118844 113.6422043

2023.05.08D06:32:39.497898000 BB  22.34468815 73.76396409 25.05860581

2023.05.08D06:32:49.497898000 AA  102.5257953 480.4908182 108.4699295

2023.05.08D06:32:59.497898000 BB  24.02710968 72.15169056 24.80722491

In this case we could also use a wj.

w:(-00:01:00 00:00:00)+\:trade.time

q)trade:wj1[w;`sym`time;trade;(`sym`time xasc select time, sym, vwap1m2:price, size from trade;(wavg;`size;`vwap1m2))]

time                          sym price       size        vwap1m1     vwap1m2

---------------------------------------------------------------------------------

2023.05.08D06:31:29.497898000 AA  116.9936556 758.8380747 116.9936556 116.9936556

2023.05.08D06:31:39.497898000 BB  20.58571976 3.467626427 20.58571976 20.58571976

2023.05.08D06:31:49.497898000 AA  110.6224288 722.0672367 113.8871409 113.8871409

2023.05.08D06:31:59.497898000 BB  23.62214103 7.185156806 22.63374444 22.63374444

2023.05.08D06:32:09.497898000 AA  112.9249087 306.9863676 113.7219228 113.7219228

2023.05.08D06:32:19.497898000 BB  27.84005021 81.26024252 27.23663577 27.23663577

2023.05.08D06:32:29.497898000 AA  104.0386648 14.84118844 113.6422043 113.6422043

2023.05.08D06:32:39.497898000 BB  22.34468815 73.76396409 25.05860581 25.05860581

2023.05.08D06:32:49.497898000 AA  102.5257953 480.4908182 108.4699295 108.4699295

2023.05.08D06:32:59.497898000 BB  24.02710968 72.15169056 24.80722491 24.80722491

Alternative wj method:

q)delete notional, cumVol from update vwap1m3: notional%cumVol from wj1[w; `sym`time;trade; ( `sym`time xasc select sym, time, notional: price * size , cumVol:size from trade;( sum; `notional); ( sum; `cumVol) ) ]

time                          sym price       size        vwap1m1     vwap1m2     vwap1m3

---------------------------------------------------------------------------------------------

2023.05.08D06:31:29.497898000 AA  116.9936556 758.8380747 116.9936556 116.9936556 116.9936556

2023.05.08D06:31:39.497898000 BB  20.58571976 3.467626427 20.58571976 20.58571976 20.58571976

2023.05.08D06:31:49.497898000 AA  110.6224288 722.0672367 113.8871409 113.8871409 113.8871409

2023.05.08D06:31:59.497898000 BB  23.62214103 7.185156806 22.63374444 22.63374444 22.63374444

2023.05.08D06:32:09.497898000 AA  112.9249087 306.9863676 113.7219228 113.7219228 113.7219228

2023.05.08D06:32:19.497898000 BB  27.84005021 81.26024252 27.23663577 27.23663577 27.23663577

2023.05.08D06:32:29.497898000 AA  104.0386648 14.84118844 113.6422043 113.6422043 113.6422043

2023.05.08D06:32:39.497898000 BB  22.34468815 73.76396409 25.05860581 25.05860581 25.05860581

2023.05.08D06:32:49.497898000 AA  102.5257953 480.4908182 108.4699295 108.4699295 108.4699295

2023.05.08D06:32:59.497898000 BB  24.02710968 72.15169056 24.80722491 24.80722491 24.80722491

However if you look at the VWAP examples from online resources at the bottom of this page, you will notice that none of them use wj. In fact, they almost all maintain a running total of notional and volume in the trade table and use that to eventually calculate VWAP. Why is that? Performance. If you are receiving realtime updates into a process, performing a full window join on every update is inefficient. In actual fact, even if you have the full dataset and perform the window join once, it is still slow.

Moving VWAP performance

Make the table a lot bigger:

q)n:500000; bigTrade:([]time:.z.p+(0D00:00:10) * til n;sym:n?`AA`BB`CC`DD`EE;price:n?10.0;size:n?1000.0)

q)\ts bigTrade:update vwap1m1:{[x;y]exec size wavg price from bigTrade where sym=y, time within(x-00:01:00;x)}'[time;sym] from bigTrade

629372 16390704

q)\ts w:(-00:01:00 00:00:00)+\:bigTrade.time

2 8389104

q)\ts bigTrade:wj1[w;`sym`time;bigTrade;(`sym`time xasc select time, sym, vwap1m2:price, size from bigTrade;(wavg[;];`size;`vwap1m2))]

890 57168464

q)\ts w:(-00:01:00 00:00:00)+\:bigTrade.time

2 8389104

q)\ts bigTrade:delete notional, cumVol from update vwap1m3: notional%cumVol from wj1[w; `sym`time;bigTrade; ( `sym`time xasc select sym, time, notional: price * size , cumVol:size from bigTrade;( sum; `notional); ( sum; `cumVol) ) ]

877 65558096

The most efficient method is by using an aj:

q)\ts bigTrade: update time,ssize:sums size,sval:sums price*size by sym from bigTrade

24 50333312

q)\ts bigTrade:aj[`sym`time;bigTrade;`sym`time xasc select 1+time+00:01:00,sym,prevssize:ssize,prevsval:sval from bigTrade]

195 46139568

// one nanosecond added otherwise any records exactly one minute ago will be excluded, which is different behaviour than the methods against which we are comparing


q)\ts bigTrade:delete ssize, sval, prevssize, prevsval from update vwap1m4:(sval-0^prevsval)%ssize-0^prevssize from bigTrade

7 16778816


q)bigTrade

time                          sym price  size   vwap1m1 vwap1m2 vwap1m3 vwap1m4

-------------------------------------------------------------------------------

2023.05.09D08:09:27.168023000 EE  3.1759 26.487 3.1759  3.1759  3.1759  3.1759

2023.05.09D08:09:37.168023000 EE  1.2626 83.115 1.7249  1.7249  1.7249  1.7249

2023.05.09D08:09:47.168023000 DD  5.6054 679.48 5.6054  5.6054  5.6054  5.6054

2023.05.09D08:09:57.168023000 BB  3.6454 516.64 3.6454  3.6454  3.6454  3.6454

2023.05.09D08:10:07.168023000 EE  7.1711 754.66 6.4804  6.4804  6.4804  6.4804

2023.05.09D08:10:17.168023000 DD  7.5136 92.603 5.8342  5.8342  5.8342  5.8342

2023.05.09D08:10:27.168023000 AA  7.955  397.7  7.955   7.955   7.955   7.955 

2023.05.09D08:10:37.168023000 CC  3.2651 247.17 3.2651  3.2651  3.2651  3.2651

2023.05.09D08:10:47.168023000 EE  1.0705 551.34 4.5956  4.5956  4.5956  4.5956

2023.05.09D08:10:57.168023000 CC  3.6095 279.14 3.4477  3.4477  3.4477  3.4477

2023.05.09D08:11:07.168023000 DD  3.3319 432.06 4.07    4.07    4.07    4.07  

2023.05.09D08:11:17.168023000 EE  3.7455 710.49 2.5767  2.5767  2.5767  2.5767

2023.05.09D08:11:27.168023000 AA  6.3989 969.1  6.8517  6.8517  6.8517  6.8517  

...

All of the examples so far have involved us running one query against a full table of data. However this may not always be the case. For example, in a tick system we would be getting large numbers of small updates throughout the day. The 'aj' method is one that could be used in a tick setup - the columns created on the first step could be maintained with each update, and the second and third step could be ran as and when needed.

Are there any other possible methods that could be used? Let's take a look...

Calculating VWAP in a tick system - examples

Let's looks at the methods found in the following online resources: 

Method 1

In this method, a VWAP RTE process updates a table called 'vwap' not with the actual VWAP value per symbol but with the total notional and total volume for each sym up to that point. Note the use of '+:' to achieve the running sum of notional and volume per symbol.

if[x~"vwap";t:`trade;

 upd:{[t;x]vwap+:select size wsum price,sum size by sym from x};

 upds:{[t;x]vwap+:select size wsum price,sum size by sym from x;show x}]

At any point the VWAP can be calculated by dividing the two numbers, however once added to the vwap table the running sum of size and notional are lost.

Method 2

Again we have a VWAP RTE process updating a 'vwap' table. In this example the running sum of notional and size are kept in the 'trade' table along with the 'running VWAP' per symbol up to and including each trade. On each upd of the trade table the running sums are calculated and the most recent VWAP per sym is upserted into the 'vwap' table. This 'vwap' table could then be published back to the tickerplant or queried directly.

As the running sum of notional and size are kept in the trade table, a VWAP over time could be calculated easily.

Let's test it out:

//define schemas

q)vwap:([sym:`$()] rvwap:`float$())

q)trade:([]time:`timestamp$();sym:`symbol$();price:`float$();size:`float$();v:`float$();s:`float$();rvwap:`float$())


//sample 10 updates

q)tenUpdates:([]time:.z.p + (00:00:10) * til 10;sym:raze 5#enlist `AA`BB;price:(10?1.0)+((5?100.0 + til 20),5?10.0 + til 20)n;size:((5?1000.0),5?100.0)n:raze flip 5 cut til 10)


// first 5 updates

q)d:5#tenUpdates

q)t:`trade


// simulate upd run

q)d:update s:sums size,v:sums size*price by sym from d;

q)d:d pj select last v,last s by sym from trade;

q)d:update rvwap:v%s from d;

q)`trade insert d;

q)`vwap upsert select last rvwap by sym from trade;


// examine results

q)trade

time                          sym price    size     v        s        rvwap

------------------------------------------------------------------------------

2023.05.07D08:49:39.229090000 AA  112.3186 751.2502 84379.35 751.2502 112.3186

2023.05.07D08:49:49.229090000 BB  21.44102 24.36638 522.44   24.36638 21.44102

2023.05.07D08:49:59.229090000 AA  111.8607 779.6748 171594.3 1530.925 112.0854

2023.05.07D08:50:09.229090000 BB  22.56004 9.386107 734.191  33.75249 21.7522

2023.05.07D08:50:19.229090000 AA  114.1635 788.3966 261600.4 2319.322 112.7918


q)vwap

sym| rvwap

---| --------

AA | 112.7918

BB | 21.7522

So far the methods have been similar to what was discussed in the article. In the next section we introduce a new method:

Method 3

In c.q there are four VWAP methods, however two have already been covered. The other two are below:

/ vwap last 10 ticks

if[x~"vwap2";t:`trade;f:{[p;s](-10#s)wavg -10#p};

 upd:{[t;x].[`.u.t;();,'';select price,size by sym from x];

  vwap::`sym xkey select sym,vwap:f'[price;size]from .u.t}]


/ vwap last minute (60000 milliseconds)

if[x~"vwap3";t:`trade;f:{[t;p;s](n#s)wavg(n:(1+t bin("t"$.z.Z)-60000)-count t)#p};

 upd:{[t;x].[`.u.t;();,'';select time,price,size by sym from x];

  vwap::`sym xkey select sym,vwap:f'[time;price;size]from .u.t}]

Here there is a separate '.u.t' table which is maintaining time and size per sym in list format. Although similar to previous methods, once the data is in list format, the method used to get a 'by time' or 'by tick' VWAP value is slightly different.

Let's test it out:

// define a table to keep time, price and size by sym

q).u.t:([sym:`symbol$()] time:`timestamp$(); price:`float$(); quantity:`float$())


// on upd, add all times, prices and sizes by sym into the .u.t table

q)upd:{[t;x] .u.t:.u.t,'select time, price, size by sym from x}


// sample upd of trade table similar to that in the previous section

q)upd[`trade;trade]


// .u.t shows lists of time, price and size by syms

q).u.t

sym| time                                                                                        ..

---| --------------------------------------------------------------------------------------------..

AA | 14:13:05.227 14:13:15.227 14:13:25.227 14:13:45.227 14:14:05.227 14:14:15.227 14:14:45.227 1..

BB | 14:13:35.227 14:13:55.227 14:14:25.227 14:14:35.227 14:14:55.227 14:15:05.227 14:15:35.227 1..


// define function to calculate 1 min VWAP

// .z.Z replaced with hardcoded time of 14:16:00 to match times in trade update

q)f:{[t;p;s](n#s)wavg(n:(1+t bin("t"$14:16:00)-60000)-count t)#p}


// VWAP table would look like this, showing the VWAP time per sym for the one minute preceding 14:16:00

q)`sym xkey select sym,vwap:f'[time;price;size]from .u.t

sym| vwap

---| --------

AA | 1.234533

BB | 10.15198

The key function here is defined in 'f' and it is doing the following with the time, price and size values for each sym in the .u.t table

f:{[t;p;s](n#s)wavg(n:(1+t bin("t"$.z.Z)-60000)-count t)#p}

Breaking it down:

(1+t bin("t"$.z.Z)-60000)

Subtract 1 minute from current time and then add one to the last index of t where the time is less than or equal to that time.

(n:(1+t bin("t"$.z.Z)-60000)-count t)

Subtract the count of t from this index and set it to variable 'n'.  This variable now contains a negative integer - the count (working backwards from the most recent value) of records in t that are greater than now-1minute. 

(n#s)wavg(n:(1+t bin("t"$14:16:00)-60000)-count t)#p

The final function then takes the last minutes values from the size column and performs a weighted average with the last minutes values from the price column, producing a 1 minute vwap.

Method 4

In this solution, two tables are maintained - a table for running sum values and a separate table for the latest calculations per sym.

A separate metrics function provides VWAP and TWAP metrics for 1min, 5min and 1hr windows using aj. Let's test it out:

Method 5