VWAP
What is it?
Volume Weighted Average Price (VWAP) is a measure of average price over a certain time period. Consider the following trades:
Time | Sym | Price | Volume
-------------------------------
10:01:01 | `AA | 1.234 | 1000
10:01:02 | `AA | 1.3 | 1
10:01:03 | `AA | 1.235 | 2000
10:01:04 | `AA | 1.234 | 500
If we were to take a simple average (mean) of the prices of all trades for symbol `AA, it would be 1.25075. But this is clearly not an accurate reflection of the average price. The one share traded at a clearly erroneous price of 1.3 is given just as much weight in the calculation as the 2000 shares traded at price 1.235. We will get a more accurate reflection of the average price by weighting each price based on the number of shares traded at that price - a volume weighted average price.
How is it calculated normally?
From Wikipedia:
Essentially, rather than adding up all of the prices and dividing by the number of prices, to get a weighted average we should multiply each price and volume, and add up the sum of those, and then divide that by the sum of the volumes.
First, let's multiply each price and volume to get what is commonly referred to as the notional, and then sum the Volume and Notional rows:
time | sym | price | volume | notional
------------------------------------------
10:01:01 | `AA | 1.234 | 1000 | 1234
10:01:02 | `AA | 1.3 | 1 | 1.3
10:01:03 | `AA | 1.235 | 2000 | 2470
10:01:04 | `AA | 1.234 | 500 | 617
SUM | 3501 | 4322.30
The VWAP calculation is the sum of each price*quantity divided by the sum of the quantity, so that is just 4322.30 % 3501 = 1.234590.
How is it calculated in kdb?
Simple VWAP
Although there is no 'vwap' function/keyword in kdb, a VWAP is just a specific term for a weighted average using volume and price. A weighted average can be calculated for not just volume and price, and there is a keyword in kdb for it - 'wavg.' It is a binary function that takes two numerical lists and returns the average of the y list weighted by the x list.
e.g.:
q)1 2 wavg 1 4
3f
To calculate the VWAP of the previous table in kdb we can use 'wavg' with the columns from the table:
q)wavg[trade`volume;trade`price]
1.23459
However this wouldn't work if the table had multiple symbols. Much easier to calculate the VWAP value 'by sym' in an update statement:
q)update vwap:volume wavg price by sym from trade
time sym price volume vwap
---------------------------------
10:01:01 AA 1.234 1000 1.23459
10:01:02 AA 1.3 1 1.23459
10:01:03 AA 1.235 2000 1.23459
10:01:04 AA 1.234 500 1.23459
Running VWAP
In the above table the 'vwap' column gives the same value in all rows - the VWAP of all four trades.
But it would be useful if we knew the VWAP 'up until that point' i.e. a running VWAP of the total VWAP up to and including each trade. To get the VWAP of a trade and all previous trades, we need to know the total notional value and total volume of all trades up to and including each trade.
q)update totalNotional:sums price*volume, totalVol:sums volume by sym from `trade
q)update rvwap:totalNotional%totalVol from trade
time sym price volume totalVol totalNotional rvwap
-----------------------------------------------------
10:01:01 AA 1.234 1000 1000 1234 1.234
10:01:02 AA 1.3 1 1001 1235.3 1.234066
10:01:03 AA 1.235 2000 3001 3705.3 1.234688
10:01:04 AA 1.234 500 3501 4322.3 1.23459
Note: a better method of calculating (sums price*volume) is (volume wsum price)
Moving VWAP - by tick
What if we wanted to find the VWAP over a 2 trade window?
q)update vwap2tick: ( 2 msum price * volume ) % 2 msum volume by sym from trade
time sym price volume vwap2tick
-----------------------------------
10:01:01 AA 1.234 1000 1.234
10:01:02 AA 1.3 1 1.234066
10:01:03 AA 1.235 2000 1.235032
10:01:04 AA 1.234 500 1.2348
Moving VWAP - by time
And if we wanted to find the VWAP over a 1 minute window? Let's make the table a bit more complex so we can better demonstrate:
time sym price volume px
-------------------------------------------------------
2023.05.04D14:11:47.841485000 AA 0.964 100 0.964
2023.05.04D14:11:57.841485000 AA 0.714 700 0.74525
2023.05.04D14:12:07.841485000 AA 1.574 700 1.132
2023.05.04D14:12:17.841485000 BB 15.5 2000 15.5
2023.05.04D14:12:27.841485000 AA 1.894 400 1.292421
2023.05.04D14:12:37.841485000 BB 15.75 5000 15.67857
2023.05.04D14:12:47.841485000 AA 1.134 500 1.259417
2023.05.04D14:12:57.841485000 BB 14.89 3000 15.442
2023.05.04D14:13:07.841485000 AA 1.484 400 1.51
2023.05.04D14:13:17.841485000 BB 14.57 4000 15.19286
To calculate the one minute VWAP by sym we could do something like:
q)update vwap:volume wavg price by 1 xbar time.minute, sym from `a
time sym price volume vwap
------------------------------------------------------
2023.05.04D14:11:47.841485000 AA 0.964 100 0.74525
2023.05.04D14:11:57.841485000 AA 0.714 700 0.74525
2023.05.04D14:12:07.841485000 AA 1.574 700 1.5165
2023.05.04D14:12:17.841485000 BB 15.5 2000 15.442
2023.05.04D14:12:27.841485000 AA 1.894 400 1.5165
2023.05.04D14:12:37.841485000 BB 15.75 5000 15.442
2023.05.04D14:12:47.841485000 AA 1.134 500 1.5165
2023.05.04D14:12:57.841485000 BB 14.89 3000 15.442
2023.05.04D14:13:07.841485000 AA 1.484 400 1.484
2023.05.04D14:13:17.841485000 BB 14.57 4000 14.57
This gives the VWAP value for each sym across each 1 minute period, so any values within the same minute will be the same.
Let's go one further: how can we calculate the VWAP of the preceding 1 minute for each trade, by sym?
Here is our trade table:
q)trade
time sym price size
---------------------------------------------------------
2023.05.08D06:31:29.497898000 AA 116.9936556 758.8380747
2023.05.08D06:31:39.497898000 BB 20.58571976 3.467626427
2023.05.08D06:31:49.497898000 AA 110.6224288 722.0672367
2023.05.08D06:31:59.497898000 BB 23.62214103 7.185156806
2023.05.08D06:32:09.497898000 AA 112.9249087 306.9863676
2023.05.08D06:32:19.497898000 BB 27.84005021 81.26024252
2023.05.08D06:32:29.497898000 AA 104.0386648 14.84118844
2023.05.08D06:32:39.497898000 BB 22.34468815 73.76396409
2023.05.08D06:32:49.497898000 AA 102.5257953 480.4908182
2023.05.08D06:32:59.497898000 BB 24.02710968 72.15169056
Using an update statement:
q)update vwap1m1:{[x;y]exec size wavg price from trade where sym=y, time within(x-00:01:00;x)}'[time;sym] from `trade
time sym price size vwap1m1
---------------------------------------------------------------------
2023.05.08D06:31:29.497898000 AA 116.9936556 758.8380747 116.9936556
2023.05.08D06:31:39.497898000 BB 20.58571976 3.467626427 20.58571976
2023.05.08D06:31:49.497898000 AA 110.6224288 722.0672367 113.8871409
2023.05.08D06:31:59.497898000 BB 23.62214103 7.185156806 22.63374444
2023.05.08D06:32:09.497898000 AA 112.9249087 306.9863676 113.7219228
2023.05.08D06:32:19.497898000 BB 27.84005021 81.26024252 27.23663577
2023.05.08D06:32:29.497898000 AA 104.0386648 14.84118844 113.6422043
2023.05.08D06:32:39.497898000 BB 22.34468815 73.76396409 25.05860581
2023.05.08D06:32:49.497898000 AA 102.5257953 480.4908182 108.4699295
2023.05.08D06:32:59.497898000 BB 24.02710968 72.15169056 24.80722491
In this case we could also use a wj.
w:(-00:01:00 00:00:00)+\:trade.time
q)trade:wj1[w;`sym`time;trade;(`sym`time xasc select time, sym, vwap1m2:price, size from trade;(wavg;`size;`vwap1m2))]
time sym price size vwap1m1 vwap1m2
---------------------------------------------------------------------------------
2023.05.08D06:31:29.497898000 AA 116.9936556 758.8380747 116.9936556 116.9936556
2023.05.08D06:31:39.497898000 BB 20.58571976 3.467626427 20.58571976 20.58571976
2023.05.08D06:31:49.497898000 AA 110.6224288 722.0672367 113.8871409 113.8871409
2023.05.08D06:31:59.497898000 BB 23.62214103 7.185156806 22.63374444 22.63374444
2023.05.08D06:32:09.497898000 AA 112.9249087 306.9863676 113.7219228 113.7219228
2023.05.08D06:32:19.497898000 BB 27.84005021 81.26024252 27.23663577 27.23663577
2023.05.08D06:32:29.497898000 AA 104.0386648 14.84118844 113.6422043 113.6422043
2023.05.08D06:32:39.497898000 BB 22.34468815 73.76396409 25.05860581 25.05860581
2023.05.08D06:32:49.497898000 AA 102.5257953 480.4908182 108.4699295 108.4699295
2023.05.08D06:32:59.497898000 BB 24.02710968 72.15169056 24.80722491 24.80722491
Alternative wj method:
q)delete notional, cumVol from update vwap1m3: notional%cumVol from wj1[w; `sym`time;trade; ( `sym`time xasc select sym, time, notional: price * size , cumVol:size from trade;( sum; `notional); ( sum; `cumVol) ) ]
time sym price size vwap1m1 vwap1m2 vwap1m3
---------------------------------------------------------------------------------------------
2023.05.08D06:31:29.497898000 AA 116.9936556 758.8380747 116.9936556 116.9936556 116.9936556
2023.05.08D06:31:39.497898000 BB 20.58571976 3.467626427 20.58571976 20.58571976 20.58571976
2023.05.08D06:31:49.497898000 AA 110.6224288 722.0672367 113.8871409 113.8871409 113.8871409
2023.05.08D06:31:59.497898000 BB 23.62214103 7.185156806 22.63374444 22.63374444 22.63374444
2023.05.08D06:32:09.497898000 AA 112.9249087 306.9863676 113.7219228 113.7219228 113.7219228
2023.05.08D06:32:19.497898000 BB 27.84005021 81.26024252 27.23663577 27.23663577 27.23663577
2023.05.08D06:32:29.497898000 AA 104.0386648 14.84118844 113.6422043 113.6422043 113.6422043
2023.05.08D06:32:39.497898000 BB 22.34468815 73.76396409 25.05860581 25.05860581 25.05860581
2023.05.08D06:32:49.497898000 AA 102.5257953 480.4908182 108.4699295 108.4699295 108.4699295
2023.05.08D06:32:59.497898000 BB 24.02710968 72.15169056 24.80722491 24.80722491 24.80722491
However if you look at the VWAP examples from online resources at the bottom of this page, you will notice that none of them use wj. In fact, they almost all maintain a running total of notional and volume in the trade table and use that to eventually calculate VWAP. Why is that? Performance. If you are receiving realtime updates into a process, performing a full window join on every update is inefficient. In actual fact, even if you have the full dataset and perform the window join once, it is still slow.
Moving VWAP performance
Make the table a lot bigger:
q)n:500000; bigTrade:([]time:.z.p+(0D00:00:10) * til n;sym:n?`AA`BB`CC`DD`EE;price:n?10.0;size:n?1000.0)
Update statement (oh dear!)
q)\ts bigTrade:update vwap1m1:{[x;y]exec size wavg price from bigTrade where sym=y, time within(x-00:01:00;x)}'[time;sym] from bigTrade
629372 16390704
wj method 1
q)\ts w:(-00:01:00 00:00:00)+\:bigTrade.time
2 8389104
q)\ts bigTrade:wj1[w;`sym`time;bigTrade;(`sym`time xasc select time, sym, vwap1m2:price, size from bigTrade;(wavg[;];`size;`vwap1m2))]
890 57168464
wj method 2
q)\ts w:(-00:01:00 00:00:00)+\:bigTrade.time
2 8389104
q)\ts bigTrade:delete notional, cumVol from update vwap1m3: notional%cumVol from wj1[w; `sym`time;bigTrade; ( `sym`time xasc select sym, time, notional: price * size , cumVol:size from bigTrade;( sum; `notional); ( sum; `cumVol) ) ]
877 65558096
The most efficient method is by using an aj:
aj method
q)\ts bigTrade: update time,ssize:sums size,sval:sums price*size by sym from bigTrade
24 50333312
q)\ts bigTrade:aj[`sym`time;bigTrade;`sym`time xasc select 1+time+00:01:00,sym,prevssize:ssize,prevsval:sval from bigTrade]
195 46139568
// one nanosecond added otherwise any records exactly one minute ago will be excluded, which is different behaviour than the methods against which we are comparing
q)\ts bigTrade:delete ssize, sval, prevssize, prevsval from update vwap1m4:(sval-0^prevsval)%ssize-0^prevssize from bigTrade
7 16778816
q)bigTrade
time sym price size vwap1m1 vwap1m2 vwap1m3 vwap1m4
-------------------------------------------------------------------------------
2023.05.09D08:09:27.168023000 EE 3.1759 26.487 3.1759 3.1759 3.1759 3.1759
2023.05.09D08:09:37.168023000 EE 1.2626 83.115 1.7249 1.7249 1.7249 1.7249
2023.05.09D08:09:47.168023000 DD 5.6054 679.48 5.6054 5.6054 5.6054 5.6054
2023.05.09D08:09:57.168023000 BB 3.6454 516.64 3.6454 3.6454 3.6454 3.6454
2023.05.09D08:10:07.168023000 EE 7.1711 754.66 6.4804 6.4804 6.4804 6.4804
2023.05.09D08:10:17.168023000 DD 7.5136 92.603 5.8342 5.8342 5.8342 5.8342
2023.05.09D08:10:27.168023000 AA 7.955 397.7 7.955 7.955 7.955 7.955
2023.05.09D08:10:37.168023000 CC 3.2651 247.17 3.2651 3.2651 3.2651 3.2651
2023.05.09D08:10:47.168023000 EE 1.0705 551.34 4.5956 4.5956 4.5956 4.5956
2023.05.09D08:10:57.168023000 CC 3.6095 279.14 3.4477 3.4477 3.4477 3.4477
2023.05.09D08:11:07.168023000 DD 3.3319 432.06 4.07 4.07 4.07 4.07
2023.05.09D08:11:17.168023000 EE 3.7455 710.49 2.5767 2.5767 2.5767 2.5767
2023.05.09D08:11:27.168023000 AA 6.3989 969.1 6.8517 6.8517 6.8517 6.8517
...
All of the examples so far have involved us running one query against a full table of data. However this may not always be the case. For example, in a tick system we would be getting large numbers of small updates throughout the day. The 'aj' method is one that could be used in a tick setup - the columns created on the first step could be maintained with each update, and the second and third step could be ran as and when needed.
Are there any other possible methods that could be used? Let's take a look...
Calculating VWAP in a tick system - examples
Let's looks at the methods found in the following online resources:
metrics_tick.q from Jonathon McMurray
Kx Github e/c.q
Method 1
In this method, a VWAP RTE process updates a table called 'vwap' not with the actual VWAP value per symbol but with the total notional and total volume for each sym up to that point. Note the use of '+:' to achieve the running sum of notional and volume per symbol.
if[x~"vwap";t:`trade;
upd:{[t;x]vwap+:select size wsum price,sum size by sym from x};
upds:{[t;x]vwap+:select size wsum price,sum size by sym from x;show x}]
At any point the VWAP can be calculated by dividing the two numbers, however once added to the vwap table the running sum of size and notional are lost.
Method 2
Again we have a VWAP RTE process updating a 'vwap' table. In this example the running sum of notional and size are kept in the 'trade' table along with the 'running VWAP' per symbol up to and including each trade. On each upd of the trade table the running sums are calculated and the most recent VWAP per sym is upserted into the 'vwap' table. This 'vwap' table could then be published back to the tickerplant or queried directly.
As the running sum of notional and size are kept in the trade table, a VWAP over time could be calculated easily.
Let's test it out:
//define schemas
q)vwap:([sym:`$()] rvwap:`float$())
q)trade:([]time:`timestamp$();sym:`symbol$();price:`float$();size:`float$();v:`float$();s:`float$();rvwap:`float$())
//sample 10 updates
q)tenUpdates:([]time:.z.p + (00:00:10) * til 10;sym:raze 5#enlist `AA`BB;price:(10?1.0)+((5?100.0 + til 20),5?10.0 + til 20)n;size:((5?1000.0),5?100.0)n:raze flip 5 cut til 10)
// first 5 updates
q)d:5#tenUpdates
q)t:`trade
// simulate upd run
q)d:update s:sums size,v:sums size*price by sym from d;
q)d:d pj select last v,last s by sym from trade;
q)d:update rvwap:v%s from d;
q)`trade insert d;
q)`vwap upsert select last rvwap by sym from trade;
// examine results
q)trade
time sym price size v s rvwap
------------------------------------------------------------------------------
2023.05.07D08:49:39.229090000 AA 112.3186 751.2502 84379.35 751.2502 112.3186
2023.05.07D08:49:49.229090000 BB 21.44102 24.36638 522.44 24.36638 21.44102
2023.05.07D08:49:59.229090000 AA 111.8607 779.6748 171594.3 1530.925 112.0854
2023.05.07D08:50:09.229090000 BB 22.56004 9.386107 734.191 33.75249 21.7522
2023.05.07D08:50:19.229090000 AA 114.1635 788.3966 261600.4 2319.322 112.7918
q)vwap
sym| rvwap
---| --------
AA | 112.7918
BB | 21.7522
So far the methods have been similar to what was discussed in the article. In the next section we introduce a new method:
Method 3
In c.q there are four VWAP methods, however two have already been covered. The other two are below:
/ vwap last 10 ticks
if[x~"vwap2";t:`trade;f:{[p;s](-10#s)wavg -10#p};
upd:{[t;x].[`.u.t;();,'';select price,size by sym from x];
vwap::`sym xkey select sym,vwap:f'[price;size]from .u.t}]
/ vwap last minute (60000 milliseconds)
if[x~"vwap3";t:`trade;f:{[t;p;s](n#s)wavg(n:(1+t bin("t"$.z.Z)-60000)-count t)#p};
upd:{[t;x].[`.u.t;();,'';select time,price,size by sym from x];
vwap::`sym xkey select sym,vwap:f'[time;price;size]from .u.t}]
Here there is a separate '.u.t' table which is maintaining time and size per sym in list format. Although similar to previous methods, once the data is in list format, the method used to get a 'by time' or 'by tick' VWAP value is slightly different.
Let's test it out:
// define a table to keep time, price and size by sym
q).u.t:([sym:`symbol$()] time:`timestamp$(); price:`float$(); quantity:`float$())
// on upd, add all times, prices and sizes by sym into the .u.t table
q)upd:{[t;x] .u.t:.u.t,'select time, price, size by sym from x}
// sample upd of trade table similar to that in the previous section
q)upd[`trade;trade]
// .u.t shows lists of time, price and size by syms
q).u.t
sym| time ..
---| --------------------------------------------------------------------------------------------..
AA | 14:13:05.227 14:13:15.227 14:13:25.227 14:13:45.227 14:14:05.227 14:14:15.227 14:14:45.227 1..
BB | 14:13:35.227 14:13:55.227 14:14:25.227 14:14:35.227 14:14:55.227 14:15:05.227 14:15:35.227 1..
// define function to calculate 1 min VWAP
// .z.Z replaced with hardcoded time of 14:16:00 to match times in trade update
q)f:{[t;p;s](n#s)wavg(n:(1+t bin("t"$14:16:00)-60000)-count t)#p}
// VWAP table would look like this, showing the VWAP time per sym for the one minute preceding 14:16:00
q)`sym xkey select sym,vwap:f'[time;price;size]from .u.t
sym| vwap
---| --------
AA | 1.234533
BB | 10.15198
The key function here is defined in 'f' and it is doing the following with the time, price and size values for each sym in the .u.t table
f:{[t;p;s](n#s)wavg(n:(1+t bin("t"$.z.Z)-60000)-count t)#p}
Breaking it down:
(1+t bin("t"$.z.Z)-60000)
Subtract 1 minute from current time and then add one to the last index of t where the time is less than or equal to that time.
(n:(1+t bin("t"$.z.Z)-60000)-count t)
Subtract the count of t from this index and set it to variable 'n'. This variable now contains a negative integer - the count (working backwards from the most recent value) of records in t that are greater than now-1minute.
(n#s)wavg(n:(1+t bin("t"$14:16:00)-60000)-count t)#p
The final function then takes the last minutes values from the size column and performs a weighted average with the last minutes values from the price column, producing a 1 minute vwap.
Method 4
In this solution, two tables are maintained - a table for running sum values and a separate table for the latest calculations per sym.
A separate metrics function provides VWAP and TWAP metrics for 1min, 5min and 1hr windows using aj. Let's test it out:
Method 5