wj
What is it?
The window join (wj) is much like the aj in that records are joined from one table to another 'as of' the time in one of the tables. However rather than matching one record from each table, the wj matches mulitple records from one of the tables using a time range rather than a specific time, and aggregates the results into one record when joining them.
For example, if you wanted to get the prevailing bid price at the time of a particular trade, you could use an aj. If you wanted to get the average bid price in the preceding 5 minutes of a particular trade, you could use a window join - this is an aggregation of the values over a particular time period.
The syntax is as follows:
wj [w; c; t; (q; (f0;c0); (f1;c1))]
where:
t and q are simple tables
w is a pair of lists of times/timestamps, begin and end
c are the names of the common columns, usually sym and time. Any column other than the sym column must have an integral type (which can still be dates,times,etc)
f0, f1 are aggregation functions applied to values in q columns c0,c1 over the intervals
Returns for each record in t, a record with additional columns c0 and c1, which are the results of the aggregation functions applied to values over the matching intervals in w.
An example is:
wj[w; `sym`time; trade; (quote;(max;`ask);(min;`bid))]
This would join two columns to the trade table, ask and bid, which would represent the max ask and min bid values in each time window given in w for each trade.
Examples and Use Cases
Let's stick with the example and say we want to calculate the max ask and min bid in the 5 minutes prior to each trade.
So let's create a table of quotes that have been received throughout the day, and a table of trades that have been placed throughout the day.
n:900;
trade:`time xasc ([]time:("p"$2022.01.01) +1000*1000000*n?n;sym:raze 300#/:`MST`APPL`FB;side:n?`B`S;price:(n?1.0)+(300?1000.0+til 200),(300?100.0+til 20),(300?10.0+til 2));
n:900000;
quote:update askPrice:bidPrice*1.10 from `time xasc ([]time:("p"$2022.01.01) +1000*1000000*n?n;sym:raze 300000#/:`MST`APPL`FB;bidPrice:(n?1.0)+(300000?1000.0+til 200),(300000?100.0+til 20),(300000?10.0+til 2));
Calculating the time window
The first thing that will need to be calculated is the time intervals. The time intervals that we want for each trade are:
from:(5 mins before time of trade)
to:(time of trade)
If the time of the trade was 2022.01.01D00:00:00.000000000 then we can get these values as follows:
from: 2022.01.01D00:00:00.000000000 - 00:05:00
to: 2022.01.01D00:00:00.000000000 - 00:00:00
Although it may seem redundant to add the last part, if we retain that then this can be expressed as:
q)2022.01.01D00:00:00.000000000 -/:(00:05:00;00:00:00)
2021.12.31D23:55:00.000000000 2022.01.01D00:00:00.000000000
Or alternatively:
q)2022.01.01D00:00:00.000000000 +/:(-00:05:00;00:00:00)
2021.12.31D23:55:00.000000000 2022.01.01D00:00:00.000000000
Now the single timestamp can be replaced by the list of all timestamps in the trade table:
q)w: trade.time +/:(-00:05:00;00:00:00)
q)w
2021.12.31D23:55:01.000000000 2021.12.31D23:55:01.000000000 2021.12.31D23:55:..
2022.01.01D00:00:01.000000000 2022.01.01D00:00:01.000000000 2022.01.01D00:00:..
Although this is usually expressed in the reverse:
q)(-00:05:00;00:00:00) +\: trade.time
For each record in the trade table, this gives us a pair of timestamps, start and end, (T-5m) to (T). These are the time windows we will use in the window join.
Using the aggregation functions
The aggregation function will be given the matching values in the selected column in list format. So the function you use must be able to operate on that list.
Instead of providing a function, you can provide the identity function (::) which will instead return all of the values that would be passed into the function.
For example:
q)wj[w;`sym`time;trade; (quote;((::);`bidPrice))]
time sym side price bidPrice
-----------------------------------------------------------------------------------
2022.01.01D00:00:01.000000000 MST S 1059.794 ,1103.515
2022.01.01D00:00:01.000000000 MST B 1012.967 ,1103.515
2022.01.01D00:00:02.000000000 MST S 1087.673 1103.515 11.75801 1172.165
2022.01.01D00:00:04.000000000 MST S 1143.472 1103.515 11.75801 1172.165 1143.734
This gives us a better picture of the values that will be passed into the function. As this is just a list, we can expect to use the 'max' and 'min' functions without issue:
q)wj[w;`sym`time;trade; (quote;(min;`bidPrice);(max;`askPrice))]
time sym side price bidPrice askPrice
-----------------------------------------------------------------
2022.01.01D00:00:01.000000000 MST S 1059.794 1103.515 1213.867
2022.01.01D00:00:01.000000000 MST B 1012.967 1103.515 1213.867
2022.01.01D00:00:02.000000000 MST S 1087.673 11.75801 1289.381
2022.01.01D00:00:04.000000000 MST S 1143.472 11.75801 1289.381
Note that the column names returned are the same as the column names provided as inputs to the functions.
Let's take a look at the quote table to confirm that the values are correct:
q)5#quote
time sym bidPrice askPrice side
---------------------------------------------------------
2022.01.01D00:00:00.000000000 MST 1103.515 1213.867 S
2022.01.01D00:00:01.000000000 FB 11.75801 12.93381 S
2022.01.01D00:00:02.000000000 MST 1172.165 1289.381 B
2022.01.01D00:00:04.000000000 MST 1143.734 1258.107 B
Yes - even at 00:00:04 the wj used the maximum ask price from the prior 5 minutes.
Performance
Window join is slow if the correct steps have not been taken to sort table q i.e. the quote table. The quote table should be sorted on `sym`time and have the parted attribute applied to the sym column. See the difference below:
q)\ts wj[w;`sym`time;trade; (quote;(max;`bidPrice))]
6524 5260512
q)\ts wj[w;`sym`time;trade; (`sym`time xasc quote;(max;`bidPrice))]
51 41944496
q)\ts wj[w;`sym`time;trade; (update `p#sym from `sym`time xasc quote;(max;`bidPrice))]
43 41944880
Alternative options
wj1
wj1 works in the same way as wj but with one difference: with wj, the last quote before the start of the window (the 'prevailing quote on entry to the window') is included, whereas in wj1 it is not included. So wj1 strictly only performs the window join with quotes inside the window.
q)quote
time sym bidPrice askPrice
---------------------------------------------------
2022.01.01D00:09:59.999000000 MST 1198.12 1290.187
2022.01.01D00:10:00.000000000 MST 1197.273 1317
2022.01.01D00:10:10.000000000 MST 1113.781 1225.159
q)trade
time sym side price
-----------------------------------------------
2022.01.01D00:10:00.000000000 MST B 1153.621
2022.01.01D00:10:01.000000000 MST S 1076.986
2022.01.01D00:10:11.000000000 MST B 1157.908
q)w: trade.time +/:(-00:00:10;00:00:00)
q)wj[w;`sym`time;trade; (quote;(max;`bidPrice))]
time sym side price bidPrice
--------------------------------------------------------
2022.01.01D00:10:00.000000000 MST B 1153.621 1198.12
2022.01.01D00:10:01.000000000 MST S 1076.986 1198.12
2022.01.01D00:10:11.000000000 MST B 1157.908 1197.273
q)wj1[w;`sym`time;trade; (quote;(max;`bidPrice))]
time sym side price bidPrice
--------------------------------------------------------
2022.01.01D00:10:00.000000000 MST B 1153.621 1198.12
2022.01.01D00:10:01.000000000 MST S 1076.986 1198.12
2022.01.01D00:10:11.000000000 MST B 1157.908 1113.781
Performance considerations
Sorting and attributes
The 'joined' table i.e. the quotes table should be sorted by `sym and `time with the parted attribute applied on the sym column. Not doing so will not only be slower but return incorrect results:
q)\ts res1:wj[w;`sym`time;trade; (quote;(max;`bidPrice))]
4373 5260864
q)quote:update `p#sym from `sym`time xasc quote
q)\ts res2:wj[w;`sym`time;trade; (quote;(max;`bidPrice))]
2 75744
q)res1~res2
0b