aj
What is it?
An 'aj' or 'as of join' performs a left join on two tables with the intention of appending the last matching record in t2 to t1 i.e. joining the 'current' record from t2 'as of' the time of the record in t1.
This facilitates analysis such as finding the most recent quote at the time of each trade.
The syntax is as follows:
aj[c; t1; t2]
where
t1 is a table
t2 is a simple table
c is a symbol list of column names, common to t1 and t2, and of matching type
The last column in c is of a sortable type (typically time)
For example:
aj[`sym`time; trades; quotes]
In this example, the most recent matching quote (by sym) will be joined onto each trade in the trades table.
Examples and Use Cases
The simplest example is probably the most common use case for aj - 'find the most recent quote at the time of each trade'.
So let's create a table of quotes that have been received throughout the day, and a table of trades that have been placed throughout the day.
n:900;
trade:`time xasc ([]time:("p"$2022.01.01) +1000*1000000*n?n;sym:raze 300#/:`MST`APPL`FB;side:n?`B`S;price:(n?1.0)+(300?1000.0+til 200),(300?100.0+til 20),(300?10.0+til 2));
n:900000;
quote:update askPrice:bidPrice*1.10 from `time xasc ([]time:("p"$2022.01.01) +1000*1000000*n?n;sym:raze 300000#/:`MST`APPL`FB;bidPrice:(n?1.0)+(300000?1000.0+til 200),(300000?100.0+til 20),(300000?10.0+til 2));
Now for each trade, let's identify the most recent quote in the market as of that trade:
aj[`sym`time; trade; quote]
time sym side price bidPrice askPrice
------------------------------------------------------------------
2022.01.01D00:00:01.000000000 MST S 1086.117 1016.021 1117.624
2022.01.01D00:00:01.000000000 MST S 1151.809 1016.021 1117.624
2022.01.01D00:00:02.000000000 MST S 1145.576 1016.021 1117.624
2022.01.01D00:00:05.000000000 APPL S 113.2617 106.2035 116.8239
2022.01.01D00:00:06.000000000 MST S 1165.064 1084.372 1192.809
2022.01.01D00:00:06.000000000 FB S 10.8193 10.23714 11.26086
2022.01.01D00:00:07.000000000 FB B 11.24955 10.44155 11.4857
2022.01.01D00:00:09.000000000 APPL B 106.5909 100.9747 111.0722
2022.01.01D00:00:09.000000000 FB S 11.12802 10.44155 11.4857
Now we can also determine if each trade price fell within the bid/ask spread of the most recent quote at that time. If it didn't, it may be cause for concern
q)update outsideSpread:1b from aj[`sym`time; trade; quote] where not price within (bidPrice;askPrice)
time sym side price bidPrice askPrice outsideSpread
--------------------------------------------------------------------------------
2022.01.01D00:00:01.000000000 MST S 1086.117 1016.021 1117.624 0
2022.01.01D00:00:01.000000000 MST S 1151.809 1016.021 1117.624 1
2022.01.01D00:00:02.000000000 MST S 1145.576 1016.021 1117.624 1
2022.01.01D00:00:05.000000000 APPL S 113.2617 106.2035 116.8239 0
2022.01.01D00:00:06.000000000 MST S 1165.064 1084.372 1192.809 0
2022.01.01D00:00:06.000000000 FB S 10.8193 10.23714 11.26086 0
There are some use cases where it is useful to aj a table to itself. For example, when calculating a 1 minute running VWAP:
q)trade:([]time:2024.01.07D10:47:15 + 00:00:10 * til 10;sym:`AAPL;price:10?10.0;size:10?1000.0)
q)trade
time sym price size
------------------------------------------------------------
2024.01.07D10:47:15.000000000 AAPL 3.694601669 17.43440982
2024.01.07D10:47:25.000000000 AAPL 3.543395477 356.6417673
2024.01.07D10:47:35.000000000 AAPL 2.249604766 205.8210415
2024.01.07D10:47:45.000000000 AAPL 7.582184493 811.2821064
2024.01.07D10:47:55.000000000 AAPL 3.63865532 300.4225073
2024.01.07D10:48:05.000000000 AAPL 0.08121281164 501.2357649
2024.01.07D10:48:15.000000000 AAPL 3.076196646 797.6497433
2024.01.07D10:48:25.000000000 AAPL 3.043846667 159.7231755
2024.01.07D10:48:35.000000000 AAPL 8.836728022 907.4049937
2024.01.07D10:48:45.000000000 AAPL 4.56739242 554.0028634
q)trade: update time, ssize:sums size, sval:sums price*size by sym from trade
q)trade:aj[`sym`time; trade; `sym`time xasc select time+00:01:00, sym, prevssize:ssize, prevsval:sval from trade]
q)delete ssize, sval, prevssize, prevsval from update vwap1m:(sval-0^prevsval)%ssize-0^prevssize from trade
time sym price size vwap1m
------------------------------------------------------------------------
2024.01.07D10:47:15.000000000 AAPL 3.694601669 17.43440982 3.694601669
2024.01.07D10:47:25.000000000 AAPL 3.543395477 356.6417673 3.55044268
2024.01.07D10:47:35.000000000 AAPL 2.249604766 205.8210415 3.088740493
2024.01.07D10:47:45.000000000 AAPL 7.582184493 811.2821064 5.709143663
2024.01.07D10:47:55.000000000 AAPL 3.63865532 300.4225073 5.34143225
2024.01.07D10:48:05.000000000 AAPL 0.08121281164 501.2357649 4.139058615
2024.01.07D10:48:15.000000000 AAPL 3.076196646 797.6497433 3.856506386
2024.01.07D10:48:25.000000000 AAPL 3.043846667 159.7231755 3.849974943
2024.01.07D10:48:35.000000000 AAPL 8.836728022 907.4049937 5.245831104
2024.01.07D10:48:45.000000000 AAPL 4.56739242 554.0028634 4.54055501
Alternative options
aj0
aj0 works the same as aj but the 'time' column in the joined table comes from t2 (RHS) rather than t1 (LHS).
q)select from trade where time=2022.01.01D00:00:01.000000000
time sym side price
------------------------------------------------
2022.01.01D00:00:01.000000000 MST S 1011.129
q)select from quote where time within 2022.01.01D00:00:00.000000000 2022.01.01D00:00:02.000000000
time sym bidPrice askPrice
---------------------------------------------------
2022.01.01D00:00:00.000000000 MST 1045.733 1150.306
q)select from aj[`sym`time; trade; quote] where sym=`MST, time within 2022.01.01D00:00:00.000000000 2022.01.01D00:00:02.000000000
time sym side price bidPrice askPrice
-----------------------------------------------------------------
2022.01.01D00:00:01.000000000 MST S 1011.129 1045.733 1150.306
q)select from aj0[`sym`time; trade; quote] where sym=`MST, time within 2022.01.01D00:00:00.000000000 2022.01.01D00:00:02.000000000
time sym side price bidPrice askPrice
-----------------------------------------------------------------
2022.01.01D00:00:00.000000000 MST B 1130.446 1045.733 1150.306
ajf and ajf0
These work the same as aj and aj0 but they will fill any nulls from t2 (RHS) with values from t1 (LHS).
Performance considerations
Keyed tab
aj will be faster when the join columns are keys
Sorting and attributes
Which attribute to use depends on whether or not the table is in memory or on disk. Assume we are joining on sym and time.
If the table is in memory, the best performance will be seen when t2 has the grouped attribute applied to the sym column and the time column is sorted within sym.
q)meta quote
c | t f a
--------| -----
time | p s
sym | s
bidPrice| f
askPrice| f
q)meta trade
c | t f a
-----| -----
time | p s
sym | s
side | s
price| f
q)\ts aj[`sym`time; trade; quote]
3184 5252032
q)\ts aj[`sym`time; trade; update `g#sym from quote]
9 25166736
If the table is on disk, the best performance will be seen when t2 has the parted attribute applied to the sym column and the time column is sorted within sym
q)`:quote set .Q.en[`:.]quote
`:quote
q)`:trade set .Q.en[`:.]trade
`:trade
q)delete trade from `.
`.
q)delete quote from `.
`.
q)system "l ."
q)\ts aj[`sym`time; trade; quote]
3825 5252048
q)`:quote set update `p#sym from `sym xasc quote
`:quote
q)system "l ."
q)\ts aj[`sym`time; trade; quote]
0 26688
The grouped attribute supposedly gives no performance improvement to on-disk data when performing an aj, however that was not found to be the case. Please leave a comment if you can spot any mistakes:
q)quote:update askPrice:bidPrice*1.10 from `time xasc ([]time:("p"$2022.01.01) +1000*1000000*n?n;sym:raze 300000#/:`MST`APPL`FB;bidPrice:(n?1.0)+(300000?1000.0+til 200),(300000?100.0+til 20),(300000?10.0+til 2));
q)`:quote set .Q.en[`:.]quote
`:quote
q)delete quote from `.
`.
q)system "l ."
q)`:quote set update `g#sym from quote
`:quote
q)system "l ."
q)\ts aj[`sym`time; trade; quote]
0 26688
Order of search
The join match will work on the columns from left to right so ensure the order makes sense, for example the first column should be the one that is sorted/has the attribute applied:
*in memory*
q)\ts aj[`side`sym`time; trade; quote]
2568 5252032
q)\ts aj[`sym`side`time; trade; quote]
3228 5252032
q)\ts aj[`sym`side`time; trade; update `g#sym from quote]
10 25166800
q)\ts aj[`side`sym`time; trade; update `g#sym from quote]
2559 25166800
Virtual column
The virtual column (i.e. the column representing the partition, usually date) should not be used if possible as it is created on the fly which takes time and resources.
select froms and where constraints
In memory, there is no need to perform any select or where constraints on t2
On disk, you must map the tables into memory by doing a select from (and adding a where date=x for partitioned tables), however adding any further constraints causes the columns in question to be copied into memory (rather than mapped) which is actually less performant than the basic select.