Attributes
What are attributes?
Attributes are a kind of metadata that you can apply to lists that lets kdb+/q know that the values in the list are arranged in a particular way - and with this knowledge it can more efficiently interact with the list. Consider the following two scenarios:
There is a list of 10 integers in unknown order. You must find the index at which the value '14' appears.
There is a list of 10 integers arranged in ascending order. You must find the index at which the value '14' appears.
It should be obvious that on average the former will take more time than the latter. In the first scenario, there is little choice but to pick a starting point and work through the list one by one. In the second scenario, starting in the middle and checking if that number is greater or less than 14 will mean that half the list can be discarded.
We have gained some information about how the list is arranged and this has allowed us to more efficiently interact with it. The concept is the same in kdb+/q with attributes. Indeed the example above is an example of using the 'sorted' attribute. The attributes available are:
Sorted
As above, this attribute signifies that the list is in ascending sorted order
Unique
This attribute signifies that each value in the list is unique (occurs only once)
Parted
This attribute signifies that the list is split into blocks of identical values
Grouped
Unlike the others, this attribute signifies that the list isn't arranged in any particular way but that we would still like to see more efficient interactions with it (more on this later)
Applying and using attributes
How to apply attributes
To apply or remove an attribute to a list, the syntax is as follows:
/ Method 1
q)l:1 2 3 4 5 6
q)`s#l /sorted
`s#1 2 3 4 5 6
q)`u#l /unique
`u#1 2 3 4 5 6
q)`p#l /parted
`p#1 2 3 4 5 6
q)`g#l /grouped
`g#1 2 3 4 5 6
/ Method 2
q)@[`.; `l; `s#]
`.
q)l
`s#1 2 3 4 5
/ Remove
q)l
`g#1 2 3 4 5 6
q)`#l /removed
1 2 3 4 5 6
Note that when the attribute has been applied, the list is prepended with e.g. `g# to signify this.
If the attribute is denoting certain characteristics of the list, such as it being sorted, then:
The list must have those characteristics otherwise the operation will fail.
q)l:6 1 2 3 4 5
q)`s#l /sorted
's-fail
[0] `s#l /sorted
Adding a value to the list that does not confirm to those characteristics will cause the attribute to be lost.
q)l:`s#1 2 3 4 5 /sorted
q)l,:3
q)l /attribute lost
1 2 3 4 5 3
Why use attributes?
Certain operations are more efficient on lists with attributes applied. When a list has the sorted attribute applied, kdb+/q will use binary search for '?' search operations. This can provide a drastic increase in search performance:
q)l:10000000?10000
q)lSorted:`s#asc l
q)\t:100 l?til 1000
2875
q)\t:100 lSorted?til 1000
22
/ What else runs faster?
q)\t:100 l>1000
907
q)\t:100 lSorted>1000
35
Most of the time, attributes are applied to columns in tables to allow more efficient querying. As columns of tables are just normal lists, an attribute can be applied to a table as so:
q)t:([]sym:1000000?`2; price:1000000?100)
q)tSorted:update `s#price from `price xasc t
Performing a 'meta' on the table now returns a value in the 'a' column denoting which column/s have attributes applied:
q)meta tSorted
c | t f a
-----| -----
sym | s
price| j s
And now that the column has the sorted attribute applied, as expected the lookup on that column is faster:
q)\t:100 select count i from t where price>25
173
q)\t:100 select count i from tSorted where price>25
97
When to use attributes
The logical conclusion at this point would be to use attributes all the time on as many columns as possible. However there are some drawbacks:
Storage overhead
The size of the column will increase when you apply the unique, grouped or parted attribute to it. This is due to the creation of an underlying data structure that allows for the quicker lookups when these attributes are applied. From smallest to largest storage cost: parted, unique, grouped.
Added complexity
Some attributes will be lost if a value is appended to the list that does not conform to the attribute characteristics. So for example in the case of sorted, the entire list would have to be resorted after every update - simply not feasible if you are receiving large amounts of realtime data.
Not guaranteed to increase performance
Firstly, attributes are mostly useless in tables under 1 million records.
Second, the attribute applied must match the usage profile of the list - for example if your list is the 'sym' column of an HDB table and your queries always use 'sym' as the second clause after date when querying, then applying the parted attribute to the 'sym' column will provide you with a performance increase - applying it to the 'side' column will not.
Therefore attributes should only be used when they will actually provide a performance improvement - testing is important to determine this.
The most frequently used attribute you will see in historical databases is the 'parted' attribute (on sym, and potentially the grouped attributes on other cols) and for in-memory tables it is likely to be the 'grouped' attribute, for example in the Tickerplant.
Attributes
Read more about individual attributes here: