Indexing hashes with a set in redis
Recently I was confronting a redis performance issue involving the following commands:
keys
scan
- multiple
hgetall
calls - storing a json string in the hash (wtf)
Clearly, I was doing everything wrong when I now take a look at my new solution. What’s weird is that it felt right when I first wrote the code. Maybe that this is because I was mind-stuck in a relational pattern instead of putting those aside and starting to think redis-only.
# The use case
I want to store a series of timed events hashes. An event consists of:
time
timestampname
the event name (ie: start, end, error)key
representing what’s the event about
There will be a lot of those per minutes.
Every command here can be copy-pasted in a redis-cli
# The wrong use of a hash
In my first implementation this was stored with:
hset events:test1 1459440507 '{"some": "awful", "json": "string"}'
I knew the existence of hmset
but I thought it didn’t fit the need because of the hash key could not be related to the event time (don’t ask why, sometimes your mind is stuck in a bad pattern). Also, I was clearly all wrong about how I had to use a hash. For example, the next event would be added on the same hash but with a different field name (here the time):
hset events:test1 1459440510 '{"some": "awful", "json": "string"}'
So that hgetall
would get every events belonging to my test1
key.
Yes, this is really a bad design choice I made, which was clearly not good for retrieval performances. I let you imagine how sorting would be messy. Please don’t do this.
# Call a hash a hash
What I didn’t think of then, is that I could simply store my time as the hash
key and add a set
holding those values for a given event key.
This adds an operation to the writing:
hmset events:test1:1459440510 time 1459440510 name start key test1
sadd events test1:1459440510
The set
events will hold my hash keys.
Retrieving the data feels better now, it feels “the redis-way” and I can just sort by the hash value:
sort events by events:*->time
1) "test1:1459440510"
This gets every hash keys, you can the loop them and do a clean hgetall
without performance issues and without using the keys
command.
I really blame myself for not reading their docs careful enough, I would’ve seen the magic before:
sort events by events:*->time get events:*->key
1) "test1"
I love redis.
#
Use multi
and exec
Just a further note, when you have to chain operations, like adding a hash and then storing the hash key in a set, you may find the multi
command useful:
127.0.0.1:6379> MULTI
OK
127.0.0.1:6379> hmset events:test1:1459440510 time 1459440510 name start key test1
QUEUED
127.0.0.1:6379> sadd events test1:1459440510
QUEUED
127.0.0.1:6379> EXEC
1) OK
2) (integer) 1
Have fun.