This post is also available in the following languages. Japanese, Korean

Redis Lua script for atomic operations and cache stampede

Kagaya Hokuto2018-11-15

2016年入社。LINE GAME プラットフォームのチームにおいてサーバサイドの開発に従事。

Hello, this is Kagaya, a member of the LINE GAME Platform development team. It's nice to be back after writing a post on Lazybones (Japanese post), in 2016.

Redis and LINE GAME Platform

The LINE GAME Platform uses Redis — in-memory no SQL database — for its main database, mainly for cache. For instance, we are using it for caching the users' social data such as profile and friend list¹, for those who authenticate themselves with a LINE account or Facebook account. We run Redis with Redis Clusters; we use common clusters in a development phase, and individual clusters in the real environment.

In addition to Redis, we also use HBase for the main database. Many components use Redis for cache and HBase for permanent storage. We've presented our use case with HBase, Data Processing behind LINE Game Platform, at LINE DEVELOPER DAY in 2017, so check it out if you are interested.

A common practice in our team is using HBase with Redis. To make using these two easy for applications, we've created a client library, Aegis, for internal use. Aegis is based on an existing Java client on which we've added a feature to manage meta data such as access information. We are using Jedis as Redis client.

Redis Cluster

As I mentioned, we use Redis Clusters, in which each unit of data is stored in a hash slot. The number of hash slots is fixed to 16,384 and clusters are composed by evenly sharding the slots to nodes. Redis uses the CRC16 hash function to allocate a slot for data; divide the CRC16 value by the number of slots, 16,384. To use commands that assign multiple keys, such as MGET or MSET, on Redis Clusters, the keys must be allocated in the same slot. Only then can you run clusters like you are running a single node.

Lua scripting

Lua scripting is a feature available from Redis v2.6.0, where an embedded Lua interpreter on a Redis server processes combined, with the EVAL command or EVALSHA command. There already are a number of libraries embedded in the interpreter available for use by Lua script. One of the strong points of Lua script is that not only can you combine multiple commands, but the script itself is interpreted as a singleton, allowing atomic operation of the script. You have to take care in type conversion and in debugging. For detailed information, please check out the EVAL command page.

Here is an interface for Lua scripting on Jedis.

Object eval(String script, List<String> keys, List<String> args);
 
// or..
Object eval(String script, int keyCount, String... params);

As you can tell by looking at the method arguments above, there are two for the EVAL command, keys and args. In Lua script, you can access the arguments through the KEYS array and the ARGV array. If you are running Redis as a single node, there are not much difference between the two arguments, so you can interchange the two. But, if you are using Redis Cluster, then you must assign the key to the keys parameter, because the node allocated for executing the script depends on the KEYS value.

EVALSHA Command

Sending a whole script every single time with the EVAL command may seem like an inefficient way of using the network bandwidth, especially if the script is long. Redis caches Lua script on the server side using the SCRIPT LOAD command, and executes the script with the EVALSHAcommand, along with the value returned from caching. Our team does use Lua script but not the EVALSHA command. We compared the cost for managing nodes and keys against the cost reduction for network bandwidth; we didn't see a significant merit in using the EVALSHA command. Since our script only consists of a few commands and few lines, sending the key (Real value: SHA-1) with the EVALSHA command doesn't save cost in terms of network bandwidth. Also, to run the SCRIPT LOAD command on Redis Cluster, you have to do it on the node where the key is allocated at, forcing caching the script on all the nodes on the cluster. (If you are running a single node, then you don't need to worry about this since script ran once is cached internally.) Of course, if you have a long script, then it'll be better, to have the application side to run the SCRIPT LOAD command on each node, in advance.

Hash tags

Some of you might be wondering about this already; If we have multiple keys, where would the script be executed? As already explained, if a certain script gets multiple keys that are not stored on the same hash slot, the script cannot be run. Redis hash tags resolves this limitation. If you wrap a part of a key with curly brackets ({}), the part wrapped is recognized as a tag, and the keys with the same tag are allocated to the same slot. The following key examples are all stored in the same slot.

{LINEGAME}:user
{LINEGAME}:friends
{LINEGAME}:blacklist

So, if your script gets multiple keys, use hash tags.

Now, let me share two use cases of using Lua script on the LINE GAME Platform.

Redis Lua script use case 1: Atomic operation

In the beginning, I've mentioned social friend data is managed by Redis. Every now and then signals are sent from the application server to Kafka, and a process operates on this data and stores it on Redis. The following is an example of commands composing a friend list.¹

DEL friends:user1
RPUSH friends:user1 user2 user3 user4 ...
EXPIRE friends:user1 YYYYY

As you see, the code above removes caches, adds a friend list in a Redis list and configures the TTL key. One day, we realized that some friend lists had duplications of friends. Interestingly, the number of the friends was double the original number of friends, which made us suspect if the RPUSH command had been called twice. We found out that the request to compose a friend list had been sent to the server twice, at exactly the same time. The Kafka v0.9.x series, which we are using, has the "At least once" policy, allowing a single message to be delivered twice. Our platform had multiple triggers for creating messages, the probability for delivering the same message multiple times was high anyway. Something like the following was likely to happen anytime. We needed to handle commands atomically.

DEL (by message A)
↓
DEL (by message B)
↓
RPUSH (by message A)
↓
RPUSH (by message B)

There are a number of ways to send requests to a Redis server in bulk:

Use commands that handle multiple keys, like MSET or MGET.
Use pipeline.
Use transactions (MULTI or EXEC command)
Use Lua script.

To combine multiple commands, we are left with the three options excluding the first one. The second option, using pipeline has a slightly different concept compared to the other two, because the purpose of using pipeline is solely to optimize network bandwidth usage. Using pipeline does not guarantee atomic operation. To process commands atomically, you either have to use transactions or Lua script.

Suppose we are using Redis Clusters instead of a single node. If we use transactions (MULTI or EXEC command), then a Redis client cannot determine which node the key to change is in.²If we deliver the key for the transaction to the Redis client in advance, then may be it will be possible. But considering the effects of resharding, implementing to determine the node in advance is difficult, and there is no implementation of this in Jedis. But, if we use Lua script, then we get keys explicitly when running a command, allowing the server side to redirect node calculation to the target server. This is why we decided to go with Lua script.

The following is an example of handling commands atomically on Redis.

String script = "redis.call('del', KEYS[1]);" +
                "redis.call('rpush', KEYS[1], unpack(ARGV, 2));" +
                "redis.call('expire', KEYS[1], ARGV[1]);";
List<String> params = new ArrayList<>();
params.add(String.valueOf(TTL)); // "TTL" is a predefined constant for TTL.
params.addAll(getFriends()); // getFriends() returns users' ID list.
jedisCluster.eval(script, ImmutableList.of(key), params);

We use the first parameter of the eval() function, params for TTL and the rest of the parameters as arguments for the RPUSH command. Here is what the official document says about transactions. Perhaps we might see transactions deprecated with more Redis Clusters taking part in action.

However it is not impossible that in a non immediate future we'll see that the whole user base is just using scripts. If this happens we may deprecate and finally remove transactions. - Redis website

Redis Lua script use case 2: Cache stampede

There are many ways for applications to make cache data. For instance, if there is cached data for a user request, we can use cache, and if not, we could reference master data and respond and at the same time cache the data. With such background, suppose we get an enormous number of simultaneous requests and consume all the cache we have. In such occasion, there will be a tremendous load on master data I/O and also on CPU resource for recalculating cache on the application side. There are a few solutions to prevent such event including Probablistic Early Recomputation(PER).³ I'd like us to see a use case on using this solution on Redis Cluster.

PER was originally announced at VLDB, an international academic conference, and the thesis is available on the Internet. How PER works is this; before a cache entry expires, a "volunteer" request recalculates cache with a certain probability. There is a similar solution where two sets of identical cache are prepared, with different TTLs, but PER is better in a way where we can reduce the size by one entry. The following is pseudocode of the PER algorithm.

def get(key, ttl, recomputer) {
    beta = 1.0 # typically
    data, remained_ttl, computation_time = read_from_cache(key)
    time = now()
    if (!data or time - delta * beta * log(random(0, 1)) >= time + remained_ttl) {
        start = now()
        data = recomputer.recompute(key)
        end = now()
        computation_time = end - start
        write_to_cache(key, data, computation_time, ttl)
    }
    return data;
}

The reason for using this particular distribution probability and how to determine beta are also interesting, but are slightly off the scope for this post. If you are interested, please see the thesis. Getting back to what we were talking about, the pseudocode above can be written in Java and Jedis as follows.

private static final int TTL = 60;
private static final String SET_SCRIPT = "redis.call('mset', KEYS[1], ARGV[1], KEYS[2], ARGV[2]);" +
                                         "redis.call('expire', KEYS[1], ARGV[3]);" +
                                         "redis.call('expire', KEYS[2], ARGV[3])";
private static final String GET_SCRIPT = "return {redis.call('mget', KEYS[1], KEYS[2])," +
                                         "redis.call('ttl', KEYS[1])}";
  
public String get(String key, Function<String, String> recomputer) {
    List<Object> ret = (List<Object>) jedisCluster.eval(GET_SCRIPT, 2, key, getDeltaKey(key));
    List<String> valueList = (List<String>) ret.get(0);
    String data = valueList.get(0);
    String deltaStr = valueList.get(1);
    if (data == null ||
        - Long.valueOf(deltaStr) * beta * Math.log(random.nextDouble()) >= (long) ret.get(1)) {
        long start = System.currentTimeMillis();
        data = recomputer.apply(key);
        long computationTime = (System.currentTimeMillis() - start) * 1000;
        jedisCluster.eval(SET_SCRIPT, 2, key, getDeltaKey(key), data, computationTime, ttl);
   }
   return data;
}
 
public String getDeltaKey(key) {
    // return key for storing delta, that is, computation time.
}
 
 
[..]

When reading data, we call the MGETcommand to get an entry for actual data and an entry for calculation, and get the remaining TTL with the TTL command. When writing data, we use the MSET and EXPIRE commands to save data and calculation time respectively. In addition, we can consider the saving the data and the calculation time in a Redis Hash, in which case we'd use the commands HMSET and HMGET.

The new LINE GAME Platform we introduced in LINE DEVELOPER DAY 2017 also adapted PER. As a result of adapting PER in our performance test, we saw the response time improved three times when all cache expires. (Of course, response time is affected by the size of application process or the size of data retrieved from a main database.) I wanted to talk about utilizing Lua script here so, perhaps I will get a chance to write on improving performance later.

Lastly

I've introduced how our team used Lua script with Redis Clusters for the LINE GAME Platform. Redis Lua Script supports various use cases. For more information, see the official Redis document or the source code.

1: A 'friend list' sounds like a list edited by a user, but in this article, it is the actual list returned by an authentication service like LINE or Facebook. Hence, unless we do a diff, we can't prevent duplication using a Redis SET.↩

2: We can get the hash slot that contains the key ourselves, and then calculate to see which slot is served on which node by using the CLUSTER NODES command.↩

3: For other options and pros and cons of each option, check the slides of the Preventing Cache Stampede with Redis and XFetch session of RedisConf 2017.↩

Blog