Hello, I am Jongyeol Choi, a member of the Redis team at LINE. LINE’s services use various storage systems based on their needs. Our messaging service uses various open source storage systems such as Redis, HBase, and Kafka. As a member of the Redis team, I participated the RedisConf18 conference in San Francisco in U.S.A, on April 26th, as a speaker. The topic of my session was, “Redis at LINE, 25 billion messages per day”. I’d like to take this opportunity to share the presentation preparation process, the conference itself and the responses I got for my session.
What is RedisConf?
RedisConf is a conference held by RedisLabs, who makes Redise, the enterprise version of Redis, and at which the founder of Redis, Salvatore Sanfilippo works. RedisLabs has been annually holding the official Redis conference, named “RedisConf”, to expand its market and enable its users—companies and services—to share their experience and knowledge..
As a member of the Redis team at LINE, I attended the Redis conference last year, but only as an attendee. At RedisConf17 my interests were in learning how to use Redis better and how others were using Redis. It was a good opportunity for me to learn a lot how other companies used Redis, all in so many different ways, and their know-how. While attending many sessions at the conference, I came to think. LINE could share its experience too. Why not? After all, the number of users or the traffic at LINE was considerably high compared to any other use cases shared at the conference. When you are involved in developing the server side, you soon realize that the amount and the variety of issues are proportionate to the number of users or the traffic the server has to process. Something told me that people would definitely be interested in our story.
Call for paper and preparation
Many IT conferences are held in the states, in between April and June. So does RedisConf, which usually runs somewhere in between April and May. It was December last year, when I saw the call for paper for RedisConf18. Call for paper for conferences is like an opening for people to apply as speakers. Not every applicant gets through. I am not sure about the exact competition rate, but only a few gets accepted after screening.
The call for paper was due December 2017. Content-wise, I wasn’t worried much, my guess was that many engineers would be interested to learn about LINE’s use cases. Having to present in English was what concerned me. Although my English speaking skill had a plenty of room for improvement, I decided to apply, being confident that hard work would never betray. After deciding to share the use cases of using Redis for LINE’s messaging service and lessons learned, I wrote the abstract of my presentation. Along with abstract, the application form asked for other things including diversity which asked for the applicant’s gender and ethnicity, which to me was quite impressive.
A good news came through at the end of December, that my application was accepted. It was a joyous moment, but at the same time, it was an announcement for the commencement of a long journey of preparation. Each session was given a total of 45 minutes, including five to ten minutes of Q&A session, which left me 35–40 minutes to present my topic. My presentation was going to be on the 25th or 26th of April. I was able to get started with preparation only from the end of January. Whenever time permitted me, I was engaged in preparing the slides, rehearsing in front of my team members and restlessly modifying my material. In addition, the technical writing team at LINE helped out paraphrasing my script to make it sound natural. The script was prepared to fit the time slot I was given, and every spare minute in between my arrival in San Francisco and the day before my talk was dedicated to recording, checking and rehearsing. Thankfully, since my talk was set to be on the second day of the conference, I had time to look around, and check out the room allocated for my session.
RedisConf18 was held for two days from April 25th to 26th. Approximately, 1,000 people registered for the event and around 60 sessions were held over six conference rooms. Here is a picture of the venue, Pier27, with “RedisConf18” flags.
After getting a few giveaways, I had a chance to look around the Expo hall on the first floor. The conference provided an app for both iOS and Android, with which you could check the sessions and general information of the conference. As you can see, my session was also on the schedule, at 14:00.
There was a table set up for free stickers, and after checking that I was allowed to put LINE stickers that I had taken with me, I put them there. People seemed to be very fond of Brown, “the cute bear” as they’ve put it, and the stickers were gone only in a couple of hours.
The founder of Redis, Salvatore Sanfillippo, began the conference with his keynote with introducing Redis 5.0, which is yet to be released. He’s announced that it was to be released soon, and went over Streams, one of its main features. After Mr. Sanfilippo’s talk, other speakers continued shared about Redise. Joel Spolsky, a well known author of Joel on Software and the CEO of the Stackoverflow site was also on the stage as an interviewee.
Day one sessions
After the keynote, each of the six conference rooms were filled so many sessions. The first one I chose to attend was Salvatore Sanfilippo’s, on The Design of Redis Streams. More details were provided on Redis Streams that had been introduced in the keynote. Many companies and services use Redis for queing, with Redis List, using BLPOP and RPUSH commands. However, using Redis List for queue has its limits, which brings in Kafka into the picture. LINE uses Redis or Kafka for similar tasks. With Redis 5.0, Streams, a new data type and new commands were introduced to complement supporting queues. Considering that Redis is faster compared to other databases around, I expect Streams to be quite handy.
After the morning sessions, I found food trucks lined up for us to enjoy lunch.
After the lunch break, I went to listen to Slack‘s talk on “Scaling Slack’s Job Queue Using Kafka and Redis” and Lyft‘s Redis at Lyft: 2,000 Instances and Beyond. Slack, a messaging service especially popular for workplaces, has shared it experience in using Redis for queue, and then moving on to using Kafka with Redis to overcome limitation and reliability issues. LINE and Slack take part in a similar industry so their use cases shared similarities to LINE and distinctions.
The presenter of Lyft was Daniel Hochman who presented a session in RedisConf17 too. Lyft is leading an open source project, Envoy Proxy, and its main service area is providing on-demand transportation. For example, it matches taxi drivers and customers, matching is time sensitive. Due to the nature of their service, they’ve put in more effort in response time over reliability.
I did attend sessions other than these two, but I was occupied with my presentation scheduled the next day, keeping me from from concentrating. There was a networking session with beer and some food, but again, I was preoccupied with my talk, I had to return to hotel and practice.
The day, D-Day, came. This was it. There were sessions in the morning, but I continued with rehearsing in my hotel room. Like the day before, sessions were held in all of the six conference rooms at the same time. As you can see from below, my session was set to be held in Van Ness at 2 p.m.
Finally, the time came, I got my self prepared, geared up with a microphone, and waited for audience. Thankfully, many people came, and here is a picture of myself, a little nervous before my talk.
And it began. Let me briefly fill you in with my talk. LINE uses Redis, HBase, and Kafka in the messaging service. We use more than 10,000 Redis nodes to deliver more than 25 billion messages every day. Many companies opt for proxy for Redis clustering, but messaging service is time-critical, so instead of using proxy, LINE uses in-house Redis clustering by making the Redis client side do the sharding. Sharing information is stored in ZooKeeper and applications synchronize the information stored in ZooKeeper. We have a Cluster Manager Server module to handle such cases where a master server of the Redis cluster crashes causing auto failover, or changing hosts. And, to monitor the vast amount of Redis nodes on a second basis, we made the Redis Cluster Monitor module with Scala and Akka. In addition, I’ve shared the issues we encountered in bringing in the “official” Redis Cluster that was introduced in Redis 3.0, and our experience in applying Lettuce, a asynchronous Redis client.
My slides and a recording of my talk has been shared as below. Hope you find them useful.
That was the fastest 45 minutes I’ve had in my life. During the Q&A, I’ve got many questions from developers from San Francisco, other countries, the ones with similar concerns, speakers of other sessions and open source committers. We had to extend our Q&A; even after the designated Q&A time, with the camera and microphone turned off, many people remained with questions and more approached me with questions. So we went outside the room as we had to vacate the room for the next session, and continued with the Q&A.
Here are some of the questions I got and the answers to them.
Q. In regards to dual writing and Read HA, how long does it take to when you retry writing on Kafka?
Currently, retrying the writing operation uses a local Redis queue in the same box where application servers is in. We are considering moving this to Kafka. Although using Kafka would take longer than using Redis, which takes a few hundred micro-seconds, but the whole process will be completed in a few milli-seconds. Having delays in writing on HBase is okay because of HBase’s versioning feature.
Q. For automatic bursting detection, can you detect Redis operation bursting if it happens really quickly?
We determine bursting with the result of the INFO command, every second. We send the MONITOR command afterwards, which means if bursting happens really quickly, there are time where we may not detect bursting. However, we are likely to detect bursting after the mid-point of it, allowing us to detect which part of the code is related. In addition, if CPU is used 100%, running the MONITOR command can make things worse, so we do not attempt to detect bursting automatically if CPU usage goes over a certain level.
Q. In regards to hot keys and replicated cluster, I understand about the increase in read performance. Can we enhance write performance?
There is only so much a single server can handle, and enhancing writing performance for the same key is difficult, leaving us to redesign the business logic to distributing data for the key to many keys.
Q. Why is memory used more with Redis Cluster 3.2?
Redis Cluster 3.2 uses a Sorted Set for mapping keys and slot numbers, which can increase the memory usage if there are many keys in the cluster. We experienced double the memory usage than in standalone Redis. Redis Cluster 4.0 uses radix tree for mapping which means that memory usage is less than that of Redis Cluster 3.2, but still more than standalone Redis.
Q. If you perform write operation with Asynchronous Task Processor, wouldn’t you get inconsistency in between Redis and Hbase?
There can be inconsistency instantaneously, but reading is mostly done by Redis than HBase. Also, due to the nature of our service being messaging, availability has higher priority over instantaneous inconsistency.
After the last sessions on day two, there was a time spared for developers to get to know each other. Not many people left right away, but stayed for chat. I couldn’t take it all into the picture, but there were spaces decorated like restaurants, and some people enjoyed beer outside where you could see the sea right in front of you.
In the evening, there was a standing party only for the speakers, which gave me an opportunity to chat with other speakers, share technical stuff, introduce LINE, and get new ideas for my work. Finally, the long journey had ended for me.
I have to admit, sharing thoughts and know-hows, asking and having open discussions with engineers from all around the world, was fun. There was a clear difference between attending a global conference just as an attendee and as a speaker in so many ways. Having developers interested in my work and getting questions and advices were all an encouraging experience for me.
No pain, no gain. Although the preparation was a rather painful journey for me, and seemed like it’d never end, it was rewarding in the end. One of my biggest worries was my English pronunciation. If I could go back, I’d tell myself not to worry. The backgrounds of attendees were so diverse, although we were speaking the same language, English, the pronunciations were all different. My pronunciation probably didn’t sound so good as much as I anticipated, but still my audience was very engaged in the session. I believe I owe it to the actual content, LINE’s use cases and experiences.
So, in the end, I guess it’s all comes down to technology. Our team is making efforts to share more content with you out there. Also, we are seeking storage engineers to join our voyage, both in Japan and Korea. For those who are interested, please check the links below. Thanks for taking time to read my post.