LINE Developer Meetup in Fukuoka #17: Firsthand Report
Hi, I’m Oshiro on LINE Fukuoka’s Data Analysis Team, and I’ll be giving you a firsthand report on the “LINE Developer Meetup in Fukuoka #17” that took place here in Fukuoka on February 23.
This presentation was given by Hongo, a data engineer here at LINE Fukuoka. As data collection is a problem that we inevitably run into when starting to do data analysis, this session introduces how to construct web crawlers that easily let users collect data using various libraries.
Constructing a Crawling Environment that Supports Data Analysis from LINE Corporation In the first half of the presentation, Hongo explained some basics, including the difference between the terms “crawling” and “scraping”, categorization of crawling acquisition sources, and what we need to be aware of when crawling.
Personally, I thought that if it were a small-scale crawler, it would be best to insert the processing of each scraping when doing a crawl; however, Hongo recommended separating the crawler and scraper, stating the two advantages below. ・Crawls do not need to be redone even if scrapings fail. ・The crawler and scraper can be scaled.
The middle part of Hongo’s presentation had to do with the use of “Scrapy” (Python’s framework for scraping) as a primary tool, and he demonstrated how it can be used for crawling and scraping through a live coding session. According to Hongo, the learning curve for becoming proficient with Scrapy is quite steep. But after watching him complete a scraping in around 10 minutes in his demonstration, I got the impression that Scrapy is an easy tool to use once you've gotten used to it. Scrapy was recommended not only because of the speed with which it can be implemented but also because it's stable and can handle retry and distributed processing.
Finally, Hongo described several tips that he called “crawling techniques”. One good example was his idea of using a Google cache to lighten the access load on crawl destination services. You can take a look at the slides for further details.
This session about real-time image conversion implemented at LINE Fukuoka’s Hackathon was given by TKengo, a member of the Data Analysis Team at LINE Fukuoka. Real-time Style Transfer and Its Future
Real-time Style Transfer and Its Future from LINE Corporation This technical content was introduced in a previous LINE Engineering Blog, so I’ll just give a summary and my impressions.
Since TKengo’s is mainly responsible for data analysis, his regular work doesn't have much to do with image conversion services. However, he was motivated to take the opportunity to try something that he couldn't usually do during work at the hackathon and decided to take on the challenge of real-time image conversion.
In his presentation, TKengo talked about going through the process of trial and error as it's almost impossible to get everything right during prototype development. Particularly in the case of real-time image conversion, there is a tradeoff between the two focal points of data volume and calculation speed. Assuming that it would bridge the gap, the application he initially built on an Android device took dozens of seconds to do the image conversion processing when he tried implementing it, so he ultimately changed the composition. By forwarding data with an RTMP protocol, writing a C-language filter, and doing processing on a server using FFmpeg and TensorFlow, he was able to resolve the problem of slow processing time.
To be honest, I couldn’t keep up with the image conversion logic portion of the presentation, but my overall impression of the session was that TKengo was passionate about encouraging others to use hackathons as an opportunity to try out new technology.
This may be a little off-topic, but with regards to the real-time processing of streaming videos, I felt that our developers really are tackling the challenges of data volume and calculation speed with the company's LINE LIVE service which give them the opportunity to learn more about how to improve its architecture.
For your reference, here are some slides about LINE LIVE’s architecture. A 7 Architecture Sustaining LINE LIVE from LINE Corporation
After all of the presentations were finished, we had a mixer where we raised our glasses to celebrate everyone's participation. It was a lively party with lighting talks that LINE employees and guests could freely join.
From the stories I heard from a few participants that I talked to at the mixer, I got the impression that there are many engineers who want to learn about crawling because they have no data with which to start data analysis. I also felt that while there is a need for data analysis in Fukuoka, data collection itself is one of the challenges.
Although I just joined the Data Analysis Team at LINE Fukuoka this year, I get to work in a productive environment, pursue projects just like I would in Tokyo (to which I take periodic business trips), and I can save time on my commute since the office is near my home.
Anyone interested is welcome to apply to positions at both the Tokyo and Fukuoka offices.