Tag Archives: Hadoop

ダウンタイムなしでHadoopクラスタを移行した時の話

こんにちは、Data Platform室の小野です。Data Platform室では、昨年のLINE DEVELOPER DAYでも発表があったように、大規模なHadoopクラスタを運用しています。


この記事では、そのときどのようにHadoopクラスタを移行したのか、そしてどのような問題が起こったのかについて、ご紹介します。

Comprehensive Security for Hadoop

(This is the 8th article of LINE Advent Calendar 2016)

Hello everyone, this is Neil Tu from Data Labs. I am in charge of Hadoop architecture at Line Corp. I construct and manage Hadoop clusters and their ecosystems, and supply a high availability, and high performance platform for the engineers and data analysts in our group.

Today, the topic we are going to talk about is “Comprehensive Security for Hadoop”.

Abstract

Nowadays, Hadoop has become a popular platform for data storage, data analysis, reporting, and distributed calculations. Basically, Hadoop cluster is an open platform that supplies users with the required resources and HDFS capacity to execute queries. But as you know, Hadoop cluster comprises of many different componments with their own administration models, such as HDFS, Yarn, hive etc. It needs to access each componment to modify or edit access permissions. This is hard to manage, so a central management tool is necessary. Maybe it is better to name it ‘Framework’. Currently, there are some united open source administration management frameworks. Ranger for Hortonworks, and Sentry for Cloudera. Beside this, Ambari, HDFS and Yarn all provide a UI to track the status of a job or the job history. Sometimes you don’t want the information of a cluster to be seen by others, so you may need a tool which can do the user authentication for you. For this requirement, Knox can help you to achieve. You can regard Knox as a reverse proxy which provides a single REST API access point of authentication and access for Hadoop services.