Setting up Storm cluster server on Azure

This document describes on how to configure a Storm cluster on a set of Ubuntu servers. This guide describes the process on the Azure platform, there are a few specifics, inhouse servers are configured simular. You might think we do it like this, well I wanted te experiance to setup a cluster in the real world. Azure provides a Storm cluster with a few clicks but thats another story. So you are planning to use Storm on Azure, just take the short cut and select HDInsight from the Azure management console and

In order to install a storm cluster the following steps are performed:

  • Create a VM on azure
  • Comfigure SSH
  • Install Java
  • Install Zookeeper
  • Install Storm
  • Configure
    • Configure Nimbus
    • Configure UI
    • Configure Supervisors (worker nodes)

All Storm nodes are identical on the software side and differ only in configuration. Once you installed the storm software create a template from this. Will save you quite a bit of work.

Create a VM’s

Create a Azure Linux node. I selected a blank Ubuntu server 15.04 server.

When creating the server i took all defaults and selected a password.

Easy ssh login

I have a habbit (and think its a good one) to generate passwords. more then 25 chars if possible. But rembering those is hard. So whenever i have access to a shell i create a private/public keypair to connect with ssh. The Geekstuff has a nice tutorial on this.

If you are using windows :-( there is something extra you need to do: http://meinit.nl/using-your-openssh-private-key-in-putty

Installing Java

The installalion of java is straight forward, take the following steps: http://tecadmin.net/install-oracle-java-8-jdk-8-ubuntu-via-ppa/

steps:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer 
sudo apt-get install oracle-java8-set-default

#Installation of Zookeeper

sudo apt-get --yes install zookeeper zookeeperd

Start the server with:

sudo service zookeeper start

#Installing Storm

Prerequisites

Storm installation

Create users for the daemons

On a linux system we do not want any process to run a a root user so we begin by creating a Storm user and group. This will allow us to run the Storm daemons as a specific user rather than the default or root users. Execute the following commands to create the user and group for storm (strangly named storm):

sudo groupadd storm
sudo useradd --gid storm --home-dir /home/storm --create-home --shell /bin/bash storm

Install the Storm software

The package downloaded from the Apache Storm download page website is (I generally take that one) a tar.gz file. In order to use this file you need to unzip and untar this. After the unpacking the it is placed in the /usr/share directory. with a symlink to a non-version specific directory. This allows for easy updates to a newer version. A confinience link to /usr/bin is created as well.

The wollowing commands will do this:

sudo wget [storm download URL]
gunzip apache-storm-0.9.4.tar.gz
tar -xvf apache-storm-0.9.4.tar
sudo mv apache-storm-0.9.4 /usr/share
sudo chown storm:storm /usr/share/apache-storm-0.9.4 -R
sudo ln -s /usr/share/apache-storm-0.9.4 /usr/share/storm
sudo ln -s /usr/share/storm/bin/storm /usr/bin/storm

Logging configuration

By default, Storm will log information to $storm.log.dir/ rather than the /var/log directory that most UNIX services use. To change this, execute the following:

sudo mkdir /var/log/storm
sudo chown storm:storm /var/log/storm
sudo sed -i 's/${storm.log.dir}/\/var\/log\/storm/g' /usr/share/storm/logback/cluster.xml

Storm configuration

When we do a upgrade of the server we want our configuration to be reused.So we place the configuration under the /etc tree and add a symbolic link to the config file for storm to find it.

sudo mkdir /etc/storm
sudo chown storm:storm /etc/storm
sudo mv /usr/share/storm/conf/storm.yaml /etc/storm/
sudo ln -s /etc/storm/storm.yaml /usr/share/storm/conf/storm.yaml

With Storm installed, we’re now ready to configure Storm and set up the Storm daemons so they start automatically.

Running the daemons

All of the Storm daemons are fail-fast by design, meaning the process will halt whenever an unexpected error occurs. This allows individual components to safely fail and successfully recover without affecting the rest of the system. This means that the Storm daemons need to be restarted immediately whenever they die unexpectedly. The technique for this is known as running a process under supervision.

we’ll use the supervisor package that’s readily available on most distributions. Unfortunately, the supervisor name collides with the name of Storm’s supervisor daemon. To clarify this distinction, we’ll refer to the non-Storm process supervision daemon as supervisord (note the added d at the end) in the text, even though sample code and commands will use the proper name without the added d.

Installing Unix supervisor

To install supervisord on Ubuntu, use the following command:

sudo apt-get --yes install supervisor

This will install and start the supervisord service. See Managing Supervisor and Supervisor documentation for more details on the configuration. Most important is that the configuration will be at /etc/supervisor/supervisord.conf . Supervisord’s configuration file will automatically include any files matching the pattern *.conf in the /etc//conf.d/ directory, and this is where we’ll place our config files to run the Storm UI and nimbus under supervision.

For each Storm daemon command we want to run under supervision, we’ll create a configuration file that contains the following: • A unique (within the supervisord configuration) name for the service under supervision. • The command to run. • The working directory in which to run the command. • Whether or not the command/service should be automatically restarted if it exits. For fail-fast services, this should always be true. • The user that will own the process. In this case, we will run all Storm daemons with the Storm user as the process owner.

For the nimbus, Storm UI and Storm Supervisor server configuration files are required. The two following files set up the Storm daemons to be automatically started (and restarted in the event of unexpected failure) by the supervisord service:

Nimbus

sudo vi /etc/supervisord/conf.d/storm-nimbus.conf

press ‘i’ and paste the following:

[program:storm-nimbus]
command=storm nimbus
directory=/home/storm
autorestart=true
user=storm
log_stdout=true
log_stderr=true

Storm UI

sudo vi /etc/supervisord/conf.d/storm-ui.conf

press ‘i’ and paste the following:

[program:storm-ui]
command=storm ui
directory=/home/storm
autorestart=true
user=storm
log_stdout=true
log_stderr=true

Storm Supervisor

sudo vi /etc/supervisord/conf.d/storm-supervisor.conf

press ‘i’ and paste the following:

[program:storm-supervisor]
command=storm supervisor
user=storm
directory=/home/storm
autostart=true
autorestart=true
startsecs=10
startretries=999
log_stdout=true
log_stderr=true
logfile=/var/log/storm/supervisor.out
logfile_maxbytes=20MB
logfile_backups=10

To activate the new configuration by executing the command:

sudo service supervisor restart

The supervisord service will load the new configurations and start the Storm daemons. Wait a moment or two for the Storm services to start and then verify the nimbus and UI are up and running visiting the following URL in a web browser (replace localhost with the host name or IP address of the actual machine):

http://localhost:8080

This will bring up the Storm UI graphical interface. It should indicate that the cluster up with no supervisors.

If for some reason the Storm UI does not come check the following log files for errors:

  • Storm UI: Check the ui.log file under /var/log/storm to check for errors
  • Nimbus: Check the nimbus.log file under /var/log/storm to check for errors

Configuring

The main configuration file from storm is the storm.yaml file. This file contains all information required to setup the cluster. In most cases all defaults that areavailable are good ones to start with. You can find the file in the _$STORMHOME/conf directory or as i did with a symlink to /etc/storm/conf directory.

Mandatory parameters

The following parameters are mandatory to get your cluster up and running:

  • storm.zookeeper.servers The location, hostname of the zookeeeper server.

    storm.zookeeper.servers

    • “zookeeper.host.name”
  • nimbus.host Host name of the nimbus server.

  • storm.local.dir Location on where nimbus and the supervisor store data.

    storm.local.dir: “/home/storm”

Some documentation states that the worker port numbers should be configured too. Although they are correct the defaults already have decent default values. The range 6700 through 6703 are default. Thus 4 workers per supervisor node.

Trouble shouting on Azure

I started creating the superviser node by means of a template. Install Storm configure and create a template from it. One thing i missed and not document (or at least could not find it) it that the storm-local directory contains information about the ID that is generated/used. a id like: a303f508-d3a0-4c43-bcf5-4af9e4560eb7. When I rolled out the images for a VM it showedup as a single supervisor. since all ID are the same. So when you copy the image make sure that the storm-local directory is empty.

References