Hadoop Multi-node setup on Ubuntu

Setiing up an Hadoop Multi-node instance on Ubuntu can be challenging. In my instance I used my laptop to do it and it can be tricky as I ran 2 VM’s with 2GB RAM, which makes everything a bit slow…thanks to my new Apple MacBook Pro with 8GB RAM I had no worries.

I will break this tutorial into a few parts just to make it more organized and so you can track your progress. Remember you will have the follow each of these parts twice on each of your machines (master and slave).

  • Part 1: Setting up your Ubuntu Environment 
  • Part 2: Configure the /etc/hosts file 
  • Part 3: SSH Setup 
  • Part 4: Download and configuring Hadoop
  • Part 5: Configure Master Slave Settings
  • Part 6: Starting Master Slave Setup
  • Part 7: Running first Map Reduce on Multi-Node Setup

Part 1: Setting up your Ubuntu Environment:

By default Ubuntu does not come with Sun Java installed so you will have to install it. This is an easy way to install it via the command line.

> sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
> sudo apt-get update
> sudo apt-get install sun-java6-jdk

Java is installed, lets now export Java:

> export JAVA_HOME=/usr/lib/jvm/java-6-sun

By default Ubuntu will not have ssh installed, so let’s install it from the command line

> sudo apt-get install ssh

It is recommended not to run hadoop under your current user/group we will create a new user and group. We will call the user hduser and group hd. The command looks like follows:

> sudo addgroup hd
> sudo adduser --ingroup hd hduser

The last thing we need to do is to disable IPV6 for Hadoop. After you have downloaded Hadoop add this line to conf/hadoop-env.sh:

> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

Part 2: Setup /etc/hosts file

You need to setup the /etc/hosts file with the details of the master and slave IP. Run the following command to edit the hosts file

> sudo vi hosts (use gedit if you don't know vi)

Add the following lines:

172.*.*.*       master
172.*.*.*       slave

You need to run the command: ifconfig , on your master and slave machine to determine the IP address of the two machines. You then fill the IP address in where I have *.

Part 3: SSH Setup

Let’s configure ssh, run

> su - hduser
> ssh-keygen -t rsa -P ""
> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

On the Master machine run the following

> hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave

Test if ssh works for master and slave run:

> ssh master
> ssh slave

Part 4: Download and configuring Hadoop

First we need to download the latest hadoop and extract to our local filesystem. Download the latest hadoop from: http://www.reverse.net/pub/apache//hadoop/common/

Extract Hadoop: tar -xvf Hadoop*.tar.gz

Now we need to change ownership of the extracted Hadoop folder to hduser, we can do that with the following command

> sudo chown hduser:hd /home/user/Downloads/hadoop/*

Best to move the hadoop folder out of Downloads folder you can do with the following command:

mv /home/user/Download/hadoop /usr/local/

Now we need to configure $HOME/.bashrc with the Hadoop variables enter the following commands:

> cd ~
> sudo vi .bashrc (if you don't know vi, you can type: sudo gedit .bashrc)

Add the following lines to the end

export JAVA_HOME=/usr/lib/jvm/java-6-sun
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

Now we are going to create a folder which Hadoop will use to store its data file

> sudo mkdir -p /app/hadoop/tmp
> sudo chown hduer:hd /app/hadoop/tmp

Good now can edit the *-sites.xml files in Hadoop/conf. We will add properties to 3 files:

  • conf/core-site.xml
  • conf/hdfs-site.xml
  • conf/mapred-site.xml

Add the following property tags to core-site.xml:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>Temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://master:54310</value>
  <description>Default file system.</description>
</property>

Add the following property tags to mapred-site.xml:

<property>
  <name>mapred.job.tracker</name>
  <value>master:54311</value>
  <description>MapReduce job tracker.</description>
</property>

Add the following property tags to hdfs-site.xml:

<property>
  <name>dfs.replication</name>
  <value>2</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

Part 5: Configure Master Slave Settings
We will configure the following 2 files on both the master and slave machines.

  • conf/masters
  • conf/slaves

Let’s start with the Master machine:

  • Open the following file: conf/masters and change ‘locahost’ to ‘master’:
master
  • Open the following file: conf/slaves and change ‘localhost’ to ‘master’ and ‘slave’
master
slave

Now on the Slave machine:

  • Open the following file: conf/masters and change ‘locahost’ to ‘slave’:
slave
  • Open the following file: conf/slaves and change ‘localhost’ to ‘slave’
slave

Part 6: Starting your Master Slave Setup
Note all of the steps below will be done on the Master machine
First thing we need to do is format the hadoop namenode, run:

> hadoop namenode -format

Starting a multi-node cluster is two steps:

  • Start HDFS daemons, run the following command in hadoop/bin
>./start-dfs.sh

Run following command on master > jps

14399 NameNode
16244 DataNode
16312 SecondaryNameNode
12215 Jps

Run following command on slave > jps

11501 DataNode
11612 Jps
  • Start Map Reduce daemons, run the following command in hadoop/bin
./start-mapred.sh

Run following command on master > jps

14399 NameNode
16244 DataNode
16312 SecondaryNameNode
18215 Jps
17102 JobTracker
17211 TaskTracker

Run following command on slave > jps

11501 DataNode
11712 Jps
11695 TaskTracker

Part 7: Running first Map Reduce on Multi-Node Setup

If everything was successful you can run your multi-node map reduce job.

Let’s get some ebooks in UTF-8 format:

http://www.gutenberg.org/ebooks/118

Now we need to push the book to our hdfs. Run the command and edit path and filename where you saved the book:

> hadoop dfs -copyFromLocal /home/user/Downloads/*.txt /user/hduser/hdinput

Let’s run our map-reduce example that counts the amount of words in the document:

> hadoop jar ../hadoopexamples-1.0.0.jar wordcount /user/hduser/hdinput /user/hduser/hdinput_result

Check the following logs of the slave machine to see what map-reduce jobs was completed:
> hadoop-hduser-tasktracker-ubuntu.log
> hadoop-hduser-jobtracker-ubuntu.log
> hadoop-hduser-datanode-ubuntu.log

If you get stuck or get an error check my other blog post with tips when running hadoop on ubuntu:

https://thysmichels.com/2012/02/11/tips-running-hadoop-on-ubuntu/

Hope this was helpful, if you have any questions please leave me a contact.

Advertisements

23 Comments on “Hadoop Multi-node setup on Ubuntu

  1. Pingback: Hadoop Installation Tutorial – Scott Pustay

  2. Hey I know this is off topic but I was wondering if
    you knew of any widgets I could add to my blog that automatically tweet my newest twitter updates.

    I’ve been looking for a plug-in like this for quite some time and was hoping maybe you would have some
    experience with something like this. Please let me know if you run into anything.
    I truly enjoy reading your blog and I look forward to your new updates.

  3. I was wondering if you ever thought of changing the layout off youur website?
    Its very well written; I love whazt youve got to
    say. But maybe you could a little more in the way
    of contdnt so people could connect with it better.

    Youve got an awful lot of text for only having one or 2 pictures.
    Maybe yyou could space it out better?

  4. Wow, that’s what I was seeking for, what a stuff! existing here at this weblog, thanks admin of this web page.

  5. On a similar note, our algorithm does not require such an essential study to run correctly, but it doesn’t hurt.

    Cinnamon bark in an oily materials in secretions after the
    election, enter the person’s sensory organs, resulting in the unique beautiful
    feeling. A China Tour into the heart of the inhabitable regions will reveal
    many strange plant varieties.

  6. You could definitely see your enthusiasm in the article you write.

    The arena hopes for more passionate writes like you who
    are not afraid tto mention how they believe.
    All the time goo after youyr heart.

  7. Woah! I’m really enjoying the template/theme of this blog.
    It’s simple, yet effective. A lot of times it’s tough to get that “perfect balance” between superb usability and visual appeal.
    I must say that you’ve done a amazing job with this.
    Additionally, the blog loads super quick for me on Opera.
    Excellent Blog!

  8. you’re in point of fact a excellent webmaster.
    The web site loading speed is incredible. It kind of feels that you’re
    doing any distinctive trick. Moreover, The contents are masterwork.
    you have performed a excellent task on this subject!

  9. I do not know if it’s just me or if everyone else experiencing problems with
    your website. It appears like some of the text on your posts
    are running off the screen. Can someone else please provide feedback and let me know
    if this is happening to them too? This might be a problem with my web browser because I’ve had
    this happen before. Thank you

  10. The structure is finalized by unbiased perfumer-composer Olivia Giacobetti who is well-known for her talent for
    simple, ethereal textures, like steam of grain and liquid in particular, although her imaginative range of expression moves beyond that.

  11. Good day! This is my first comment here so I just wanted to give a quick
    shout out and tell you I truly enjoy reading through your articles.
    Can you recommend any other blogs/websites/forums that deal with
    the same topics? Thanks!

  12. I have to thank you for the efforts you have put
    in penning this blog. I’m hoping to check out the same high-grade blog posts by you later
    on as well. In truth, your creative writing abilities has inspired me
    to get my own, personal website now 😉

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: