Setiing up an Hadoop Multi-node instance on Ubuntu can be challenging. In my instance I used my laptop to do it and it can be tricky as I ran 2 VM’s with 2GB RAM, which makes everything a bit slow…thanks to my new Apple MacBook Pro with 8GB RAM I had no worries.
I will break this tutorial into a few parts just to make it more organized and so you can track your progress. Remember you will have the follow each of these parts twice on each of your machines (master and slave).
- Part 1: Setting up your Ubuntu Environment
- Part 2: Configure the /etc/hosts file
- Part 3: SSH Setup
- Part 4: Download and configuring Hadoop
- Part 5: Configure Master Slave Settings
- Part 6: Starting Master Slave Setup
- Part 7: Running first Map Reduce on Multi-Node Setup
Part 1: Setting up your Ubuntu Environment:
By default Ubuntu does not come with Sun Java installed so you will have to install it. This is an easy way to install it via the command line.
> sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" > sudo apt-get update > sudo apt-get install sun-java6-jdk
Java is installed, lets now export Java:
> export JAVA_HOME=/usr/lib/jvm/java-6-sun
By default Ubuntu will not have ssh installed, so let’s install it from the command line
> sudo apt-get install ssh
It is recommended not to run hadoop under your current user/group we will create a new user and group. We will call the user hduser and group hd. The command looks like follows:
> sudo addgroup hd > sudo adduser --ingroup hd hduser
The last thing we need to do is to disable IPV6 for Hadoop. After you have downloaded Hadoop add this line to conf/hadoop-env.sh:
> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
Part 2: Setup /etc/hosts file
You need to setup the /etc/hosts file with the details of the master and slave IP. Run the following command to edit the hosts file
> sudo vi hosts (use gedit if you don't know vi)
Add the following lines:
172.*.*.* master 172.*.*.* slave
You need to run the command: ifconfig , on your master and slave machine to determine the IP address of the two machines. You then fill the IP address in where I have *.
Part 3: SSH Setup
Let’s configure ssh, run
> su - hduser > ssh-keygen -t rsa -P "" > cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
On the Master machine run the following
> hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave
Test if ssh works for master and slave run:
> ssh master > ssh slave
Part 4: Download and configuring Hadoop
First we need to download the latest hadoop and extract to our local filesystem. Download the latest hadoop from: http://www.reverse.net/pub/apache//hadoop/common/
Extract Hadoop: tar -xvf Hadoop*.tar.gz
Now we need to change ownership of the extracted Hadoop folder to hduser, we can do that with the following command
> sudo chown hduser:hd /home/user/Downloads/hadoop/*
Best to move the hadoop folder out of Downloads folder you can do with the following command:
mv /home/user/Download/hadoop /usr/local/
Now we need to configure $HOME/.bashrc with the Hadoop variables enter the following commands:
> cd ~ > sudo vi .bashrc (if you don't know vi, you can type: sudo gedit .bashrc)
Add the following lines to the end
export JAVA_HOME=/usr/lib/jvm/java-6-sun export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin
Now we are going to create a folder which Hadoop will use to store its data file
> sudo mkdir -p /app/hadoop/tmp > sudo chown hduer:hd /app/hadoop/tmp
Good now can edit the *-sites.xml files in Hadoop/conf. We will add properties to 3 files:
- conf/core-site.xml
- conf/hdfs-site.xml
- conf/mapred-site.xml
Add the following property tags to core-site.xml:
<property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>Temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://master:54310</value> <description>Default file system.</description> </property>
Add the following property tags to mapred-site.xml:
<property> <name>mapred.job.tracker</name> <value>master:54311</value> <description>MapReduce job tracker.</description> </property>
Add the following property tags to hdfs-site.xml:
<property> <name>dfs.replication</name> <value>2</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
Part 5: Configure Master Slave Settings
We will configure the following 2 files on both the master and slave machines.
- conf/masters
- conf/slaves
Let’s start with the Master machine:
- Open the following file: conf/masters and change ‘locahost’ to ‘master’:
master
- Open the following file: conf/slaves and change ‘localhost’ to ‘master’ and ‘slave’
master slave
Now on the Slave machine:
- Open the following file: conf/masters and change ‘locahost’ to ‘slave’:
slave
- Open the following file: conf/slaves and change ‘localhost’ to ‘slave’
slave
Part 6: Starting your Master Slave Setup
Note all of the steps below will be done on the Master machine
First thing we need to do is format the hadoop namenode, run:
> hadoop namenode -format
Starting a multi-node cluster is two steps:
- Start HDFS daemons, run the following command in hadoop/bin
>./start-dfs.sh
Run following command on master > jps
14399 NameNode 16244 DataNode 16312 SecondaryNameNode 12215 Jps
Run following command on slave > jps
11501 DataNode 11612 Jps
- Start Map Reduce daemons, run the following command in hadoop/bin
./start-mapred.sh
Run following command on master > jps
14399 NameNode 16244 DataNode 16312 SecondaryNameNode 18215 Jps 17102 JobTracker 17211 TaskTracker
Run following command on slave > jps
11501 DataNode 11712 Jps 11695 TaskTracker
Part 7: Running first Map Reduce on Multi-Node Setup
If everything was successful you can run your multi-node map reduce job.
Let’s get some ebooks in UTF-8 format:
http://www.gutenberg.org/ebooks/118
Now we need to push the book to our hdfs. Run the command and edit path and filename where you saved the book:
> hadoop dfs -copyFromLocal /home/user/Downloads/*.txt /user/hduser/hdinput
Let’s run our map-reduce example that counts the amount of words in the document:
> hadoop jar ../hadoopexamples-1.0.0.jar wordcount /user/hduser/hdinput /user/hduser/hdinput_result
Check the following logs of the slave machine to see what map-reduce jobs was completed:
> hadoop-hduser-tasktracker-ubuntu.log
> hadoop-hduser-jobtracker-ubuntu.log
> hadoop-hduser-datanode-ubuntu.log
If you get stuck or get an error check my other blog post with tips when running hadoop on ubuntu:
https://thysmichels.com/2012/02/11/tips-running-hadoop-on-ubuntu/
Hope this was helpful, if you have any questions please leave me a contact.
should it not be jdk?
sudo apt-get install sun-java6-sdk ->
sudo apt-get install sun-java6-jdk
Yes thanks. sudo apt-get install sun-java6-jdk
Hello colleagues, pleasant article and fastidious urging commented at this place, I am genuinely enjoying by these.
Hey I know this is off topic but I was wondering if
you knew of any widgets I could add to my blog that automatically tweet my newest twitter updates.
I’ve been looking for a plug-in like this for quite some time and was hoping maybe you would have some
experience with something like this. Please let me know if you run into anything.
I truly enjoy reading your blog and I look forward to your new updates.
I was wondering if you ever thought of changing the layout off youur website?
Its very well written; I love whazt youve got to
say. But maybe you could a little more in the way
of contdnt so people could connect with it better.
Youve got an awful lot of text for only having one or 2 pictures.
Maybe yyou could space it out better?
Wow, that’s what I was seeking for, what a stuff! existing here at this weblog, thanks admin of this web page.
On a similar note, our algorithm does not require such an essential study to run correctly, but it doesn’t hurt.
Cinnamon bark in an oily materials in secretions after the
election, enter the person’s sensory organs, resulting in the unique beautiful
feeling. A China Tour into the heart of the inhabitable regions will reveal
many strange plant varieties.
I love what you guys are up too. This kind of clever work and exposure!
Keep up the wonderful works guys I’ve included you guys to my own blogroll.
You could definitely see your enthusiasm in the article you write.
The arena hopes for more passionate writes like you who
are not afraid tto mention how they believe.
All the time goo after youyr heart.
Hi, every time i used to checck webpage posts here early
in the morning, as i love too gain knowledge of more and more.
Woah! I’m really enjoying the template/theme of this blog.
It’s simple, yet effective. A lot of times it’s tough to get that “perfect balance” between superb usability and visual appeal.
I must say that you’ve done a amazing job with this.
Additionally, the blog loads super quick for me on Opera.
Excellent Blog!
What a data of սn-ambiguity and preserveness
of precious knowledge concernng unpredicted feelings.
Thanks for sharing your thoughts about movie torrent.
Regards
you’re in point of fact a excellent webmaster.
The web site loading speed is incredible. It kind of feels that you’re
doing any distinctive trick. Moreover, The contents are masterwork.
you have performed a excellent task on this subject!
This is a really good tip particularly to those new to the blogosphere.
Simple but very precise information… Appreciate your sharing this one.
A must read article!
I do not know if it’s just me or if everyone else experiencing problems with
your website. It appears like some of the text on your posts
are running off the screen. Can someone else please provide feedback and let me know
if this is happening to them too? This might be a problem with my web browser because I’ve had
this happen before. Thank you
Ѵeery gρod article! We will be linking to this particularly
grreat content on ouг weƄsite. Keep up the reat writing.
The structure is finalized by unbiased perfumer-composer Olivia Giacobetti who is well-known for her talent for
simple, ethereal textures, like steam of grain and liquid in particular, although her imaginative range of expression moves beyond that.
Good day! This is my first comment here so I just wanted to give a quick
shout out and tell you I truly enjoy reading through your articles.
Can you recommend any other blogs/websites/forums that deal with
the same topics? Thanks!
I have to thank you for the efforts you have put
in penning this blog. I’m hoping to check out the same high-grade blog posts by you later
on as well. In truth, your creative writing abilities has inspired me
to get my own, personal website now 😉
I always spent my half an hour to read this website’s content everyday
along with a cup of coffee.
I always spent my half an hour to read this weblog’s posts everyday along with a cup of coffee.