Hadoop Category

Hadoop in Practice

I had the privilege to get an early release of the Hadoop in Practice book from Manning Publishers. The book has the following chapters: Table of Contents   1: Getting started – FREE Part I: Data Logistics   2: Moving Data in and out of Hadoop   3: Data Serialization: Working with Text and BeyondPart II: Big Data Patterns   4: Applying MapReduce […]

Rate this:

Read More

Hadoop Multi-node setup on Ubuntu

Setiing up an Hadoop Multi-node instance on Ubuntu can be challenging. In my instance I used my laptop to do it and it can be tricky as I ran 2 VM’s with 2GB RAM, which makes everything a bit slow…thanks to my new Apple MacBook Pro with 8GB RAM I had no worries. I will […]

Rate this:

Read More

Working with HDFS Java Example

This is a java shows how we can work with the Hadoop File System. Prerequisite for using the code in Eclipse is that you download and add the following jars to your project libraries:  hadoop-core-0.20.2.jar commons-logging-*.jar See comments in code: In this example we created a HDFS Configuration, specified a Path for our file, Read […]

Rate this:

Read More

Tips running Hadoop on Ubuntu

Below is some tips when running Hadoop on Ubuntu. If you find some errors running Hadoop on Ubuntu please comment the problem and how you solved it. When you get this Warning: $HADOOP_HOME is deprecated Solution: add “export HADOOP_HOME_WARN_SUPPRESS=”TRUE””  in the hadoop-env.sh. Cannot create directory `/usr/local/hadoop/libexec/../logs Solution: sudo chown -R hduser:hadoop /usr/local/hadoop/ Enter passphrase when running ./start-all.sh […]

Rate this:

Read More

Installing Hadoop on Windows

Below is the steps you can follow to install Hadoop on windows: Step 1.I downloaded the following file: http://www.poolsaboveground.com/apache//hadoop/core/hadoop-0.23.0/hadoop-0.23.0.tar.gz/ Step 2. Copy into C:/Cygwin/home folder. Step 3. Extract: tar -xvf hadoop-0.23.0.tar.gz Step 4. Open up /hadoop/conf/yarn-site.xml. Copy the following between </configuration></configuration> <!– Site specific YARN configuration properties –> <property> <name>fs.default.name</name> <value>hdfs://localhost:9100</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:9101</value> </property> […]

Rate this:

Read More

Java-based HDFS API Tutorial

In this tutorial I show how to use Java to interact with your Hadoop Distributed File System (HDFS) using libHDFS. This Java program creates a file named hadoop.txt, writes a short message into it, then reads it back and prints it to the screen. If the file already existed, it is deleted first.   import java.io.File; […]

Rate this:

Read More

Hadoop Distributed File System (HDFS) Tutorial

HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications. One of […]

Rate this:

Read More

Running Hadoop on Ubuntu

Below is the steps to run your first Hadoop job after you have installed Hadoop. Step 1. Format the NameNode: Initializes the directory specified by the dfs.name.dir variable. sudo -u hdfs hadoop namenode -format Output 12/01/30 11:51:33 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ubuntu/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = […]

Rate this:

Read More

Install Apache Big Top on Ubuntu

Below is the steps you can follow to install Appache Big Top on Ubuntu. For those that don’t know Apache BigTop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. Step 1: First you have to install the Big GPG Key wget -O- http://www.apache.org/dist/incubator/bigtop/stable/repos/GPG-KEY-bigtop | sudo apt-key add – […]

Rate this:

Read More