Java-based HDFS API Tutorial

In this tutorial I show how to use Java to interact with your Hadoop Distributed File System (HDFS) using libHDFS.

This Java program creates a file named hadoop.txt, writes a short message into it, then reads it back and prints it to the screen. If the file already existed, it is deleted first.

 

  import java.io.File;
  import java.io.IOException;

  //Import LibHDFS Packages
  import org.apache.hadoop.conf.Configuration;
  import org.apache.hadoop.fs.FileSystem;
  import org.apache.hadoop.fs.FSDataInputStream;
  import org.apache.hadoop.fs.FSDataOutputStream;
  import org.apache.hadoop.fs.Path;

  // Create Class
  public class HDFSExample {
   public static final String FileName = "hadoop.txt";
   public static final String message = "My First Hadoop API call!\n";

   public static void main (String [] args) throws IOException {
     //Initialize new default Hadoop Configuration
     Configuration conf = new Configuration();
     //Initialize new abstract Hadoop FileSystem
     FileSystem fs = FileSystem.get(conf);
     //Specify File Path of Hadoop File System
     Path filenamePath = new Path(theFilename);

     try {
       //Check if file doesn't exist
       if (fs.exists(filenamePath)) {
         // if file exist, remove file first
         fs.delete(filenamePath);
      }
       //Write Configuration to File
       FSDataOutputStream out = fs.create(filenamePath);
       out.writeUTF(message);
       out.close();

       //Open Config file to read
       FSDataInputStream in = fs.open(filenamePath);
       String messageIn = in.readUTF();
       System.out.print(messageIn);
       in.close();
     } catch (IOException ioe) {
       System.err.println("IOException during operation: " + ioe.toString());
       System.exit(1);
     }
  }
 }

For more Information:

Complete JavaDoc for the HDFS API is provided at http://wiki.apache.org/hadoop/LibHDFS

4 Comments

  1. Joe Futrelle says:

    Thanks, one issue with this is that it doesn’t compile.

    theFilename is undeclared. I assume this is the same variable as FileName?

    I modified the two names to agree, and also modded the code to add the core-site.xml config file from my production HDFS instance, like this:

    conf.addResource(new Path(“/usr/lib/hadoop/conf/core-site.xml”));

    and it worked. Thanks for the leg up! But you’re gonna wanna fix that variable name issue.

  2. Thys Michels says:

    Yes correct!

  3. anjucs15 says:

    i m getting the following error——-

    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
    at org.apache.hadoop.conf.Configuration.(Configuration.java:153)
    at pkgHdfs.HDFSClient.main(HDFSClient.java:192)
    Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    … 2 more
    Java Result: 1

  4. swagatika says:

    Hi,
    Can you please suggest me a way to read the contents of a file which is of RC file format using jave ?

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s