This is a follow up on the previous post:
https://thysmichels.com/2012/01/31/hadoop-configuring-distributed-file-system-hdfs-tutorial/
Starting Hadoop Distributed File System
Now we will format the file system we just configured. Important: This process should only be performed once
bin/hadoop namenode -format
Now we can start the the HDFS
bin/start-dfs.sh
The start-dfs.sh will start the NameNode server on the master machine and also starts the DataNode on the slave machines. Note: in a single instance the NameNode and DataNode will have the same name. In a clustered HDFS you will have to ssh into each slave and start the DataNode.
Interacting with Hadoop Distributed File System
The bulk of commands that communicate with the cluster are performed by a monolithic script named bin/hadoop. This will load the Hadoop system with the Java virtual machine and execute a user command. The commands are specified in the following form:
bin/hadoop moduleName -cmd args...
moduleName : subset of Hadoop to use
cmd : module to execute
Exaples of Hadoop Distributed File System
Listing file :
bin/hadoop dfs -ls /
Insert Data into the Cluster (3 steps):
Step 1. Create hadoop user
bin/hadoop dfs -mkdir /user/username
Step 2: Put file to cluster
bin/hadoop dfs -put /home/username/DataSet.txt /user/username/
Step 3: Verify the file is in HDFS
dfs -ls /user/yourUserName
Uploading multiple files at once (specify directory to upload):
bin/hadoop -put /myfiles /user/username
Note: Another synonym for -put is -copyFromLocal.
Display Files in HDFS:
bin/hadoop dfs -cat file
(will display in console)
Copy File from HDFS:
bin/hadoop dfs -get file localfile
How can we interact with HDFS from an external node (Not a datanode not a Namenode)