Archives

Tips running Hadoop on Ubuntu

Below is some tips when running Hadoop on Ubuntu. If you find some errors running Hadoop on Ubuntu please comment the problem and how you solved it. When you get this Warning: $HADOOP_HOME is deprecated Solution: add “export HADOOP_HOME_WARN_SUPPRESS=”TRUE””  in the hadoop-env.sh. Cannot create directory `/usr/local/hadoop/libexec/../logs Solution: sudo chown -R hduser:hadoop /usr/local/hadoop/ Enter passphrase when running ./start-all.sh […]

Rate this:

Read More

Python writing reading large datasets script

Below is a script to read and write to and from large datasets saved as csv files. Writing datasets to an csv file. import csv #writing data into csv file writer = csv.writer(open(‘dataset.csv’, ‘wb’, buffering=0)) writer.writerows([ (‘GOOG’, ‘Google Inc.’, 123.44, 0.32, 0.09), (‘YHOO’, ‘Yahoo! Inc.’, 2.33, 99.23, 0.123), (‘IBM’, ‘IBM Inc.’, 223.44, 212.32, 6.42) ]) […]

Rate this:

Read More

Running local mrjob streaming hadoop jobs

Follow the steps below to run an local mrjob. In this example I run an mrjob to calculate word frequency. Prereq: Needs python 2.6 or 2.7 installed this to work. Step 1. Download mrjob: https://github.com/Yelp/mrjob Step 2. Navigate to Yelp/mrjob/examples in your terminal Step 3: Create a Dataset download a dataset from http://www.infochimps.com. Step 4: […]

Rate this:

Read More