Posted on February 11, 2012 1 Comment
Below is some tips when running Hadoop on Ubuntu. If you find some errors running Hadoop on Ubuntu please comment the problem and how you solved it. When you get this Warning: $HADOOP_HOME is deprecated Solution: add “export HADOOP_HOME_WARN_SUPPRESS=”TRUE”” in the hadoop-env.sh. Cannot create directory `/usr/local/hadoop/libexec/../logs Solution: sudo chown -R hduser:hadoop /usr/local/hadoop/ Enter passphrase when running ./start-all.sh […]
Posted on January 31, 2012 Leave a Comment
Below is a script to read and write to and from large datasets saved as csv files. Writing datasets to an csv file. import csv #writing data into csv file writer = csv.writer(open(‘dataset.csv’, ‘wb’, buffering=0)) writer.writerows([ (‘GOOG’, ‘Google Inc.’, 123.44, 0.32, 0.09), (‘YHOO’, ‘Yahoo! Inc.’, 2.33, 99.23, 0.123), (‘IBM’, ‘IBM Inc.’, 223.44, 212.32, 6.42) ]) […]
Posted on January 27, 2012 1 Comment
Follow the steps below to run an local mrjob. In this example I run an mrjob to calculate word frequency. Prereq: Needs python 2.6 or 2.7 installed this to work. Step 1. Download mrjob: https://github.com/Yelp/mrjob Step 2. Navigate to Yelp/mrjob/examples in your terminal Step 3: Create a Dataset download a dataset from http://www.infochimps.com. Step 4: […]