Running local mrjob streaming hadoop jobs
Follow the steps below to run an local mrjob. In this example I run an mrjob to calculate word frequency.
Prereq: Needs python 2.6 or 2.7 installed this to work.
Step 1. Download mrjob:
Step 2. Navigate to Yelp/mrjob/examples in your terminal
Step 3: Create a Dataset download a dataset from http://www.infochimps.com.
Step 4: Test your environment and make sure mrjob works, run:
This will show no errors or dependency issues.
Step 4: Running your mrjob
python mr_word_freq_count.py log1 > counts
log1 input was: (note each line was tabbed delimited)
test one two three four five one two test
"five" 1 "four" 1 "one" 2 "test" 2 "three" 1 "two" 2