Apex Code: Auto Convert Web-to-Lead Check Duplicates Create Opportunity

Use Case: Web-to-lead form needs to be auto converted to Account, Contact and Opportunity. Create a trigger for the Web-to-lead record type. Check if that contact email already exist. If it does just create an opportunity.

The trigger will look like this:

trigger autoConvertLead on Lead (before insert,after insert, after update) {
List<Lead> marketoLeadsToBeConverted = new List<Lead>();
 Set<String> emailSet = new Set<String>();
 boolean isChanged = false;
 Lead oldLeadInfo = null;
 Set<Id> rLDBLeadIdSet = new Set<Id>();
 Map<String, Lead> mobileFALMap = new Map<String, Lead>();

 for (Lead leadInfo : trigger.new)
 isChanged = false;
 if (trigger.isAfter) 
 if (!leadInfo.IsConverted) 
 if (leadInfo.email != null ) 
 if (Trigger.isInsert) 
 isChanged = true;

 else if (Trigger.isUpdate) 
 oldLeadInfo = trigger.oldMap.get(leadInfo.Id);
 isChanged = Utils.hasChanges('Email', oldLeadInfo, leadinfo);
 if (isChanged) 
 if (leadInfo.RLDB__c) 
 if (leadInfo.RecordTypeId == [Select Id from RecordType where Name='Mobile FAL Leads'].Id)
 mobileFALMap.put(leadInfo.email, leadInfo); 
 } else {
 leadInfo.Created_Date__c = System.today();

 if (marketoLeadsToBeConverted != null && marketoLeadsToBeConverted.size() > 0) {
 if (rLDBLeadIdSet != null && rLDBLeadIdSet.size() > 0) {
 if (!mobileFALMap.isEmpty())

The Apex class to check for duplicate emails, convert lead and create opportunities.

public static void convertMobileFALLeads(Map<String, Lead> mobileFALLeads)
 Set<String> checkduplicateEmails = new Set<String>();
 Map<String,Contact> contactMap = getContactsMatchingEmailsGiven(checkduplicateEmails);
 List<Opportunity> createPlanOpportunity = new List<Opportunity>();

 for (String duplicateEmailContact : mobileFALLeads.keySet())
 if (contactMap.containsKey(duplicateEmailContact))
 createPlanOpportunity.add(ACRoundRobinV5.CreatePlanOpportunityForIncorpOpp(contactMap.get(duplicateEmailContact).AccountId, contactMap.get(duplicateEmailContact).Account.OwnerId, 'Pro Legal Plan - Annual (A)', 'Mobile FAL')); 

 Database.LeadConvert lc = new Database.LeadConvert();
 List<Database.LeadConvert> listOflc = new List<Database.LeadConvert>();

 String convertedStatus = getLeadConvertionStatus();

 for (Lead leadsToBeConverted : mobileFALLeads.values())

 for (Database.LeadConvert convertlead : listOflc)
 Database.LeadConvertResult mobileFalLCR = Database.convertLead(convertlead);
 createPlanOpportunity.add(ACRoundRobinV5.CreatePlanOpportunityForIncorpOpp(mobileFalLCR.getAccountId(), convertlead.getOwnerId(), 'Pro Legal Plan - Annual (A)', 'Mobile FAL'));
 insert createPlanOpportunity;

PyCon2012 Intermediate Python Solutions


import types
import unittest

class TestFunctional(unittest.TestCase):
    def test_functional(self):
        # Lambda
        # create a lambda statement that adds 2 to it's input
        # assign the statement to a variable named ``add_2``
        # ================================
        add_2=lambda a: a+2
        self.assert_(isinstance(add_2, types.LambdaType))
        self.assertEquals(add_2(4), 6)

        # Lamda 2
        # Create an ``is_odd`` lambda statement,
        # that returns True if the input is odd
        # ================================
        is_odd=lambda a: a%2 != 0
        self.assert_(isinstance(is_odd, types.LambdaType))
        self.assertEquals(is_odd(5), True)
        self.assertEquals(is_odd(4), False)

        # Map
        # Create a list ``digits`` with the numbers from 0 to 9
        # Create a new list ``two_more`` from digits by
        # mapping ``add_2`` to the elements of digits
        # ================================
        digits = range(0,10)
        two_more= map (add_2, digits)
        self.assertEquals(digits, range(0, 10))
        self.assertEquals(two_more, range(2,12))

        # Reduce
        # Add the values of digits by using
        # ``reduce`` with the ``operator.add``
        # function, store the result in ``digit_sum``
        # ================================
        import operator
        digit_sum = reduce(operator.add, digits)
        self.assertEquals(digit_sum, 45)

        # Filter
        # use ``filter`` to get the odd digits of
        # ``two_more``, store results in ``two_odd``
        # ================================
        two_odd=filter(lambda two_more: two_more%2!=0, two_more)
        self.assertEquals(two_odd, [3, 5, 7, 9, 11])

if __name__ == '__main__':



import types
import unittest

class TestClosures(unittest.TestCase):
    def test_closure(self):
        # Closure
        # Create a function ``echo_creator`` that returns a function that returns what was passed into it
        # ================================
        def echo_creator():
            def echo_in(innum):
                return innum
            return echo_in
        echo = echo_creator()
        self.assert_(isinstance(echo, types.FunctionType))
        self.assertEquals(echo(5), 5)
        self.assertEquals(echo("foo"), "foo")

        # Closure 2
        # Create a function ``mult_factory`` that accepts
        # a number and returns a function that multiples its
        # input by that number
        # ================================
        def mult_factory(num):
            def mul_num(mulnum):
                return num * mulnum
            return mul_num
        mult5 = mult_factory(5)
        self.assert_(isinstance(mult5, types.FunctionType))
        self.assertEquals(mult5(5), 25)
        self.assertEquals(mult5("f"), "fffff")

if __name__ == '__main__':


import unittest
import types

class TestIterators(unittest.TestCase):
    def test_iterators(self):
        # Iterators
        # Create a list ``a_list`` with [0,1,2] in it.
        # Get an iterator ``an_iter`` for ``a_list`` by calling
        # the ``iter`` function on it
        # ================================
        an_iter = iter(a_list)
        self.assertEquals(next(an_iter), 0)
        self.assertEquals(next(an_iter), 1)
        self.assertEquals(next(an_iter), 2)
        self.assertRaises(StopIteration, next, an_iter)

        # Iterators
        # Create a list, ``b_list`` with [4, 3, 2] in it.
        # Store an iterator for ``b_list`` in ``b_iter``.
        # Take 2 items off the iterator by calling ``next()`` on it.
        # ================================
        b_list = [4,3,2]
        b_iter = iter(b_list)
        self.assertEquals(next(b_iter), 2)
        self.assertRaises(StopIteration, next, b_iter)

if __name__ == '__main__':

Blog hits 100 000 visits

I am super stoked about the tremendous liking geeks, developers and all techies have found useful. I have tried to keep the posts current and give as much info on each tutorial so you can try it at home/work.

My plan for this year is to increase the amount of posts and also make the posts more relevant so you can get more value from it. My mission is to become a well rounded software developer and architect. I am passionate about software and love to work on new projects.

I want to thank everyone that commented on my blog, sent me a email or just connected with me on LinkedIn. I hope I could help and answer your questions. If there is anything specifically you want me to blog about please send me a email.

Salesforce Integration with Pervasive Data Cloud, WebSphere Cast Iron, Informatica Cloud Services and Dell Boomi

The need for integration

With the accelerating growth of Cloud Computing, many companies are getting stuck in migrating, integrating and extracting information from their on-premise to their cloud environment and vice versa.

Given these concerns regarding integration, it is not surprising that many IT organizations have felt hesitation if not outright rejection of Cloud applications. According to a Gartner study conducted in 2009 on why many CIO’s were actually transitioning away from Cloud deployments, 56% responded it was due to the impact of integration requirements on their systems. So, we can see the complexity of integrating applications has obviously been a driving factor for adoption and implementation of Cloud solutions.

To realize the full power of force.com, integration with on-premise databases and other Cloud-based solutions must be simple yet complete.  It must be possible to implement solutions in days, not weeks or months. At the same time, the solution needs the sophistication required to harmonize business processes across multiple cloud and on-premise applications. The integration solution should be able to run anywhere, connect applications deployed anywhere, be managed from anywhere and require limited specialist integration skills or IT infrastructure. These solutions must be easily configurable, flexible and scalable, meaning no coding.

Integration challenges

Below are some of the integration challenges that customers face when integrating on-premise applications with Cloud applications:

  • Firewall mediation – How do you open up the firewall for integrating force.com applications with on-premise apps?
  • Security – How do you encrypt and otherwise protect sensitive information, stored or on the move, Cloud or on-premise?
  • Semantic mediation – How do you account for differences in data structure between the source and target?
  • Performance required – How fast do you need to move data and how quickly does data transformation and routing mechanisms need to function?
  • Data integrity – How do you make sure the right data is delivered to the right target at the right time? This includes ensuring that the data is clean when it arrives at the target database.
  • Maintenance and upgrades – How do you support new Cloud or enterprise system interfaces as they evolve?
  • Governance – How do you monitor all points of integration and log the data being synchronized?

Integration options to consider

The following options are available:




Custom development

If an organization has enough IT resources and programmers to create a one-off custom integration, this can often be a viable solution.

A number of resource-intensive hidden costs in maintenance, support, and any future changes should the need arise to grow the solution to integrate more applications.


On-demand solution specializing in cloud-to-cloud connectivity

Low-cost alternative for simple cloud integration projects.

Scalability and functionality to address on-premise or hybrid scenarios. Pure on-demand point solutions are not equipped to handle complex processes and back office applications.

Traditional on-premise solution

Based on a more classic ETL architecture, designed for extracting, processing and storing large quantities of data.

Longer install and implementation time as well as a much larger IT footprint. Companies may end up purchasing and maintaining two or more complex systems to solve one problem.

Specific integration tool examples

The following are examples of the force.com integration tools available:

  • Pervasive Data Integrator
  • WebSphere Cast Iron Appliance
  • Informatica Cloud Services
  • Dell Boomi

Pervasive Data Cloud

Pervasive offers highly efficient and reliable force.com integration with ERP, HR and other systems without custom coding, extensive software libraries or a big price tag.

Pervasive can connect force.com  to all your data – integrate with your accounting, ERP, SaaS, MIS and any other mission critical business application. Pervasive can connect/migrate/integrate your data and provide flexible delivery options.

When to choose Pervasive:

  • Data migration
  • Batch/Real-time processing integration
  • Advanced workflow

When not to choose Pervasive:

  • Limited deployment models  as Pervasive cloud offering is not multi-tenant
  • Limited design capability in the cloud and also no secure connector to on-premise applications.
  • Limited API’s made available for 3rd party product integration.

WebSphere Cast Iron

Many enterprises need to synchronize a master list of their current customers, products, prices and all their transaction history between force.com and corporate on-premise systems. WebSphere Cast Iron integration for force.com is a fast, simple solution specifically for integrating force.com with other applications. With a few clicks and configurations you can migrate and integrate your current on-premise applications with force.com.

WebSphere Cast Iron has multiple force.com adapters that provide the capability to migrate, integrate or extract information in a fast and secure manner. WebSphere Cast Iron provides the following options for data migration and data quality:

  • Data profiling – Asses the quality of the data before migrating it.
  • Data cleansing – Remove duplicates from various sources and setup validation rules.
  • Data enrichment – Perform lookups with external providers to enrich data.

It provides the following options for integration and extraction.

  • Connectivity – Configurable connectivity between on-premise applications and Sales force.
  • Transformation – Drag and drop user interface for data transformation.
  • Workflow – Visual interface for designing workflow rules.
  • Management – Easy manageability through a single web-based console.

When to use WebSphere Cast Iron:

  • Implementing multiple deployment models
  • Batch/Real time process integration
  • Enterprise connectivity – multiple enterprise adapters for popular enterprise software.
  • UI Mashup
  • Template Development Kit

When not to use WebSphere Cast Iron:

  • Limited data migration functionality  which can only migrate data to and from specific software and not all
  • Limited data quality functionality to clean data to and from the cloud

Informatica Cloud Services

Informatica Cloud Services are specifically designed to meet the data integration needs of line-of-business users requirements. Informatica Cloud Services are based entirely “in the cloud” which makes integrating cloud-based to force.com data quick and easy. It can also be used to synchronize and replicate data between local databases and files.

Informatica Cloud Services can be used to integrate SaaS applications with a variety of common on-premise systems and databases. The tools to build a cloud service requiring very little training to set up and administer.

When to use Informatica Cloud Services:

  • Batch processing
  • High availability
  • Web Service Integration
  • Advance Workflow

When not to use Informatica Cloud Services:

  • Real-time integration is needed
  • Security agent is necessary for secure cloud connection.
  • Multiple deployment models
  • UI Mashup
  • Multiple environments

Dell Boomi

The Dell IT group used the Boomi AtomSphere® application to unify the force.com .com instances, enabling fully integrated and synchronized customer information across sales groups and businesses processes.

When you choose Dell Boomi you get:

  • Batch Process Integration
  • Real-time Process Integration
  • Data Migration
  • Advance Workflow
  • Web Services
  • High Availability

Integration Vendor Comparison Chart

This table below gives a breakdown comparison of the different force.com Integration software’s that is available:

Capabilities Pervasive WebSphere Cast Iron Informatica Cloud Services Boomi

Multiple Deployment Models


Data Migration

 x  x x x

Batch Process Integration

x x x x

Real-time Process Integration

x  x x x


x x  x

Enterprise Connectivity

x x  x

Data Quality

x x x x

UI Mashup

x x  x

Multiple environments

 x x

Template Development Kit


Web Service API Gateway

 x x  x

Management APIs

 x  x

High Availability

 x x x x

What to look for when evaluating a cloud integration solution

Below are some of the evaluating factors that need to be considered when choosing an integration solution:

Cloud Area

Pervasive Data Cloud

WebSphere Cast Iron

Informatica Cloud Service


Design in the Cloud

 x x x

Manage in the Cloud

 x x x  x

Run in the Cloud

 x x  x x

Multi-tenant Cloud platform

 x x x

Cloud to on-premise data

 x  x

APIs for the Cloud

x  x

Connector kit

 x x


To get the most value out of your current IT investments, you need to be able to integrate existing applications with force.com applications.

The solutions described above meet the need of today’s businesses by providing a simplified, fast, and low-cost approach to integration projects, with the flexibility to deploy integrations in the cloud or on premise, and the option to change form factors if needed.

Visit to Computer History Museum

This weekend I decided to cycle down the road and visit the Computer History Museum in Mount View. Wow what a great place show casting some of the oldest computers. One that amazed me was the Babbage difference engine No. 2. It was designed by Babage almost 100 years ago stored in a museum in London without ever being built. Till a few years ago when a few historians decided to build the design.

See the clip of it in action:


Some cool photos from the museum:

Really a place to visit if you are a geek or interested learning about the evolution of the computer.

WebSphere Application Server Deployment Strategies

If you are thinking of deploying WAS clusters or migrating cluster there are 5 ways you can do it. Below I will look at the 5 strategies and which one is best for each scenario.

Strategy 1: Side by Side – Create a new cell and populate with scripts or manually. Best results with a comprehensive set of scripts or tools for configuration automation.

Strategy 2: In Place – Copy and replace the cell – Recreates the exact v6.x/v7.0 configuration in v8.0. The deployment manager is migrated to the new cell.

Strategy 3: In Place – Copy and replace the DMgr – Recreates the exact v6.x/v7.0 configuration in v8.0. Add new v8.0 nodes and move incrementally by adding new node to deployment manager and replaying old node.

Strategy 4: In Place – Copy and coexist – Recreates the exact v6.x/v7.0 configuration in v8.0Modify the ports in the new cell and coexist.

Strategy 5: Side by Side – Fine Grained – Create a new cell and incrementally copy existing configuration. Uses an intermediate profile, runtime migration and PBC tools

Starting, Interacting and Examples of HDFS

This is a follow up on the previous post:

Starting Hadoop Distributed File System

Now we will format the file system we just configured. Important: This process should only be performed once

bin/hadoop namenode -format

Now we can start the the HDFS


The start-dfs.sh will start the NameNode server on the master machine and also starts the DataNode on the slave machines. Note: in a single instance the NameNode and DataNode will have the same name. In a clustered HDFS you will have to ssh into each slave and start the DataNode.

Interacting with Hadoop Distributed File System

The bulk of commands that communicate with the cluster are performed by a monolithic script named bin/hadoop. This will load the Hadoop system with the Java virtual machine and execute a user command. The commands are specified in the following form:

bin/hadoop moduleName -cmd args...

moduleName : subset of Hadoop to use

cmd : module to execute

Exaples of Hadoop Distributed File System

Listing file :

 bin/hadoop dfs -ls /

Insert Data into the Cluster (3 steps):
Step 1. Create hadoop user

bin/hadoop dfs -mkdir /user/username

Step 2: Put file to cluster

bin/hadoop dfs -put /home/username/DataSet.txt /user/username/

Step 3: Verify the file is in HDFS

dfs -ls /user/yourUserName

Uploading multiple files at once (specify directory to upload):

bin/hadoop -put /myfiles /user/username

Note: Another synonym for -put is -copyFromLocal.

Display Files in HDFS:

bin/hadoop dfs -cat file

(will display in console)

Copy File from HDFS:

bin/hadoop dfs -get file localfile

Ubuntu Update Manager not working – Failed to download repository information

If your Ubuntu update manager is not working and you get the following message:
“Failed to download repository information”
run the following commands below below:

sudo apt-get clean
cd /var/lib/apt
sudo mv lists lists.old
sudo mkdir -p lists/partial
sudo apt-get clean
sudo apt-get update

Running mrjob on Amazon Elastic MapReduce

Below is the steps to run your first Amazon Elastic Map Reduce on Amazon EC2.

First step is to make sure you have completed the steps specified in my previous post:

Ok let’s start:
Step 1. Create a new file called mrjob.conf. The location of the file is important.

  • The location specified by MRJOB_CONF
  • ~/.mrjob.conf
  • ~/.mrjob (deprecated)
  • mrjob.conf in any directory in PYTHONPATH (deprecated)
  • /etc/mrjob.conf

I created my .mrjob.conf in my /home/thys_michels/.mrjob.conf

Step 2: Below is the mrjob.conf explained. Make sure you comment out all the lines below and also modify parameters where necessary.

Note: Sample of mrjob.conf can be downloaded from: https://github.com/Yelp/mrjob/blob/master/mrjob.conf.example

Below is my .mrjob.conf. See lines in bold is what I have changed. I also comment out allot of the lines.

# This is basically the config file we use in production at Yelp, with some
# strategic edits. ;)
# If you don't have the yaml module installed, you'll have to use JSON instead,
# which would look something like this:
# {"runners": {
# "emr": {
# "aws_access_key_id": "HADOOPHADOOPBOBADOOP",
# "aws_region": "us-west-1",
# "base_tmp_dir": "/scratch/$USER"
# "bootstrap_python_packages": [
# "$BT/aws/python-packages/*.tar.gz"
# ],
# ...
    aws_access_key_id: ### 
    # See Step 3 on how to create an AWS Access key
    # We run on in the west region because we're located on the west coast,
    # and there are no eventual consistency issues with newly created S3 keys.
    aws_region: us-west-1 
    # make sure your keys are created in the same aws_region
   aws_secret_access_key: ### 
    # see step 3 on how to create an access key and access your secret key
    # alternate tmp dir
    base_tmp_dir: /scratch/$USER
    # make sure you have priviliges to /scratch file
    # $BT is the path to our source tree. This lets us add modules to
    # install on EMR by simply dumping them in this dir.
    ## $BT/aws/python-packages/*.tar.gz
    # specifying an ssh key pair allows us to ssh tunnel to the job tracker
    # and fetch logs via ssh
    ec2_key_pair: mrjobkey2 
    ec2_key_pair_file: /home/thys_michels/Documents/mrjobkey2.pem 
    # See Step 4 to create key_pairs
    # use beefier instances in production
    ec2_instance_type: m1.small
    # make sure to change this from c1.xlarge to m1.small if you are running small mapreduce jobs. As you will be charged more for xlarge instance.
    # but only use one unless overridden
    num_ec2_instances: 1
    # use our local time zone (this is important for deciding when
    # days start and end, for instance)
     TZ: America/Los_Angeles 
    # Confirm you your keys and images are created in this TimeZone
    # we create the src-tree.tar.gz tarball with a Makefile. It only contains
    # a subset of our code
    ##python_archives: &python_archives
    ##- $BT/aws/src-tree.tar.gz
    # our bucket also lives in the us-west region
   s3_log_uri: s3://mrbucket1/
   s3_scratch_uri: s3://mrbucket1/tmp/
    # Create these two bucks and one tmp folder inside the bucket. Make sure your bucket is in the same TimeZone as your keys.
    ##setup_cmds: &setup_cmds
    # these files are different between dev and production, so they're
    # uploaded separately. copying them into place isn't safe because
    # src-tree.tar.gz is actually shared between several mappers/reducers.
    # Another safe approach would be to add a rule to Makefile.emr that
    # copies these files if they haven't already been copied (setup_cmds
    # from two mappers/reducers won't run simultaneously on the same machine)
    ##- ln -sf $(readlink -f config.py) src-tree.tar.gz/config/config.py
    ##- ln -sf $(readlink -f secret.py) src-tree.tar.gz/config/secret.py
    # run Makefile.emr to compile C code (EMR has a different architecture,
    # so we can't just upload the .so files)
    ##- cd src-tree.tar.gz; make -f Makefile.emr
    # generally, we run jobs on a Linux server separate from our desktop
    # machine. So the SSH tunnel needs to be open so a browser on our
    # desktop machine can connect to it.
    ssh_tunnel_is_open: true
    ssh_tunnel_to_job_tracker: true
    # upload these particular files on the fly because they're different
    # between development and production
    ##upload_files: &upload_files
    ##- $BT/config/config.py
    ##- $BT/config/secret.py
    # Note the use of YAML references to re-use parts of the EMR config.
    # We don't currently run our own hadoop cluster, so this section is
    # pretty boring.
    base_tmp_dir: /scratch/$USER
    ##python_archives: *python_archives
    ##setup_cmds: *setup_cmds
    ##upload_files: *upload_files
    # We don't have gcc installed in production, so if we have to run an
    # MRJob in local mode in production, don't run the Makefile
    # and whatnot; just fall back on the original copy of the code.
    base_tmp_dir: /scratch/$USER

Step 3: Creating your AWS Access Key. Login to AWS and navigate to My Account > Security Credentials
Click on ‘Create New Access’. It will create an Access Key ID an Secret Access Key. Assign them link follow to your mrjob.config
aws_access_key_id = Access Key ID
aws_secret_access_key = Secret Access Key

Step 4: Create a key value pair: Navigate to AWS Management Console > EC2 tab:
Confirm you are in the right region before you create your key pairs.
Click ‘Create Key Pair’ button and give a name to your key.
Important you will only have once chance to download your key pair. Download .pem after it has been created and safe it somewhere safe.

Specify your pem files in your mrjobs.conf as follow:
ec2_key_pair = name of key value pair
ec2_key_pair_file = location of .pem file

Make sure you have read access on your pem file. Do chmod 0400 if you are not sure.

After you have done all of this it is time to run your mrjob. Use the following command:
python mr_word_freq_count.py log1 -r emr > counts
You will see the following output:

using configs in /home/thys_michels/.mrjob.conf
Uploading input to s3://mrbucket1/tmp/mr_word_freq_count.root.20120127.202504.109284/input/
creating tmp directory /scratch/root/mr_word_freq_count.root.20120127.202504.109284
writing master bootstrap script to /scratch/root/mr_word_freq_count.root.20120127.202504.109284/b.py
Copying non-input files into s3://mrbucket1/tmp/mr_word_freq_count.root.20120127.202504.109284/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-2Y1JXJT4FKQ7Y
Job launched 30.3s ago, status STARTING: Starting instances
Job launched 61.8s ago, status STARTING: Starting instances
Job launched 92.1s ago, status STARTING: Starting instances
Job launched 122.4s ago, status STARTING: Starting instances
Job launched 152.7s ago, status STARTING: Starting instances
Job launched 183.0s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 213.3s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 243.8s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 274.1s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 304.6s ago, status RUNNING: Running step (mr_word_freq_count.root.20120127.202504.109284: Step 1 of 1)
Opening ssh tunnel to Hadoop job tracker
Connect to job tracker at: http://ubuntu:40053/jobtracker.jsp
Job launched 336.3s ago, status RUNNING: Running step (mr_word_freq_count.root.20120127.202504.109284: Step 1 of 1)
 map 100% reduce 100%
Job launched 367.0s ago, status RUNNING: Running step (mr_word_freq_count.root.20120127.202504.109284: Step 1 of 1)
 map 100% reduce 100%
Job completed.
Running time was 52.0s (not counting time spent waiting for the EC2 instances)
Fetching counters from S3...
Waiting 5.0s for S3 eventual consistency
Counters from step 1:
    S3_BYTES_READ: 78
  Job Counters :
    Launched map tasks: 2
    Launched reduce tasks: 1
    Rack-local map tasks: 2
  Map-Reduce Framework:
    Combine input records: 9
    Combine output records: 9
    Map input bytes: 51
    Map input records: 9
    Map output bytes: 78
    Map output records: 9
    Reduce input groups: 6
    Reduce input records: 9
    Reduce output records: 6
    Reduce shuffle bytes: 127
    Spilled Records: 18
Streaming final output from s3://mrbucket1/tmp/mr_word_freq_count.root.20120127.202504.109284/output/
removing tmp directory /scratch/root/mr_word_freq_count.root.20120127.202504.109284
Removing all files in s3://mrbucket1/tmp/mr_word_freq_count.root.20120127.202504.109284/
Removing all files in s3://mrbucket1/j-2Y1JXJT4FKQ7Y/
Killing our SSH tunnel (pid 17859)
Terminating job flow: j-2Y1JXJT4FKQ7Y

Cool you can open your jobtracker as seen in logs: http://ubuntu:40053/jobtracker.jsp
You will see a nice breakdown of your map reduce job: