Unit testing a Java Hadoop job

26 Aug


In my previous post I showed how to setup a complete Maven based project to create a Hadoop job in Java. Of course it wasn’t complete because it is missing the unit test part :-). In this post I show how to add MapReduce unit tests to the project I started previously. For the unit test I make use of the MRUnit framework.

  • Add the necessary dependency to the pom
  • Add the following dependency to the pom:

    <dependency>
       <groupId>org.apache.mrunit</groupId>
       <artifactId>mrunit</artifactId>
       <version>1.0.0</version>
       <classifier>hadoop1</classifier>
       <scope>test</scope>
    </dependency>
    


    This will made the MRunit framework available to the project.

  • Add Unit tests for testing the Map Reduce logic
  • The use of this framework is quite straightforward, especially in our business case. So I will just show the unit test code and some comments if necessary but I think it is quite obvious how to use it.
    The unit test for the Mapper ‘MapperTest':

    package net.pascalalma.hadoop;
    
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mrunit.mapreduce.MapDriver;
    import org.junit.Before;
    import org.junit.Test;
    import java.io.IOException;
    
    /**
     * Created with IntelliJ IDEA.
     * User: pascal
     */
    public class MapperTest {
    
        MapDriver<Text, Text, Text, Text> mapDriver;
    
        @Before
        public void setUp() {
            WordMapper mapper = new WordMapper();
            mapDriver = MapDriver.newMapDriver(mapper);
        }
    
        @Test
        public void testMapper() throws IOException {
            mapDriver.withInput(new Text("a"), new Text("ein"));
            mapDriver.withInput(new Text("a"), new Text("zwei"));
            mapDriver.withInput(new Text("c"), new Text("drei"));
            mapDriver.withOutput(new Text("a"), new Text("ein"));
            mapDriver.withOutput(new Text("a"), new Text("zwei"));
            mapDriver.withOutput(new Text("c"), new Text("drei"));
            mapDriver.runTest();
        }
    }
    

    This test class is actually even simpler than the Mapper implementation itself. You just define the input of the mapper and the expected output and then let the configured MapDriver run the test. In our case the Mapper doesn’t do anything specific but you see how easy it is to setup a testcase.
    For completeness here is the test class of the Reducer:

    package net.pascalalma.hadoop;
    
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
    import org.junit.Before;
    import org.junit.Test;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.List;
    
    /**
     * Created with IntelliJ IDEA.
     * User: pascal
     */
    public class ReducerTest {
    
        ReduceDriver<Text, Text, Text, Text> reduceDriver;
    
        @Before
        public void setUp() {
            AllTranslationsReducer reducer = new AllTranslationsReducer();
            reduceDriver = ReduceDriver.newReduceDriver(reducer);
        }
    
        @Test
        public void testReducer() throws IOException {
            List<Text> values = new ArrayList<Text>();
            values.add(new Text("ein"));
            values.add(new Text("zwei"));
            reduceDriver.withInput(new Text("a"), values);
            reduceDriver.withOutput(new Text("a"), new Text("|ein|zwei"));
            reduceDriver.runTest();
        }
    }
    
  • Run the unit tests it
  • With the Maven command “mvn clean test” we can run the tests:
    Screen Shot 2013-08-23 at 20.12.50

With the unit tests in place I would say we are ready to build the project and deploy it to an Hadoop cluster, which I will describe in the next post.

2 Responses to “Unit testing a Java Hadoop job”

Trackbacks/Pingbacks

  1. Links & reads for 2013 Week 35 | Martin's Weekly Curations - 01/09/2013

    […] Unit testing a Java Hadoop job […]

  2. Run your Hadoop MapReduce job on Amazon EMR | The Pragmatic Integrator - 03/09/2013

    […] For this task I created a new default Maven project. The main class in this project is the one that you can run to initiate the EMR cluster and perform the MapReduce job I created in this post: […]

Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 99 other followers

%d bloggers like this: