Tag Archives: Hadoop

Running PageRank Hadoop job on AWS Elastic MapReduce

In a previous post I described an example to perform a PageRank calculation which is part of the Mining Massive Dataset course with Apache Hadoop. In that post I took an existing Hadoop job in Java and modified it somewhat … Continue reading

Posted in AWS, Hadoop | Tagged , , , | 4 Comments

Calculate PageRanks with Apache Hadoop

Currently I am following the Coursera training ‘Mining Massive Datasets‘. I have been interested in MapReduce and Apache Hadoop for some time and with this course I hope to get more insight in when and how MapReduce can help to … Continue reading

Posted in Hadoop, MapReduce | Tagged , , | 1 Comment

Running MapReduce Design Patterns on Cloudera’s CDH5

One of the better books I read so far about MapReduce is ‘MapReduce Design Patterns‘ as I mentioned in my previous post. In this post I describe the steps to get started with running the Hadoop source code that goes … Continue reading

Posted in Hadoop | Tagged , , | Comments Off on Running MapReduce Design Patterns on Cloudera’s CDH5

Hadoop and MapReduce Design Patterns

Recently I finished my last project in which I was implementing Mule ESB. This gives me some room in my schedule to dive into the world of Big Data again (more specifically the Hadoop ecosystem). I have looked into this … Continue reading

Posted in Hadoop | Tagged , , , | 2 Comments

Run your Hadoop MapReduce job on Amazon EMR

I have posted a while ago how to setup an EMR cluster by using CLI. In this post I will show how to setup the cluster by using the Java SDK for AWS. The best way to show how to … Continue reading

Posted in AWS, cloud, Hadoop | Tagged , , , | Comments Off on Run your Hadoop MapReduce job on Amazon EMR

Unit testing a Java Hadoop job

In my previous post I showed how to setup a complete Maven based project to create a Hadoop job in Java. Of course it wasn’t complete because it is missing the unit test part :-). In this post I show … Continue reading

Posted in Hadoop, Maven | Tagged , , | 2 Comments

Writing a Hadoop MapReduce task in Java

Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a MapReduce job in Java based on a Maven project like any other Java … Continue reading

Posted in Hadoop | Tagged , | 5 Comments

Hadoop on the Amazon cloud by BigData’University’

Last night I stumbled upon this free online training program where you could learn about how to combine Hadoop and Amazon AWS! You would even receive a certificate if you passed the test at the end of the course. So … Continue reading

Posted in AWS, cloud, Hadoop | Tagged , | 1 Comment

Running Hive jobs on AWS EMR

In a previous post I showed how to run a simple job using AWS Elastic MapReduce (EMR). In this example we continue to make use of EMR but now to run a Hive job. Hive is a data warehouse system … Continue reading

Posted in AWS, cloud | Tagged , , , , | Comments Off on Running Hive jobs on AWS EMR