Tag Archives: AWS EMR

Running PageRank Hadoop job on AWS Elastic MapReduce

In a previous post I described an example to perform a PageRank calculation which is part of the Mining Massive Dataset course with Apache Hadoop. In that post I took an existing Hadoop job in Java and modified it somewhat … Continue reading

Posted in AWS, Hadoop | Tagged , , , | 4 Comments

Run your Hadoop MapReduce job on Amazon EMR

I have posted a while ago how to setup an EMR cluster by using CLI. In this post I will show how to setup the cluster by using the Java SDK for AWS. The best way to show how to … Continue reading

Posted in AWS, cloud, Hadoop | Tagged , , ,

Hadoop on the Amazon cloud by BigData’University’

Last night I stumbled upon this free online training program where you could learn about how to combine Hadoop and Amazon AWS! You would even receive a certificate if you passed the test at the end of the course. So … Continue reading

Posted in AWS, cloud, Hadoop | Tagged , | 1 Comment

(Not so) deep dive on Databases on AWS

Last week I attended to a webinar of AWS for partners called ‘Deep Dive on Databases on AWS’. Although I know my way around in AWS it can never hurt to attend to webinars like these to see if there … Continue reading

Posted in AWS, cloud | Tagged , ,

Running Hive jobs on AWS EMR

In a previous post I showed how to run a simple job using AWS Elastic MapReduce (EMR). In this example we continue to make use of EMR but now to run a Hive job. Hive is a data warehouse system … Continue reading

Posted in AWS, cloud | Tagged , , , ,

Using AWS ElasticMapReduce with the Command Line Interface

In this post I am going to use the AWS MapReduce service (called ElasticMapReduce) by making use of the CLI for EMR. The process of using EMR can be divided in three steps on a high level: set up and … Continue reading

Posted in AWS, cloud | Tagged , , , | 2 Comments