Calculate PageRanks with Apache Hadoop

18 Feb

hadoopCurrently I am following the Coursera training ‘Mining Massive Datasets‘. I have been interested in MapReduce and Apache Hadoop for some time and with this course I hope to get more insight in when and how MapReduce can help to fix some real world business problems (another way to do so I described here). This Coursera course is mainly focussing on the theory of used algorithms and less about the coding itself. The first week is about PageRanking and how Google used this to rank pages. Luckily there is a lot to find about this topic in combination with Hadoop. I ended up here and decided to have a closer look at this code. Continue reading

Sharing your sources stored in a Git repository

12 Feb

logo-gitI have been using Git for some time now and so far like it a lot. Especially the set up as described by Vincent Driessen in combination with the git-flow (and perhaps even better for lots of Java projects) the Maven implementation of it make using it easy.
However you might always end up in a situation as I did lately that you have to share your sources with someone who doesn’t use Git. In that case there is a simple Git command to help you out. It is the ‘git archive‘ command. You can use it like this:
git archive --format zip --output my-project-sources.zip develop
In this case a zip file ‘my-project-sources.zip’ is created containing all the sources that are in the ‘develop’ branch.
For lots more Git commands see this page and for more general background info about the way Git works see this book.

Making use of the open sources of WSO2 ESB

28 Jan

wso2-logo-e1412323639751When implementing services using the WSO2 stack (or any other open source Java framework) you will sooner or later run into a situation that the framework behaviour doesn’t do what you expect it should do. Or you just want to verify the way a product works. I lately had several of these experiences and I got around it to setup a remote debug session so I could go through the code step-by-step to see what exactly was happening. Of course this only makes sense if you have the source code available (long live open source :-)).
In this post an example with the WSO2 ESB (v 4.8.1) in combination with IntelliJ IDEA. Continue reading

Base64 encoding of binary file content

28 Jan

For testing a base64Binary XML type at one of my projects I needed an example of a base64 encoded file content. There is a simple command for that (at least when you are working on a Mac). For a file called ‘abc.pdf’ the command is:

openssl base64 -in abc.pdf -out encoded.txt

The result is a file ‘encoded.txt’ with a base64 decoded string:
Continue reading

Using your own WSDL with a WSO2 ESB Proxy Service

21 Jan

wso2-logo-e1412323639751It is common practice to use an external XSD file in your WSDL. This way you can easily reuse your XSD at other places. However if you want to use such WSDL in your WSO2 ESB Proxy Service you have to configure the path to the XSD correctly.
This post describes how to set this up. More background info about this can be found here. I created a Multi Module Maven project and added the WSDL artifact and the XSD’s so I got a result like this:
Screenshot at Dec 06 17-36-43
In the WSDL I imported the ‘EchoElements.xsd’ like this:

<?xml version='1.0' encoding='UTF-8' standalone='no'?>
<wsdl:definitions xmlns:soap='http://schemas.xmlsoap.org/wsdl/soap/' 
xmlns:tns='http://echo.pascalalma.net/echoService' xmlns:elm='http://echo.pascalalma.net/elements' 
xmlns:wsdl='http://schemas.xmlsoap.org/wsdl/' xmlns:xsd='http://www.w3.org/2001/XMLSchema' name='EchoWsdl' targetNamespace='http://echo.pascalalma.net/echoService'>
  <wsdl:types>
    <xsd:schema>
       <xsd:import namespace='http://echo.pascalalma.net/elements'
		schemaLocation='../xsd/EchoElements.xsd' />
    </xsd:schema>
  </wsdl:types>
  ...

As you can see I go up one directory and look for the XSD file in the ‘xsd’ folder.
In the corresponding EchoElements.xsd I also import another XSD like this:

  ...
  <xs:import namespace='http://echo.pascalalma.net/types'  schemaLocation='./EchoTypes.xsd' />
  ...

So in the same folder as the ‘parent’ XSD I am looking for the ‘EchoTypes.xsd’ folder.

Now this will translate to the following proxy service configuration :

  <publishWSDL key='gov:wsdl/EchoWsdl.wsdl'>
     <resource location='../xsd/EchoElements.xsd' key='gov:/xsd/EchoElements.xsd'/>
     <resource location='./EchoTypes.xsd' key='gov:/xsd/EchoTypes.xsd'/>
  </publishWSDL>

As you can see the keys defined are pointing to the keys of the artifacts in the registry. Another import thin g to notice is that the location attribute of the resource has to match the defined schema location attribute in the artifact that is importing the resource.
If we for example modify the ‘EchoElements.xsd’ and rewrite the import element to this:

  ...
  <xs:import namespace='http://echo.pascalalma.net/types'  schemaLocation='../xsd/EchoTypes.xsd' />
  ...

It would actually point to the same (physical) location as seen from this XSD however if we deploy this configuration the ESB will throw an error because it won’t be able to match the resource with the defined import:


ERROR - ProxyService Error building service from WSDL
org.apache.axis2.AxisFault: WSDLException (at /wsdl:definitions/wsdl:types/xsd:schema/xs:schema): faultCode=PARSER_ERROR: Problem parsing 'file:../xsd/./EchoTypes.xsd'.: java.io.FileNotFoundException: ../xsd/./EchoTypes.xsd (No such file or directory)
at org.apache.axis2.AxisFault.makeFault(AxisFault.java:430)
at org.apache.axis2.description.WSDL11ToAxisServiceBuilder.populateService(WSDL11ToAxisServiceBuilder.java:397)
at org.apache.synapse.core.axis2.ProxyService.buildAxisService(ProxyService.java:503)
at org.apache.synapse.deployers.ProxyServiceDeployer.deploySynapseArtifact(ProxyServiceDeployer.java:73)
at org.wso2.carbon.proxyadmin.ProxyServiceDeployer.deploySynapseArtifact(ProxyServiceDeployer.java:46)
at org.apache.synapse.deployers.AbstractSynapseArtifactDeployer.deploy(AbstractSynapseArtifactDeployer.java:192)
at org.wso2.carbon.application.deployer.synapse.SynapseAppDeployer.deployArtifacts(SynapseAppDeployer.java:100)
at org.wso2.carbon.application.deployer.internal.ApplicationManager.deployCarbonApp(ApplicationManager.java:251)
at org.wso2.carbon.application.deployer.CappAxis2Deployer.deploy(CappAxis2Deployer.java:114)
at org.apache.axis2.deployment.repository.util.DeploymentFileData.deploy(DeploymentFileData.java:136)
at org.apache.axis2.deployment.DeploymentEngine.doDeploy(DeploymentEngine.java:807)
...

If we now also change the resource declaration in the Proxy Service so it matches the import in the XSD it works again:

  <publishWSDL key='gov:wsdl/EchoWsdl.wsdl'>
     <resource location='../xsd/EchoElements.xsd' key='gov:/xsd/EchoElements.xsd'/>
     <resource location='../xsd/EchoTypes.xsd' key='gov:/xsd/EchoTypes.xsd'/>
  </publishWSDL>

Developing with WSO2

29 Nov

wso2-logo-e1412323639751Since a few months I am back working with WSO2 products. In the upcoming posts I describe some of the (small) issues I ran into and how to solve them.

The first thing I did when setting up my development environment was downloading the Developer Studio (64-bit version) on my Mac. Continue reading

right-pad values with XSLT

19 Oct

In this post an XSLT function that can be used to right-pad the value of an element with a chosen character to a certain length. No rocket science but this might become handy again so by putting it down here I don’t have to reinvent it later. The function itself looks like:

<xsl:stylesheet version="2.0"  xmlns:functx="http://my/functions"
                xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:function name="functx:pad-string-to-length" as="xsd:string">
    <xsl:param name="stringToPad" as="xsd:string?"/>
    <xsl:param name="padChar" as="xsd:string"/>
    <xsl:param name="length" as="xsd:integer"/>
    <xsl:sequence select="
   substring(
     string-join (
       ($stringToPad, for $i in (1 to $length) return $padChar)
       ,'')
    ,1,$length)
"/>
    </xsl:function>

</xsl:stylesheet>
 

Continue reading

Running MapReduce Design Patterns on Cloudera’s CDH5

9 Sep

cloudera-hadoopOne of the better books I read so far about MapReduce is ‘MapReduce Design Patterns‘ as I mentioned in my previous post. In this post I describe the steps to get started with running the Hadoop source code that goes with the book on Cloudera’s latest Hadoop distribution CDH5. I decided to be making use of HDFS and YARN for testing the patterns. Take the following steps to get it all up and running:

  • Get CDH5 and run it
  • Install IntelliJ IDEA
  • Upgrade GIT client
  • Create local directory
  • Checkout source code
  • Install source data
  • Run the job

Continue reading

Hadoop and MapReduce Design Patterns

14 Aug

hadoopRecently I finished my last project in which I was implementing Mule ESB. This gives me some room in my schedule to dive into the world of Big Data again (more specifically the Hadoop ecosystem). I have looked into this subject before which resulted into several blog posts. This time I started with a refresh by taking the online training of AWS: Big Data Technology Fundamentals. It is about MapReduce, Hadoop, Pig and Hive. After this nice online training I started with the Hadoop training of core-servlets. I had to get used to the form and layout of the training but now I have been working with it for a while I realise it contains a lot of information about the way Hadoop works. It comes with a (working!) virtual machine (based on Cloudera’s CDH4) on which Hadoop and the necessary tooling is installed including all training and exercise materials (and solutions).
Paralel to this (low level) training I am going through the book MapReduce Design Patterns. With this book you get a good idea which problems you can manage/solve with MapReduce framework and in what way. Especially the recommendation when not to use a certain pattern can be very handy while working with MapReduce.

Mule ESB, ActiveMQ and the DLQ

27 Jul

Apache-activemq-logoIn this post I show a simple Mule ESB flow to see the DLQ feature of Active MQ in action.
I assume you have a running Apache ActiveMQ instance available (if not you can download a version here). In this example I make use of Mule ESB 3.4.2 and ActiveMQ 5.9.0. We can create a simple Mule project based on the following pom file: Continue reading

Follow

Get every new post delivered to your Inbox.

Join 155 other followers