Writing a Neo4J extension

This post is part of a multipart series about creating a graph off all available Maven dependencies.

This article describes the drivers behind and the implementation of the Unmanaged Neo4j Extension written for our Maven graph analysis.

When a the processing cluster is sending the sub graphs that need to be merged into a single database locking issues will occur. We opted to use a queuing mechanism to prevent. From the choices we had to implement this as a component for the Neo4J server we choose to this as a unmanaged extension.

Graph representation

Dependencies can be described as a graph by means of its vertices and edges. The following image show a simple graph with the vertices and edges in the table next to it.

as you can see each of the Vertices has an ID and properties associated with it. The Maven dependencies for a single artifact are stored in a similar fashion. As an example a simple table that describes the artifacts and its dependencies.

For example the following dependency tree: (partial output from a mvn dependency:tree command)

[INFO] +- javax:javaee-api:jar:7.0:provided
[INFO] |  \- com.sun.mail:javax.mail:jar:1.5.0:provided
[INFO] |     \- javax.activation:activation:jar:1.1:provided

would result in the following tables:

For the exchange of data between the Spark based resolvers and the plugin we opted for a Json representation.

JSon format

A (Sub) Graph message is defined as follows and is very similar to the formatting of the previous table.

{
  "vertices": [
    {
      "id": -1624024821,
      "groupId": "com.sun.mail",
      "artifactId": "javax.mail",
      "version": "1.5.0",
      "type": "Jar"
    },
    {
      "id": 1561955024,
      "groupId": "javax.activation",
      "artifactId": "activation",
      "version": "1.1",
      "type": "Jar"
    },
    {
      "id": -1466729092,
      "groupId": "javax",
      "artifactId": "javaee-api",
      "version": "7.0",
      "type": "Jar"
    }
  ],
  "edges": [
    {
      "source": -1466729092,
      "destination": -1624024821,
      "relationType": "Provided"
    },
    {
      "source": -1624024821,
      "destination": 1561955024,
      "relationType": "Compile"
    }
  ]
}

Queueing

In order to decouple the receiving of the subgraphs to a single thread the Java concurrency libraries are used.

Executer service

The executor service is used in a non standard manner. The Executer service is started with a single Thread and a pool size of 1. This ensures that only a single thread can do the actual inserts to the DB.

ExecutorService executorService = new ThreadPoolExecutor(1, 1, 1, TimeUnit.HOURS, queue, new ThreadPoolExecutor.CallerRunsPolicy());

Neo4J extension

The Neo4J extension is the most flexible way to extend the functionality of the server. The main reason for using this type is that we have complete control over the endpoints since its just a REST service that you implement.

This snippet shows the basic implementation of a extension:

@Path( "/dependency" )
public class HelloWorldResource
{
    private final GraphDatabaseService database;

    public HelloWorldResource( @Context GraphDatabaseService database )
    {
        this.database = database;
    }

    @Post
    @Produces( MediaType.TEXT_PLAIN )
    @Consumes(MediaType.APPLICATION_JSON)
    @Path( "/graphs" )
    public Response hello(final DependencyGrap grap )
    {
        //Send data to queue
        return Response.ok().build();
    }
}

Implementation

The implementation of this extension can be found in the Github repository