NEO4J SEARCH: A* OPTIMIZATION

06 MARCH 2018

Written in 2026, backdated to 2018.

First real performance problem. Vessel tracking with Neo4J hit a wall. Need to find shortest paths in a 13,000-vertex graph. Dijkstra’s search, which Neo4J ships with, slowed to a crawl after 4,000.

Spent better part of the week replacing Dijkstra’s algorithm with A* search—Python shop, junior developer, took more than a day to set up toolchain and build the project. Haversine function as heuristic uses distance between ports as the crow flies to steer search:

private double computeHeuristic(
    final double lat1, final double lon1,
    final double lat2, final double lon2) {
    final int earthRadius = 6371;
    final double kmToNM = 0.539957;

    final double latDistance = Math.toRadians(lat2 - lat1);
    final double lonDistance = Math.toRadians(lon2 - lon1);

    final double a = Math.sin(latDistance / 2) 
        * Math.sin(latDistance / 2)
        + Math.cos(Math.toRadians(lat1)) 
        * Math.cos(Math.toRadians(lat2))
        * Math.sin(lonDistance / 2) 
        * Math.sin(lonDistance / 2);

    final double c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));
    return earthRadius * c * kmToNM;
}

Core search loop updates costs when better path found:

private void updateCosts(
    final int source, final int target,
    final double newCost, final double heuristic) {
    final double oldCost = gCosts.getOrDefault(target, Double.MAX_VALUE);
    if (newCost < oldCost) {
        gCosts.put(target, newCost);
        fCosts.put(target, newCost + heuristic);
        path.put(target, source);
    }
}

Verdict: 300x speedup—scaled to 13,000 vertices.

Upstreamed changes: Neo4J v3.4.0 | Full source.

NOTE: Despite impressive gain, performance horizon visible. Unlikely to scale past 16,000 vertices without caching.