In this series of posts, I’m going to try to represent the Minecraft crafting tree in Neo4j. In Part 2 we imported some data, explored it a little and noticed some inconsistencies which we fixed up. We found that we can write some simple and interesting queries against it. In this post we’ll try building a shopping list for a Wooden Sword and realise that our data model isn’t right yet – then we’ll try and refactor the model to something more useful.
Note: You can download the state of our database so far via this Gist, which is a Cypher script to be run into a clean graph. You’ll be able to download the state of play after all our refactoring work from this post at the end.

A problem lurks – output quantities
So far our model is good for the dependency chain of Minecraft items (i.e. ‘some number of these things makes up item X’) but we can’t produce a list of required materials to gather for any given Minecraft item – the shopping list of items-and-quantities that we’d have to go find or mine to make it.
Our graph falls down because we’ve made a bogus assumption – that some variable number of input items produce one unit of output.
For instance – a Torch is made of 1x Stick and 1x Coal – if we have one each of one of those, we produce one Torch and all is well.
But while making Sticks requires 2x Wood, we actually output 4x Sticks for every 2x Wood that comes into the crafting bench. Our graph didn’t encode this at all, so we can’t reliably figure out how much Wood we need to produce one Stick. Or three Sticks.
Representation is everything
It’s not even clear how best to represent this. At the minute our :REQUIRES relationship has a qty
attribute detailing how many of the input item it takes to build one unit of the output item. Our current model has
(stick)-[:REQUIRES { qty: 2 }]->(wood)
which is strictly true (in the sense that we can’t make a Sticks until we have 2x Wood) but obviously doesn’t reflect the actual cost of producing a Stick.
Mathematically more accurate would maybe be to allow fractional quantities – 4 Sticks are produced when we put 2x Wooden Planks together, so the cost of one Stick is in some sense 0.5 units of Wood.
(stick)-[:REQUIRES { qty: 0.5 }]->(wood)
But now we’ve lost the concept of how many input items are needed to build the smallest buildable bundle of Sticks – we can’t bring half a Wood unit to the crafting bench, half-units don’t exist at all and even if they did the recipe requires 2 units to kick off. We can’t infer that from the quantity on the relationship.
Let’s explore a couple of refactorings of our model to help.
Option 1 – add outputSize as a property on our Resource nodes
If we could reliably say that Sticks are only ever produced in bundles of 4 then we could express that on the Stick node in our graph and keep representing input quantities how we are just now (i.e. the actual amount required to kick off the crafting). For example:
(stick { outputSize: 4 })-[:REQUIRES { qty: 2 }]->(wood)
Or:
(arrow { outputSize: 4 })-[:REQUIRES { qty: 1 }]->(flint)
Now to figure out how much it’ll cost to make a Torch we could walk the graph and do some maths along the way:
- 1x Coal is just found by mining
- 1x Stick is 0.25 of the
outputSize
of a Stick crafting run. Since we can’t have fractional crafting runs – we either produce 4x Sticks or none at all – we could maybe express the requirement asceil(1 Stick / outputSize) = 1 Sticks
in whatever application logic we have - To make 1x Stick requires 2x Wood
- Thus the minimum requirement is 1x Coal + 2x Wood, and we’ll end up with some Sticks left over at the end
If there are multiple multiple ways to produce the same item then this falls down – and while our graph just now has only one way to produce things, Minecraft does support variations. For example, you can use an Anvil to repair any Sword by combining two of that Sword type into one.
Thus we could make a sword by building it from scratch, or by combining two broken swords into one new one. Our graph above doesn’t let us express those as different options – if we try to represent it naively we’ll make it look as though the cost of making a sword is that of all its raw ingredients, plus another sword (giving us cycles, if nothing else, and meaning we’d never be able to make the things).
So far it looks OK as an approach so long as we’re willing to accept that we can’t model there being multiple ways to make a given item (which for some games is entirely fine).
The following few statements will update our original graph with the output quantities:
// Recipes that produce 2 of something
MATCH (r:Resource)
WHERE r.name in ['Trapdoor', 'Fence', 'Blaze Powder']
SET r.outputQty = 2;
// Recipes that produce 3 of something
MATCH (r:Resource)
WHERE r.name in ['Fire Charge', 'Ladder', 'Paper', 'Sign', 'Bottle']
SET r.outputQty = 3;
// Recipes that produce 4 of something
MATCH (r:Resource)
WHERE r.name in ['Wooden Plank', 'Stick', 'Torch', 'Arrow', 'Stone Brick', 'Smooth Sandstone', 'Wood Stair', 'Cobblestone Stair', 'Bowl', 'Pumpkin Seed']
SET r.outputQty = 4;
// Recipes that produce 6 of something
MATCH (r:Resource)
WHERE r.name in ['Wooden Slab', 'Stone Slab', 'Stone Brick Slab', 'Brick Slab', 'Cobblestone Slab', 'Nether Brick Fence', 'Cobblestone Wall', 'Mossy Cobblestone Wall']
SET r.outputQty = 6;
// Recipes that produce 8 of something
MATCH (r:Resource)
WHERE r.name in ['Cookie']
SET r.outputQty = 8;
// Recipes that product 15 of something
MATCH (r:Resource)
WHERE r.name in ['Iron Bar', 'Glass Pane']
SET r.outputQty = 16;
// Give a default output quantity of 1 for everything else
MATCH (r:Resource)
WHERE r.outputQty IS NULL
SET r.outputQty = 1;
Option 2 – represent a Recipe as a first-class citizen of the graph
Option 1 is trying to work around the fact that our graph’s representing two things in one go:
- The actual items being crafted – like Sticks and Pickaxes
- The way in which those items are crafted – their dependencies
In short, a node in the graph represents both the output item and the recipe to make it, even though they’re two different concerns.
What if Recipes were nodes in our graph?
A Recipe:
- :REQUIRES Resource nodes (where the
qty
attribute expresses the number required to complete the recipe) - :PRODUCES Resource nodes (where the
qty
attribute expresses how many of that Resource are produced for each run of the Recipe)
We could create Recipe nodes for every Resource node that has any inbound :REQUIRES edges – for example. we won’t end up with a Recipe for Wood (because you find it, you don’t craft it) but will have a Recipe for Wooden Planks.
This feels like a more correct approach somehow, so let’s explore how we might implement it.
Introducing Recipe nodes
First let’s create one Recipe node for each Resource node in the graph that has any inbound :REQUIRES edge (i.e. that isn’t a raw material):
MATCH (out: Resource)-[rel:REQUIRES]->(in: Resource)
MERGE (outRecipe: Recipe { name: out.name })
MERGE (outRecipe)-[:REQUIRES { qty: rel.qty }]->(in)
MERGE (outRecipe)-[:PRODUCES { qty: 1 }]->(out)
RETURN out, in, outRecipe
Added 137 labels, created 137 nodes, set 493 properties, created 356 relationships, started streaming 219 records after 98 ms and completed after 115 ms.
We could also delete the original :REQUIRES relationships:
MATCH (:Resource)-[rel:REQUIRES]->(:Resource)
DELETE rel
Completed after 32 ms.
And let’s throw a unique constraint on the new Recipe node’s name
attribute while we’re at it.
CREATE CONSTRAINT ON (recipe: Recipe) ASSERT recipe.name IS UNIQUE
Added 1 constraint, completed after 104 ms.
Querying with Recipe nodes and pure Cypher
Our first stop should be to ask the updated graph how do you make Wooden Planks. But we’ve got a problem with this new approach – by adding in extra hops, we can’t easily use variable length path querying to figure out how to get from Wooden Plank to all its constituent raw materials and the intermediate Recipe nodes.
If we knew ahead of time that there was only one hop, then the following would work:
MATCH path = (r: Resource { name: 'Wooden Plank' })<-[:PRODUCES]-(recipe: Recipe)-[:REQUIRES]->(dep: Resource)
RETURN path
But if we try that trick on, say – a Stick we’ll only get as far as the Wooden Plank requirement and have to make another query to figure out how to make those:
MATCH path = (r: Resource { name: 'Stick' })<-[:PRODUCES]-(recipe: Recipe)-[:REQUIRES]->(dep: Resource)
RETURN path

Before we’d just have said ‘traverse as many :PRODUCES relationship hops as you need’ – recall our Wood Sword example:
MATCH path= (r:Resource { name: 'Wood Sword' })-[:REQUIRES*]->(:Resource)
RETURN path
We can’t use that trick any more – we sort-of want to wrap up the (Resource)<-[:PRODUCES]-(Recipe)-[:REQUIRES]-(Resource)
double-hop and say ‘run that double-hop as many times as you need’ but that can’t really be done.
We could use a chain of OPTIONAL MATCH
es and some clumsy Cypher to do the job if we knew the most hops we’d ever need. For example, we could support an ‘up to 3-hop’ traversal with two OPTIONAL MATCH
clauses.
MATCH (r: Resource { name: 'Wood Sword' })<-[:PRODUCES]-(recipe1: Recipe)-[:REQUIRES]->(dep1: Resource)
OPTIONAL MATCH (dep1)-[:PRODUCES]-(recipe2: Recipe)-[:REQUIRES]->(dep2: Resource)
OPTIONAL MATCH (dep2)-[:PRODUCES]-(recipe3: Recipe)-[:REQUIRES]->(dep3: Resource)
RETURN r, recipe1, dep1, recipe2, dep2, recipe3, dep3

Here we’d just tack in extra OPTIONAL MATCH
clauses for each level of depth we want to support.
Ugly but workable, and pure Cypher.
Querying with Recipe nodes and APOC
Instead, we can use APOC – the apoc.path.subgraphAll
procedure lets us specify a repeating pattern of relationships to navigate and also limit the traversal to a specific min and max depth.
MATCH (r: Resource { name: 'Wooden Plank' })
CALL apoc.path.subgraphAll(r, {relationshipFilter: '<PRODUCES,REQUIRES>', maxLevel: 10}) YIELD nodes, relationships
RETURN nodes, relationships

Let’s break the query down a bit.
First, start with the resource we want to build:
MATCH (r: Resource { name: 'Wooden Plank' })
Next, ask APOC to grab us the subgraph from that node. We start with our resource r
, but what’s that second argument doing?
CALL apoc.path.subgraphAll(r, {relationshipFilter: '<PRODUCES,REQUIRES>', maxLevel: 10}) YIELD nodes, relationships
Here we’re using a map to configure the traversal with two properties:
relationshipFilter
is an APOC filter string to express the relationships we should seek and traversemaxLevel
is the maximum depth fromr
we should traverse to
The documentation’ll do a better job than me, but essentially:
<PRODUCES
– find :PRODUCES relationships going into ourr
nodeREQUIRES>
– from nodes linked via that :PRODUCES relationship (which we know to be Recipe nodes), find the nodes that are related via an outbound :REQUIRES relationship- The comma between the two patterns asks APOC to apply them in sequence – find a :PRODUCES relationship, then expand out from there using :REQUIRES relationships.
APOC will then iterate that process up to 10 times (our maxLevel
parameter), expanding via :PRODUCES, then via :REQUIRES, then via :PRODUCES again until we run out of nodes or hit our limit.
The same query works for Wood Sword:
MATCH (r: Resource { name: 'Wood Sword' })
CALL apoc.path.subgraphAll(r, {relationshipFilter: '<PRODUCES,REQUIRES>', maxLevel: 10}) YIELD nodes, relationships
RETURN nodes, relationships
It’s essentially the same as the other query, but now we’re relying on APOC and not just pure Cypher to do the work, and it isn’t particularly easier to read. Still – it’s good to have options.
Fixing up our output quantities
We still need to make sure that our output quantities are right. So far we have each Recipe requiring the correct quantity of materials through its :REQUIRES relationship to them, but we assumed that each Recipe only :PRODUCES one of that item. That’s not true in a number of cases, so first let’s fix that.
// Recipes that produce 2 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Trapdoor', 'Fence', 'Blaze Powder']
SET relProduces.qty = 2;
// Recipes that produce 3 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Fire Charge', 'Ladder', 'Paper', 'Sign', 'Bottle']
SET relProduces.qty = 3;
// Recipes that produce 4 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Wooden Plank', 'Stick', 'Torch', 'Arrow', 'Stone Brick', 'Smooth Sandstone', 'Wood Stair', 'Cobblestone Stair', 'Bowl', 'Pumpkin Seed']
SET relProduces.qty = 4;
// Recipes that produce 6 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Wooden Slab', 'Stone Slab', 'Stone Brick Slab', 'Brick Slab', 'Cobblestone Slab', 'Nether Brick Fence', 'Cobblestone Wall', 'Mossy Cobblestone Wall']
SET relProduces.qty = 6;
// Recipes that produce 8 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Cookie']
SET relProduces.qty = 8;
// Recipes that produce 16 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Iron Bar', 'Glass Pane']
SET relProduces.qty = 16;
Asking a basic question – how much Wood do two Wooden Planks take to produce?
We know a Wooden Plank recipe requires 1x Wood to kick off, and produces 4x Wooden Plank. In this case then, making 2x Wooden Plank requires we run the Wooden Plank recipe once (as 4 > 2), which requires 1x Wood. We can’t run the Recipe a fractional number of times, we just over-produce.
Download the sample database so far
You can download the state of the ‘with Recipes’ database so far from this Gist – it’s pure Cypher, so just run it on a clean database in Neo4j Desktop (or whatever version you’re using).