Minecraft Crafting Tree in Neo4j – Part 3

In this series of posts, I’m going to try to represent the Minecraft crafting tree in Neo4j. In Part 2 we imported some data, explored it a little and noticed some inconsistencies which we fixed up. We found that we can write some simple and interesting queries against it. In this post we’ll try building a shopping list for a Wooden Sword and realise that our data model isn’t right yet – then we’ll try and refactor the model to something more useful.

Note: You can download the state of our database so far via this Gist, which is a Cypher script to be run into a clean graph. You’ll be able to download the state of play after all our refactoring work from this post at the end.

A problem lurks – output quantities

So far our model is good for the dependency chain of Minecraft items (i.e. ‘some number of these things makes up item X’) but we can’t produce a list of required materials to gather for any given Minecraft item – the shopping list of items-and-quantities that we’d have to go find or mine to make it.

Our graph falls down because we’ve made a bogus assumption – that some variable number of input items produce one unit of output.

For instance – a Torch is made of 1x Stick and 1x Coal – if we have one each of one of those, we produce one Torch and all is well.

But while making Sticks requires 2x Wood, we actually output 4x Sticks for every 2x Wood that comes into the crafting bench. Our graph didn’t encode this at all, so we can’t reliably figure out how much Wood we need to produce one Stick. Or three Sticks.

Representation is everything

It’s not even clear how best to represent this. At the minute our :REQUIRES relationship has a qty attribute detailing how many of the input item it takes to build one unit of the output item. Our current model has

(stick)-[:REQUIRES { qty: 2 }]->(wood)

which is strictly true (in the sense that we can’t make a Sticks until we have 2x Wood) but obviously doesn’t reflect the actual cost of producing a Stick.

Mathematically more accurate would maybe be to allow fractional quantities – 4 Sticks are produced when we put 2x Wooden Planks together, so the cost of one Stick is in some sense 0.5 units of Wood.

(stick)-[:REQUIRES { qty: 0.5 }]->(wood)

But now we’ve lost the concept of how many input items are needed to build the smallest buildable bundle of Sticks – we can’t bring half a Wood unit to the crafting bench, half-units don’t exist at all and even if they did the recipe requires 2 units to kick off. We can’t infer that from the quantity on the relationship.

Let’s explore a couple of refactorings of our model to help.

Option 1 – add outputSize as a property on our Resource nodes

If we could reliably say that Sticks are only ever produced in bundles of 4 then we could express that on the Stick node in our graph and keep representing input quantities how we are just now (i.e. the actual amount required to kick off the crafting). For example:

(stick { outputSize: 4 })-[:REQUIRES { qty: 2 }]->(wood)

Or:

(arrow { outputSize: 4 })-[:REQUIRES { qty: 1 }]->(flint)

Now to figure out how much it’ll cost to make a Torch we could walk the graph and do some maths along the way:

  • 1x Coal is just found by mining
  • 1x Stick is 0.25 of the outputSize of a Stick crafting run. Since we can’t have fractional crafting runs – we either produce 4x Sticks or none at all – we could maybe express the requirement as ceil(1 Stick / outputSize) = 1 Sticks in whatever application logic we have
  • To make 1x Stick requires 2x Wood
  • Thus the minimum requirement is 1x Coal + 2x Wood, and we’ll end up with some Sticks left over at the end

If there are multiple multiple ways to produce the same item then this falls down – and while our graph just now has only one way to produce things, Minecraft does support variations. For example, you can use an Anvil to repair any Sword by combining two of that Sword type into one.

Thus we could make a sword by building it from scratch, or by combining two broken swords into one new one. Our graph above doesn’t let us express those as different options – if we try to represent it naively we’ll make it look as though the cost of making a sword is that of all its raw ingredients, plus another sword (giving us cycles, if nothing else, and meaning we’d never be able to make the things).

So far it looks OK as an approach so long as we’re willing to accept that we can’t model there being multiple ways to make a given item (which for some games is entirely fine).

The following few statements will update our original graph with the output quantities:

// Recipes that produce 2 of something
MATCH (r:Resource)
WHERE r.name in ['Trapdoor', 'Fence', 'Blaze Powder']
SET r.outputQty = 2;

// Recipes that produce 3 of something
MATCH (r:Resource)
WHERE r.name in ['Fire Charge', 'Ladder', 'Paper', 'Sign', 'Bottle']
SET r.outputQty = 3;

// Recipes that produce 4 of something
MATCH (r:Resource)
WHERE r.name in ['Wooden Plank', 'Stick', 'Torch', 'Arrow', 'Stone Brick', 'Smooth Sandstone', 'Wood Stair', 'Cobblestone Stair', 'Bowl', 'Pumpkin Seed']
SET r.outputQty = 4;

// Recipes that produce 6 of something
MATCH (r:Resource)
WHERE r.name in ['Wooden Slab', 'Stone Slab', 'Stone Brick Slab', 'Brick Slab', 'Cobblestone Slab', 'Nether Brick Fence', 'Cobblestone Wall', 'Mossy Cobblestone Wall']
SET r.outputQty = 6;

// Recipes that produce 8 of something
MATCH (r:Resource)
WHERE r.name in ['Cookie']
SET r.outputQty = 8;

// Recipes that product 15 of something
MATCH (r:Resource)
WHERE r.name in ['Iron Bar', 'Glass Pane']
SET r.outputQty = 16;

// Give a default output quantity of 1 for everything else
MATCH (r:Resource)
WHERE r.outputQty IS NULL
SET r.outputQty = 1;

Option 2 – represent a Recipe as a first-class citizen of the graph

Option 1 is trying to work around the fact that our graph’s representing two things in one go:

  • The actual items being crafted – like Sticks and Pickaxes
  • The way in which those items are crafted – their dependencies

In short, a node in the graph represents both the output item and the recipe to make it, even though they’re two different concerns.

What if Recipes were nodes in our graph?

A Recipe:

  • :REQUIRES Resource nodes (where the qty attribute expresses the number required to complete the recipe)
  • :PRODUCES Resource nodes (where the qty attribute expresses how many of that Resource are produced for each run of the Recipe)

We could create Recipe nodes for every Resource node that has any inbound :REQUIRES edges – for example. we won’t end up with a Recipe for Wood (because you find it, you don’t craft it) but will have a Recipe for Wooden Planks.

This feels like a more correct approach somehow, so let’s explore how we might implement it.

Introducing Recipe nodes

First let’s create one Recipe node for each Resource node in the graph that has any inbound :REQUIRES edge (i.e. that isn’t a raw material):

MATCH (out: Resource)-[rel:REQUIRES]->(in: Resource)
MERGE (outRecipe: Recipe { name: out.name })
MERGE (outRecipe)-[:REQUIRES { qty: rel.qty }]->(in)
MERGE (outRecipe)-[:PRODUCES { qty: 1 }]->(out)
RETURN out, in, outRecipe

Added 137 labels, created 137 nodes, set 493 properties, created 356 relationships, started streaming 219 records after 98 ms and completed after 115 ms.

We could also delete the original :REQUIRES relationships:

MATCH (:Resource)-[rel:REQUIRES]->(:Resource)
DELETE rel

Completed after 32 ms.

And let’s throw a unique constraint on the new Recipe node’s name attribute while we’re at it.

CREATE CONSTRAINT ON (recipe: Recipe) ASSERT recipe.name IS UNIQUE

Added 1 constraint, completed after 104 ms.

Querying with Recipe nodes and pure Cypher

Our first stop should be to ask the updated graph how do you make Wooden Planks. But we’ve got a problem with this new approach – by adding in extra hops, we can’t easily use variable length path querying to figure out how to get from Wooden Plank to all its constituent raw materials and the intermediate Recipe nodes.

If we knew ahead of time that there was only one hop, then the following would work:

MATCH path = (r: Resource { name: 'Wooden Plank' })<-[:PRODUCES]-(recipe: Recipe)-[:REQUIRES]->(dep: Resource)
RETURN path

But if we try that trick on, say – a Stick we’ll only get as far as the Wooden Plank requirement and have to make another query to figure out how to make those:

MATCH path = (r: Resource { name: 'Stick' })<-[:PRODUCES]-(recipe: Recipe)-[:REQUIRES]->(dep: Resource)
RETURN path

Before we’d just have said ‘traverse as many :PRODUCES relationship hops as you need’ – recall our Wood Sword example:

MATCH path= (r:Resource { name: 'Wood Sword' })-[:REQUIRES*]->(:Resource)
RETURN path

We can’t use that trick any more – we sort-of want to wrap up the (Resource)<-[:PRODUCES]-(Recipe)-[:REQUIRES]-(Resource) double-hop and say ‘run that double-hop as many times as you need’ but that can’t really be done.

We could use a chain of OPTIONAL MATCHes and some clumsy Cypher to do the job if we knew the most hops we’d ever need. For example, we could support an ‘up to 3-hop’ traversal with two OPTIONAL MATCH clauses.

MATCH (r: Resource { name: 'Wood Sword' })<-[:PRODUCES]-(recipe1: Recipe)-[:REQUIRES]->(dep1: Resource)
OPTIONAL MATCH (dep1)-[:PRODUCES]-(recipe2: Recipe)-[:REQUIRES]->(dep2: Resource)
OPTIONAL MATCH (dep2)-[:PRODUCES]-(recipe3: Recipe)-[:REQUIRES]->(dep3: Resource)
RETURN r, recipe1, dep1, recipe2, dep2, recipe3, dep3

Here we’d just tack in extra OPTIONAL MATCH clauses for each level of depth we want to support.

Ugly but workable, and pure Cypher.

Querying with Recipe nodes and APOC

Instead, we can use APOC – the apoc.path.subgraphAll procedure lets us specify a repeating pattern of relationships to navigate and also limit the traversal to a specific min and max depth.

MATCH (r: Resource { name: 'Wooden Plank' })
CALL apoc.path.subgraphAll(r, {relationshipFilter: '<PRODUCES,REQUIRES>', maxLevel: 10}) YIELD nodes, relationships
RETURN nodes, relationships

Let’s break the query down a bit.

First, start with the resource we want to build:

MATCH (r: Resource { name: 'Wooden Plank' })

Next, ask APOC to grab us the subgraph from that node. We start with our resource r, but what’s that second argument doing?

CALL apoc.path.subgraphAll(r, {relationshipFilter: '<PRODUCES,REQUIRES>', maxLevel: 10}) YIELD nodes, relationships

Here we’re using a map to configure the traversal with two properties:

  • relationshipFilter is an APOC filter string to express the relationships we should seek and traverse
  • maxLevel is the maximum depth from r we should traverse to

The documentation’ll do a better job than me, but essentially:

  • <PRODUCES – find :PRODUCES relationships going into our r node
  • REQUIRES> – from nodes linked via that :PRODUCES relationship (which we know to be Recipe nodes), find the nodes that are related via an outbound :REQUIRES relationship
  • The comma between the two patterns asks APOC to apply them in sequence – find a :PRODUCES relationship, then expand out from there using :REQUIRES relationships.

APOC will then iterate that process up to 10 times (our maxLevel parameter), expanding via :PRODUCES, then via :REQUIRES, then via :PRODUCES again until we run out of nodes or hit our limit.

The same query works for Wood Sword:

MATCH (r: Resource { name: 'Wood Sword' })
CALL apoc.path.subgraphAll(r, {relationshipFilter: '<PRODUCES,REQUIRES>', maxLevel: 10}) YIELD nodes, relationships
RETURN nodes, relationships

It’s essentially the same as the other query, but now we’re relying on APOC and not just pure Cypher to do the work, and it isn’t particularly easier to read. Still – it’s good to have options.

Fixing up our output quantities

We still need to make sure that our output quantities are right. So far we have each Recipe requiring the correct quantity of materials through its :REQUIRES relationship to them, but we assumed that each Recipe only :PRODUCES one of that item. That’s not true in a number of cases, so first let’s fix that.

// Recipes that produce 2 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Trapdoor', 'Fence', 'Blaze Powder']
SET relProduces.qty = 2;
// Recipes that produce 3 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Fire Charge', 'Ladder', 'Paper', 'Sign', 'Bottle']
SET relProduces.qty = 3;
// Recipes that produce 4 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Wooden Plank', 'Stick', 'Torch', 'Arrow', 'Stone Brick', 'Smooth Sandstone', 'Wood Stair', 'Cobblestone Stair', 'Bowl', 'Pumpkin Seed']
SET relProduces.qty = 4;
// Recipes that produce 6 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Wooden Slab', 'Stone Slab', 'Stone Brick Slab', 'Brick Slab', 'Cobblestone Slab', 'Nether Brick Fence', 'Cobblestone Wall', 'Mossy Cobblestone Wall']
SET relProduces.qty = 6;
// Recipes that produce 8 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Cookie']
SET relProduces.qty = 8;
// Recipes that produce 16 of something
MATCH (:Recipe)-[relProduces:PRODUCES]->(r: Resource)
WHERE r.name in ['Iron Bar', 'Glass Pane']
SET relProduces.qty = 16;

Asking a basic question – how much Wood do two Wooden Planks take to produce?

We know a Wooden Plank recipe requires 1x Wood to kick off, and produces 4x Wooden Plank. In this case then, making 2x Wooden Plank requires we run the Wooden Plank recipe once (as 4 > 2), which requires 1x Wood. We can’t run the Recipe a fractional number of times, we just over-produce.

Download the sample database so far

You can download the state of the ‘with Recipes’ database so far from this Gist – it’s pure Cypher, so just run it on a clean database in Neo4j Desktop (or whatever version you’re using).

Download minecraft-with-recipes.cypher.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.