Minecraft Crafting Tree in Neo4j – Part 1

In this series of posts, I’m going to try to represent the Minecraft crafting tree in Neo4j so that we can query it and see how we might answer some basic questions like:

  • How much Wood does it take to make a Wooden Plank?
  • What are the set of recipes I need to produce a Wood Sword?
  • What’s the most involved recipe in the game (in terms of production steps)?

Before we go any further, I should point out that:

  • This isn’t necessarily a very good use-case for Neo (for reasons we’ll come to in the last Retrospective post)
  • We’re going to be writing some Javascript to do a bunch of the heavy-lifting towards the end
  • We probably should have skipped the graph database bit and just done it in memory in JS

Still – bit of fun, eh?

Setting the scene

Minecraft is a game with a crafting mechanic at its heart – to make item X, you need 3 of item Y and 1 of item Z.

The items you can craft can then be part of bigger recipes. For example, to make a Wooden Sword we need 1 Stick and 2 Wooden Planks. A Stick requires 2 Wooden Planks to make on its own, and Wooden Planks are made out of Wood which can be cut from trees in the environment.

If you imagine the steps involved in crafting a given item, you might represent it as a graph.

  • Each node is a resource that can be found (a raw material) or constructed from other resources
  • Each edge links a resource to its component parts (where it involves some recipe to make it)

In the above example, we might represent it via :REQUIRES relationships between the item being crafted and its ingredients:

CREATE (wood: Resource { name: 'Wood' })
CREATE (plank: Resource { name: 'Wooden Plank' })
CREATE (stick: Resource {name: 'Stick' })
CREATE (woodsword: Resource { name: 'Wood Sword' })
MERGE (plank)-[:REQUIRES]->(wood)
MERGE (stick)-[:REQUIRES]->(plank)
MERGE (woodsword)-[:REQUIRES]->(stick)
MERGE (woodsword)-[:REQUIRES]->(plank)

Let’s get some data

Note: The remainder of this post deals with scraping a web page to pull the information we need into two CSV files for loading into Neo.

If you just want the data, grab the files from this Gist and continue on to the next post. If you want to go spelunking with Google Chrome Developer Tools then more power to you…

We’ll need some data to work with. minecraftsavant.weebly.com has a fairly well structured table that we can work with that splits out resources into ‘things you can make at a crafting table’ and ‘things you can make in a furness’.

The markup of the site’s a bit sketchy because it’s been created using Weebly’s visual editor, but totally workable. Each craftable item has its own <table class="wsite-multicol-table"> element, and it contains two columns that we’re interested in:

  • The name of the item being crafted
  • The ingredients for the item

We’re going to have to write a bit of script to parse that into something we can turn into a graph, but nothing too crazy. And because this is a hack, we’ll just play around in the Chrome F12 developer tools. The full script is available at the end of the post.

Pulling the table contents

For each table that contains a recipe, the first cell contains the name of the item being made and the second contains its component parts. Since the formatting in different bits of the table varies, we’ll keep it simple and just use the text content of the cells.

var tables = Array.from(document.getElementsByClassName("wsite-multicol-table"));

var recipes = tables.map(t =>
{ 
   var toReturn = {};
   
   var item = t.rows[0].cells[0].innerText.trim();
   var ingredients = t.rows[0].cells[1].innerText.split("\n");

   toReturn.item = item;
   toReturn.ingredientsUnparsed = ingredients.filter(i => i.length > 0).map(i => i.trim());

   return toReturn;
});

We have some data quality issues here though:

  • The item quantity is still in the ingredient name
  • When multiple ingredients are required, the ingredient name has ‘and’ at the end
  • Item names are sometimes pluralised when listed as an ingredient when multiple are required
    • But not always – things like ‘Glass’ are listed as ‘3 Glass’ and not ‘3 Glasses’
  • Item names are pluralised when more than one of them is produced by its recipe (for example, ‘Wooden Planks’)
  • Item name casing is sometimes off – we want to canonicalise to title-case

Let’s fix the quantity and ‘and’ issue first, then work on canonicalising the names of items.

recipes.forEach(r => {
   var extractionRegex = /^([0-9]+)? ?(.+?)( and)?$/;

   // Shamelessly nicked from StackOverflow
   // https://stackoverflow.com/a/4068586/677173
   var fixCasing = s => s.replace(/(\w)(\w*)/g,
        function(g0,g1,g2){return g1.toUpperCase() + g2.toLowerCase();});

   var parsed = [];
   for (var i = 0; i < r.ingredientsUnparsed.length; i++) {
       var match = extractionRegex.exec(r.ingredientsUnparsed[i]);
       if (match) {
	      parsed.push({ qty: (match[1] || 1), item: fixCasing(match[2]) });
       }
   }

   r.ingredients = parsed;
});

Our regex matches any numeric digit string, and then captures the rest of the string (excluding any trailing ‘and’) so that the first match group is the quantity and the second the item name.

We then update each recipe with a new ingredients property, which is an array of objects with a qty and item.

Fixing up pluralisations

Pluralisation’s trickier, so we’ll go with a ‘good enough’ approach. First, which items are pluralised?

recipes.filter(r => r.item.endsWith("s")).map(r => r.item);
(19) ["Wooden Planks", "Sticks", "Torches", "Compass", "Shears", "Arrows", "Leather Leggings", "Iron Leggings", "Gold Leggings", "Diamond Leggings", "Leather Boots", "Iron Boots", "Gold Boots", "Diamond Boots", "Wood Stairs", "Cobblestone Stairs", "Iron Bars", "Pumpkin Seeds", "Melon Seeds"]

While we could blindly strip trailing ‘s’ characters, we’d end up:

  • Breaking ‘Compass’, which would turn into ‘Compas’ – same with Glass -> ‘Glas’
  • Breaking ‘Torches’ which would turn into ‘Torche’

Let’s hard-code those cases, and fix up the rest – this isn’t an exercise in data cleansing, we want to play with a graph.

var depluralise = str => {
    if (!str.endsWith("s") || str == "Compass" || str == "Glass") {
		return str;
    }

	if (str == "Torches") {
  		return "Torch";
    }
	else {
		return str.substring(0, str.length - 1);
    }
};

recipes.forEach(r => r.item = depluralise(r.item));

Lovely – our item names are now canonical, but when they appear in recipes they’re not so let’s go fix that too:

recipes.forEach(r => r.ingredients.forEach(i => i.item = depluralise(i.item)));

Before we spit out a CSV, let’s sanity check our data – aside from raw materials (which aren’t crafted but found), were there any typos in the data set that might screw us up?

new Set(recipes.flatMap(r => r.ingredients.map(i => i.item))
.filter(i => recipes.map(r => r.item).indexOf(i) < 0));

Some of these are raw materials but there’s also two typos in the source data:

  • “Wood Plank” appears as a missing item, because our item name is actually “Wooden Plank” – we’ll need to fix that up.
  • “Two Wooden Slab” appears in the ingredients of a Fence Gate, but our parsing code hasn’t handled the Two = 2 equivalence
recipes
    .forEach(r => r.ingredients.filter(i => i.item == "Wood Plank")
    .forEach(i => i.item = "Wooden Plank"));

recipes
    .forEach(r => r.ingredients.filter(i => i.item == "Two Wooden Slab")
    .forEach(i => { i.item = "Wooden Slab"; i.qty = 2; }));

If we tack in the ‘missing’ items to our recipe item list, we can now produce two CSVs.

// Item list
var itemList = Array.from(new Set(recipes.map(r => r.item).concat(recipes.flatMap(r => r.ingredients.map(i => i.item))))).join("\n");

// Item ingredient connections
var ingredientList = recipes.flatMap(r => r.ingredients.map(i => `${r.item},${i.qty},${i.item}`)).join("\n");

Let’s get them copied and pasted into Notepad and bash some headers on by hand. We’ll use the following headers for our Recipes.csv file:

  • OutputItem
  • Qty
  • InputItem

Our ‘Ingredients’ CSV is just a single column of item names, which we’ll still put a header on of ‘Item’.

We’ll need to run these same steps on the Furnace Recipes page to get the full list of craftable items. This will give two pairs of CSVs, one of the craftable items and one of the forgeable ones. We’ll just concatenate the two sets together for data loading.

Where did we just get to?

We now have:

Next steps

Next time we’re going to load the two CSVs up into Neo4j Desktop and see what we’ve got, and start exploring issues with the data we’ve pulled in so far.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.