Category Archives: Coding

Minecraft Crafting Tree in Neo4j – Part 1

In this series of posts, I’m going to try to represent the Minecraft crafting tree in Neo4j so that we can query it and see how we might answer some basic questions like:

  • How much Wood does it take to make a Wooden Plank?
  • What are the set of recipes I need to produce a Wood Sword?
  • What’s the most involved recipe in the game (in terms of production steps)?

Before we go any further, I should point out that:

  • This isn’t necessarily a very good use-case for Neo (for reasons we’ll come to in the last Retrospective post)
  • We’re going to be writing some Javascript to do a bunch of the heavy-lifting towards the end
  • We probably should have skipped the graph database bit and just done it in memory in JS

Still – bit of fun, eh?

Setting the scene

Minecraft is a game with a crafting mechanic at its heart – to make item X, you need 3 of item Y and 1 of item Z.

The items you can craft can then be part of bigger recipes. For example, to make a Wooden Sword we need 1 Stick and 2 Wooden Planks. A Stick requires 2 Wooden Planks to make on its own, and Wooden Planks are made out of Wood which can be cut from trees in the environment.

If you imagine the steps involved in crafting a given item, you might represent it as a graph.

  • Each node is a resource that can be found (a raw material) or constructed from other resources
  • Each edge links a resource to its component parts (where it involves some recipe to make it)

In the above example, we might represent it via :REQUIRES relationships between the item being crafted and its ingredients:

CREATE (wood: Resource { name: 'Wood' })
CREATE (plank: Resource { name: 'Wooden Plank' })
CREATE (stick: Resource {name: 'Stick' })
CREATE (woodsword: Resource { name: 'Wood Sword' })
MERGE (plank)-[:REQUIRES]->(wood)
MERGE (stick)-[:REQUIRES]->(plank)
MERGE (woodsword)-[:REQUIRES]->(stick)
MERGE (woodsword)-[:REQUIRES]->(plank)

Let’s get some data

Note: The remainder of this post deals with scraping a web page to pull the information we need into two CSV files for loading into Neo.

If you just want the data, grab the files from this Gist and continue on to the next post. If you want to go spelunking with Google Chrome Developer Tools then more power to you…

We’ll need some data to work with. minecraftsavant.weebly.com has a fairly well structured table that we can work with that splits out resources into ‘things you can make at a crafting table’ and ‘things you can make in a furness’.

The markup of the site’s a bit sketchy because it’s been created using Weebly’s visual editor, but totally workable. Each craftable item has its own <table class="wsite-multicol-table"> element, and it contains two columns that we’re interested in:

  • The name of the item being crafted
  • The ingredients for the item

We’re going to have to write a bit of script to parse that into something we can turn into a graph, but nothing too crazy. And because this is a hack, we’ll just play around in the Chrome F12 developer tools. The full script is available at the end of the post.

Pulling the table contents

For each table that contains a recipe, the first cell contains the name of the item being made and the second contains its component parts. Since the formatting in different bits of the table varies, we’ll keep it simple and just use the text content of the cells.

var tables = Array.from(document.getElementsByClassName("wsite-multicol-table"));

var recipes = tables.map(t =>
{ 
   var toReturn = {};
   
   var item = t.rows[0].cells[0].innerText.trim();
   var ingredients = t.rows[0].cells[1].innerText.split("\n");

   toReturn.item = item;
   toReturn.ingredientsUnparsed = ingredients.filter(i => i.length > 0).map(i => i.trim());

   return toReturn;
});

We have some data quality issues here though:

  • The item quantity is still in the ingredient name
  • When multiple ingredients are required, the ingredient name has ‘and’ at the end
  • Item names are sometimes pluralised when listed as an ingredient when multiple are required
    • But not always – things like ‘Glass’ are listed as ‘3 Glass’ and not ‘3 Glasses’
  • Item names are pluralised when more than one of them is produced by its recipe (for example, ‘Wooden Planks’)
  • Item name casing is sometimes off – we want to canonicalise to title-case

Let’s fix the quantity and ‘and’ issue first, then work on canonicalising the names of items.

recipes.forEach(r => {
   var extractionRegex = /^([0-9]+)? ?(.+?)( and)?$/;

   // Shamelessly nicked from StackOverflow
   // https://stackoverflow.com/a/4068586/677173
   var fixCasing = s => s.replace(/(\w)(\w*)/g,
        function(g0,g1,g2){return g1.toUpperCase() + g2.toLowerCase();});

   var parsed = [];
   for (var i = 0; i < r.ingredientsUnparsed.length; i++) {
       var match = extractionRegex.exec(r.ingredientsUnparsed[i]);
       if (match) {
	      parsed.push({ qty: (match[1] || 1), item: fixCasing(match[2]) });
       }
   }

   r.ingredients = parsed;
});

Our regex matches any numeric digit string, and then captures the rest of the string (excluding any trailing ‘and’) so that the first match group is the quantity and the second the item name.

We then update each recipe with a new ingredients property, which is an array of objects with a qty and item.

Fixing up pluralisations

Pluralisation’s trickier, so we’ll go with a ‘good enough’ approach. First, which items are pluralised?

recipes.filter(r => r.item.endsWith("s")).map(r => r.item);
(19) ["Wooden Planks", "Sticks", "Torches", "Compass", "Shears", "Arrows", "Leather Leggings", "Iron Leggings", "Gold Leggings", "Diamond Leggings", "Leather Boots", "Iron Boots", "Gold Boots", "Diamond Boots", "Wood Stairs", "Cobblestone Stairs", "Iron Bars", "Pumpkin Seeds", "Melon Seeds"]

While we could blindly strip trailing ‘s’ characters, we’d end up:

  • Breaking ‘Compass’, which would turn into ‘Compas’ – same with Glass -> ‘Glas’
  • Breaking ‘Torches’ which would turn into ‘Torche’

Let’s hard-code those cases, and fix up the rest – this isn’t an exercise in data cleansing, we want to play with a graph.

var depluralise = str => {
    if (!str.endsWith("s") || str == "Compass" || str == "Glass") {
		return str;
    }

	if (str == "Torches") {
  		return "Torch";
    }
	else {
		return str.substring(0, str.length - 1);
    }
};

recipes.forEach(r => r.item = depluralise(r.item));

Lovely – our item names are now canonical, but when they appear in recipes they’re not so let’s go fix that too:

recipes.forEach(r => r.ingredients.forEach(i => i.item = depluralise(i.item)));

Before we spit out a CSV, let’s sanity check our data – aside from raw materials (which aren’t crafted but found), were there any typos in the data set that might screw us up?

new Set(recipes.flatMap(r => r.ingredients.map(i => i.item))
.filter(i => recipes.map(r => r.item).indexOf(i) < 0));

Some of these are raw materials but there’s also two typos in the source data:

  • “Wood Plank” appears as a missing item, because our item name is actually “Wooden Plank” – we’ll need to fix that up.
  • “Two Wooden Slab” appears in the ingredients of a Fence Gate, but our parsing code hasn’t handled the Two = 2 equivalence
recipes
    .forEach(r => r.ingredients.filter(i => i.item == "Wood Plank")
    .forEach(i => i.item = "Wooden Plank"));

recipes
    .forEach(r => r.ingredients.filter(i => i.item == "Two Wooden Slab")
    .forEach(i => { i.item = "Wooden Slab"; i.qty = 2; }));

If we tack in the ‘missing’ items to our recipe item list, we can now produce two CSVs.

// Item list
var itemList = Array.from(new Set(recipes.map(r => r.item).concat(recipes.flatMap(r => r.ingredients.map(i => i.item))))).join("\n");

// Item ingredient connections
var ingredientList = recipes.flatMap(r => r.ingredients.map(i => `${r.item},${i.qty},${i.item}`)).join("\n");

Let’s get them copied and pasted into Notepad and bash some headers on by hand. We’ll use the following headers for our Recipes.csv file:

  • OutputItem
  • Qty
  • InputItem

Our ‘Ingredients’ CSV is just a single column of item names, which we’ll still put a header on of ‘Item’.

We’ll need to run these same steps on the Furnace Recipes page to get the full list of craftable items. This will give two pairs of CSVs, one of the craftable items and one of the forgeable ones. We’ll just concatenate the two sets together for data loading.

Where did we just get to?

We now have:

Next steps

Next time we’re going to load the two CSVs up into Neo4j Desktop and see what we’ve got, and start exploring issues with the data we’ve pulled in so far.

Parsing source maps in .NET

When we minify JavaScript source, or write code in TypeScript or CoffeeScript and compile it down to JavaScript our debugging experience would be difficult without tools that support source maps.

I’m currently modifying Chutzpah to address a tiny gap in its handling of code coverage for generated source files like those output by the TypeScript compiler, and needed exactly that – a way for .NET code to parse a source map file, then query it to find out which original source line numbers map to a generated source line that’s been covered or not by a unit test.

SourceMapDotNet is my initial, bare-bones attempt at a partial port of the the excellent Mozilla source-map library, but intended only to handle that one type of query – not full parsing, definitely not generation.

It’s also up on NuGet.

SonarQube TypeScript plugin

I use SonarQube (live demo) a fair bit to monitor code quality metrics, but there’s no in-built support nor published community plugins for TypeScript analysis – so I’m writing one.

I intend two core features:

  • Measure code quality by running against TsLint
  • Measure unit test coverage by processing an LCOV file

Running an alpha version of SonarTsPlugin against a random TypeScript project from GitHub shows code issues but no code coverage - yet

Running an alpha version of SonarTsPlugin against a random TypeScript project from GitHub shows code issues but no code coverage – yet

The first of those two goals isn’t that far away at all – above is a screenshot from the alpha version running locally. If you’re interested in helping, drop me an email!

gh-ticker – a simple ticker for your public GitHub activity

With a spare weekend I put together the ticker widget you can see at the top of the screen just now – iterating through my most recent GitHub activity items every few seconds.

It is, fittingly, available on GitHub for forking and customisation licensed under the BSD 3-Clause.

How it works

The GitHub API is very straightforward, and data that’s already public (such as what appears on your Public Activity tab) can be accessed without authentication and with JSONP – ideal for client-side hackery.

The widget’s architected as a couple of JS files (taking a dependency on jQuery and Handlebars for now), one which contains Handlebars precompiled templates and the other that makes the API call and renders partials befitting the type of each activity item.

Setting it up’s pretty simple – reference the JS and CSS, make sure Handlebars and jQuery are in there too and then whack a DIV somewhere on your page with id ‘gh-ticker’.

<div id="gh-ticker" data-user="pablissimo" data-interval-ms="5000" />

The user whose data is pulled and the interval between ticker item flips are configurable as data attributes.

The GitHub Events API

The Events API knows about a set number of event types – for each event type, there’s a Handlebars partial. When we’re wondering how to render an item we look up the relevant partial and whack it into the page.

Since that’s a fair few partials (neat for development in isolation, bad for request count overhead) those partials are precompiled using the Handlebars CLI and put into a single gh-templates.js file.

Improvements

The ticker’s very basic – it just hides or shows the items as required, without any pretty transitions. It also takes a dependency on jQuery which it needn’t, since it’s only using it for the AJAX call and element manipulation both of which are easily covered off by existing browser functionality.

Still – it can be easily styled to be fairly unobtrusive and has at least taught me a little about Handlebars.

NRConfig 1.4.0.0 for New Relic released

I’ve spent a little time working on NRConfig, the tool that generates custom instrumentation files for .NET projects using New Relic, after a bug report that pointed out that the tool was unable to run for an assembly for which dependencies weren’t available. This isn’t likely in production code as you’d likely need the dependencies available to run, but can happen when you want to do an offline run of instrumentation generation against a third-party library.

To this end, NRConfig’s been changed pretty substantially under the hood to support alternatives to .NET reflection for discovering instrumentable types, and Microsoft’s Common Compiler Infrastructure (or CCI) library drafted in as the default discovery provider.

CCI’s slower than reflection by quite a margin – it can now take several seconds to produce instrumentation configuration for large or complex assemblies, but I’m hoping to improve that if it becomes a problem.

Also introduced is support for MSBuild in a new NuGet package, NRConfig.MSBuild. This should make generating instrumentation files for your own code a lot less work – simply add the NRConfig.MSBuild package to any project containing code you want to instrument and mark up the assembly, types or methods with [Instrument] attributes to control the output. On build, a custom instrumentation file is generated in your output directory for you to deploy wherever.

Enabling CORS on your ASP.NET output-cached webservice? Don’t forget to change your varyByHeaders…

If you’re enabling CORS on your ASP.NET web service, you’ll be receiving an ‘Origin’ header and outputting an Access-Control-Allow-Origin header if you’re happy to receive the request. If you’re being strict about your access control policy, you’ll be returning the same origin you got rather than * so that the user agent knows to let the call continue.

This poses a bit of an obstacle when combined with ASP.NET Output Caching, as unless you either tell it to vary its output by all headers or explicitly call out the Origin header you may find that accessing your service from two URLs within your cache lifetime period will see one call succeed and the other fail.

The failing call is because the Access-Control-Allow-Origin header’s being sent from the cache, but for the broken site won’t match the Origin that was sent to it and since we’ve not configured output caching to vary by the Origin header it assumes the requests from the two different origins are the same and responds accordingly.So, we just need to tack in the Origin header into our cache configuration’s varyByHeader attribute (separated from other headers with a semicolon, if any others exist) and bingo! The two sites result in correct responses.

 

Hottest stands at the RMW Whisky Fringe 2012

While I’m not sure if I’m going to re-run the Whisky Fringe Tasting Tracker from last year, I saw heatmap.js for the first time the other day and thought it’d be fun to make a Mansfield Traquair heatmap showing dram-sampling by stand. Here’s the result:

Heatmap of drams sampled during the 2012 RMW Whisky Fringe

The 675 samplings recorded by www.wf2012.co.uk over the 2012 Whisky Fringe

Not bad for a first attempt. That’s 675 samplings tracked by stand – of course, some stands had appreciably more drams to sample than others but there were definite hotspots. Given that we have opinion data too, we can also plot the hotspots of most-liked drams:

Positive opinions recorded at each stand during the 2012 Whisky Fringe

Positive opinions recorded at each stand during the 2012 Whisky Fringe – broadly similar but with some interesting detail

If I do run it again this year it’d be great to get heatmap.js combined with the above floorplan image and Pusher for some real-time updates…

Fun with sometimes.rb – in .NET…

Sometimes.rb is a fun set of helpers that give you the ability to express a degree of fuzziness in your Ruby logic. A couple of examples from the docs:

15.percent_of_the_time do
  puts "Howdy, Don't forget to register!"  # be annoying, but only 15% of the time
end
(4..10).times do
  pick_nose  # between 4 and 10 boogers made, it's unpredictable!
end

Given ten minutes and a small Aberlour I thought I’d have a bash at emulating some of it in .NET just for fun:

“Object reference not set to an instance of an object” exception when deploying Azure project

Scenario:

  • Created a new cloud project into which I wanted to deploy an existing bit of code (a new staging service for an existing production system for testing)
  • Right-click Publish… and after a few seconds of thinking deploy fails with NullReferenceException (Object reference not set to an instance of an object exception) and reports of a fatal error but no other diagnostic information

Problem was that the existing service had an HTTPS endpoint defined using a certificate that I’d not uploaded to my brand-new staging service. Deleting the endpoint (or uploading the certificate) does the trick.

Instrument specific types using wildcards with nrconfig

I just pushed version 1.3.0.0 of the NRConfig.Tool NuGet package and the associated project site – the binary is also available as a direct download.

Only two changes:

  • A fix for nested types showing duplicate method signatures in the output XML file
  • Introduction of the /w flag for wildcard matching of type names to be included in the New Relic custom instrumentation file

The /w switch is pretty straightforward – specify one or more wildcard filter strings that identify types to be included in the output file. So, if we had a project using the Repository pattern we could instrument the public methods of all of our concrete repositories:

nrconfig /i MyAssy.dll /f methods+ /w *Repository

which would match any type whose full name ends with Repository. Or we could instrument types in a specific namespace:

nrconfig /i MyAssy.dll /f methods+ /w MyAssy.Utils.* MyAssy.Controller.*

or limit ourselves to specific types:

nrconfig /i MyAssy.dll /f methods+ /w MyAssy.Controller.HomeController