Minecraft Crafting Tree in Neo4j – Part 2

In this series of posts, I’m going to try to represent the Minecraft crafting tree in Neo4j. In Part 1 we looked briefly at one possible data model and then produced two CSV files ready for import into Neo4j Desktop.

In this post we’re going to do a load of those two CSVs and start exploring the graph that we get as a result.

You can access the CSVs here as part of this Gist.

Note: We’re going to create a database from scratch via those CSVs, then do a couple of data fixes to our database as part of this post.

If you don’t want the step-by-step, you can just directly download the minecraft.cypher file that represents the state of our database by the end of this post.

Data load

Given two CSVs structured as follows:

  • Items.csv – containing a single column of item names
  • Recipes.csv – with three columns
    • OutputItem
    • Qty
    • InputItem

We can do a quick load into Neo4j to see how we’re set.

First, drop the two files into the import folder then do a quick sanity check on both:

LOAD CSV WITH HEADERS FROM 'file:///Items.csv' AS row
RETURN row
LOAD CSV WITH HEADERS FROM 'file:///Recipes.csv' AS row
RETURN row

Our data model will be as we described earlier:

  • Nodes will be labelled with ‘Resource’ for now to cover all bases, but later we’ll maybe refine this out
  • name is the only property of a node so far
  • qty is the only property of a relationship so far
LOAD CSV WITH HEADERS FROM 'file:///Items.csv' AS row
MERGE (r: Resource { name: row.Item })

Added 187 labels, created 187 nodes, set 187 properties, completed after 193 ms.
LOAD CSV WITH HEADERS FROM 'file:///Recipes.csv' AS row
MATCH (rInput: Resource { name: row.InputItem })
MATCH (rOutput: Resource { name: row.OutputItem })
MERGE (rOutput)-[:REQUIRES { qty: toInteger(row.Qty) }]->(rInput)

Set 219 properties, created 219 relationships, completed after 330 ms.
CREATE CONSTRAINT ON (resource: Resource) ASSERT resource.name IS UNIQUE
Added 1 constraint, completed after 79 ms.

Running some queries

What all goes into a Wood Sword?

MATCH path= (r:Resource { name: 'Wood Sword' })-[:REQUIRES*]->(:Resource)
RETURN path

What’s the most convoluted item to craft?

MATCH path = (s:Resource)-[:REQUIRES*]->(e:Resource)
RETURN s, max(length(path))
ORDER BY max(length(path)) DESC, s.name DESC
LIMIT 1

Huh – really?

MATCH path= (r:Resource { name: 'Carrot on a Stick' })-[:REQUIRES*]->(:Resource)
RETURN path

Just graph the whole thing – what requires what?

MATCH path = (:Resource)-[:REQUIRES*]->(:Resource)
RETURN path

One last problem – bricks

Our import isn’t perfect – in fact just by looking around the graph interactively we can see some trouble.

Let’s look at the Brick node:

MATCH path = (:Resource { name: 'Brick' })-[:REQUIRES*]->(:Resource)
RETURN path

Displaying 4 nodes, 3 relationships.

From our Crafting page, we see that Bricks only require Clay Bricks to make:

But we can also make bricks directly from Clay using a furnace:

Surely that can’t be right? From the official documentation you can make Brick (as an item) from Clay in a furnace but you then combine four of those brick materials into a Bricks block.

So our source website isn’t entirely accurate, because its listing for Brick should be Bricks. We can fix up our graph for this instance by using the Clay Brick item when we mean the brick material and Brick when we mean the Brick block. We’ll then rename Brick to match the documentation and force it to be plural.

// First make the Flower Pot be constructed of Clay Brick
MATCH (pot: Resource { name: 'Flower Pot' })-[rel:REQUIRES]->(brick: Resource { name: 'Brick' })
MATCH (clayBrick: Resource { name: 'Clay Brick' })
MERGE (pot)-[:REQUIRES { qty: rel.qty }]->(clayBrick)
DELETE rel
RETURN pot, brick, clayBrick
// Now stop Bricks blocks being made of Clay and Fuel, and point those relationships over to the Clay Brick instead
MATCH (brick: Resource { name: 'Brick' })-[rel:REQUIRES]->(clay: Resource { name: 'Clay' })
MATCH (clayBrick: Resource { name: 'Clay Brick' })
MERGE (clayBrick)-[:REQUIRES { qty: rel.qty }]->(clay)
DELETE rel
RETURN brick, clay, clayBrick
MATCH (brick: Resource { name: 'Brick' })-[rel:REQUIRES]->(fuel: Resource { name: 'Fuel' })
MATCH (clayBrick: Resource { name: 'Clay Brick' })
MERGE (clayBrick)-[:REQUIRES { qty: rel.qty }]->(fuel)
DELETE rel
RETURN brick, fuel, clayBrick

Download the sample database so far

I have exported the state of the database so far to this Gist – it’s pure Cypher, so just run it on a clean database in Neo4j Desktop (or whatever version you’re using).

Download minecraft.cypher.

Next steps

We’ve got something that’s sort-of useful now, but we haven’t yet managed to answer the question of ‘how much Wood does it take to make a Wood Sword‘, or anything similar. We’re going to find that tricky because we’ve forgotten to import some key information from the original data source, and to tidily do that our current representation is going to cause us trouble.

Next time we’ll look at the representation problem, we’ll refactor our graph to make Recipe a first-class citizen and we’ll see what impact that has on our ability to query.

Minecraft Crafting Tree in Neo4j – Part 1

In this series of posts, I’m going to try to represent the Minecraft crafting tree in Neo4j so that we can query it and see how we might answer some basic questions like:

  • How much Wood does it take to make a Wooden Plank?
  • What are the set of recipes I need to produce a Wood Sword?
  • What’s the most involved recipe in the game (in terms of production steps)?

Before we go any further, I should point out that:

  • This isn’t necessarily a very good use-case for Neo (for reasons we’ll come to in the last Retrospective post)
  • We’re going to be writing some Javascript to do a bunch of the heavy-lifting towards the end
  • We probably should have skipped the graph database bit and just done it in memory in JS

Still – bit of fun, eh?

Setting the scene

Minecraft is a game with a crafting mechanic at its heart – to make item X, you need 3 of item Y and 1 of item Z.

The items you can craft can then be part of bigger recipes. For example, to make a Wooden Sword we need 1 Stick and 2 Wooden Planks. A Stick requires 2 Wooden Planks to make on its own, and Wooden Planks are made out of Wood which can be cut from trees in the environment.

If you imagine the steps involved in crafting a given item, you might represent it as a graph.

  • Each node is a resource that can be found (a raw material) or constructed from other resources
  • Each edge links a resource to its component parts (where it involves some recipe to make it)

In the above example, we might represent it via :REQUIRES relationships between the item being crafted and its ingredients:

CREATE (wood: Resource { name: 'Wood' })
CREATE (plank: Resource { name: 'Wooden Plank' })
CREATE (stick: Resource {name: 'Stick' })
CREATE (woodsword: Resource { name: 'Wood Sword' })
MERGE (plank)-[:REQUIRES]->(wood)
MERGE (stick)-[:REQUIRES]->(plank)
MERGE (woodsword)-[:REQUIRES]->(stick)
MERGE (woodsword)-[:REQUIRES]->(plank)

Let’s get some data

Note: The remainder of this post deals with scraping a web page to pull the information we need into two CSV files for loading into Neo.

If you just want the data, grab the files from this Gist and continue on to the next post. If you want to go spelunking with Google Chrome Developer Tools then more power to you…

We’ll need some data to work with. minecraftsavant.weebly.com has a fairly well structured table that we can work with that splits out resources into ‘things you can make at a crafting table’ and ‘things you can make in a furness’.

The markup of the site’s a bit sketchy because it’s been created using Weebly’s visual editor, but totally workable. Each craftable item has its own <table class="wsite-multicol-table"> element, and it contains two columns that we’re interested in:

  • The name of the item being crafted
  • The ingredients for the item

We’re going to have to write a bit of script to parse that into something we can turn into a graph, but nothing too crazy. And because this is a hack, we’ll just play around in the Chrome F12 developer tools. The full script is available at the end of the post.

Pulling the table contents

For each table that contains a recipe, the first cell contains the name of the item being made and the second contains its component parts. Since the formatting in different bits of the table varies, we’ll keep it simple and just use the text content of the cells.

var tables = Array.from(document.getElementsByClassName("wsite-multicol-table"));

var recipes = tables.map(t =>
{ 
   var toReturn = {};
   
   var item = t.rows[0].cells[0].innerText.trim();
   var ingredients = t.rows[0].cells[1].innerText.split("\n");

   toReturn.item = item;
   toReturn.ingredientsUnparsed = ingredients.filter(i => i.length > 0).map(i => i.trim());

   return toReturn;
});

We have some data quality issues here though:

  • The item quantity is still in the ingredient name
  • When multiple ingredients are required, the ingredient name has ‘and’ at the end
  • Item names are sometimes pluralised when listed as an ingredient when multiple are required
    • But not always – things like ‘Glass’ are listed as ‘3 Glass’ and not ‘3 Glasses’
  • Item names are pluralised when more than one of them is produced by its recipe (for example, ‘Wooden Planks’)
  • Item name casing is sometimes off – we want to canonicalise to title-case

Let’s fix the quantity and ‘and’ issue first, then work on canonicalising the names of items.

recipes.forEach(r => {
   var extractionRegex = /^([0-9]+)? ?(.+?)( and)?$/;

   // Shamelessly nicked from StackOverflow
   // https://stackoverflow.com/a/4068586/677173
   var fixCasing = s => s.replace(/(\w)(\w*)/g,
        function(g0,g1,g2){return g1.toUpperCase() + g2.toLowerCase();});

   var parsed = [];
   for (var i = 0; i < r.ingredientsUnparsed.length; i++) {
       var match = extractionRegex.exec(r.ingredientsUnparsed[i]);
       if (match) {
	      parsed.push({ qty: (match[1] || 1), item: fixCasing(match[2]) });
       }
   }

   r.ingredients = parsed;
});

Our regex matches any numeric digit string, and then captures the rest of the string (excluding any trailing ‘and’) so that the first match group is the quantity and the second the item name.

We then update each recipe with a new ingredients property, which is an array of objects with a qty and item.

Fixing up pluralisations

Pluralisation’s trickier, so we’ll go with a ‘good enough’ approach. First, which items are pluralised?

recipes.filter(r => r.item.endsWith("s")).map(r => r.item);
(19) ["Wooden Planks", "Sticks", "Torches", "Compass", "Shears", "Arrows", "Leather Leggings", "Iron Leggings", "Gold Leggings", "Diamond Leggings", "Leather Boots", "Iron Boots", "Gold Boots", "Diamond Boots", "Wood Stairs", "Cobblestone Stairs", "Iron Bars", "Pumpkin Seeds", "Melon Seeds"]

While we could blindly strip trailing ‘s’ characters, we’d end up:

  • Breaking ‘Compass’, which would turn into ‘Compas’ – same with Glass -> ‘Glas’
  • Breaking ‘Torches’ which would turn into ‘Torche’

Let’s hard-code those cases, and fix up the rest – this isn’t an exercise in data cleansing, we want to play with a graph.

var depluralise = str => {
    if (!str.endsWith("s") || str == "Compass" || str == "Glass") {
		return str;
    }

	if (str == "Torches") {
  		return "Torch";
    }
	else {
		return str.substring(0, str.length - 1);
    }
};

recipes.forEach(r => r.item = depluralise(r.item));

Lovely – our item names are now canonical, but when they appear in recipes they’re not so let’s go fix that too:

recipes.forEach(r => r.ingredients.forEach(i => i.item = depluralise(i.item)));

Before we spit out a CSV, let’s sanity check our data – aside from raw materials (which aren’t crafted but found), were there any typos in the data set that might screw us up?

new Set(recipes.flatMap(r => r.ingredients.map(i => i.item))
.filter(i => recipes.map(r => r.item).indexOf(i) < 0));

Some of these are raw materials but there’s also two typos in the source data:

  • “Wood Plank” appears as a missing item, because our item name is actually “Wooden Plank” – we’ll need to fix that up.
  • “Two Wooden Slab” appears in the ingredients of a Fence Gate, but our parsing code hasn’t handled the Two = 2 equivalence
recipes
    .forEach(r => r.ingredients.filter(i => i.item == "Wood Plank")
    .forEach(i => i.item = "Wooden Plank"));

recipes
    .forEach(r => r.ingredients.filter(i => i.item == "Two Wooden Slab")
    .forEach(i => { i.item = "Wooden Slab"; i.qty = 2; }));

If we tack in the ‘missing’ items to our recipe item list, we can now produce two CSVs.

// Item list
var itemList = Array.from(new Set(recipes.map(r => r.item).concat(recipes.flatMap(r => r.ingredients.map(i => i.item))))).join("\n");

// Item ingredient connections
var ingredientList = recipes.flatMap(r => r.ingredients.map(i => `${r.item},${i.qty},${i.item}`)).join("\n");

Let’s get them copied and pasted into Notepad and bash some headers on by hand. We’ll use the following headers for our Recipes.csv file:

  • OutputItem
  • Qty
  • InputItem

Our ‘Ingredients’ CSV is just a single column of item names, which we’ll still put a header on of ‘Item’.

We’ll need to run these same steps on the Furnace Recipes page to get the full list of craftable items. This will give two pairs of CSVs, one of the craftable items and one of the forgeable ones. We’ll just concatenate the two sets together for data loading.

Where did we just get to?

We now have:

Next steps

Next time we’re going to load the two CSVs up into Neo4j Desktop and see what we’ve got, and start exploring issues with the data we’ve pulled in so far.

Graph data modelling – inferred vs explicit categories and labels

When building graph data models we frequently have to deal with a degree of polymorphism for our entities just like the real world.

For instance – I’m a person, but I’m also a parent, a spouse, a sibling, a child, a…

Implicit categorisation

Sometimes the entity categories are entirely defined by relationships to other entities. In most of the above examples, we can categorise me because of how I relate to other people in my family:

  • I’m a husband because I have a ‘married to’ relationship to my wife
  • I’m a sibling because I have a ‘sibling of’ relationship to my brother
  • I’m a child because I have a ‘child of’ relationship to both my mother and father

These categories are all fairly simple one-hop affairs – we can categorise me in different ways by looking at how I’m directly connected to other entities in the graph.

A more involved category is ‘parent’ – in a family tree we could be explicit when dealing with parent/child relationships and add both of them to the graph or just add one (say, :PARENT_OF) and flipping our query around a bit to figure out what should have a ‘child’ category.

These two representations aren’t quite equivalent, but we can answer the same questions with the first as the second by caging everything in terms of the :PARENT_OF relationship

There are disadvantages to the second approach when the relationship is symmetrical (you now have to maintain two relationship types that are logically mutually dependent, additional storage requirements and there’s no longer a canonical way to query the children of a given Person) but it’s still a viable model.

When I say ‘a canonical way to query’ we can consider the question of ‘Who is Tony Stark’s father’. The following two queries work on the right hand graph schema, express the same intent and return the same result:

MATCH (father: Person)-[:PARENT_OF]->(tony: Person { name: 'Tony Stark' })
RETURN father
MATCH (tony: Person { name: 'Tony Stark'})-[:CHILD_OF]->(father: Person)
RETURN father

In the first graph there’s only one way to figure out Tony’s parent, which may be considered an advantage.

Explicit categorisation

There are a couple of situations where we may want to explicitly categorise nodes (via node labels or node properties):

  • The category is not entirely defined by the entity’s labels and how that entity relates to the wider graph
  • On a busy graph we’re frequently interested in just the categories of nodes, without necessarily being interested in how those categories were arrived at (we’ll see an example of this in a bit)

The first is fairly obvious, and we can use an HR example to demonstrate it.

In Big Co, every employee has exactly one manager. Managers can have any number of employees reporting to them, including none. A manager isn’t a specific job title or position – you can manage people as part of your day job, but there are also dedicated staff managers whose only role is line management.

Here it’s easy to think that the ‘manager’ category is dependent on their being ‘REPORTS_TO’ relationships into the node, as follows:

CREATE (gill: Employee { name: 'Gill' })
CREATE (peter: Employee { name: 'Peter' })
CREATE (geoff: Employee { name: 'Geoff' })
MERGE path=(geoff)-[:REPORTS_TO]->(peter)-[:REPORTS_TO]->(gill)
RETURN path

And we can pull a list of managers by just looking for anyone with a :REPORTS_TO into them:

MATCH (:Employee)-[:REPORTS_TO]->(manager: Employee)
RETURN DISTINCT manager.name

manager.name
"Gill"
"Peter"

But we haven’t covered the case where a dedicated staff manager doesn’t have any reports yet. Suppose a new manager called Sandra joins the company reporting to Peter – on their first few days they won’t have any reports as they’re still getting trained up, but they’re still a manager according to our definition.

We now need to explicitly categorise the new node somehow. Either via a label:

CREATE (sandra:Employee:Manager { name: 'Sandra' })
WITH sandra
MATCH (peter:Employee {name: 'Peter' })
MERGE path=(sandra)-[:REPORTS_TO]->(peter)
RETURN path

Or via some property on the Sandra node:

CREATE (sandra:Employee { name: 'Sandra', isManager: true })
WITH sandra
MATCH (peter:Employee {name: 'Peter' })
MERGE path=(sandra)-[:REPORTS_TO]->(peter)
RETURN path

To make sure we get Sandra back in our earlier ‘get all managers’ query we now have a few options. Here’s a couple:

-- Assumes we went with maintaining a 'Manager' label
MATCH (:Employee)-[:REPORTS_TO]->(manager: Employee)
RETURN manager.name
-- Union will distinct for us, so we can remove it from the two RETURNs
UNION MATCH (manager:Manager)
RETURN manager.name

-- Alternate phrasing of the above without the UNION
MATCH (manager:Employee)
WHERE (:Employee)-[:REPORTS_TO]->(manager: Employee)
   OR (manager:Manager)
RETURN DISTINCT manager.name
-- Assumes we went with a node property 'isManager' and that we've indexed it
MATCH (:Employee)-[:REPORTS_TO]->(manager: Employee)
RETURN manager.name
UNION MATCH (manager:Employee { isManager: true })
RETURN manager.name

Dirty third option

There is a fairly dirty third solution here, which is to have a dummy Employee node that represents a placeholder employee – a dummy entry, from which we can create a :REPORTS_TO relationship to Sandra. Now our original inference is correct again (if you have inbound :REPORTS_TO relationships then you’re a manager), but our data model no longer matches the business definition because we may have multiple managers listed for that dummy node (breaking the ‘exactly one manager’ rule). Again, workable around by creating a dummy employee node for each manager who lacks reports.

This option is also problematic because we would need to detach and reattach the dummy node when :REPORTS_TO relationships are created or destroyed, and still have to store the explicit ‘isManager’ flag for that process to reliably work.

It also has shifted the problem somewhere else – we’ve made it easy to get a list of managers, but how do we now get a list of employees while excluding the dummy ones?

Impacts of explicit categorisation

There are several big impacts to explicitly categorising nodes, though ultimately if your business requirement doesn’t allow you to reliably infer categories from relationships you’ve not many choices.

Query complexity and performance

The above, really straightforward query is hard to express succinctly because:

  • Every OR in a query expands our search space and slows us down – we’re no longer just hitting indexes to look things up, and we need to combine result sets to get us to the right answer
  • Neo4j doesn’t have an efficient mechanism to query with a disjunction of node labels – we can’t for example say ‘MATCH (:Employee | :Manager)’. Our ‘alternate’ query above basically does a label disjunction, but relies on every Manager also being an Employee (which we can specify for this case but not in general). For other cases, label disjunctions essentially scan every node in the graph

However, you may find that some queries become quicker because you’re doing less work for the ‘flag’ cases where you just want to know if someone’s a manager, and not who they manage. If you could maintain a :Manager label semi-automatically, those queries become trivial, but that maintenance itself isn’t free and is extra logic for your application to contain.

Cognitive complexity

Our logical definition of ‘Manager’ is now:

  • Has any inbound ‘reports to’ relationships
  • OR: is explicitly flagged as a Manager (via label or property)

This means in any piece of code where we need a list of managers we have to embed that logic into the search. If our definition changes we’d need to update a lot of places at once, and because the logic is probably hard to make performant in general it might be expressed in very different ways in different queries depending on the situation.

We can’t get around the cognitive complexity of what the business defines as a manager, but if we’re automating the maintenance of a :Manager label and then in our schema only ever use :Manager to determine the list of managers, while still using the :REPORTS_TO relationship to find out who reports to each manager then we have a clearer delineation and an easier rule to lint for.

Automatically maintaining the category has plenty of failure modes

Settings labels based on relationships is pretty perilous and really only viable if you can infer the intended label from one or two hops (plus whatever manual flag is being set). We now also need to assess the impact if we screw up or the automated label maintenance fails/is delayed:

  • What happens if we forget to set or remove a label?
  • How do we document that we need to fix up the :Manager category each time we amend :REPORTS_TO relationships?

Automating the category tightly couples parts of your application that needn’t be

Back to our family tree example: let’s say that we want to find all the Uncles in the family tree. The maths for being an uncle is pretty easy – X is an uncle of Z if X is the :SIBLING_OF Y and Y is the :PARENT_OF Z.

That transitive relationship causes us a headache though – now when my brother has a child, the corresponding ‘Add Child’ code has to:

  • Create the child node (we had to do this before too)
  • Create a :PARENT_OF relationship between my brother and the new child (same as before)
  • Find all siblings of my brother and label them as :Uncle

The code that adds children to parents shouldn’t care at all about uncles, aunts, nieces and nephews but now it has to (or has to at least know that there might be downstream impacts) to fix up the graph so the categories match the data.

Upshot

I think from having played around with this both in relational and graph databases I’ve roughly come down on:

  • If the category represents a fundamental classification of an entity that broadly doesn’t change and doesn’t depend on the entity’s place in the wider graph, use a node label/explicit category
  • If the category is defined mostly or entirely by its place in the graph, keep its definition to be relative to the graph – i.e. follow the relationships to answer questions, perhaps with disjunctions for explicit classification flags on the node (the ‘isManager’ example above)
  • If performance becomes an issue then consider automating the maintenance of indexable fields or labels based on whatever logic dictates the classification

Graphs aren’t magic

Ultimately, if you have complicated logic in your business domain to classify entities then that complexity has to go somewhere and graph databases aren’t exempt from that. You can cover it off in your application, or you can make your data model more complex/less representative of the real world but chances are you’ll have the same sorts of hurdles to overcome whether using a graph or SQL.

SonarTsPlugin retired and archived

Some time late 2014 I wrote SonarTsPlugin, which for a few years was one of the only ways to get Typescript analysis into SonarQube. It was:

  • The first time I’d used tslint
  • The first time (and last, though for purely coincidental reasons) I’d written an analyser for SonarQube
  • The most popular thing I’ve committed to GitHub

Over 180 stars, over 100 forks and 8.5k downloads of the most recent version. I’m pretty pleased.

Fortunately, the SonarQube guys wrote their own official Typescript plugin a while back and it’s both stable, well-supported and covers off most everything mine did or can be supplemented by other existing plugins to cover the remaining functionality. Not only does this make my plugin redundant, it also makes it a source of confusion – people come raising issues thinking it’s the official plugin, and I’m not equipped time-wise to give much help (especially since Elspeth was born).

I’ve not been able to make updates to it at all for at least 18 months, which means it’s drifted behind SonarQube upgrades and tslint rule changes.

So as of today the repository has been archived with a note, and it’ll be left up as a reference only.

Taking a career break

You love developing software – the hands-on sketching, writing, debugging and deploying of it, prototyping out ideas and proving out new approaches to problems in code.

The industry has a well-known problem for people like you though – especially in large companies, progression tends to take you away from the hands-on day-to-day development work and towards management, architecture and governance roles.

What’s the career equivalent of feature creep?

For a lot of folk this is perfect. You take on different responsibilities, you’re forced to stretch into roles you’ve never held before and develop skills that are broadly very useful but probably not part of a university CS degree. There’s a catch though – making the step back to deep technical work becomes harder the longer you’re away from it, and it’s easy to find yourself in a position where your CV no longer looks like that of a developer but looks rather more like that of a manager.

So you read Hacker News on the train, you spend evenings and weekends hacking about with Angular or Azure Event Grid or AWS Lambda and you get to keep up to speed with what’s happening around you a bit. Flex your coding muscles while still getting to do all the other management and less hands-on technical bits your work requires. You mightn’t even notice how much professional development you’re doing off the clock.

Enter a new challenger

This is where I was at until our daughter arrived in 2013, I just didn’t know it. She’s amazing – shouty and stompy and with a grin that’d turn the sky blue in a storm but looking after your children is all-consuming and you throw yourself into it entirely.

But there’s no time to tinker with TinkerPop when your little girl’s yelling “SLIDE!” at you and bouncing up and down, there’s sliding to be done! Reading that book on domain-driven design or microservices comes much further down the list than making her favourite dinner. And forget about that Azure Friday backlog you’ve got, you’ve got your own dinner to make once she’s in bed.

All of this has forced me to account for my time a bit more, and realise that I was compensating for spending less time being a developer at work by spending more and more of my spare time doing technical things. That’s fine, obviously – software development’s my hobby as well as my career, but when that time is no longer available to you it’s easy to get resentful or sad about how quickly things are whipping past you. It’s easy to lose your love for what you’re doing.

Taking a step back and some time off

So I decided to take a break, hand my notice in and take three or so months off dedicated to the things I want to do and learn. My last day at work was the end of May, so I’ve been able to focus my time since then on two main areas.

First off – I get to spend a lot more time with my daughter. No commute means I’m there to greet her from nursery, do drop-offs when my wife’s working from home, spend a full extra day a week with her and my wife for the day she’s not at nursery when I’d usually be in the office. If nothing else, seeing that massive grinning loon barrelling toward you from the nursery garden yelling “daddy!” – and with scant regard for her friends as she barges past – has emotionally paid for this time off entirely.

Secondly I get to do focused learning on topics that I have had bookmarked for ages but not gotten around to, where I can take on board a new framework or technology or try to better understand the underpinnings of stuff I’ve already been doing. It’s learning for its own sake, but with a rough direction I’m heading towards.

Learning targets

The rough plan, then is to learn or improve my knowledge in a few specific areas:

React and Redux

I’ve been a full-stack web developer for the past eight or so years (so well over half my career now), but the front-end work has almost always been in Knockout for reasonably good reasons. It’s stable, it works well enough and I’ve been extremely productive with it but new projects rarely start in it. I chose to learn React for a couple of reasons:

  • So that I’ve got a mental model of how a different front-end framework operates
  • To get more exposure to newer client-side build pipeline standards

I’ve chosen Udemy as the basis for this learning, an online course which I’ll be backing up with practical explorations using some of the other tech on my list.

Neo4j Graph Database

At my previous employer I worked on and was responsible for our bespoke institutional investor CRM and marketing platform, which was all built on Oracle and ASP.NET. When you get down to it though, you’re modelling graph-like structures in a relational database – so I spent some time exploring Neo4j as an alternative back-end but in a fairly unstructured fashion.

In my break I’ve already spent a long while doing the Neo4j self-learning courses, brushed up on my Cypher syntax and built out multi-node clusters for testing various scenarios as part of their operations training. I’ve also become a Neo4j Certified Professional to back that up. I plan to explore what the implications would be of having a large-scale CRM built in Neo4j in some of my remaining learning time.

GraphQL and the GRAND Stack

GraphQL’s having a good run at the minute but it’s fairly early doors in terms of tooling and support, especially on the .NET front (when compared with something like Apollo). That’s good for me – I’ll get to learn it while it’s still a bit of a moving target, but it’ll also force me to look broader than my .NET background for back-ends to do that learning.

More interesting for me is the so-called GRAND stack of GraphQL, React, Apollo and Neo4j for building graph-based applications. A lot of CRM data is natively hierarchical or structured as connections between related entities – perfect for a graph database, ideal for GraphQL. I’m not going to have time to deep dive the whole stack, but a working knowledge of all the moving parts is the aim and that Neo4j CRM prototype is intended to be built against some of this stack.

Azure and Serverless

Having worked almost exclusively on on-premise applications for the past handful of years I’ve watched Azure mature and expand as a platform but not managed to keep as current with it as I’d like. Sure, App Service and Azure SQL and Table Storage and Azure Virtual Machines are all essentially unchanged but I’ve not played with Event Grid, nor tried to break CosmosDB, or really gotten into Azure Functions at all.

This’ll be a mix of Azure online learning resources and just trying things out. Ideally I’d be in a position to do a certification at the end of this but that’s a nice-to-have and not the goal.

Life after the career break

I’ve not yet decided what comes after my break. Obviously just finding another job is high up the list, but I’m already more willing to consider things like remote working, or part-time working, or contracting given how vast the benefits to my home life have been just for spending more time with my family.

What’ll be important though is that I get to keep learning as I go, and that there are technical challenges to overcome. While learning and prototyping for its own sake is a wonderful indulgence, for me nothing beats getting an idea out of someone’s head and into production.

SonarTsPlugin 1.0.0 released

In something of a milestone for the project SonarTsPlugin 1.0.0 has been released. While the last blog post that mentioned the plugin had it at v0.3, there have been a great many changes since then to the point that I might as well outline the total feature set:

  • Analyses TypeScript code using tslint, or consumes existing tslint output and reports issues to the SonarQube interface
  • Analyses code coverage information in LCOV format
    • Also supports Angular-CLI output
  • Derives lines-of-code in your TypeScript project
  • Supports user-defined rule breach reporting
  • Supports custom tslint rule specification
  • Compatible with Windows and Linux, supports various CI environments including VSTS
  • Compatible with SonarQube 5.6 LTS and above
  • A demo site exists
  • Sample projects demonstrating setup of the plugin are available

The project readme has fairly detailed information on how to configure the plugin, which I’m shortly to turn into a wiki on GitHub with a little more structure.

The plugin has been downloaded over a thousand times now, and appears to be getting increasing use given the recent trend of issues and activity on the project. Hopefully it’s now in a good place to build upon, with the core functionality done.

The next big milestone is to get the plugin listed on the SonarQube Update Centre, which will require fixing a few code issues before going through a review process and addressing anything that comes out of that. Being on the Update Centre is the easiest way as a developer to consume the plugin and to receive updates, so is a real priority for the next few months.

SonarQube TypeScript plugin 0.3 released and demo site available

I’ve recently made some changes to my SonarQube TypeScript plugin pithily named ‘SonarTsPlugin’ that:

  • Make it easier to keep up to date with changes to TsLint
  • Fix minor bugs
  • Support custom TsLint rules

Download links

Breaking change

In a breaking change, the plugin no longer generates a configuration file for TsLint based on your configured project settings, but instead requires that you specify the location of a tslint.json file to use for analysis via the sonar.ts.tslintconfigpath project-level setting.

There were several reasons for the change as detailed on the initial GitHub issue:

  • The options for any given TsLint rule are somewhat fluid and change over time as the language evolves – either we model that with constant plugin changes, or we push the onus onto the developer
  • Decouples the TsLint version from the plugin somewhat – so long as rules with the same names remain supported, a TsLint upgrade shouldn’t break anything
  • Means your local build process and SonarQube analysis can use literally the same tslint.json configuration

Custom rule support

New to 0.3 is support for specifying a custom rule directory. TsLint supports user-created rules, and several large open-source projects have good examples – in fact, there’s a whole repository of them. You can now specify a path to find your custom rules via the sonar.ts.tslintrulesdir project property.

NCLOC accuracy improvements

A minor defect in NCLOC was fixed, where the inside of block comments longer than 3 lines were considered code.

Demo site

To test the plugin against some larger and more interesting code-bases, there’s a SonarQube 5.4 demo installation with the plugin installed available for viewing. Sadly so far none of the projects I’ve analysed have any major issues versus their custom rule setup…

Future

There remains minor work to do on the plugin, and I’ll keep it up to date with TsLint changes where possible.

Failure to start Xamarin Android Player simulator

On Windows 10, you might find that after installing Xamarin tools and the Xamarin Android Player you can’t launch the simulator from Visual Studio nor the Xamarin Device Manager with the error ‘VBoxManage command failed. See log for further details‘.

xap failure

XAP is automating VirtualBox in the background, but you’ll probably find that you can’t manually start the VM image from there either, but with a more helpful error ‘Failed to open/create the internal network‘ and ‘Failed to attach the network LUN (VERR_INTNET_FLT_IF_NOT_FOUND)‘:

xap failure - vbox

The fix is to edit the connection properties of the VirtualBox Host-Only network adapter (as named in the error message) and make sure that VirtualBox NDIS6 Bridged Networking Driver is ticked. In the example below, even though the installation appeared to go swimmingly it didn’t enable the bridge driver.

xap failure - fix

Tick the box and off you go!

Chutzpah and source maps – more complete TypeScript/CoffeeScript coverage

I spent a lot of time over Christmas contributing to open-source JavaScript unit test runner Chutzpah, and the recent Chutzpah 3.3.0 release includes source-map support as a result.

The new UseSourceMaps setting causes Chutzpah to translate generated source (i.e. JavaScript) code coverage data into original source (i.e. TypeScript/CoffeeScript/whatever) coverage data for more accurate metrics. It also plays well with LCOV support, which I added a while back but only got released as part of 3.3.0.

Chutzpah before sourcemaps

Chutzpah handles recording code coverage using Blanket.js. However, code coverage was always expressed in terms of covered lines of generated JavaScript, and not covered lines of the original language.

This makes code coverage stats inaccurate:

  • There’re likely to be more generated JavaScript lines than source TypeScript/CoffeeScript (skewing percentages for some constructs)
  • The original language might output boilerplate for things like inheritance in each file, which if not used is essentially uncoverable in the generated JavaScript – TypeScript suffers especially from this

UseSourceMaps setting

The new UseSourceMaps setting tells Chutzpah to, when faced with a file called File.js, look for a source map file called File.js.map containing mapping information between File.js and its original source code – likely a TypeScript or CoffeeScript file.

{
 "Compile": {
   "Extensions": [".ts"],
   "ExtensionsWithNoOutput": [".d.ts"],
   "Mode": "External",
   "UseSourceMaps": true
  },
 "References": [
   {"Include": "**/src/**.ts", "Exclude": "**/src/**.d.ts" }
 ],
 "Tests": [
   { "Include": "**/test/**.ts", "Exclude": "**/test/**.d.ts" }
 ]
}

This will only be of use when Chutzpah has been told of original source files using the Compile setting, asked to perform code coverage and source maps exist.

Parsing source maps in .NET

When we minify JavaScript source, or write code in TypeScript or CoffeeScript and compile it down to JavaScript our debugging experience would be difficult without tools that support source maps.

I’m currently modifying Chutzpah to address a tiny gap in its handling of code coverage for generated source files like those output by the TypeScript compiler, and needed exactly that – a way for .NET code to parse a source map file, then query it to find out which original source line numbers map to a generated source line that’s been covered or not by a unit test.

SourceMapDotNet is my initial, bare-bones attempt at a partial port of the the excellent Mozilla source-map library, but intended only to handle that one type of query – not full parsing, definitely not generation.

It’s also up on NuGet.