I answered a Stack Overflow question the other day about why two queries performed very differently. One involved a chain of OPTIONAL MATCHes, the other a single OPTIONAL MATCH with portions of a pattern separated by commas.
There’s a difference between the two that will almost certainly give you different results.
Patterns in Neo4j
A pattern is a description of a structure in a graph we’re trying to match. Patterns can assign nodes and relationships to variables we use in subsequent processing. They express both the nodes we’re looking for and how they must be related. They’re fundamental to a graph query – and the docs aren’t super-clear on how patterns with commas work.
A pattern might be something like ‘Actors connected to Movies via an ‘ACTED_IN’ relationship’.
Patterns with commas
A MATCH statement where the pattern contains a comma is still one MATCH statement. Every component of the pattern must be satisfied for the MATCH to return anything.
Let’s say we’re using the Movies database, and we want to find the directors of movies Tom Cruise has starred in.
One way is to assign a variable to the Movie node, and use commas:
MATCH (tom: Person { name: 'Tom Hanks' })-[:ACTED_IN]->(m: Movie), (director: Person)-[:DIRECTED]->(m)
RETURN DISTINCT director.name
Even though it might look as though we’re specifying two patterns we’re not: it’s all part of the same pattern. If Neo4j can’t satisfy any portion of the MATCH, it’ll return nothing.
Alternatively we could also use two MATCH statements one after the other:
MATCH (tom: Person { name: 'Tom Hanks' })-[:ACTED_IN]->(m: Movie)
MATCH (m)<-[:DIRECTED]-(director: Person)
RETURN DISTINCT director.name
In this example there’s no difference between the two queries – they’re functionally identical. Both return the same data and have identical EXPLAIN and PROFILE results. For this specific query there’s no difference in the two approaches.
But one situation where there is a big difference between the two is when we’re using OPTIONAL MATCH instead of MATCH.
OPTIONAL MATCH chains
Recall that an OPTIONAL MATCH returns a row if our pattern is satisfied, and NULL if not – it’s basically an OUTER JOIN but expressed in a graph query.
Let’s look at a fairly contrived example using the Movies database. Here, we want every Movie and the list of people who produced it, if any are known. We use OPTIONAL MATCH because not all Movies have producers in our database, but we still want to return the Movie title.
MATCH (m: Movie)
OPTIONAL MATCH (producer: Person)-[:PRODUCED]->(m)
RETURN m.title, collect(producer.name)
// 38 records returned
38 records are returned, one for each Movie in the database, of which 10 have a non-empty collection of producer names.
What if we now also wanted to list the names of anyone who reviewed those films, alongside the producers. We can chain two OPTIONAL MATCH statements for this:
MATCH (m: Movie)
OPTIONAL MATCH (producer: Person)-[:PRODUCED]->(m)
OPTIONAL MATCH (reviewer: Person)-[:REVIEWED]->(m)
RETURN m.title, collect(producer.name), collect(reviewer.name)
We still get 38 records, some of which have a producer (like The Matrix and Jerry Maguire) and some of which have a reviewer (like Jerry Maguire and The Replacements). Most records only have one or the other – a few have both.
You might look at that query and wonder, “can’t we simplify that by doing all the OPTIONAL MATCHes in one go?”. It might look like this:
MATCH (m: Movie)
OPTIONAL MATCH (producer: Person)-[:PRODUCED]->(m),
(reviewer: Person)-[:REVIEWED]->(m)
RETURN m.title, collect(producer.name), collect(reviewer.name)
Well – no, you can’t. The two queries are expressing different things. The commas in the second query’s OPTIONAL MATCH are expressing a single pattern – we need to find a Person who :PRODUCED the movie, and a Person who :REVIEWED the movie to return anything from the optional match at all. If we can’t match any part of the comma-separated portions of the pattern, we’ll return nothing at all.
We no longer return any producer or reviewer information for The Matrix, nor The Replacements. The Matrix has a producer but no reviewer, and the Replacements has a reviewer but no producer.