Day 52 of #100daysofnetworks
Cypher Queries: Getting Started with Neo4j Queries
Hello everyone! Today is day 52, and as promised last week, we are going to dive into creating queries for Neo4j! This language is called Cypher, and you can learn more about it at:
Book: Graph Data Processing with Cypher (Excellent book! I am reading it!)
There is plenty for you to read, so we are going to jump into it. Cypher is a query language for graph databases, and we’re going to use it for that.
Today’s article is going to be a combination of show and tell as well as discovery. As mentioned previously, I come from relational databases, so I am learning along with you all. But as I am very experienced in graph analysis and working with databases, I know what I am looking for.
Today, we are going to start at the start. We used Cognee to populate a knowledge graph, and now we’re going to learn about it. We’re going to answer some questions about our Artificial Life Knowledge Graph such as:
What are the node types?
What are the relationship types that exist between nodes?
How do we inspect the “Entity” nodes?
How do we inspect the “Document Chunk” nodes?
What do Entity and Document Chunk nodes look like together?
Can we see the author collaboration networks?
I will use those sections to give this tour. Also, think of this as Knowledge Graph inspection. I am doing this for GraphRAG purposes. I am looking for things to improve, but am also generally curious about graph databases.
The rest of this article will be a show and tell, with screenshots from our Neo4j Aura “Artificial Life” Instance. Thank you to all paying readers, for making this possible.
I promised to keep this simple, but I need to lean forward a bit and include certain filters. I will explain the code.
What are the node types?
First, before querying nodes, I want to know what types of nodes we even have.
MATCH (n)
RETURN labels(n) AS label, count(*) AS count
ORDER BY count DESC;This first query finds and counts all node labels.
You can see that we have 18,258 entity nodes, and 2000 document chunks. That makes sense as this is 10% of a dataset of 20,000 documents of Artificial Life references.
MATCH (n)
RETURN DISTINCT labels(n) AS labelThis shows the same result but without the counts.
MATCH (n)
UNWIND labels(n) AS label
RETURN DISTINCT labelAnd this is even cleaner, showing just the node labels.
This last query was really what I wanted, but the first one showed me useful numbers.
What are the relationship types that exist between nodes?
Like the first node query, this first query gives us counts
MATCH ()-[r]-()
RETURN type(r) AS edgeType, count(*) AS count
ORDER BY count DESC;First, observe this bit:
MATCH ()-[r]-() This is very interesting. We don’t care about the nodes, so we have not created a variable for the nodes. We only set r.
And the query can be made directed, like these two options:
MATCH ()<-[r]-()
RETURN type(r) AS edgeType, count(*) AS count
ORDER BY count DESC;
MATCH ()-[r]->()
RETURN type(r) AS edgeType, count(*) AS count
ORDER BY count DESC;It’s a little confusing at first, but not once you get used to it. () is the cypher way of showing a node. It resembles a circle. Relationships happen BETWEEN nodes.
If I run this:
MATCH ()-[r]-()
RETURN type(r) AS edgeType, count(*) AS count
ORDER BY count DESC;I see this:
Those counts will be useful. And here is how I pull only the distinct relationships:
MATCH ()-[r]->()
RETURN DISTINCT type(r) AS edgeType
ORDER BY edgeTypeNice. No counts! So, now we know about our nodes and their relationships!
How do we inspect the “Entity” nodes?
Here is a simple query to match entities:
MATCH (n:Entity)
RETURN nThat’s too much to be of use.
I would rather see what they are about than see a bunch of dots.
MATCH (n:Entity)
RETURN properties(n)
LIMIT 50If you use properties() with the node variable, you can look inside.
Nice. This is exactly what I wanted. Now I can see the structure that exists in an Entities node.
What do DocumentChunks look like?
How do we inspect the “Document Chunk” nodes? This is useful for inspecting your KG if you are working on GraphRAG.
MATCH (n:DocumentChunk)
RETURN properties(n)
LIMIT 50If I run that, I can see into the document chunks. Here is ONE output, out of 2000.
Neat! We can learn about this particular paper, and we have an arXiv link!
What do Entity and Document Chunk nodes look like together?
Can I join Entity and Document Chunk together, and start to stitch together a tabular layout? Yes, I can! This query is complicated, so don’t worry about it. I just wanted to lean forward and try it!
MATCH (e:Entity)
MATCH (dc:DocumentChunk)
RETURN e.description, apoc.convert.fromJsonMap(dc.text).url AS url, properties(dc)This will show me:
The entity’s description
The arXiv URL
The full context from within DocumentChunk
How does it look?
The above screenshot shows the first record. As I did not add a LIMIT, it fetched everything!
Pause for a Second
This is really great. A few weeks ago, I did not know how to do a simple graph query, and now I do. I can quickly determine the node and edge types, and I can use that information to explore the database and knowledge graph.
If you are coming from SQL, the syntax is a bit strange, but it becomes easy very fast, once you start typing. So, if you are a paying reader, please let me know if you would like access to the database, as I am happy to provide paying readers with access, so that they can explore and learn.
Anyway, we have real information about what is in the database, now! I can do queries, and I can see what is what.
But what I REALLY want to see are these collaboration networks. Authors write with other authors. Let’s extract collaboration networks! Let’s get ambitious!
Can we see the author collaboration networks?
I noticed that one of the relationship types is “authored by”, so let’s try to use that.
MATCH (a)-[r:authored_by]-(b)
RETURN a, r, bThis is looking for a “authored_by” relationship between any two nodes. If I run it, I see this:
Neat! But oh no! I realized that Cognee created a bunch of similar relationship types. This is something to standardize! See, here is another query:
MATCH (a)-[r:written_by]-(b)
RETURN a, r, bAnd likewise, it also found collaboration networks.
But see, that’s a weakness. We only get “written_by” relationships. I found many other similar examples. So, I wrote a query to catch all of them.
MATCH (a)-[r:authored_by|written_by|authored|author_of|author|is_author_of|is_author|authors|_authored_by|co_authored|co_authored_with|co_author|co_authored_by|co_author_of_study|co_authored_paper|co_authored_study_on|co_authored_work|co_authored_study|co_author_of|authored_study|authored_on|authored_work_on|authored_study_about|authored_research_on|authored_on_date|authored_results|wrote|wrote_about|wrote_on|is_written_by|co_written_by]-(b)
RETURN a, r, bAnd that gives me what I want.
I can look closer at the communities!
Excellent! I can see both ‘written_by’ and ‘authored_by’ captured in the graph now. I think it would be ideal for these to be standardized.
Ok, we are out of space! Let’s wrap up for today! We will continue to explore Cypher over the next several days!
Please Support this Work!
I have written over 50 articles for this series. Each one takes about roughly four hours of research, and several pages of writing and editing. Here are some ways you can support the blog!
Please subscribe, if you have not.
BIGGEST HELP: Please consider upgrading if you are a subscriber. Thank you to all current paying subscribers for making this research and development possible!
Please buy my book to understand how I think about Natural Language Processing and Network Science combined.
Feel free to hang out in the comments and have a good time!
We have come so far since the very first day of the very first #100daysofnetworks. I love writing for this series. Thank you for being a part of it!













