Day 54 of #100daysofnetworks
Cypher Queries: Filtering! Getting Signal from Noise!
Hi everybody! Today feels good. I have a more traditional #100daysofnetworks post for today, because the last one was a wild one. In the last article, I mentioned that my own company Verdant Intelligence launched and that its first brand GrooveSeeker is now online.
So, that was a bit chaotic, as I had a lot to say. A lot of really important stuff happening in my life right now, and I need to take care of it. So, please do read that previous article if you are curious about what is going on. I am in business, and I would love to do business!
But today, we are going to do a continuation of day 52 and before, before life threw me a curveball. On day 52, we delved into writing Cypher queries to explore our Artificial Life Knowledge graph.
So, that is a good one to read and catch up on. Today will be a continuation of day 52.
Filtering Datasets
Very often in life, to get to the value, you need to do some kind of filtering. I can’t even imagine how nasty coffee would taste if it were just a mouthful of grit. With data, the signal is almost always hidden within noise, and you need to get to it.
In relational databases, you’ll use a WHERE statement. You might have written a query like this:
select
<fields>
from
<table>
where
<criteria for inclusion>or to be imaginative and concise:
select * from planets where planet = 'neptune'Well, good news: WHERE is used with Cypher as well.
Looking above, in the SQL example, what am I really doing? I am saying:
Show me <fields> (columns, fields, etc)
From <table> (think dataset)
Where <some criteria for inclusion>
Or more specifically:
Show me everything
From table “planets”
Where there is a row or rows that have the ‘planet’ field listed as neptune.
Or more simply: show me everything in this planets table about Neptune.
In that particular hypothetical situation, I am only interested in Neptune. Everything else is noise to me. Neptune is signal. Jupiter, Earth, and everything else in that table is noise. I need the signal.
Cypher Challenge
When I try to learn something, I use interesting data so that the work is interesting to me. I don’t use toy datasets that come with databases. I use datasets that I build, that I have shown how to build throughout this series. That is how I roll. I am curious about what I am curious about.
So, today, we’re going to look at the Artificial Life database again, because it is such a neat dataset to explore.
Likewise, when I am figuring out how to write queries, I don’t write the code from books. I read it to understand how it works, and then I give myself challenges, so that I can use it with datasets that I care about.
Here are the challenges for today’s article’s queries:
Nodes containing a word
Nodes containing the exact match of multiple words together
Edges containing a particular edge type
Ecosystems: contains word (ecosystem + #1)
Ecosystems: exact match of word (ecosystem + #2)
Ecosystem of edge groups
I did this in advance so that I could have a plan before sitting in front of a database that I am still learning to use. It helps me with focus and not getting distracted. That is another useful trick for learning: plan in advance with specific goals.
I used this book to help with my learning. Please check it out!
Let’s Go!
I’m just going to jump in and be fast so we don’t run out of space. The rest of this article will be show and tell and less description.
Challenge: Nodes containing a word
MATCH (n:Entity)
WHERE n.description CONTAINS 'planet'
RETURN nEasy. It’s not a lot to look at, but I can click on any node and expand.
Challenge: Nodes containing the exact match of multiple words together
MATCH (n:Entity {name: 'terrestrial planet finder'})
RETURN nSuccess, but not a lot to look at. I can expand and investigate. Moving on!
Challenge: Edges containing a particular edge type
I used this query on day 52 to see unique edge types:
MATCH ()-[r]->()
RETURN DISTINCT type(r) AS edgeType
ORDER BY edgeTypeAnd that shows many, many different types. For this challenge, I’ll use just one simple edge type.
MATCH (a)-[r:authored_by]-(b)
RETURN a, r, bThere are actually many different synonyms for this, so I will show a way to pull them all in, a bit later, below.
If I run this r:authored_by query, I see an author collaboration network! I can zoom in and look around!
But what if I want to see a particular type of collaboration network? What if I just want to see people who write about planets and outer space?
Challenge: Ecosystems: contains word (ecosystem + #1)
MATCH (a)-[r]-(b)
WHERE a.description CONTAINS 'planet'
RETURN a, r, bThis query simply looks for any relationships starting from a source node that has a description mentioning the word planet.
Nice! That has much more interesting shape than the previous ecosystem. That big cluster at the bottom is interesting. It looks like a lot of people worked on one paper.
Yep, that’s exactly what happened. Look at all the mentions of ‘author’. That’s interesting, making me want to investigate. But let’s keep moving.
Challenge: Ecosystems: exact match of word (ecosystem + #2)
MATCH (a)-[r]-(b)
WHERE a.name = 'neptune'
RETURN a, r, bThis time, we are looking for a node named neptune, and we want to see what it is related to. We are doing an exact match search, not a CONTAINS search.
Cool! This is a much smaller graph, and these nodes can be explored! If we click on ‘planet’ and expand it, I bet something interesting will happen.
Nice. Yep, we can explore these other nodes, the papers behind them, the authors behind them, and so on. We can click on ‘venus’ and follow that path, for instance.
Which didn’t lead too far!
Challenge: Ecosystem of edge groups
We’re about out of space, so I will just show how to do this. If you have a lot of synonym edges, you can pull them together like this:
MATCH (a)-[r:authored_by|written_by|authored|author_of|author|is_author_of|is_author|authors|_authored_by|co_authored|co_authored_with|co_author|co_authored_by|co_author_of_study|co_authored_paper|co_authored_study_on|co_authored_work|co_authored_study|co_author_of|authored_study|authored_on|authored_work_on|authored_study_about|authored_research_on|authored_on_date|authored_results|wrote|wrote_about|wrote_on|is_written_by|co_written_by]-(b)
RETURN a, r, bIdeally, we should fix this. These can be standardized, collapsed. I think that could lead to performance gains, too. We will attempt to do that after we get through this series on Cypher queries. We will eventually create our own KGs from scratch, and optimize them. Please subscribe! The more financial support we have, the more we can do and the more we will all learn.
Challenges Completed Successfully!
Today’s post was more like an old-school 100daysofnetworks post, where I showed my own learning. I hope it was informative! I definitely feel more comfortable and learned a bit!
Please Support this Work!
I have written over 50 articles for this series. Each one takes about four hours of research, and several pages of writing and editing. Here are some ways you can support the blog!
Please subscribe, if you have not.
LET’S DO BUSINESS. Reach out to me if you need data or AI help! Happy to help!
BIGGEST HELP to BLOG: Please consider upgrading if you are a subscriber. Thank you to all current paying subscribers for making this research and development possible!
Please buy my book to understand how I think about Natural Language Processing and Network Science combined.
Feel free to hang out in the comments and have a good time!
We have come so far since the very first day of the very first #100daysofnetworks. I love writing for this series. Thank you for being a part of it!











