Day 44 of #100daysofnetworks
GenAI: "Text to Edgelist" Tool
Hi everybody. I am so glad that this blog is back in action, as I have been working on some exciting things, and now I get to talk about it. Related to graph analysis and AI, I have two things going on, simultaneously:
I’ve built a prototype “Text to Edgelist” tool, and in this article I’m going to describe it and how it can be useful.
I’ve built a prototype “Agentic Graph” tool, for exploring and investigating graph networks in your own native language.
These two things are using two different approaches. The first is using a pretty vanilla prompt-based approach with AI, and the second is relying on AI agents. So, if you want to learn AI engineering, you are in the right place, and you should subscribe to this blog. Because I will show how to use both approaches.
Deepnote is Awesome!
In the last year, I have made one very significant change to how I do analysis. In the past, I’ve used Jupyter, and then if I needed to collaborate, I’d use Google Colab. However, Google Colab is annoying and simplistic, in my opinion. It bothers me to have to reupload files every time a notebook shuts down, and the collaboration capabilities are inadequate.
Last year, I found out about Deepnote, and it is my favorite Data Science “notebook” tool, right now. They are not a sponsor of this blog, but I sure wish they were. :)
Their plans are simple and affordable.
Even though I am currently unemployed, I pay for the Team Plan. Because with this, I have what I need to do freelance work, or to share notebooks with collaborators, and I can use their more powerful machines.
This article isn’t for advertising them, though. So, that’s enough praise. But do check them out. Deepnote is a very useful Data Science tool, and it has replaced Jupyter, for me.
So, today’s demo has screenshots from my Deepnote. You can see the kinds of things I work on.
What is this? Text to Edgelist?
So, what is this text to edgelist thing? It’s simple. Humans use language to communicate things.
“Mary had a little lamb, its fleece was white as snow. And everywhere that Mary went, that lamb was sure to go.”
What lifeforms and relationships do you read from the text?
Mary is in this setting
A lamb is in this setting
Mary possesses the lamb in some way
The lamb follows Mary around
You can learn about relationships from language, and Artificial Intelligence can, as well.
Previously, if you wanted to extract entities (people, places, organizations, etc) from text, you would download and use a NER model, such as one of spaCy’s.
With this approach, you do not need to download and use a spaCy model. You can use any LLM, and because many LLMs are multilingual, that also simplifies things.
Instead of downloading one of these models for every language you need to support:
You could just use this one single workflow. So, this is much simpler, and also less bloated. Those spaCy models are large, and if you are using them in expensive workflows, then they affect the scaling of those workflows. So, this simpler workflow could save a company thousands of dollars, if they used it.
The idea is simple:
The input is text, any text written in any human language. This is multilingual, so the text could be English, Japanese, Chinese, Russian, Slovenian, whatever. The input is text, any text. That is powerful by itself.
The output is an edgelist that contains nodes, reasoning for why the edge is included, and example text from the source text.
What that edgelist, you can build and analyze a graph. You can learn about the relationships that exist in the text, who knows who, how they know each other, etc.
Eventually, I may do one or more of these things:
Put it behind an API and make the API available to paying subscribers
Put it behind a UI and make the UI available to paying subscribers
Build a company and make this into a simple product
But you can do this too, so learn from me. What I am showing you is much more valuable than the subscription fee for this blog costs, so please subscribe.
Let’s See It
Alright, let’s do some show and tell. I’ll keep it simple, and we will look closer on upcoming days. Sorry, this is not in Jupyter. This is in Deepnote. So, I will share screenshots, today.
The setup is simple. Use these libraries:
Connect to a GPT model of your choice:
Create a simple function for making your prompt calls. Call it whatever you want. I didn’t put much thought into this name.
My current prompt is simple, and it is not final. This is rough form, non-optimized, non-validated.
setup = """
Extract an edgelist of relationships from any text.
Rules:
Include only named human, human-like entities, and named animals (exclude objects, places, unnamed groups).
Add an undirected edge if two individuals interact, appear in the same scene/sentence, or one references/thinks of the other.
Resolve vague mentions if possible; otherwise keep as written (e.g. “Her Sister”).
Include all valid entities.
Use one consistent name per entity, capitalize Mixed Casing.
Each row should have a one sentence explanation of why it was included.
Also include a small snippet from the text supporting the reasoning.
Output:
[("Name1", "Name2", "Reasoning", "Snippet"),
("Name3", "Name4", "Reasoning", "Snippet")]
Return only the edgelist. You must return valid output in the format specified.
Text follows:
"""And for now, I use this code to use the above:
import re
import string
for chapter in list(chapters.keys()):
print('Running chapter: {}'.format(chapter))
# keep letters, digits, punctuation, and whitespace
text = chapters[chapter]['text']
text = re.sub(f"[^{re.escape(string.ascii_letters + string.digits + string.punctuation)}]", " ", text)
text = " ".join(text.split())
#print(text)
edgelist = ask_chatgpt(setup, text)
edgelist = ast.literal_eval(edgelist)
if len(edgelist) > 0:
edge_df = pd.DataFrame(edgelist)
edge_df.columns = ['source', 'target', 'reasoning', 'support_text']
print(edgelist)
G = nx.from_pandas_edgelist(edge_df)
chapters[chapter]['graph'] = G
chapters[chapter]['edgelist'] = edgelist # empty or loaded is fineThis is the line to pay attention to:
edgelist = ask_chatgpt(setup, text)In that line, the input is the chapter text from Alice in Wonderland or Through the Looking Glass, and the output is an edgelist.
You can see that I’m doing other useful stuff with it:
I’ve created a graph using the text of every chapter
I’ve captured the graph and the edgelist in a Python dictionary
More on this stuff later. Ask questions if you have any!
Let’s Inspect
I’ve made several datasets available as of today. Go to https://github.com/itsgorain/100daysofnetworks/tree/main/data/genai.
Let me describe these:
Combined: Alice in Wonderland + Through the Looking Glass
Alice: Alice in Wonderland
Looking Glass: Through the Looking Glass
GPT: The model that was used in creating the edgelist
So, that is six edgelist files created by AI using Alice in Wonderland and Through the Looking Glass. These are killer graphs to explore, and you can also learn about AI reasoning, and you can also do a project to learn about the completeness that AI provides in terms of reading comprehension. These are useful datasets for NLP and AI research. PLEASE SUBSCRIBE TO THIS BLOG. This stuff is valuable and I have done the heavy lifting, making it available for you to do actual AI research and have a successful career.
Let’s inspect a little bit of the data. Here is an example chapter:
[('Alice',
'Guard',
'Alice and Guard interact on the train when he demands her ticket and looks at her angrily.',
'Tickets, please!... Show your ticket, child!... looking angrily at Alice.'),
('Alice',
'Gentleman In White Paper',
'Alice and the Gentleman In White Paper speak directly when he advises her about return-tickets and she refuses.',
"'Never mind what they all say, my dear, but take a return-ticket...' 'Indeed I shan't!' said Alice."),
('Alice',
'Goat',
'Alice shares the carriage with the Goat, who comments on her and whose beard she grabs in fright.',
"'A Goat... said...' and 'she... caught at... the Goat's beard.'"),
('Alice',
'Beetle',
'Alice is in the same carriage as the Beetle, who speaks about sending her back as luggage.',
"'There was a Beetle... 'She'll have to go back from here as luggage!'"),
('Alice',
'Horse',
'Alice and the Horse are in the same carriage when it reassures the passengers about jumping the brook.',
"The Horse... said, 'It's only a brook we have to jump over.'"),
('Goat',
'Gentleman In White Paper',
'The Goat is seated next to the Gentleman In White Paper, placing them together in the same scene.',
'A Goat, that was sitting next to the gentleman in white...'),
('Goat',
'Beetle',
'The Beetle is sitting next to the Goat, indicating co-presence in the carriage.',
'There was a Beetle sitting next to the Goat'),
('Alice',
'Gnat',
'Alice and the Gnat converse at length under the tree about insects and names.',
"the Gnat... 'then you don't like all insects?' 'I like them when they can talk,' Alice said."),
('Alice',
'Rocking-Horse-Fly',
'Alice observes the Rocking-Horse-Fly that the Gnat points out, placing them in the same scene.',
"you'll see a Rocking-horse-fly... Alice looked up at the Rocking-horse-fly with great interest"),
('Gnat',
'Rocking-Horse-Fly',
'The Gnat identifies and describes the Rocking-Horse-Fly to Alice.',
"'half way up that bush, you'll see a Rocking-horse-fly' ... 'Sap and sawdust,' said the Gnat."),
('Alice',
'Snap-Dragon-Fly',
'Alice looks at the Snap-Dragon-Fly above her head after the Gnat directs her to it.',
"Look on the branch above your head... there you'll find a snap-dragon-fly."),
('Gnat',
'Snap-Dragon-Fly',
'The Gnat points out and describes the Snap-Dragon-Fly to Alice.',
"there you'll find a snap-dragon-fly... its body is made of plum-pudding..."),
('Alice',
'Bread-And-Butterfly',
'Alice observes the Bread-And-Butterfly crawling at her feet and asks about its food.',
'Crawling at your feet... you may observe a Bread-and-Butterfly.'),
('Gnat',
'Bread-And-Butterfly',
"The Gnat describes the Bread-And-Butterfly and answers Alice's questions about it.",
"Its wings are thin slices of Bread-and-butter... 'Weak tea with cream in it.'"),
('Alice',
'Fawn',
'Alice and the Fawn speak and walk together through the wood until it recognizes her as a human child.',
"'What do you call yourself?' the Fawn said... 'So they walked on together...'"),
('Alice',
'Tweedledum',
'Alice thinks about visiting Tweedledum and follows finger-posts to his house.',
"TO TWEEDLEDUM'S HOUSE... 'I’ll just call and say how d'you do?'"),
('Alice',
'Tweedledee',
'Alice thinks about visiting Tweedledee and follows finger-posts to his house.',
"TO THE HOUSE OF TWEEDLEDEE... 'I’ll just call and say how d'you do?'"),
('Alice',
'Dash',
'Alice references a named dog Dash while imagining advertisements for lost animals.',
'answers to the name of Dash: had on a brass collar')]That is plenty to get a preview for what the AI is doing, how it is evaluating, and what is in the example text.
I am really excited to play with this on languages other than English. This is going to be very useful.
What Are the Uses?
There are several uses for this kind of workflow, but it boils down to converting unstructured data into useful data. Depending on what your business does, you will have different uses, and you may want an alternative to this approach, to build different kinds of graphs than flow graphs.
For instance, if the prompt were altered, then this could be useful for converting source code into dataflow maps. That’s not much of a stretch, and I know how.
And I wish so much that computational humanities and the social sciences would pay attention to this research. This stuff could supercharge your efforts into understanding literature/art, and in understanding society.
Do some research. Ask GPT.
“What are the business applications of converting unstructured text into an edgelist or graph representation.”
And then read this book.
And then read mine and this blog, learn, practice, and get good. Use this and provide value to yourself, your organizations, etc.
While I am building this out using literature, this has very serious uses in general. This is positively useful in cybersecurity, risk, software engineering, data operations, and definitely the social sciences. Computational humanities should learn from me.
Need Any Help?
Finally… there’s no fun way to say this. I am out of a job and need to find work. If you work in Data Operations, Data Engineering, Cybersecurity, Data Science, or do anything with Artificial Intelligence or Machine Learning and you think I can be of use to your company, please reach out.
I need full-time work with benefits. I need a good problem to help solve. I enjoy collaborations, but I need to pay mortgage and expenses. I am very good at what I do, and I am very good at making companies more effective. I’m a very friendly person and a good teammate, so let me know if you think of anything. Or pitch in and buy a coffee to support this writing.
Please Support this Blog
I would like to make a special request in this article. This blog has over 600 subscribers. I have written over 40 articles. Each article typically involves about four hours of research and development, so that’s about 200 hours of valuable work and writing that I’ve provided for free, because most important is that I want people to learn this. I am not doing this to make money.
However, these days, there are things that I would like to do. For instance, to play with GraphRAG for AI, it is useful to have access to a Graph Database. The cheapest tier Neo4j instance is about $800/year. I would like to work on GraphRAG and write about it so that you all learn, but I cannot do that without support.
So, I have opened up a few ways for you to support this blog:
If you are a subscriber, please consider converting to a paid subscriber. I provide code, data files, and coding explanations that are absolutely worth more than $8 per month. But I understand that not everyone can afford to pay, and that’s fine. Free is absolutely fine, for those who need free.
If you are a paid or unpaid subscriber and you want more flexibility in your contributions, I set up a ko-fi account. CLICK HERE. You can use this this to buy me a coffee ($5 donation) or even to pitch in for a Neo4j Aura instance, which will enable more writing and learning.
And no matter what, if you are here, please buy and read my book. I am working on more book projects, as mentioned on Day 43.
Oh, and please participate in comments. I am a friendly guy. It is lonely in the comments and always weird to me that people don’t seem to want to talk about this stuff. Why not? What is on your mind? Have any cool ideas you want to brainstorm? Don’t be intimidated, for sure. Be creative instead.
Chappie (ChatGPT) drew us a picture for this post. I let it proofread. I do not use it for writing. My words are my own.
In the next article, we will be analyzing these edgelists, and getting started with Graph Machine Learning. Read this if you want to read along.
Have a great day, everybody! We are getting to the fun stuff, now! Enjoy the data! Play with it! Learn from me! Experiment. And then add your new skills to your resume. Do projects you care about. Try using this, or let me know if you want a custom edgelist.











Great read David - cheers.