In my previous work, I worked on a Graph Neural Network, and we used an academic paper author collaboration network as our dataset. That work has inspired other interesting network analysis I’ve set up and done, such as:
Science Fiction Author Collaboration Networks (I will redo, soon)
Today, I’m adding arXiv Author Collaboration Networks to the list. I’ve built a way to pull datasets from arXiv (using the ‘arxiv’ Python library and some minimal code) on any topic whatsoever. This means that you can programmatically pull data from arXiv to research any topic whatsoever. If someone has written about a topic, you can find it.
arXiv is a platform for scientific papers. That’s the quick and dirty. This searches for academic papers on any topic.
Use this for yourselves and for your companies. This can help you have a more focused learning path than everything everywhere all at once.'
How To Use It?
The code is available here, and I’ve created a simple analysis notebook here.
How I Used It?
I wanted to use this to research Network Science researchers, to understand who is collaborating the most.
With the first dataset I pulled, I built a author collaboration network.
With that, I can see which authors have the highest Page Rank, indicating overall importance.
This is what I see, in the ego network of the most collaborative author.
If I drop the center (one of my favorite techniques), I can see the subgroups that are part of Danielle’s ego network.
And I can see which articles Danielle and her collaborators worked on together.
The data that this crawler pulls is also very recent, with the latest article not even 24 hours old. This will be a useful way to keep up with any topic.
Follow Your Curiosity and Creativity
The best advice I have for any learner is to follow your curiosity and creativity. Yes, sometimes you will have to buckle down and knock out tedious work, but you can also have fun learning.
That means, use things like this to pull datasets that excite you or stimulate your creativity. Chase what makes your heart sing. That’s what you are meant to research, in my opinion.
Manageable Knowledge Management
That’s quite a phrase, but hear me out. I often hear data professionals say that they can’t keep up with the deluge of information about Machine Learning. Well, with this, you can. You don’t have to read everything. Everything everywhere all at once is a terrible approach to learning anything. Using this tool, you have articles, authors, dates, and summaries. With a few NLP tricks for similarity, you could easily build a simple search to find what is most relevant to you, and get rid of that overwhelming feeling of drowning in information. You could also filter by date to only read the recent stuff.
That’s All, Folks!
I’m going to keep today’s post light and stop here. There’s much more we can do with these networks. We will use these throughout this adventure.
That’s all for today! Thanks for reading! If you would like to learn more about networks and network analysis, please buy a copy of my book!