2 min readfrom Machine Learning

A map of the latest 11 million papers split by semantic similarity and time slices [P]

A map of the latest 11 million papers split by semantic similarity and time slices [P]
A map of the latest 11 million papers split by semantic similarity and time slices [P]

I am building alternative ways explore scientifc literature. The goal was to make the large number of papers published daily easier to keep up with by visualising the macro scopic trend.

It is free to use at The Global Research Space for any one interested in giving it a try!

How I built it

I sourced the latest 11M papers from OpenAlex and Arxiv and ecoded them using SPECTER 2 on titles and abstracts then projecting it down to 2d using UMAP and creating labels within voronoi bounds around high density peaks at increasingly deep depths.

There is also support for both keyword and semantic queries, and there's an analytics layer for ranking institutions, authors, and topics etc.

I have also more recently added to ability to slide back and forth in time and a daily auto ingestion script to ensure the map is up to date.

Feedback or suggestions is very welcome!

submitted by /u/icannotchangethename
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#rows.com
#real-time data collaboration
#real-time collaboration
#self-service analytics tools
#large dataset processing
#financial modeling with spreadsheets
#predictive analytics in spreadsheets
#predictive analytics
#self-service analytics
#scientific literature
#semantic similarity
#OpenAlex
#Arxiv
#SPECTER 2
#UMAP
#Voronoi
#keyword queries