1. About
  2. Features
  3. Explore

My goal is to produce a graph showing the linkages (and lack there of) between several fields that share a common subproblem by showing who cites whom. There are many databases that show citations between papers: ISI Web of Knowledge, Microsoft Academic Research, and Google Scholar. However none allows me to download even part of their database. Is there some database I have overlooked? Has someone written a scrapper for one of these websites?

For reference, the fields I'm considering are Mechanics, Magnetic Resonance Imaging, Signal Processing, Geophysics, and several others.

1 Answer 1

Our WikiCite project uses Wikidata to store bibliographic information including citation information. The data from Wikidata is free under CC0 license and you are able to access the data via XML dumps, RDF dumps, the web API and the Wikidata Query Service with a SPARQL endpoint.

While the bibliographic data in Wikidata is certainly not complete as of October 2017, Wikidata editors have done considerable work so Wikidata now contains over 8 million scientific articles and over 36 million citations. The coverage may be somewhat ok'ish for a start in Magnetic Resonance Imaging while quite bad in fields such as Mechanics, Signal Processing and Geophysics.

The Wikidata Query Service is used in our Scholia webservice to aggregate and display scholarly information in tables, plots and graphs, including citation graphs. Scholia, via the Wikidata Query Service, allows you to zoom in on a topic and create co-author graphs and co-occuring topics graph with respect to a topic. For instance, for the machine learning concept of "embedding" you can see the graphs here: https://tools.wmflabs.org/scholia/topic/Q29043227. Small partial citation graphs are displayed on the 'work' pages, see, e.g., https://tools.wmflabs.org/scholia/work/Q21090025.

It is possible to extract the bibliographic data with the Wikidata Query Service. For a co-author graph of the big connected component I used a SPARQL query against the Wikidata Query Service and Gephi on the downloaded results. The resulting image - and the SPARQL queries - are available here: https://commons.wikimedia.org/wiki/File:Scholarly_co-author_graph_via_Wikidata,_2017-05-26.png

You can read more about the approach of using Wikidata for scientometrics work in this paper: "Scholia and scientometrics with Wikidata", https://arxiv.org/abs/1703.04222