Citing Data

Posted by Eva Amsen, on 17 January 2011

I just got back from attending two meetings about academia and the internet – one in person and the second, in true internet style, virtually. Both meetings at one point or another discussed the growing trend toward archiving and citing data itself (on top of citing the papers written based on analysis of the data).

The first meeting was the HighWire publishers meeting. HighWire takes care of the online version of many journals, including Development, and their meeting was mostly about practical things for journals and not directly relevant to most of you yet. (You can still find some of the discussion on Twitter, although Twitter’s search function expires after about 10 days.)

The second meeting, which I followed over the web, was Science Online in North Carolina. That meeting is now in its fifth year, and started out as a meeting solely about science blogging, but has expanded to cover other aspects of science and the internet.

It was rather interesting to follow both meetings back to back, since the first was very practical and aimed at things that publishers can do and are doing right now, while the second was full of thoughts and ideas and the audience was very varied. There is still a lot of science blogging being discussed at the Science Online meeting, and I myself Skyped into a panel with 15 community managers of different science blog networks. (“Different” is the keyword here, because the Node has very little in common with, say, the Guardian or Discover blog networks, but it was interesting to hear some comparisons.)

A few topics, however, came up at both the HighWire meeting and at Science Online, and one of these was the new problem of citing data. As Benoit mentioned here back in August, the Journal of Neuroscience no longer publishes supplementary data. This journal was at the HighWire meeting to share how that is going so far, and one of the surprising reactions they had was a response from librarians: If journals don’t publish supplementary data, it has to go somewhere, and libraries are stepping up to claim the niche of data archiving and curating. The J. Neurosci. talk was followed by a librarian from Stanford who mentioned that they just hired a “data librarian” to look into things like this.

And they’re not the only ones: last fall at the Science Online London meeting, the British Library showed that they were very involved in data archiving as well. And as I mentioned, this weekend’s Science Online North Carolina meeting also included a few talks about data archiving and whether having your data cited will one day be valuable for your career (just like having your papers cited is now).

What do you think of this movement toward archiving and citing data? Are you happy that you’ll be able to find other people’s data? Annoyed that it’s yet another thing to keep on top of? And where are all your data at the moment? Can someone easily find them if they want to build upon your work? Have you ever cited a database in a paper?

(Disclaimer: I’m asking out of personal interest, because I’m intrigued by the whole data archiving/citation issue. My personal interest in the topic is entirely separate from the work I do at Development, which has existing policies concerning archiving of microarray and sequence data and supplementary materials. I deal with the front page of the journal website and sometimes the newsy bits and spotlight pieces, but not at all with the papers and their data. )

Card catalogue photo by SpecialKRB on Flickr

(No Ratings Yet)

Tags: archiving, citations, data, libraries
Categories: Discussion, Events

One thought on “Citing Data”

Brian Hole says:

January 17, 2011 at 3:33 PM

The issue of finding an appropriate place for supplementary files and research data is being tackled directly by the DryadUK project (http://datadryad.org/dryaduk), funded by JISC and run by Oxford University and the British Library. We aim to establish a sustainable framework for integrating data archiving and access into the workflows of scientific publishers.

With Dryad, an author submitting an article to a participating journal is asked to also submit the underlying data to the Dryad repository. The data is then curated, and a DataCite DOI is sent to the journal for inclusion in the article. The data is thus securely stored, preserved and made publicly available for validation and repurposing etc., with very little overhead for the journal itself. We are now also working on implementations where the data can also be made available for the peer review process if desired.

The focus of the Dryad repository is specifically in the fields of evolution, ecology, and biomedicine, but the results of the project in terms of identifying a sustainable framework for data repositories in the publishing process will be of much broader applicability.

If anyone is interested in more information, or in participating in DryadUK, please get in touch (brian.hole@bl.uk).

1
0

Reply Report comment