Want a permanent DOI assigned to your data and code? Follow this simple recipe

2 11 2021

These days with data and code often required to be designated as open-source, licenced, and fully trackable for most manuscript submissions to a peer-reviewed journal, it’s easy to get lost in the multitude of platforms and options available. In most cases, we no longer have much of a choice to do so, even if you are reticent (although the benefits of posting your data and code online immediately far outweigh any potential disadvantages).

But do you post your data and code on the Open Science Framework (free), Github (free), Figshare (free), Zenodo (free, but donations encouraged), Dryad ($), or Harvard Dataverse (free) (and so on, and so on, …)? Pick your favourite. Another issue that arises is that even if you have solved the first dilemma, how do you obtain a digital object identifier (DOI) for your data and/or code?

Again, there are many ways to do this, and some methods are more automated than other. That said, I do have a preference that is rather easy to implement that I’d thought I’d share with you here.

The first requirement is getting yourself a (free) Github account. What’s Github? Github is one of the world’s largest communities of developers, where code for all manner of types and uses can be developed, shared, updated, collaborated, shipped, and maintained. It might seem a bit overwhelming for non-developers, but if you strip it down to its basics, it’s straightforward to use as a simple repository for your code and data. Of course, Github is designed for so much more than just this (software development collaboration being one of the main ones), but you don’t need to worry about that for now.

Step 1

Once you create an account, you can start creating ‘repositories’, which are essentially just sections of your account dedicated to specific code (and data). I mostly code in R, so I upload my R code text files and associated datasets to these repositories, and spend a good deal of effort on making the Readme.md file highly explanatory and easy to follow. You can check out some of mine here.

Ok. So, you have a repository with some code and data, you’ve explained what’s going on and how the code works in the Readme file, and now you want a permanent DOI that will point to the repository (and any updates) for all time.

Github doesn’t do this by itself, but it integrates seamlessly with another platform — Zenodo — that does. Oh no! Not another platform! Yes, I’m afraid so, but it’s not as painful as you might expect.

Read the rest of this entry »




To share, or not to share, is no longer the question

7 01 2018

sharing dataAn edited version of a snippet from my upcoming book, The Effective Scientist (due out in March 2018).

I tend to assume tacitly that my collaborators are indeed entirely fine with the idea of having their hard-won data spread across the internet, and that anyone can access and use them. In reality, many are probably not comfortable with that concept at all, and that the very notion of ‘sharing’ data with anyone but your closest and most-trusted colleagues is the stuff of nightmares.

I too was once far too concerned about the privacy of the data for which I had literally sweated and bled, for I feared that some nefarious and amoral scientist would steal, analyse, and publish them before I had the chance, thus usurping my unique contributions to the body of human knowledge. Perhaps I was just paranoid, although I still encounter such attitudes today. While data theft can occur, in reality it is unlikely that anyone would bother trying to out-do you in this regard, mainly for the simple reason that in most cases, data availability is not the limiting factor for scientific advancement. Another reason why this should not worry you is that far too few of us have the time to publish all of our own data, let alone someone else’s. Read the rest of this entry »