Be wise about what you put online

21 03 2022

While you have little choice these days about posting your data and code online when you publish, here are some things to consider when contemplating putting potentially sensitive data online (modified excerpt from The Effective Scientist).

One aspect of making your data publicly available is the prickly issue of whether your data contain sensitive information.

Of course, there are many different types of ‘sensitive’ information that might accompany the more basic quantitative measurements of your datasets, with perhaps the most common being personal details of any human subjects. For example, if you are a medical researcher and your data are derived primarily from living human beings undergoing some procedure, trial, or intervention, then clearly you are bound by your human ethics approvals not to publish information like names, addresses, or anything that could be used to identify the subjects in your sample. In fact, human ethics approvals generally prohibit any sort of public accessibility to medical data that has personal information included; thus, the scientists concerned are being pulled in two different directions — keeping their subjects’ personal information out of the hands of the public, while still making the data available to other scientists.

There are ways around this, such as publishing only generic information online (i.e., by excluding personal identifiers) that could then be linked to the more sensitive data via unique identifiers. In these cases, any other researcher requiring the additional information would have to seek specific permission from the primary researchers, pending additional human-ethics approvals.

Survey data, where individual people are asked questions on anything from their personal habits to their voting preferences, can also come under this umbrella of data sensitivity. Even data mined from social-media platforms can be restricted on privacy grounds. Often commercially sensitive data are in the same category, such as mining leases, fishing grounds, and hunting sites.

It is therefore the responsibility of both the researcher and the committees granting ethics or permitting approvals to decide on an optimal trade-off between data dissemination and the protection of individual privacy or commercial privilege.

Many other types of data sensitivities abound, such as those that must be considered given the propensity of certain nefarious types to exploit scientific information for their own personal gain.

photo: John Long, Flinders University

One rather galling, generic example of this comes from the fields of archaeology and palaeontology, whereby scientists who have published the location of deposit-rich sites have been horrified to discover that curio and fossil hunters have pilfered precious specimens after reading the relevant scientific articles.

As a result, most online archaeological or palaeontological datasets today either do not publish the site locations at all, or they deliberately add a location error so that only specialists will know where to look.

An even more disturbing behaviour is becoming increasingly frequent as would-be poachers and pet-traders use the scientific literature in ecology and biodiversity conservation to discover the locations of rare and endangered wildlife and plants. By virtue of being rare, many species are considered highly valuable in the trade of parts or pets — just think of rhino horn, elephant ivory, rare orchids, and tropical aquarium fish.

If you happen to research any rare species, fossils, human remains, or other potentially valuable specimens, do think carefully about what data you make publicly available online. At the very least, do not tell people where you found them.

CJA Bradshaw



One response

23 03 2022
Salvador Herrando-Perez

Not reporting lat/long because a site is sensitive should not be confused with not reporting lat/long due to poor data-reporting habits. Replicability is a cornerstone in modern science, for example, having access to the same site, specimen or fossil material for additional analysis.

Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: