Error-free genetic repositories: case of amphibians

18 08 2020

In our new study, we curated > 39,000 amphibian mitochondrial DNA (mtDNA) sequences from GenBank, identified > 2,000 sequencing and taxonomic errors, and published the quality-checked records as a curated dataset with an automated workflow in R. High-quality genetic data should help quantify and protect the diversity of the most threatened vertebrate group on Earth.


Upper left: species of Boophis from Andasibe, Madagascar. Upper right: Dendropsophus anceps from State of Rio de Janeiro, Brazil. Lower left; Dendropsophus bipunctatus from State of Rio de Janeiro, Brazil. Lower right: Bufo bufo from Gelderland, The Netherlands. All images from the author.

Scientists from a broad range of biological disciplines use genetic information like DNA sequences to test ecological and evolutionary hypotheses. Critically, genetics are today essential for naming species and therefore quantifying biodiversity, as well as determining where species live and how many individuals of a species occur in the wild.

Researchers are routinely asked, and more recently frequently required, by scientific journals to submit their DNA sequences to GenBank (among other public repositories of genetic data) as a requirement for publishing a paper. Although GenBank provides some quality controls (e.g., to filter sequences with bacterial contaminants and those from other kingdoms), authors are responsible for the quality of their genetic data and have full freedom to assign these to species in the taxonomy database of GenBank. Notably, once sequences have been deposited in GenBank, records are rarely updated in light of identified errors often resulting from taxonomic progress.

