Smart genetic analysis made fast and easy

29 07 2021

If you use genetics to differentiate populations, the new package smartsnp might be your new friend. Written in R language and available from GitHub and CRAN, this package does principal component analysis with control for genetic drift, projects ancient samples onto modern genetic space, and tests for population differences in genotypes. The package has been built to load big datasets and run complex stats in the blink of an eye, and is fully described in a paper published in Methods in Ecology and Evolution (1).


In the bioinformatics era, sequencing a genome has never been so straightforward. No surprise that > 20 petabytes of genomic data are expected to be generated every year by the end of this decade (2) — if 1 byte of information was 1 mm long, we could make 29,000 round trips to the moon with 20 petabytes. Data size in genetics keeps outpacing the computer power available to handle it at any given time (3). Many will be familiar with a computer freezing if unable to load or run an analysis on a huge dataset, and how many coffees or teas we might have drunk, or computer screens might have been broken, during the wait. The bottom line is that software advances that speed up data processing and genetic analysis are always good news.

With that idea in mind, I have just published a paper presenting the new R package smartsnp (1) to run multivariate analysis of big genotype data, with applications to studies of ancestry, evolution, forensics, lineages, and overall population genetics. I am proud to say that the development of the package has been one of the most gratifying short-term collaborations in my entire career, with my colleagues Christian Huber and Ray Tobler: a true team effort!

The package is available on GitHub and the Comprehensive R Archive Network CRAN. See downloading options here, and vignettes here with step-by-step instructions to run different functionalities of our package (summarised below).

In this blog, I use “genotype” meaning the combination of gene variants (alleles) across a predefined set of positions (loci) in the genome of a given individual of animal, human, microbe, or plant. One type of those variants is single nucleotide polymorphisms (SNP), a DNA locus at which two or more alternative nucleotides occur, sometimes conditioning protein translation or gene expression. SNPs are relatively stable over time and are routinely used to identify individuals and ancestors in humans and wildlife.

What the package does

The package smartsnp is partly based on the field-standard software EIGENSOFT (4, 5) which is only available for Unix command-line environments. In fact, our driving motivation was (i) to broaden the use of EIGENSOFT tools by making them available to the rocketing community of professionals, not only academics who employ R for their work (6), and (ii) to optimise our package to handle big datasets and complex stats efficiently. Our package mimics EIGENSOFT’s principal component analysis (SMARTPCA) (5), and also runs multivariate tests for population differences in genotypes as follows:

Read the rest of this entry »




Population of First Australians grew to millions, much more than previous estimates

30 04 2021

Shutterstock/Jason Benz Bennee


We know it is more than 60,000 years since the first people entered the continent of Sahul — the giant landmass that connected New Guinea, Australia and Tasmania when sea levels were lower than today.

But where the earliest people moved across the landscape, how fast they moved, and how many were involved, have been shrouded in mystery.

Our latest research, published today shows the establishment of populations in every part of this giant continent could have occurred in as little as 5,000 years. And the entire population of Sahul could have been as high as 6.4 million people.

This translates to more than 3 million people in the area that is now modern-day Australia, far more than any previous estimate.


Read more: We mapped the ‘super-highways’ the First Australians used to cross the ancient land


The first people could have entered through what is now western New Guinea or from the now-submerged Sahul Shelf off the modern-day Kimberley (or both).

But whichever the route, entire communities of people arrived, adapted to and established deep cultural connections with Country over 11 million square kilometres of land, from northwestern Sahul to Tasmania.

A map showing a much larger landmass as Australia is joined to both Tasmania and New Guinea due to lower sea levels

Map of what Australia looked like for most of the human history of the continent when sea levels were lower than today. Author provided


This equals a rate of population establishment of about 1km per year (based on a maximum straight-line distance of about 5,000km from the introduction point to the farthest point).

That’s doubly impressive when you consider the harshness of the Australian landscape in which people both survived and thrived.

Previous estimates of Indigenous population

Various attempts have been made to calculate the number of people living in Australia before European invasion. Estimates vary from 300,000 to more than 1,200,000 people. Read the rest of this entry »





Climate change and humans together pushed Australia’s biggest beasts to extinction

25 11 2019

people-megafaunaOver the last 60,000 years, many of the world’s largest species disappeared forever. Some of the largest that we generally call ‘megafauna’ were first lost in Sahul — the super-continent formed by the connection of Australia and New Guinea during periods of low sea level. The causes of these extinctions have been heavily debated for decades within the scientific community.

Three potential drivers of these extinctions have been suggested. The first is climate change that assumes an increase in arid conditions that eventually became lethal to megafauna. The second proposed mechanism is that the early ancestors of Aboriginal people who either hunted megafauna species to extinction, or modified ecosystems to put the largest species at a disadvantage. The third and most nuanced proposed driver of extinction is the combination of the first two.

The primary scientific tools we scientists use to determine which of these proposed causes of extinction have the most support are dated fossil records from the extinct species themselves, as well as archaeological evidence from early Aboriginal people. Traditionally, the main way we use these data is to construct a timeline of when the last fossil of a species was preserved, and compare this to evidence indicating when people arrived. We can also reconstruct climate patterns back tens of thousands of years using models similar to the ones used today to predict future climates. Based on the comparison of all of these different timelines, we conclude that abrupt climate changes in the past were influential if they occurred at or immediately before a recorded extinction event. On the other hand, if megafauna extinctions occur immediately after humans are thought to have arrived, we attribute more weight to human arrival as a driver.

Read the rest of this entry »





Legacy of human migration on the diversity of languages in the Americas

12 09 2018

quechua-foto-ale-glogsterThis might seem a little left-of-centre for CB.com subject matter, but hang in there, this does have some pretty important conservation implications.

In our quest to be as transdisciplinary as possible, I’ve team up with a few people outside my discipline to put together a PhD modelling project that could really help us understand how human colonisation shaped not only ancient ecosystems, but also our own ancient cultures.

Thanks largely to the efforts of Dr Frédérik Saltré here in the Global Ecology Laboratory, at Flinders University, and in collaboration with Dr Bastien Llamas (Australian Centre for Ancient DNA), Joshua Birchall (Museu Paraense Emílio Goeldi, Brazil), and Lars Fehren-Schmitz (University of California at Santa Cruz, USA), I think the student could break down a few disciplinary boundaries here and provide real insights into the causes and consequences of human expansion into novel environments.

Interested? See below for more details?

Languages are ‘documents of history’ and historical linguists have developed comparative methods to infer patterns of human prehistory and cultural evolution. The Americas present a more substantive diversity of indigenous language stock than any other continent; however, whether such a diversity arose from initial human migration pathways across the continent is still unknown, because the primary proxy used (i.e., archaeological evidence) to study modern human migration is both too incomplete and biased to inform any regional inference of colonisation trajectories. Read the rest of this entry »





Prioritising your academic tasks

18 04 2018

The following is an abridged version of one of the chapters in my recent book, The Effective Scientist, regarding how to prioritise your tasks in academia. For a more complete treatise of the issue, access the full book here.

splitting tasks

Splitting tasks. © René Campbell renecampbellart.com

How the hell do you balance all the requirements of an academic life in science? From actually doing the science, analysing the data, writing papers, reviewing, writing grants, to mentoring students — not to mention trying to have a modicum of a life outside of the lab — you can quickly end up feeling a little daunted. While there is no empirical formula that make you run your academic life efficiently all the time, I can offer a few suggestions that might make your life just a little less chaotic.

Priority 1: Revise articles submitted to high-ranked journals

Barring a family emergency, my top priority is always revising an article that has been sent back to me from a high-ranking journal for revisions. Spend the necessary time to complete the necessary revisions.

Priority 2: Revise articles submitted to lower-ranked journals

I could have lumped this priority with the previous, but I think it is necessary to distinguish the two should you find yourself in the fortunate position of having to do more than one revision at a time.

Priority 3: Experimentation and field work

Most of us need data before we can write papers, so this is high on my personal priority list. If field work is required, then obviously this will be your dominant preoccupation for sometimes extended periods. Many experiments can also be highly time-consuming, while others can be done in stages or run in the background while you complete other tasks.

Priority 4: Databasing

This one could be easily forgotten, but it is a task that can take up a disproportionate amount of your time if do not deliberately fit it into your schedule. Well-organised, abundantly meta-tagged, intuitive, and backed-up databases are essential for effective scientific analysis; good data are useless if you cannot find them or understand to what they refer. Read the rest of this entry »





My interview with Conservation Careers

10 04 2018

IMage-2

The online job-search engine and careers magazine for conservation professionals — Conservation Careers — recently published an interview with me written by Mark Thomas. Mark said that he didn’t mind if I republished the article here.

As we walk through life we sometimes don’t know where our current path will take us. Will it be meaningful, and what steps could we take? Seeking out and talking to people who have walked far ahead of us in a line of work that we are interested in could help shape the next steps we take, and help us not make the same mistakes that could have cost us precious time.

A phrase that I love is “standing on the shoulders of giants” and this conversation has really inspired me — I hope it will do for you as well.

Corey Bradshaw is the Matthew Flinders Fellow in Global Ecology at Flinders University, and author to over 260 hundred peer-reviewed articles. His research is mainly in the area of global-change ecology, and his blog ConservationBytes critiques the science of conservation and has over 11,000 followers. He has written books, and his most recent one ‘The Effective Scientist’ will be published in March (more on this later).

What got you interested in ecology and conservation?

As a child I grew up in British Columbia, Canada, my father was a fur trapper, and we hunted everything we ate (we ate a lot of black bear). My father had lots of dead things around the house and he prepared the skins for the fur market. It was a very consumptive and decidedly non-conservation upbringing.

Ironically, I learnt early in life that some of the biggest impediments to deforestation through logging was the trapping industry, because when you cut down trees nothing that is furry likes to live there. In their own consumptive ways, the hunters were vocal and acted to protect more species possibly than what some dedicated NGOs were able to.

So, at the time, I never fully appreciated it, but not having much exposure to all things urban and the great wide world, and by spending a lot of time out in the bush, I ended up appreciating the conservation of wild things even within that consumptive mind-set. Read the rest of this entry »





The Effective Scientist

22 03 2018

final coverWhat is an effective scientist?

The more I have tried to answer this question, the more it has eluded me. Before I even venture an attempt, it is necessary to distinguish the more esoteric term ‘effective’ from the more pedestrian term ‘success’. Even ‘success’ can be defined and quantified in many different ways. Is the most successful scientist the one who publishes the most papers, gains the most citations, earns the most grant money, gives the most keynote addresses, lectures the most undergraduate students, supervises the most PhD students, appears on the most television shows, or the one whose results improves the most lives? The unfortunate and wholly unsatisfying answer to each of those components is ‘yes’, but neither is the answer restricted to the superlative of any one of those. What I mean here is that you need to do reasonably well (i.e., relative to your peers, at any rate) in most of these things if you want to be considered ‘successful’. The relative contribution of your performance in these components will vary from person to person, and from discipline to discipline, but most undeniably ‘successful’ scientists do well in many or most of these areas.

That’s the opening paragraph for my new book that has finally been release for sale today in the United Kingdom and Europe (the Australasian release is scheduled for 7 April, and 30 April for North America). Published by Cambridge University Press, The Effective ScientistA Handy Guide to a Successful Academic Career is the culmination of many years of work on all the things an academic scientist today needs to know, but was never taught formally.

Several people have asked me why I decided to write this book, so a little history of its genesis is in order. I suppose my over-arching drive was to create something that I sincerely wish had existed when I was a young scientist just starting out on the academic career path. I was focussed on learning my science, and didn’t necessarily have any formal instruction in all the other varied duties I’d eventually be expected to do well, from how to write papers efficiently, to how to review properly, how to manage my grant money, how to organise and store my data, how to run a lab smoothly, how to get the most out of a conference, how to deal with the media, to how to engage in social media effectively (even though the latter didn’t really exist yet at the time) — all of these so-called ‘extra-curricular’ activities associated with an academic career were things I would eventually just have to learn as I went along. I’m sure you’ll agree, there has to be a better way than just muddling through one’s career picking up haphazard experience. Read the rest of this entry »





Dangers of forcing regressions through the origin

17 10 2017

correlationsI had an interesting ‘discussion’ on Twitter yesterday that convinced me the topic would make a useful post. The specific example has nothing whatsoever to do with conservation, but it serves as a valuable statistical lesson for all concerned about demonstrating adequate evidence before jumping to conclusions.

The data in question were used in a correlation between national gun ownership (guns per capita) and gun-related deaths and injuries (total deaths and injuries from guns per 100,000 people) (the third figure in the article). As you might intuitively expect, the author concluded that there was a positive correlation between gun-related deaths and injuries, and gun ownership:

image-20160307-30436-2rzo6k

__

Now, if you’re an empirical skeptic like me, there was something fishy about that fitted trend line. So, I replotted the data (available here) using Plot Digitizer (if you haven’t yet discovered this wonderful tool for lifting data out of figures, you would be wise to get it now), and ran a little analysis of my own in R:

Rplot01

Just doing a little 2-parameter linear model (y ~ α + βx) in R on these log-log data (which means, it’s assumed to be a power relationship), shows that there’s no relationship at all — the intercept is 1.3565 (± 0.3814) in log space (i.e., 101.3565 = 22.72), and there’s no evidence for a non-zero slope (in fact, the estimated slope is negative at -0.1411, but it has no support). See R code here.

Now, the author pointed out what appears to be a rather intuitive requirement for this analysis — you should not have a positive number of gun-related deaths/injuries if there are no guns in the population; in other words, the relationship should be forced to go through the origin (xy = 0, 0). You can easily do this in R by using the lm function and setting the relationship to y ~ 0 + x; see code here). Read the rest of this entry »