The cost of sequencing is still going down

There’s been a bit of chatter on Twitter about how the cost of sequencing is not going down anymore, with reference to that genome.gov graph.  I realise that sensationalist statements get more retweets, but I am sorry, this one is complete crap – the cost of sequencing is still on a long-term downward trend.

By way of qualification, I have been helping to run our University’s next-generation genomics facility, Edinburgh Genomics, since 2010.  We are an academic facility running an FEC model – which means we recover all of our costs (reagents, staff, lab space etc) but do not make a profit.  If you are interested in Illumina sequencing, especially after reading below, you should contact us.

You can read some relatively up-to-date references here, here and here.

What I am going to talk about below are the medium- to long- term trends in sequencing prices on the Illumina platform.  There are fluctuations in the price in any given period (for example Illumina put prices up across the board a few weeks ago), but these fluctuations are very small in the context of the wider, global trend of cost reduction.

HiSeq V3

Back in 2013/14, the most cost-effective method of sequencing genomes was on the HiSeq 2000/2500 using V3 chemistry.  At its best, a lane of sequencing would produce 180million paired 100bp reads, or 36 gigabases of sequence data.  I am going to use this as our base line.

HiSeq V4

After HiSeq V3, came HiSeq V4 which was introduced last year.  All new 2500s could offer this and many new-ish 2500s could be upgraded.  At its best, V4 produces 250million paired 125bp reads, or 62.5 gigabases of sequence data.

Of course, Illumina charge more for V4 reagents than they do for V3 reagents, but crucially, the price increase is proportionally smaller than the increase in throughput.  So, at Edinburgh Genomics, the cost of a V4 lane was approx. 1.2 times the cost of a V3 lane, but you get 1.7 times the data.  Therefore, the cost of sequencing decreased.

HiSeq X

This is rather trivial I think!  By my calculations, a lane of the HiSeq X will produce around 110 gigabases of sequence data, which is 3 times as much data as HiSeq V3, and the cost has come down to about 0.4 times.  Therefore, the cost of sequencing decreased.

HiSeq 4000

The HiSeq 4000 is a bit of an unknown quantity at present as we haven’t seen one in the wild (yet) and nor have we come up with a detailed costing model.  However, my back-of-a-post-it calculations tell me the HiSeq 4000 lanes will produce around 93 gigabases of data (about 2.6 times as much data as HiSeq V3) at about 1.4 times the cost.   Therefore, the cost of sequencing decreased.

Drawing a new graph

Here it is.  You’re welcome.

biomickwatson_thenewgraph

My numbers are accurate, despite what you may hear

It came to my attention last year that some facilities might be offering HiSeq V3 lanes with over 200 million read-pairs/clusters.  It is possible to go that high, but often only through a process known as over-clustering.  This is a bad thing.  It not only reduces sequence quality, but also produces lots of PCR and optical duplicates which can negatively affect downstream analysis.

Other platforms

I haven’t spoken about Ion Torrent or Ion Proton for obvious reasons.  I also haven’t included NextSeq 500 nor MiSeq – to be honest, though these are very popular, they are not cost-effective ways of producing sequence data (even Illumina will admit that) and if you told your director that they were, well, shame on you ;-)

PacBio?  Well it seems the throughput has increased 3 times in the last year:

Despite the need for an expensive concrete block:

So I can’t really see the cost of PacBio sequencing going up either.

MinION and PromethION – costs currently unknown, but very promising platforms and likely to bring the cost of sequencing down further.

Complete Genomics – well, as I said in 2013, they claimed to be able to increase throughput by 30 times:

There is also the BGISEQ-1000, which apparently can do 1 million human genomes per year. Apparently.

All of which means – the cost of sequencing is going to keep coming down.

So why is the genome.gov graph incorrect?

I don’t know for sure, but I have an idea.  Firstly, the data only go to July 2014; and secondly, the cost per genome is listed as $4905, which is obviously crap in the era of HiSeq X.

Can we stop this now?

Defunding research excellence in Scotland

Earlier this year, the results of the Research Excellence Framework were published, a 5-yearly attempt to grade and rank (from best to worst) universities throughout the UK in terms of their research excellence.  It is a far from perfect process, estimated to cost over £1billion, and is highly criticised throughout the academic world.  However, it’s important, because the rankings are used to inform funding decisions – the implicit promise being that the higher up the rankings you are, the more funding you get.

You can download the full REF 2014 results here.

You will see that the results are split by University, then by department, and then finally into Outputs, Impact and Environment.  The definition of these three can be seen here, but for the purposes of this blog post, we will focus only on Outputs – that is the traditional outputs of academic research (e.g. in scientific areas, these would be peer-reviewed papers).

I’ve cleaned up the Scottish data, summarised over all departments, and you can see it here.

Now, Times Higher Education describe two metrics which can be calculated on the data:

  • GPA (grade point average). Loosely speaking this can be thought of as the “rate” of high quality research.  The higher the value of the GPA, the more likely research at that institution will be of high quality.
  • Research Power.  This can be thought of as the volume of high quality research, and is simply the GPA multiplied by the FTE count.

The reason we have both of these is to separate institutions who submit only a very small amount of high quality research to the REF, from those that submit a larger amount of of mixed quality research.

Remember here we are only looking at research outputs.  So how do the Scottish Universities compare to one another?  Here are the Scottish institutions arranged by GPA:

scot14_crop

And here are the Scottish institution arranged by Research Power:

scot14_rp_crop I don’t think there are any real surprises in these tables, if I’m honest.

Now, it is the Scottish Funding Council (SFC) in Scotland that distributes Government funds to universities in Scotland, and we can see the “Research Excellence” settlement for the next 4 years here.  You can download a sanitised version here.

If we take the 14/15 research excellence settlement as the baseline, then we can plot it against both GPA and Research Power: ref_correlate

Whilst in general, universities with a higher GPA generally receive more funding, this is normalised by research volume, and we can see clearly that research funding correlates with research power.

Those of you who have looked at the SFC funding figures will see that they change over the next few years.  So let’s take a look at those changes, using this year as a baseline. Here are the raw figures:

 

losses

Does there look like there’s an outlier to you?  That data point in the bottom right is the University of Edinburgh – ranked first for research power in Scotland and second for GPA in Scotland; also ranked first for loss of funding.  However, Edinburgh receives the most funding from SFC, so let’s look at percentage drop in funding:

losses_pc

This changes the picture a bit: the biggest loss in funding in percentage terms is Robert Gordon University, losing 59% of its 2014 budget over the next 3 years.  Second worst off is the University of Edinburgh, losing 20%; then Dundee (13%); Aberdeen (10%); and St Andrews (4%).

In fact, 4 of the top 6 Scottish universities (ranked by GPA or by Research Power) are receiving a funding cut.  In contrast, 7 of the bottom 8 Scottish Universities (ranked by GPA or by Research Power) are receiving a funding increase.

It is hard to see this as anything other than the defunding of research excellence in Scotland.  The argument is that by taking money from “the rich” and giving it to “the poor”, the SFC will raise the quality of research at some of the lower ranked universities.  Well it might, but there is no guarantee.  At the same time, the opposite will occur – the quality of research at the higher ranked universities will drop.  By my calculations, the University of Edinburgh will lose £17m over the next 3 years.  That can only have a negative effect on research excellence.

I am not against Robin Hood politics – there absolutely should be a continuous re-distribution of wealth from the rich to the poor.  But does this work in a research setting?  The REF ranks universities on “research excellence”.  Universities strive to be as close to the top of those rankings as they possibly can be, and they spend a fortune to do it.  What point is the ranking if the result is that the best are worse off than they were before?  Put another way, could Edinburgh, Dundee, Aberdeen and St Andrews have done anything differently?  They all improved since 2008; they all came in the top 6 in Scotland.  Yet the reward is a cut in funding.  Perversely, had their research quality plummeted and had they been ranked in the bottom 8, would they have received a funding increase?

More likely is that the SFC intended to redistribute funding no matter what the outcome of the REF (note the SFC funding above is called the “research excellence grant”).  If that is the longer term policy of the SFC, that gradually funding will be moved from the higher ranked universities to the lower ranked, then they immediately remove the incentive for the higher ranked universities to improve or even maintain their ranking.  It could be argued that there is also little incentive for the lower ranked universities to improve, as they will get increased funding anyway.

#sigh

Overall, I have to say, I am left completely bemused by the whole REF process.  What the **** was the point?

How to not sack your professors

I have to admit to being  bit shaken by two recent events in UK academia – the tragic death of Stefan Grimm, a Professor at Imperial College, who had recently been told he was “struggling” to “fulfil the metrics” (grant income of £200k per annum); and the sacking of Dr Alison Hayman, a lecturer at Bristol University, for failing to generate enough income from grants.

Let me be clear – this post is not an attack on Imperial nor Bristol, and I am in no way suggesting either has done anything wrong.  I don’t know enough about either case to pass judgement.

Nor do I have a problem with institutions taking steps to remove people from their post who are not performing satisfactorily.  We are all paid to do a job, and we need to do it.  (Note I am not suggesting that there was a problem with either Stefan or Alison’s performance; I don’t know enough to pass judgement)

However, I do have an issue with judging an academics performance by only a single metric – how much money they have “won” in grants.  Firstly because the grants system can be hugely unfair; secondly, and more importantly, because winning grants is not something we do alone, it is a collaboration with other scientists, and also a collaboration with the institution where we work.

In this collaboration, the role of the PI is to come up with fundable ideas; and the role of the institution is to provide an environment conducive to getting those ideas funded.  I don’t think it is fair to sack someone for not winning grants if you have failed to provide an environment within which it is easy to do so.

I am very lucky, because the institution where I work, The Roslin Institute, is exceptional at this.  So I am going to pass on a few tips, simple things that academic institutions can implement, which will mean that no-one has to sack any more academics.

1. Provide funding

This will probably have the biggest impact, but comes at the highest cost.  In my experience, nothing makes a grant more fundable than some promising preliminary data.  Generating data costs money, but it takes data to generate money.  So fund up front.  Give every PI a budget to spend every year with the explicit intention that it should be spent to generate preliminary data for grant applications.  This single, simple step will likely have a greater impact on your grant success than any other.  And make sure it is a decent sum of money.  I recall speaking to a PI from a UK University who told me that each PI gets £150 per year, and out of that they need to pay for printing.  Printing.  3 pence a sheet.  That was over 10 years ago and I’m still shocked today.

2. Cut the admin

Every minute your PIs spend on compulsory training courses, filling in forms, filling in reports, dredging your awful intranet for information that should be easy to find, filling in spreadsheets, monitoring budgets, calculating costs, dealing with pointless emails and attending meetings is a minute they are not spending writing grants and papers.  Cut. The. Admin.  In fact, employ administrators to do the admin.  It’s what they’re good at.

3. Perform independent QC

So  one of your PIs grant proposals are repeatedly rejected.  Does that make them bad proposals?  Or bad ideas?  Perhaps they are excellent proposals, but they don’t hit the right priorities (which means they didn’t go to the right funder, and that might be your fault).  Read the grants yourself and form your own opinion.  Collect the review reports.  Collect the feedback from the funders.  Were they bad proposals?  Or were they good proposals that didn’t get funded?  I really don’t think it’s tenable to sack people if they repeatedly submit grant proposals that are rated as fundable by the committee concerned.  At that point the PI has done their job.

You might also think about putting in place an internal group of senior academics to survey proposals before they are submitted.  This will give you the opportunity to provide feedback on proposals and perhaps make them more fundable before they even reach the grant committee.  Proposals which are really not ready can be kept over until the next submission date, giving time for more improvements

4. Provide support

Do I even need to say this?  For the PIs who have their grants rejected, give them some support.  Give them a mentor, someone who gets a lot of proposals funded.  Provide training and workshops.  Share tips for success.  Sit with them and discuss their ideas, try and influence their future direction.  Do everything you possibly can to help them.

5. Pay it forwards

Every institution has their superstars, the guys who only need to wink at a committee and they’ll get funded.  But those guys, those professors with 10 post-docs and 15 students, they’re black holes that can suck in all the funding around them, making it difficult for others to get a fair share of the pot.  As an institution, you don’t want them to stop, because you want the funding, of course; but there is a compromise, where these superstars share their knowledge and expertise, where they co-author proposals with less successful (perhaps more junior) PIs, lending their name and their weight, their reputation and gravitas, to a proposal.  When the proposal is funded, it is the junior PI who runs the grant and gets the last author publications.  It doesn’t matter to the more senior PI as they probably already have tenure and an H index north of 50.  So pass it on.  Pay it forwards. Transfer that wonderful grantsmanship to the next generation, and coach your next round of superstars.

6. Be excellent

Yes, you.  The institution.  Be excellent at something.  I don’t care whether it’s medicine or ecology, worms or plants or cats or humans, evolution or botany, I couldn’t care less, but choose something and be excellent at it.  Invest in it.  Create a global reputation for that thing so that when reviewers see a proposal they immediately think “these guys know what they’re doing”.  Make sure you have the equipment and the facilities to do it (see 8).

7. Make it easy to work with industry

As PIs we are increasingly being pushed to generate “impact”, and one way of doing this is to collaborate with industry.  But this is a skill and some institutions are very good at it, others very bad.  Be one of the good ones.  Create strategic partnerships with industry, pump-prime some work (again to generate preliminary data), run workshops and industry days and have a legal team that are set up to make it work, rather than to make it difficult.  There are lots of funding streams available only to academic-industrial partnerships, and you’d be insane to ignore them.

8. Invest in infrastructure

Make sure you have the right equipment, the right type of lab, and the right computing to ensure PIs can actually do science.  It seems obvious, but you’d be surprised how many institutions are out there that simply don’t provide adequate facilities and infrastructure.

——————————————

So, there it is.  I’ve run out of suggestions.  As I said above, I am very lucky, I work at The Roslin Institute, which does as much as it possibly can to create an environment where PIs win grants.

Here’s the thing – if you think your staff are failing, for whatever reason, the very first thing you should do is ask yourself this question “Is it our fault?  Could we have done anything more to help them?”.  I’d argue that, in most cases, the answer to both would be “Yes”.  We all share in the success of published papers and grants won; so don’t forget that there is a shared responsibility in failure, and if some of your PIs are not winning grants, at least some of that is the institution’s fault.

How to recruit a good bioinformatician

I may have said this before (when you get to my age, you begin to forget things), but I’ve been in bioinformatics for around 17 years now, and for that entire time, bioinformatics skills and people have been in high demand.

Today, my friend and colleague Mike Cox asked this question:

So I thought I would write a blog post about how to recruit bioinformaticians.  Hint: it’s not necessarily about where you advertise.

1. Make sure they have something interesting to do

This is vital.  Do you have a really cool research project?  Do you have ideas, testable hypotheses, potential new discoveries?  Is bioinformatics key to this process and do you recognise that only by integrating bioinformatics into your group will it be possible to realise your scientific vision, to answer those amazing questions?

Or do you have a bunch of data you don’t know what to do with, and need someone to come along and analyse whatever it is you throw at them?

Which is it?  Hmm?

2. Make sure they have a good environment to work in

Bioinformatics is unique, I think, in that you can start the day not knowing how to do something, and by the end of the day, be able to do that thing competently.  Most bioinformaticians are collaborative and open and willing to help one another.  This is fantastic.  So a new bioinformatician will want to know: what other bioinformatics groups are around? Is there a journal club?  Is there a monthly regional bioinformatics meeting?  Are there peers I can talk to, to gain and give help and support?

Or will I be alone in the basement with the servers?

Speaking of servers, the *other* type of environment bioinformaticians need is access to good compute resources.  Does your institution have HPC?  Is there a cluster with enough grunt to get most tasks done?  Is there a sys/admin who understands Linux?

Or were you hoping to give them the laptop your student just handed back after having used it during their 4 year PhD?  The one with WIndows 2000 on it?

3. Give them a career path

Look around.  Does your institution value computational biology?  Are here computational PIs and group leaders?  Do you have professors in computational biology, do computational scientists run any of your research programmes?  Could you ever envisage that your director could be a computational biologist?

Or is bioinformatics just another tool, just another skill to acquire on your way to the top?

4. Give them a development path

Bioinformaticians love opportunities to learn, both new technical skills and new scientific skills.  They work best when they are embedded fully in the research process, are able to have input into study design, are involved throughout data generation and (of course) the data analysis.  They want to be allowed to make the discoveries and write the papers.  Is this going to be possible? Could you imagine, in your group, a bioinformatician writing a first author paper?

Or do you see them sitting at the end of the process, responsible merely for turning your data into p-values and graphs, before you and others write the paper?

5. Pay them what they’re worth

This is perhaps the most controversial, but the laws of supply and demand are at play here.  Whenever something is in short supply, the cost of that something goes up.  Pay it.  If you don’t, someone else will.

6. Drop your standards

Especially true in academia.  Does the job description/pay grade demand a PhD?  You know what?  I don’t have a PhD, and I’m doing OK (group leader for 11 years, over 60 publications, several million in grants won).  Take a chance.  A PhD isn’t everything

7. Promote them

Got funds for an RA?  Try and push it up to post-doc level and emphasize the possibility of being involved in research. Got funds for a post-doc?  Try and push it up to a fellowship and offer semi-independence and a small research budget.  Got money for a fellowship?  Try and push it up to group leader level, and co-supervise a PhD student with them.


If none of the above is possible, at least make sure you have access to good beer.


 

I’m not sure how this post is going to go down, to be honest.  I know a lot of lab people who might think “Why should bioinformaticians be treated any differently?”.  I understand, I do. I get annoyed at inequalities too.  However, the simple fact is that “supply and demand” is in play here.  I think it was Chris Fields who said that many people try and recruit bioinformatics unicorns, mythical creatures capable of solving all of their data problems for them.  Well, it’s possible that they just might, but if you want to find a unicorn, you’re going to have to do something magical.

Putting the HiSeq 4000 in context

Illumina have done it again, disrupted their own market under no competiton and produced some wonderful new machines with higher throughput and lower run times.  Below is a brief summary of what I have learned so far.

HiSeq X 5

Pretty basic, this is half of an X ten, but the reagents etc are going to be more expensive.  $6million caital for an X5 and the headline figure appears to be $1400 per 30X human genome.  The headline figure for X10 is $1000 per genome, so X5 may be 40% more expensive.

HiSeq 3000/4000

The 3000 is to the 4000 as the 1000 was to the 2000 and the 1500 to the 2500 – it’s a 4000 that can only run one flowcell instead of two.  I expect it to be as popular as the 1000/1500s were – i.e. not very.  No-one goes to a funder for capital investment and says “Give me millions of dollars so I can buy the second best machine”.

Details are scarce, but the 4000 (judging by the stats) will have 2 flowcells with 6 8 lanes each, do 2x150bp sequencing, it seems around 375 312 million clusters per lane in 3.5 days.

Here is how it stacks up against the other HiSeq systems:

Clusters per lane Read length Lanes Days Gb per lane Gb total Gb per day
V1 rapid 150000000 2×150 4 2 45 180 90
V2 rapid 150000000 2×250 4 2.5 75 300 120
V3 high output 180000000 2×100 16 11 36 576 52
V4 high output 250000000 2×125 16 6 62.5 1000 167
HiSeq 4000 312000000 2×150 16 3.5 93.6 1500 428
HiSeq X 375000000 2×150 16 3 112.5 1800 600

These are headine figures and contain some guesses. How the machines behave in reality might differ.

If any of my figures are wrong, please leave a comment!

UPDATE: there appears to be some confusion over the exact config of the HiSeq 4000.  The spec sheet says that 5 billion reads per run pass filter.  The RNA-Seq dataset has 378million reads “from one lane”.  5 billion / 378 million is ~ 13 (lanes).  My contact at Illumina says there are 8 lanes per flowcell.  5 billion clusters / 16 lanes would give us 312 million reads per lane.  Possible the RNA-Seq dataset is overclustered!

A 387million paired RNA-Seq data set is here.

Edinburgh and Glasgow invest in revolutionary HiSeq X clinical genomics technology

I don’t think I’m exaggerating when I say that the ability to sequence a person’s genome for less than £750 will turn out to be one of the biggest breakthroughs in human medicine this century.  This one technology gives us an unprecedented ability to understand and diagnose genetic disease, and to enable personalised medicine and pharmacogenomics.

Today, the Universities of Edinburgh and Glasgow announce the purchase of 15 HiSeq X instruments, one of the largest installs in the world.  This tied together with Scotland and the UK’s world-leading super computing infrastructure, much of which is based at the University of Edinburgh, will make Scotland a global leader in clinical genomics.  in Edinburgh, the installation is backed by three world leading institutions, The Roslin Institute (my employer), the Institute for Genetics and Molecular Medicine and Edinburgh’s School of Biological Sciences.  The Edinburgh node will be placed within Edinburgh Genomics.

One nuance of this announcement that should not be underestimated is the fact that Scotland has excellent electronic national health records, representing a huge advantage.  As anyone working in genomics knows – in the land of cheap sequencing, the person with good metadata is king.

This announcement represents both the beginning and the end of two separate but related stories.  The first began at PAG XXII last year (it doesn’t escape me that I am writing this at PAGXXIII this year), when Illumina announced the HiSeq X system.  This morning I re-read a 3 hour e-mail conversation from January last year between Mark Blaxter (director of Edinburgh Genomics), Karim Gharbi (head of genomics), Richard Talbot (head of laboratory) and myself (head of bioinformatics).  As Illumina announced the HiSeq X system, we started off at confusion, moved swiftly to wonder and amazement, and ended with Mark asking “Do you think we need to invest in this technology?”.  He received 3 resounding answers: “Yes!”.  Three hours to decide we needed to push for a clinical genomics revolution in Scotland, driven by a HiSeq X Ten install.  The last year has seen some incredible hard work, not least from Mark Blaxter, Tim Aitman, David Hume and countless others.  A huge amount of respect has to go to all three, especially Mark who really began to drive these discussions at the start of last year.  No doubt countless others worked incredibly hard, and everyone involved deserves credit.

That’s the end of one story, and the the beginning of another – sequencing 1000s of human genomes from Scotland and beyond.  I am incredibly excited by what that represents.  One of my first jobs was at GlaxoWellcome in the 90s, and around 1996-98 I recall the buzz around pharmacogenomics, about personalised medicine, about delivering the right drug to the right patient basedon genetic information.  That’s nearly 20 years ago now, which demonstrates that this is not a new idea, and nor is it one that has been quick to realise.

Within Edinburgh, the HiSeq X systems will be run by a new clinical genomics division of Edinburgh Genomics, directed by Tim Aitman.  All other sequencing, including agrigenomics, environmental and non-genomic human sequencing will be run by the existing Edinburgh Genomics infrastructure, Edinburgh Genomics Genome Sciences, directed by Mark Blaxter.  Existing customers, and those who work in fields other than human genomics should rest assured that we remain committed to our existing facility and will continue to deliver high quality data and projects across the biological sciences using our current install of HiSeq 2500 and MiSeq sequencers.

These are very, very exciting times!

Easy and powerful graph editing using R and PowerPoint

A recent conversation on Twitter reminded me of a powerful way to edit graphs created in R inside PowerPoint:

I realise that R is incredibly powerful anyway, and that much of this can be done within R, but I am also painfully well aware that R is hard, and that many users prefer “point-and-click”.

This example uses Windows.

1. In R, let’s create a graph:

data(iris)
plot(iris$Sepal.Length, iris$Sepal.Width, main="My Graph", xlab="Width", ylab="Length")

2. You should see something like the graph below.  In the top left hand corner, choose File -> Save As -> Metafile, and save it somewhere convenient

mygraph3. Now, fire up PowerPoint and start with a blank slide.  Choose insert picture

insertpictureNavigate to the .emf file you just saved and choose it.  You should see:

inserted4. Now, right click the image, choose group -> ungroup:

ungroupYou will get a message asking if you want to convert the image to a Microsoft Drawing object.  Choose yes:

drawingobj

5. Now repeat step 4. Right click the image, choose group -> ungroup:

ungroupYou should now see something like this:

editableEvery single part of the graph is now selectable, moveable and editable!  Click outside of the chart area to de-select everything, and then click on individual components to edit them.  In the graph below I have changed the title, and given one of the data points a red background:

edited

Enjoy!

(I realise there are other ways to do this, perhaps better ways e.g. export as PDF then edit in Illustrator (thanks !  This post is really aimed at those familiar and comfortable with PowerPoint :-))