Celebrate the prople who make your science possible

I want to start this post with a heart-felt congratulations to the team from Roslin and their paper in BMC Biology “A gene expression atlas of the domestic pig”, which has some really cool science in it, and which describes a fantastic resource that will be used by scientists throughout the world for years to come.  Well done guys!

What you probably don’t know is that the paper almost won an award in the annual BMC Research Awards – you can see that here, where the paper came runner up in the “Computational and high-throughput studies in genomics and systems biology” category.

I’m particularly proud of Alison Downing, who works in ARK-Genomics, the facility which I am director of – well done Alison!  Alison, like everyone else in the facility, works very hard to enable large scale genetics and genomics projects to deliver; she doesn’t work 9-5, Mon-Fri, she works the hours that the projects demand.  Often here early in the morning, late at night and at the weekend, Alison is the embodiment of a facility scientist – dedicated, skilled, hard working – and completely and utterly under-appreciated by those who use the facility!

Acknowledge and celebrate the people who do the work

As you can see from the Pig Atlas paper, Alison was co-author on the paper.  This is how it should be, in my opinion.  Having processed 100s of samples and microarrays, without her, the paper would not have been possible.

Now, this is a Roslin paper and Alison is a Roslin scientist, but what you may not know is that no-one at Roslin gets a special deal from ARK-Genomics.  They pay the same fees as everyone else.  They wait in the same queue.  There is no price reduction and no queue jumping.

More often than not, when people pay us to process NGS, genotyping or microarray samples, our scientists don’t end up on the resulting paper; quite often, we don’t even get an acknowledgement – and the question has to be why the hell not?

The way I see it is this:  if the work the facility carries out had actually been done in your own lab, would those lab members get on your author list?  If not on the author list, would they get into the acknowledgements?  If the answer is yes, then you should be putting the facility scientist on your publication – full stop.  No argument.

“But I paid for the work!” I hear you say – sure, and you also pay the people in your own lab!  What’s the difference?  Plus, the last time I checked, the author list for a paper is supposed to represent a list of people who did the research and produced the paper – there is no caveat about whether you paid for it or not.

So come on.  Lets celebrate the people who make the genomics revolution possible, the hard working scientists who take your samples and turn them into data, the people who make your science happen – hurray for the facility scientists!

A pedantic look at the cost of sequencing

A quick post to take a deeper look at the cost of sequencing, after Neil Hall wrote an excellent commentary on this (and other issues) in Genome Biology (“After the gold rush”).

We’ve all seen the graph, published by the NHGRI, on the historical cost of sequencing; in fact, we’ve seen it too much, in presentations, posters, grant applications and on the side of buses.  In fact, it’s a major part of genomics bingo.

Here it is for your loving perusal:

The raw data can be found here.

My first major grip with this graph is that the y-axis starts at $10k.  The biggest value in the spreadsheet is $5292.39.  The graph above is misleading, and makes the casual observer think the price used to be close to $10k per Mb (this is of course an illusion of the log scale).  Whilst I appreciate that many graphs with log scales do increment in powers of 10, this is needless in this instance.  Here is the graph with a slightly more informative y-axis:

figure1

The above graph of course tells the same story.

The second issue I have is with Neil’s assertion that the price went up.  Of course, it did – from 6.5 cents per Mb to 7.3 cents per Mb, an increase of 0.8 cents per Mb.  This was indeed an increase of around 12% in relative terms, but not a huge increase in real terms.

What the NHGRI spreadsheet doesn’t tell you is how they convert the “per Mb” cost into the “per genome” cost.  Of course, they use a multiplier: but from 2001 – 2007, that multiplier is 18000 (equivalent to 6X of a human genome); for one data point, in January 2008, the multiplier is 30000 (equivalent to 10X of a human genome); and then for the rest of the data, the “next-gen” period”, the multiplier is 90000 (equivalent to 30X of a human genome).

These values aren’t in the spreadsheet, but they are mentioned in the web-page that refers to the data.  This is not Maths, this is Biomaths; that is “Maths done in a spreadsheet with estimates from the real world that we put somewhere other than the data we’re calculating on”.

Looking at the rate of change

I want to finish by looking at the rate of change, that is, the change in price expressed as a proportion of the original price.  This is what we see:

figure2

This graph may or may not tell a different story.  The story is that yes, sequencing costs are coming down; but since late 2007, early 2008 the rate of change of that reduction has been following an upwards trend i.e. over time, the reduction in cost from one period to the next has been increasing.

This could be an interesting pattern; or arguably, we should see 2007/2008 as an outlier and this is just the “proportional reduction” returning to the value range it was at before that outlier.

GA vs HiSeq

It is mildly interesting that introducing the Genome Analyzer in 2008 had a greater impact than introducing the HiSeq 2000 in 2010 and the V3 upgrades in 2011; that is until you realise the comparator for the former was “Sanger” and the comparator for the latter was “GA”, and relative to one another, the GA was more of a revolution compared to Sanger, than HiSeq was to GA.

A guide for the lonely bioinformatician

This may be a uniquely UK centric blog post but I suspect not.  Let me start with a brief story.  Sat with a coffee in our canteen a few weeks ago, I overheard a conversation between a few PIs about a grant application.  “Don’t worry”, the lead PI said,  “we’ve put money on the application to fund a bioinformatician”.  Good planning I hear you say, and I agree;  however,  note that none of the PIs in that discussion were themselves bioinformaticians; none of them can code;  put them in front of a Linux terminal and they wouldn’t know what to do.

Yes – we were witnessing the birth of yet another “pet bioinformatician”.  What I mean by this term is a single bioinformatician employed within laboratory based group. These guys are becoming more and more common in UK academic groups,  and it concerns me because it is possible they will become isolated and pick up bad practices as they don’t have a senior bioinformatician to guide them. It also concerns me that their career and profesional development might suffer.

Consider the opposite situation – how many bioinformatician PIs manage lab staff?  How could we possibly guide a young post doc on how to run gels, PCRs etc nevermind more complicated laboratory SOPs? We couldn’t – so why do lab-based PIs assume they can guide bioinformaticians?

Consideration has to be given to how you can develop and nurture a young or inexperienced member of staff when you have NONE of the skills they will need to develop to survive in their chosen field. How can you help them when you dont know yourself?

This guide is aimed at pet bioinformaticians, and is meant to guide them towards better career development.

1. Make friends with local bioinformatics groups
You may not be in the local bioinformatics group, but if there is one, seek them out, introduce yourself and make friends. Ask if you can attend their lab meetings and journal club. Tell the group leader you want to make sure you learn good practice in bioinformatics, and would like their help. If there isn’t one in your institute, where is the nearest one geographically? Can you travel to meet them? If so, do it; if not, attempt to skype into meetings etc. Develop electronic relationships with people and groups on the Internet. Develop a support group who will be able to help you with the kind of problems your lab-based group cannot.

2. Talk to your computing group
Find them, tell them what your work is about, what resource you will need and ask them how best to get access to those resources in the environment you exist in. If you don’t know what resources you will need, see point (1) above. Your local IT team will be essential to your sucess. Befriend the linux sys admin, they might save your life.

3. Obtain clear expectations
Speak to your manager and get them to outline exactly what their expectations are of you. If you are funded on a grant, get the grant application and read exactly what your manager has promised you will deliver. Prepare your manager for the possibility that their expectations may need to be altered, especially if they are unrealistic.

4. Rewrite your job description
Armed with (3) and with the use of (1) and (2), rewrite your job description, make your manager fully aware what you can deliver and what you can’t. Make them aware of how long things take, as they may not know. Do this as early as you can. If they disagree with your estimates of what you can deliver, ask for the support from (1). Give realistic estimations. Ask your manager to prioritise the objectives.

5. Papers
You need first author papers, just like any other scientist. Middle author papers will only get you so far. Be up front about this, ask your manager where your next first author paper is coming from. Explain that, for certain projects you will have done more work than the lab guys, so deserve first authorship (only do this if its true). If no possibility exists, ask to be allowed to develop your own ideas and publish those. Talk to the guys in (1).

6. Attend bioinformatics meetings
Your manager will want you to go to meetings relevant to the group’s research. Go to these, but also ask to attend bioinformatics meetings and workshops.

7. Try first, ask later
This is a delicate balancing act. Nothing will teach you better than just getting hold of some data or code and just giving it a go. Try. Sit and read and try as hard as you can to solve whatever problems you encounter. But be aware the solutions you come up with may be sub optimal. After a certain time period, ask for help. Show what you’ve done to those with more experience and ask for feedback. Take the best of what you have learned and any feedback you gained, and leave the rest behind. Noone likes the person who asks for help too early and expects someone else to tell them what to do; we all love a trier. But don’t take it too far; try your own way first, but at some point take a break and ask for feedback and assistance.


I’ve seen more than one project where the results were almost 100% crap because a bioinformatician acted in isolation and didn’t ask for help. Don’t be that person. Don’t let yourself be. Being around other bioinformaticians should mitigate this risk, as they will be able to spot where you are going wrong before it is too late.

Make sure you are valued as highly as other members of your group, that your career is nurtured, that your skills are developed in the same way that other members of your group enjoy. Make sure you work to clear and realistic expectations, and take an active role in (re)defining your role and job description.

If you’re in academia, publish. Please publish. Mid author is ok, but first author is essential if you want to be seen as anything other than a facilitator. Aim for two papers a year. If your manager can’t deliver that for you, do it yourself. Publish a review. Publish a comparison of software tools. Take the problem you encounter most, solve it and publish the solution. It will seem daunting at first, but ask for help. You can do it.

Look after yourselves, pet bioinformaticians :)

Update – 24/4/2013

Wow, did this post touch a nerve – definitely my most popular post in terms of retweets and first day visitors.  There are clearly a lot of pet bioinformaticians out there!

A few things.  Firstly, if you are the PI of a pet bioinformatician, there is no explicit or implied criticism of you here.  There is nothing wrong with you employing a bioinformatician in your lab.  Just look after them, and recognise you can’t give them everything that they need.  You can give them a lot, just not everything.

Secondly, there is nothing wrong with being a pet bioinformatician – it can be a really stimulating role, and opens your eyes to lab-based science.  I am not criticizing the pets either, I just urge you to look after yourselves :)

Finally, this great comment from the tweetome:

I completely agree, if you can, you should be doing this.  Don’t be a passenger in lab meetings, suggest things that can be done, never forget you are a scientist too, you can propose hypotheses and how these may be tested :)   Your ideas are valid and you bring something to the party that no-one else in the room can.

Update – 30/4/2013

There is a possibility we could host a workshop in Edinburgh to look at best practices in “embedded bioinformatics”, involving some presentations, break out groups, and ultimately authoring a document/paper that highlights the pros and cons of the system.  This would either be free to attend, or would incur a very small cost to cover expenses (tea/coffee/lunch).  Would you come?

Are you ready for the 20% cut in your sequencing budget?

Well, I’m sorry, I wanted to start a debate on VAT, but even if you think this is boring, please, please read on.

Basically, the rules on VAT for research are changing in the UK.  If you want to read the consultation document, it is here.

However, I summarise the key paragraph here:

Examples of how VAT will now apply
1). A charity grant funds University A to carry out some research. The supply 
University A makes to the charity is outside the scope of VAT. However, 
University A needs to subcontract part of the research to University B. At 
present, the supply University B makes to University A is exempt from VAT but 
after the exemption is withdrawn it will be taxable (20% VAT). This will not 
affect the supply University A makes to the charity which will remain outside 
the scope of VAT.

In this scenario, University A’s costs increase by the VAT charged by 
University B.

All of the UK academic sequencing facilities are in the same boat, and the consensus opinion appears to be that, after August 1st, we will need to start charging VAT on sequencing for academic researchers outside of our own universities.

The VAT of course goes straight back to the treasury.

Let me spell this out to you.  Let’s say you have a grant, with a £100k sequencing budget, and you were planning on using an external sequencing provider.  On July 31st, you will have £100k to spend.  The next day, you will effectively have significantly less, because the price just went up by 20%.

It doesn’t matter if you have a VAT exemption certificate.  It doesn’t matter if you have a collaboration agreement.

You might think “Oh, well we’ll just use a provider outside of the UK”.  If you do that, then you are liable to pay the VAT, direct to the treasury.  Which is actually more complicated than paying it direct to the sequencing provider.

Let me say this again – the VAT goes directly back to the treasury.  This is effectively a stealth cut in the research budget by the UK government.

Is anyone happy about this?! 

Write to your MP: http://www.writetothem.com/

Don’t tell me this isn’t how I’m supposed to be

Do you know what “public engagement” is?  No, really – do you?

It’s not writing press releases; nor is it writing articles in the Guardian, or doing an interview on Radio 4.  The key is in the second word: engagement.  What is meant by public engagement is actually engaging, in a dialogue, with members of the public.  That’s not you telling them something and then thinking the job is done; it’s actually listening to what they have to say and talking to them about their beliefs and opinions and how your research might interact with those beliefs and opinions in some way.

The great thing about public engagement is that it works.  It’s the right thing to do.  Its predecessor, known as the deficit model, has failed.  The deficit model relies on an assumption that if we, as scientists, provide enough information to the public, enough scientific facts, then they will see the World our way and agree with what we’re doing.  They don’t and they won’t.  This model has failed.  The model that works is not the provision of facts and information, it’s the engagement model.

As you all probably know, I’m on Twitter (I have 1400 followers) and I have this blog, which has been read approx. 15,000 times since January.  I’m quite proud of these numbers, I really am.  But I’m under no illusions – this is not a major “public engagement” activity because the people I engage with are generally scientists.  I’d love it if a couple of plumbers, brick layers, nurses, doctors, lawyers and accountants came on here every once in a while and provided a comment, but somehow I doubt it.

Despite this, my tweets and this blog are publicly available and therefore could be discovered and read by members of the public.  I’m happy about this, very happy.  I want to engage.

What I am not happy about is the existence of large, unwieldy institutional policies and regulations on the use of “social media”.

Why does this bother me?  I’ll tell you – because I’m a person as well as a scientist.  I do this because I care.  OK, I do it because I get paid too, but mostly because I care. I love my job, I love my employers, and I love scientific research.  I care that it is done well, and openly, and I get angry when it is done badly, or hidden behind unnecessary pay-walls.   I get emotional, I can be elated and happy or angry and frustrated.   I’m a person, and a scientist and I have emotions and reactions, and sometimes I vent those on this blog or on Twitter.  None of my blog posts have taken more than 30 minutes to write, and some of them have been written through a red mist.  So what?  Impulse is human nature. I don’t think any of that should be filtered through 22 pages of rules and regulations, the IP department and the legal department; I don’t want anyone to shave off the rough edges and change my use of language.  The public deserve to see real scientists, to see real reactions, to see that we care.  If you asked them, I’m sure they’d tell you that they’d prefer to see real people with real emotions, rather than a trimmed down, well-dressed avatar spouting pre-processed opinion that’s been written by a team of legal experts.  I care, and I love and I hate and I get angry and frustrated and that’s how I want to be, that’s how I want to communicate with you.  Don’t tell me that’s not how I’m supposed to be.

Watson’s law of bioinformatics ontologies

Modeled very much on Godwin’s law, “Watson’s law of bioinformatics ontologies” was first coined at TGAC during an ELIXIR/GOBLET workshop.

The law simply states:

"As the discussion regarding a particular bioinformatics topic gets longer, 
the probability that someone will suggest the group develops an ontology for 
that topic approaches 1"

Bioinformatics is awash with ontologies, including the sequence ontology and the gene ontology.  There is at least anecdotal evidence that neither is actually a true ontology.  There is also the suggestion that the gene ontology should only have two terms: “encodes for a protein”; and “encodes for an RNA”.  Function would then be defined by a Protein Ontology (PO) and RNA ontology (RO).

The particular discussion that inspired the “Watson’s law of bioinformatics ontologies” was around SASI, an ontology to describe events/announcements in the life sciences.

I would also like to suggest the very beginnings of a controlled vocabulary, possibly to be developed into an ontology, to describe bioinformaticians themselves.  The first two terms may be:

  • “has an interest in ontologies”
  • “has no interest in ontologies”

Comments welcome :)

The alternative “what it takes to be a bioinformatician”

Inspired by this beautifully written piece over at the NY Genome Centre Blog, I thought I’d quickly write down the alternative version, according to yours truly :)

I mean, the NY genome piece has some lovely soundbites:

  • bioinformaticians are “rewriting biology”
  • a postdoc in bioinformatics can expect to earn about 50 percent more than a postdoc in biology
  • etc

but I can’t help thinking the whole piece is a little too “nice”, a little too “perfect” and ignores some of the deficiencies of the real world.

So, here is the alternative version of what it takes to be a bioinformatician:

  1. Patience.  You won’t be spending the majority of your time running beautifully crafted machine learning algorithms to find that perfect, but hidden, signal that reflects the true biology.  No.  I’d say that’s about 1% of your job.  The vast majority of your time, upwards of 90%, will be spent getting data into the correct format, dealing with the fact that no two databases use the same identifiers, or the same format, trying to figure out why your cluster jobs didn’t run, and removing errors and systematic bias from your data.  This is the true art of bioinformatics.  Try and get this done quickly and efficiently, so you can spend more time on the biology.
  2. Suspicion. If it looks too good to be true, it probably is.  A large majority of your “Eureka!” moments will just be errors and systematic bias.  Whenever you find an answer, treat it with huge suspicion until you are absolutely sure it’s not an error.  Don’t trust quality values, of any kind.
  3. Biological knowledge.  Your job does not finish when the alignment jobs do.  Nor does it finish when GATK does.  If your day-to-day job is simply running algorithms, the results of which you then give to a biologist to interpret, then you are not a bioinformatician, you are just an informatician.  Noone can expect you to know everything about every problem, but you should have enough biological knowledge to be able to add some interpretation to the data.  If not, see (4).  You need the biological knowledge to figure out the errors.
  4. Social skills.  Whatever it is you’re working on, go talk to a biologist who knows a lot about it.  In fact, talk to lots of them.  They won’t need to be encouraged to talk about their science, in fact you’ll probably have to put a time limit on the conversation.  Learn about the biology.  Learn about the system.  Not only will it help you interpret the data, it will also help you realise which results are errors, which are bias and which are real.
  5. Big cojonesI’m sorry for swearing, even if it is in Spanish, but I’m trying to make a point.  You’ve just been given 100s of millions, if not billions, of data points.  You need to find the answer, the story, within that data.  It’s in there, somewhere.  Finding it will not be easy.  Do you have what it takes to confidently disregard what you suspect are errors, and engage in a dialogue with biologists about what you think the data is telling you?  Or do you just send off an Excel sheet with all the data in it and expect someone else to do it?
  6. The mind of a super sleuth.  You are basically a detective, and all the clues to the murder are in your data.  The murderer is not going to make it easy and hand themselves in with a full confession.  Work the clues.  Work the data.  Figure it out.  Be a detective.
  7. Delivery.  This is related to (5).  Deliver an end product.  Often this will be a paper.  If I had to divide up all scientists (not just bioinformaticians) into two groups, it would be (i) those who can write papers; and (ii) those who can’t.  As a bioinformatician, you can write papers.  Not always, and not with every project, but with some projects you can.  I’m not talking about writing the “bioinformatics method” section and providing a few figures, I am talking about designing and executing an in silico experiment, interpreting the results and writing the paper.  Or creating some software, releasing it, supporting it and writing the paper.  The guys who do this are the guys who get promotions, and the guys who get that extra 50% Purvesh Khatri is talking about in the NY Genome Piece.  And yes, even if you’re in a “support role” – I was running a bioinformatics support group when I wrote my first bioinformatics papers.
  8. The ability to code.  Perl, Python, Ruby, R, whatever.  Some kind of coding ability is essential.  Using GUIs and web-tools will only get you so far.  If you need to do 1000 things, do you sit and open 1000 browser tabs and laboriously start every job?  Or do you write a few lines of code and submit 1000 jobs to the cluster?  The latter is what a bioinformatician does;  the former is what an **** does (insert your own word here).

The idea that you can get ahead more quickly, and get paid more, because you have skills that are in demand is true – but this will only happen to the best, and to the lucky.   Ultimately, if you are an academic, then (7) is the most important and that is what you will be judged on.  You need first and last author papers if you want to get promoted, and if you want to be a PI.  Being able to produce those is tough, very tough – harder even than installing QIIME.  And to produce the papers, you’ll need 1-8 above.

Good luck :)

UPDATE (19/03/2013): Sorry, this post isn’t meant to be a criticism of people who perhaps feel they don’t have some of the above, and I am sorry if you feel I am trying to tell you you’re not a bioinformatician.  There is nothing wrong with being an informatician.  There is nothing wrong with being the support guy who doesn’t publish papers.  This post was in response to the NY Genome post which paints a beautiful, romanticised version of what it means to be a bioinformatician.  However, only a few of us will ever realise that vision – steps 1-8 above are what you’ll need.