You’re not allowed bioinformatics anymore

Ah welcome! Come in, come in!” said the institute director as Professor Smith appeared for their scheduled 2pm meeting. “I want to talk to you about your latest proposal”, the director continued.

“Oh?” replied Smith.

“Yes. Now, let’s see. It’s an amazing, visionary proposal, a great collaboration, and congratulations on pulling it together. I just have one question” said the director “This proposal will generate a huge amount of data – how do you plan to deal with it all?”

“Oh that’s easy!” answered Smith. “It’s all on page 6. We’ve requested funds to employ a bioinformatician for the lifetime of the project. They’ll deal with all of the data” he stated, triumphantly.

The director frowned.

“I see. Do you yourself have any experience of bioinformatics?”

Smith seemed uncertain.

“Well, no…..”

“Then how will you be able to guide the bioinformatician, to ensure they are using appropriate tools? How will you train them?” the director pressed

Smith appeared perplexed by the question.

“We’ll employ someone who has already been trained, with at least a Masters in bioinformatics! They should already know what they’re doing…” Smith trailed off.

The director sighed.

“And what papers will the bioinformatician publish?”

Smith regained some confidence.

“They’ll get co-authorship on all of the papers coming out of the project. The post-docs who do the work will be first author, I will be last author and the bioinformatician will be in the middle”

The director drummed his fingers on his desk.

“What about a data management plan?”

“A what?”

“A data management plan. A plan, to manage the data. Where will it be stored? How will it be backed up? When will it be released?” the director asked

“Same as always, I guess” said Smith. “We’ll release supporting data as supplementary PDFs, and we’ll make sure we get every last publication we possibly can before releasing the full data set”

The director shifted uneasily in his seat. “And data storage?”

“Don’t IT deal with that kind of stuff?” Smith answered.

An awkward silence settled over the office. The director stared at Professor Smith. Finally he broke the silence.

“OK, so you have this bioinformatician, you give them the data, and they analyse it and they give you the results. How will you ensure that they’ve carried out reproducible science?”

“Reproducible what? What the hell are you talking about?” Smith answered angrily.

The director slammed his hand down on the desk.

“At least tell me you have a plan for dealing with the sequence data!”

“Of course!” said Smith “We’ve been doing this for years. We’ll keep the sequences in Word documents….”

an amber light started flashing on the director’s desk

“… annotate genes by highlighting the sequence in blue…”

the flashing light turned red

“… annotate promoters by highlighting the sequence in orange…”

Smith’s sentence was interrupted by a noisy klaxon suddenly going off, accompanied by a bright blue flashing light that had popped up behind the director’s chair.  Smith looked wide-eyed, terrified.

The director pressed a few buttons on his desk and the noisy alarm ceased, the blue light disappeared.

Smith, removing his hands from his ears, asked “What the hell was that?”

The director stood, walked over to the window and sighed heavily. “I’m sorry, Smith. I had a feeling this might happen. Look… this may appear harsh, but… you’re not allowed bioinformatics anymore”

“What?”

“As I said. You’ve crossed the threshold. You’re not allowed bioinformatics anymore”

Smith’s mouth flapped open and shut as he tried to take in the news.

“You mean no-one will analyse my data?”

The director turned to face Smith.

“Quite the contrary, Smith. Good data will always be welcome, and yours will be treated no differently. It’s just that you won’t be in charge of the storage and analysis of it anymore. You can generate the data, but that will be the end of your involvement. The data will be passed to a bioinformatics group who know what to do with it.”

Smith was furious.

“Are you insane? That’s my data! I can do whatever I like with it! Bioinformaticians won’t know what to do with it anyway!”

“On the contrary” replied the director “It’s not your data. Your research is funded by the government, which is in turn funded by the tax payer. The data belong in the public domain. As for bioinformaticians, they’re scientists too and they’ll be able to analyse your data just as well as you can, probably better”

“I’ve never heard anything so ridiculous! Who decided that I’m not allowed bioinformatics anymore?”

“The Universe.”

“The Universe? Why should the Universe say I’m not allowed bioinformatics anymore?”

“Because you haven’t paid bioinformatics enough attention. It’s not a support service, at your beck and call. It’s a science. Bioinformaticians are scientists too. Young bioinformaticians need support, guidance and training; something you’re clearly not qualified to provide. They also need first-author papers to advance their careers”

“I don’t understand. What do you mean, they’re not support?!” spluttered Smith.

The director continued regardless of the interruption.

“You’ve had the opportunity to learn about bioinformatics. We’ve had a bioinformatics research group at the institute for over ten years, yet you only ever speak to them at the end of a project when you’ve already generated the data and need their help!”

“The bioinformatics group?! They’re just a bunch of computer junkies!”

The director was beginning to get angry.

“Quite the opposite. They publish multiple research papers every year, and consistently bring in funding. More than your group, actually”.

Smith looked stunned.

“But, but, but… how can this be possible? You’ll never get away with this!”

“I’m afraid I can and I will” said the director. “Science has changed, Smith. It’s a brave new world out there. Bioinformatics is key to the success of many major research programmes and bioinformaticians are now driving those programmes. Those researchers who embrace bioinformatics as a new and exciting science will be successful and those that don’t will be left behind.”

The director stared pointedly at Professor Smith. Smith was defeated, but still defiant.

“It doesn’t matter. We have tons of data we haven’t published yet. I’ll be able to work on that for decades! I don’t need new data, I have plenty of existing data”.

A smile flittered at the corners of the director’s mouth.

“Here’s the thing, Smith. As soon as that alarm went off, all of your data were zipped into a .tar.gz archive and uploaded to the cloud. It’s no longer in your possession”.

Smith looked horrified.

“What’s the cloud? How do I access it? What is a .tar.gz file and how do I open it?”

“You know” said the director “keep asking questions like that, and you might get bioinformatics back”


If you are leading a project that creates huge amounts of data, instead of employing a bioinformatician in your own group, why not collaborate with an existing bioinformatics group and fund a post there? The bioinformatician will benefit hugely from being around more knowledgeable computational biologists, and will still be dedicated to your project.


The above was hugely Inspired by “Ballantyne T (2012) If only … Nature 489(7414):170-170”.  I hope Tony doesn’t mind.

 

How not to make your papers replicable

Titus Brown has written a somewhat idealistic post on replicable bioinformatics papers, so I thought I would write some of my ideas down too :-)

1. Create a new folder to hold all of the results/analysis.  Probably the best thing to do is name it after yourself, so try “Dave” or “Kelly”.  If you already have folders with those names, just add an index number e.g. “Dave378″ or “Kelly5142″

2. Put all of your analysis in a Perl script.  Call this analysis.pl.  If you need to update this script, don’t add a version number, simply call these new scripts “newanalysis.pl”, “latestanalysis.pl”, “newnewanalysis.pl”, “newestanalysis.pl” etc etc

3. Place a README in the directory.  Don’t put anything in the README.  Your PI will simply check that the README is there and be satisfied you are doing reproducible research.  They won’t ever read the README.

4. Write the paper in Word, called “paper.docx”.  Send it round to all co-authors, asking them to turn on track changes.  Watch in horror has 500 different versions come back to you, called things like “paper_edited.docx”, “paper_mw.docx”, “paper_new.docx” etc etc.  Open each one to see that it now looks like Salvadore Dali had an epilectic fit in a paint factory.

5. When reviewer comments come back 6 months later asking for some small detail to be changed, have a massive panic attack as you realise you have no idea how you did any of it.  Start the whole analysis again, in a new folder (“Dave379″ or “Kelly5143″) and pray to God that you somehow miraculously come up with the same results and figures.

6. After the paper has been accepted, and the copy editor insists that all figures are 1200 dpi, first look up dpi so you know what it means, and then wrestle with R’s png() and jpeg() functions.  Watch as your PC grinds away for 300 hours to produce a scatterplot that, in area, is roughly the size of Russia and comes in at 30Tb.  Attempts to open it in an image viewer crash your entire network.

7. Weep silently with joy when someone tells you about ImageMagick, or that the journal will accept PDF images.

8. Upon publication, forget any of this ever happened.

The lonely bioinformatician revisited: clinical labs

Some time ago, I published a post called “A guide for the lonely bioinformatician” – this turned out to be one of my most popular posts, and has over 10,000 views to date.  Whilst I wrote that post to try and help those that find themselves as lone bioinformaticians in wet labs, that wasn’t initially my main motivation; at first, my main motivation had been panic – panic at the amount of bad science that lone bioinformaticians, without support, might produce.

Let me be clear, this isn’t the fault of the lone bioinformaticians themselves – any young scientist working in isolation will make mistakes – it is the fault of the PIs and heads of labs who employ said lone bioinformaticians with no cogent plan on how to support them.

You may get a sense of my motivation from the post itself:

I’ve seen more than one project where the results were almost 100% crap because a bioinformatician acted in isolation and didn’t ask for help

Then yesterday I had this conversation on Twitter:

Bioinformatics Unicorns

To summarise, we started with a clinical lab saying, quite rightly, that it is hard to recruit bioinformaticians; there were then many comments about how labs often want to employ people with rare and in-demand skills (so called “bioinformatics unicorns”) on poor salaries or boring projects, and that’s why it is difficult to recruit.

I agree with this, but that’s not the point I want to make here.

Many of you will be ahead of me at this point, but let me spell it out.  Lone bioinformaticians will make mistakes, often elementary mistakes, because they don’t have peer support or access to an expert in bioinformatics who can help them.  This matters less in research labs investigating e.g. the evolution of sea squirts, but clinical labs deal with data that can actually affect a patient’s health and/or treatment.

I am aware of a few lone bioinformaticians working in clinical labs.  I want to make this clear – this is a bad idea.  In fact, it’s a terrible idea.  Through no fault of their own, these guys will make mistakes.  Those mistakes may have dire consequences if the data are then used to inform a treatment plan or diagnosis.

So what’s the solution?  I can think of two:

  1. Pay more and employ more senior bioinformaticians who know what they’re doing, and build a team around those experienced bioinformaticians
  2. Collaborate with a bioinformatics group experienced in human genomics/genetics

To any lone bioinformaticians working in clinical labs, I would say this: find support; find help; make sure the things you are doing are the right things; have your pipelines reviewed by an independent external expert.  Don’t be alone.   This isn’t a personal attack on you – every young scientist makes mistakes, I certainly did – you need support and it’s important you get it.

To the clinical labs: I understand there are funding issues.  This isn’t a personal attack on you either.  But employing a lone (and inexperienced) bioinformatcian will almost certainly result in mistakes being made that would have been avoided by someone more experienced.  Please consider the options 1 and 2 above.

 

Comment on Piero Anversa controversy

I learned of the issues surrounding Piero Anversa, who has had a paper in Circulation retracted and an Expression of Concern from The Lancet, on Twitter earlier this week.  However, the blog post which I read made me quite uncomfortable, and left me wondering whether we have lost the concept of “Innocent until proven guilty” from science?

Allow me to explain.  The retraction and expression of concern are deeply worrying, and call into question some of the research methods used (by the entire group of scientists, by the way, not just Piero Anversa).  However, the blog post I read goes much further than that, with accusations of fear and threats, of ridicule and of careers ended for asking questions.  These accusations come from an anonymous author, yet we have Piero Anversa’s name and image right there in the post for all to see.

I’m not saying that the post is incorrect, I’m just uncomfortable that we can publish these accusations (and at the moment, that’s all they are) from an anonymous account yet a scientist’s full name and picture are included.  Is that right? (for the sarcastic amongst you, yes I am aware of the internet and what happens on it ;-))

I am not affiliated with Piero Anversa, I have never met him or communicated with him in any way.  I’m not here to defend him.  But I guess that’s my point – noone is here to defend him, whilst unfounded accusations about him are read by thousands.

Some comments on the issues brought up by the post

If we can just set aside the “unfounded accusations” issue for now, the blog post brings up several important issues:

  • I have certainly been involved with projects and scientists where the theory dictates the data, i.e. the theory is stated first and the data are made to fit the theory (Don’t try and figure out who, noone I currently work with does this).  So I am not surprised by this accusation and I would not be surprised if it is true.  It’s bad science.  I have no idea how common it is.  However, my approach has always been to quietly remove myself from the project, and I suggest anyone who is involved in such work, at whatever level you are, do the same.  I realise this may seem like career suicide, but being associated with a compromised paper is also career suicide.  Just get out.
  • The post also mentions “Machiavellian Principles”, and actually I think is is scarily accurate.  I’d say Machiavellian politics are the dominant form within academic scientific research, with a “divide and rule” approach to the competition, and anonymous peer review forming a perfect weapon to “destroy” the opposition.  We should remove this weapon.  However, I see these most often between groups, not within a group.
  • I have never seen the kind of behaviour that the blog post mentions; the naked threats, the fear, the reward of simple obedience.  Maybe I have just been lucky?   Does this actually happen?

Summary

We need to be very wary of making unproven accusations from an anonymous account about named scientists.  This seems very unfair and actually very unscientific.

For me, the most important issue the blog post raises is the point that some scientists put the theory before the data, and make the data fit the theory.  This is clearly wrong and needs to stop.  Whether Piero Anversa is guilty of this, we do not yet know – however, I’d say that some scientists are guilty of it, and that’s what we need to address.

We have the $1000 genome; what’s next?

Well, we got there, didn’t we?  And when I say “we”, I mean “Illumina”.  The $1000 genome is clearly here.  This has been a goal of genomics for so long, that we are left asking – what’s next?  If any of you are thinking “The $100 genome!” then please leave this blog now – you’re not welcome.  Obvious changes like this are intellectually bankrupt, and annoy the hell out of me.

The next step is pretty obvious, and I won’t be the first to say this: “Genome at home“.  That should be the next challenge of genomics, the equivalent of “The $1000 genome”.  And when I say “Genome at home”, I mean everything at home – sequencing and analysis.  What we need is technology that can take a sample from a person sitting in their own home, sequence the genome, and upload the data to software sitting on a laptop that can analyse the data and tell the person what it means.

I can already anticipate the comments/emails from companies telling me they can already do this (at least from a software perspective).  Save your “ink” – you can’t.  Keep trying though.

Some people may point towards the MinION USB sequencer, and I thnk this is the closest device to being able to generate a “Genome at home”, but there are three barriers still to be overcome: 1) I don’t think the MinION throughput is human-genome ready yet; 2) sample prep still needs to be done, and you need molecular biology skills to do it; 3) we don’t know how good the data are yet.

Of course, as is true of every technology, the “raw data to clinical interpretation” software doesn’t exist yet, though many are trying.

So there we are – the challenge that I think should replace “The $1000 genome” is “Genome at home“.

An improved model for direct to consumer genome sequencing

I read with great interest this story about how a DTC genetics company sent details to a customer about a genetic variant they thought he had, which if true, would have altered his treatment for Crohn’s disease.  As it turns out, the variant could not be confirmed independently and the correct course of treatment was administered.

There will be much hand-wringing about this, and rightly so.  However, for me, it reveals a fault in the model of how DTC genetics and genomics is done.  What happens is that customers provide DNA, and the genetics company provides SNPs and an interpretation.  Of course there are many steps in between those two points, with uncertainty at each stage, and I think separating the “pipeline” is a better model.

So instead of having an all-in-one package, wouldn’t it be better if we had:

  1. Customer buys (and owns) the raw sequence data for their genome (they may end up owning many, from different times in their life or from different tissues).  As it is relatively unprocessed, this would represent the least biased view of the customer’s genome.
  2. The customer can then purchase analysis of their genome separately; including alignment, variant calling and interpretation.  This step would involve education of the customer that different aligners, variant callers and interpretation services have different strengths and weaknesses, and all come with a level of uncertainty.
  3. Of course, as the software and databases improve, step 2 can be repeated on the same data from step 1 over and over again.  If sequencing improves, step 1 can be repeated.

I think owning your own genome data, and buying services to run on that data, is the best model for DTC genetics/genomics.  How would this have helped Dr J?  Well, owning his own data, and having found out that one interpretation of that data suggested a variant which would seriously affect his Crohn’s treatment, Dr J could simply have bought a second opinion on the same data from a different vendor, who would use slightly different algorithms to determine Dr J’s variants.

Thoughts and comments :-)

How to stand out in academic scientific research

Edit 28/04/2014: alternate views here and here  (though of course I didn’t state 80 hours – I said 37.5 + 10-20 hours)

This post is aimed at young academic scientists, particularly post-docs.  Please note, I am a biologist, so some of my recommendations may be specific to this field.

A huge number of tweets appear in my feed about the lack of opportunities/jobs for young students in academic science, and there is an excellent piece in Nature Biotechnology that addresses the problem.  The paper contains this scary graph:

nbt.2706-F1The scary graph is accompanied by the scary text:

Since 1982, almost 800,000 PhDs were awarded in science and engineering (S&E) fields, whereas only about 100,000 academic faculty positions were created in those fields within the same time frame

This looks terrible.  This is clearly competition in academic science, and with way more PhDs being awarded than faculty positions being created, there is a problem.  Rather than add to the noise of people calling for change, I thought I’d try and produce some recommendations for what to do in the current system – and I’m certainly not alone in doing this.

Disclaimers

DISCLAIMER 1: these are my views and in no way do they represent the views of my current employer, any of my past employers, or any of my future employers

DISCLAIMER 2: this post does not mean that I endorse the current “system”.

DISCLAIMER 3: I work in bioinformatics, and I know many bioinformaticians work in “support” or “service” based bioinformatics.  The recommendations for you guys are different, and the post below is not for you; read it, by all means, there is some useful advice for you in there; but the piece below is aimed for those who want a career in research.

How to stand out as a post-doc

These are recommendations on how to get noticed based on my 11 years’ experience as a PI in academic science in the UK.

1. Learn to write papers

You may be suffering under the illusion that something else in your career matters, but you’d be wrong.  Nothing else matters, other than publications.  Not your skills, not your experience, nothing.  Having your name on peer-reviewed publications is the major, if not the only, aspect of your career that you will be judged on.  It’s what will get you your next post-doc; it’s what will get you promoted; and a lack of it is what will get you ignored.

When I mean “write papers”, I mean when the results are in and no other work needs to be done, going from those results to a first draft that other authors can comment on and change.  I’ll break this down into sections:

Introduction: as a post-doc, you may not have the encyclopaedic knowledge of the subject area that your PI does, so the intro will be tough, but at the very least you should be able to identify relevant related papers from the literature, write a few sentences about each, and explain why you did the research

Methods: If you are the senior post-doc author on the paper, this should be simple and quick.  It’s what you did.  Write it up.  If you’re a great post-doc, you already have a perfect record of it in your lab notebook.

Results: Again, if you are the person who produced the results, you should be able to write these up too, without much difficulty.  If you need inspiration on presentation style, look at how other papers have presented similar results (it’s not OK to plagiarise, but it is OK to be inspired by).

Discussion: This might be the toughest section of all, but you should be able to have a stab at a few paragraphs.  Your PI may rip these to shreds, but you should have enough knowledge about the subject area to say how your results enhance knowledge in the area, what the implications are and how they relate to previously published work

References: Possibly the biggest source of frustration.  You cannot make any factual statement in a paper without backing it up with a reference (or results).  As a post-doc, you should know about referencing; and you should know about referencing the correct paper (e.g. reference the original work, not a review)

2. Write fast

OK, so you know how to write papers (see point 1 above).  Now, here’s the thing – to really stand out, you need to learn how to do it quickly.  Most areas of science move very quickly, and therefore your PI probably spends most of their time in fear of being scooped.  You are no use to your PI if you take 6 months to go from results to a first draft of a paper.

I’ll try and give some benchmarks: last year, I wrote a grant application in 3 days, totally 4808 words.  So that’s 1800 words of properly referenced and formatted, structured and edited scientific text per day.  At that rate, a 35000 word thesis would have taken me less than 3 weeks to write.  I’d say most PIs can write a first draft paper in 2-3 days.  You’re not a PI, so you can take longer, but we’re talking 1-2 weeks maximum here.  Here’s some rough ideas of what I would expect from a good post-doc:

  • Time from results to a first draft paper: 1-2 weeks
  • Time taken to write 30 minute powerpoint presentation: 1-2 hours
  • Time taken to write conference poster: 1 hour

There are always going to be cases where the above expectations are too high – but remember, the point of this post is that I’m telling you how to stand out from everyone else.

3. Know more than your PI 

This can be split into i) knowledge, and ii) skills.  Let’s tackle the first one, knowledge:

Picture the scene.  You open up your e-mail software, and sat in your inbox is an e-mail from your PI.  They’ve found a paper, very recently published, it’s relevant to your project, they’re excited about it and they think you should read it.  Good.  This happens in labs all the time throughout the World.

Now reverse the scenario – your PI is opening their e-mail software, and the e-mail is from you, the post-doc, about a hot-off-the-press paper that you think is relevant.  You’re excited about it and you think they should read it.  That.  That’s what you need to do to stand out.

Now, onto skills.  The very, very best way to ensure you are employed constantly is to have skills that are in demand; to be able to do something that people need.  I can’t really say much more than that – just figure out the one skill that your PI needs above anything else and learn it; become irreplaceable.  You’d be amazed how quickly PIs will find funding to keep you on if your skills are i) rare, and ii) essential for their future research.  (HINT: from my own experience, bioinformatics skills are a good bet)

4. Finish stuff

I’m going to invoke “Watson’s 90:10″ rule here (others may have coined this already; please tell me if so!).  Basically, the rule states that:

90% of the work will take 10% of the time; the remaining 10% of the work will take 90% of the time

I invoke this in many different scenarios, so don’t get confused, but in this scenario I mean that, often, the published paper is the 10%.  Don’t underestimate how much work it is, or how long it will take.  Be the person who can do the 90% and the 10%.

You may not be aware of it, but there are many theories of roles within teams, and you can read about some of those roles here: http://en.wikipedia.org/wiki/Team_Role_Inventories. In particular, look at the “Finisher” role, which I replicate here:

The Finisher is a perfectionist and will often go the extra mile to make sure everything is “just right,” and the things he or she delivers can be trusted to have been double-checked and then checked again. The Completer Finisher has a strong inward sense of the need for accuracy, and sets his or her own high standards rather than working on the encouragement of others. They may frustrate their teammates by worrying excessively about minor details and by refusing to delegate tasks that they do not trust anyone else to perform.

Everything but the last sentence is very good advice for how to stand out as a post-doc.  Be a finisher.  I’m going to go out on a limb here and say that ideas are easy – many people have them, and it’s rare that you find an original idea anyway.  Don’t rely on being the “ideas person”; far more valuable is the person who takes ideas and turns them into a finished product (in academic circles: the paper).

And just to add to that, point 2 is relevant here.  Yes, double-check and check again – but make sure that this doesn’t take months to do; time is hugely important in scientific research.  Do it, but do it quickly!


So in summary, my recommendations are: learn to turn your research outputs into papers, and learn how to produce research outputs in a short space of time; learn skills that are rare and in high demand; and learn how to take projects to completion.

Now a few opinions on the next step – becoming a PI

Making the transition from post-doc to PI

Your opportunities to do this will be limited by only one thing: publications. What counts is the number you have, whether or not you are first author, and the perceived impact of the research.  There it is.  That’s the current system.  At the very least, you will need lots of first author papers before anyone will consider you for a PI position; and at worst, you will need some of those publications to be in high impact journals.

Another thing to consider is that there is attrition in every area.  Not every estate agent will become branch manager; not every salesperson will become area manager; not every post-doc will become a PI.  Some people inevitably will never go beyond post-doc level in academic science.  That’s expected.

Before you decide you want to become a PI, let’s look at the species in all their glory (?!).  For some people, science is a job, for others it is a way of life;  for some, reading and writing about scientific research is a chore, for others it is a pleasure.  Are you the person who will spend their evenings and weekends reading papers, not because they are paid to do it, but because that’s what they enjoy doing?  In all cases, I’d say that PIs are enriched for the second set of characteristics.  They live and breath their science.  They do what they do for love, not money.

Ask yourself: are you that person?

Forget for a moment about whether it is right, or moral, or ethical.  We are where we are, the system might change, but that’s not what this post is about.  The majority of PIs I know work long hours, and they do it willingly; their employer doesn’t even have to ask.  It’s just what they do.

Are you like this?  Do you want to be like this?  Do you have what it takes to be like this?

Sure, it should be possible for someone to work 37.5 hours per week and still get senior positions in academic science;  it should be the case.  However, I fear it is not.  And what incentive does the academic science have to change?  What universities have are a workforce who happily work their 37.5 hours per week and then stick an extra 10-20 hours per week on top voluntarily.  A workforce who don’t take their entire annual leave allowance.  A workforce who work when they are sick and rarely take time off.

If that’s to change, then there is a lot of work to be done; until then, that’s what you have to do to go from being a post-doc to a PI – be willing to work, live and breath your scientific discipline.

Ask yourself: Is it really what you want?  If so, good luck to you; if not, there is no shame in that, and good luck in your alternative science career!