Why do we do things?

Take a minute, jot down all the projects you are currently working on, and then ask yourself this question: “Why am I doing these projects?”.  Specifically, why those projects and not others?  I’ve been thinking a lot about how we come to do the science we do – possibly inspired by my quarterly assessment of funded grants, leading to exclamations such as “How the hell did you get funding to do that?”, “That’s a dumb idea!”, and “What the ****?!”

So that got me thinking: can we sum up the reasons why we do certain projects?  I came up with the following.   We do things for one or more of the following reasons:

  1. Because we want to
  2. Because it needs to be done
  3. Because no-one has done it before
  4. Because we can get funding to do it
  5. Because we can

There may be others, and please comment below if you think there are.

It becomes fun to combine these, and if we assume the absence of the other terms means that the opposite is true, then we can start thinking of interesting combinations.  For example, if you are doing something only because you want to, then the absence of the other terms means it doesn’t need to be done, someone has done it before, you can’t get funding for it and you can’t actually do it anyway.   This would mean that doing something just because you want to is a really dumb idea.

Some of them may seem to clash: e.g. if it needs to be done, then doesn’t that mean that no-one can have done it before?  Well, not really.  For example, many genome assembly tools exist, but as there is room for improvement, and a new attempt might be welcome.  Equally, doing something that no-one has done before is not necessarily a good thing – e.g. trying to assemble a genome from 2bp reads has never been done before, that doesn’t make it OK for you to try!

Ideally, all 5 would be true for any project to go ahead, but I suspect this is rare.  So what is the minimum set of requirements that should be matched before a project is green-lighted?  In academic science, I suspect the minimum is (2), (4) and (5).  But actually, how often is (4) the only reason that someone does a project?  There are certainly people who “chase the funding” and make a very good career out of it.  But is that enough?  If (4) is true in the absence of (2) then this is a waste of valuable research funding.  This is also if we have (4) in the absence of (5) – getting funding to do something you can’t actually do, again, is a waste of valuable research funds.

In bioinformatics in particular, I think it is important to look at number 3 – as Nick and I pointed out here:

No matter how gnarly a problem or how cutting-edge a method, there is a pretty good chance someone out there has tried to tackle it already

So e.g. whilst the presence of an existing aligner might not preclude funding for another one, the additional benefits of creating a new one should be quite high.

I wonder if we could start thinking of a list of questions that every grant should be able to answer, possibly based on the above, before they could get funded?

Genomics: prepare for the quacks!

Today is April fool’s day, a day when we traditionally put up joke stories in an attempt to fool other people.  I thought I’d turn this round and report on things that should be April fool’s stories but aren’t.

In keeping with the subject of this blog, I wanted to look at what will happen when everyone, when all of you, own your own genome data.  And what will happen is that there will be a new economy, 100s of new companies just waiting to con the stupid out of their money:

  1. “Love is no conincidence” – give these guys your genome data and they will find your perfect partner
  2. Too fat or unhealthy?  You’re eating the wrong food and the answer is encoded in your genome
  3. Hate the gym, but can’t play sports?  Don’t worry, we can provide your personalised genomic fitness plan
  4. Found the perfect genomic partner using genepartners.com?  Want to know how long they’ll live?  Easy!
  5. Have you always wanted to know how Yoga affects your genome?  No?  Well, if you do, “mind-body genomics” is a thing now.
  6. Do you want to improve your health?  Who doesn’t?  Well, why not just eat some DNA?
  7. Struggling to understand homeopathy?  It’s epigenetic, you clown!

To genomicists, scientists, rationalists and anyone else who uses an evidence-based philosophy to understand the World, I say this: prepare for the quacks.

4-year PhD funding for the development of exome sequencing in pigs

An exciting PhD opportunity has become available in my lab, due to begin in October 2014. I am looking for a candidate to join my group for a fully funded BioSciences KTN CASE 4 year PhD stidentship. Only UK/EU nationals are eligible to apply.

A prestigious Biosciences KTN funded industrial CASE studentship is available to develop exome sequencing in pig species and apply the results to selective breeding programmes and the development of biomedical models.  We are now recruiting an enthusiastic and hardworking PhD student to work alongside a multidisciplinary team with expertise in bioinformatics, next-generation sequencing, breeding, animal health and food security.  The studentship is funded for 4 years.

You will learn many skills during the PhD, and we anticipate that bioinformatics will be at the forefront of our approaches.  You will receive expert training and advice from me in this area.

You will be based at The Roslin Institute, University of Edinburgh, a world-leading research institute in the field of farm animal genetics and genomics.  Your supervisory team will consist of internationally renowned researchers, including Professor David Hume, FRSE, Director of The Roslin Institute and professor of Mammalian Functional Genomics at the University of Edinburgh; and Mick Watson, who has over 15 years’ experience in bioinformatics and next-generation sequencing, including work in both industry and academic.   Over the lifetime of the project, you will gain over 6 months experience within industry.  The industrial partner, Genus Plc, is a world leader in applying science to animal breeding creating advances through biotechnology and selling added value products for livestock farming and food producers.  The successful applicant will also benefit from working alongside scientists within Edinburgh Genomics, the University of Edinburgh’s next-generation genomics facility and one of the largest producers of Illumina next-generation sequencing data in Europe.

For more information, click here.

Biologists: this is why bioinformaticians hate you…

The Wellcome Trust have released a data set showing article processing charges paid in 2012-2013, and you can download the results here.  I’d like to join with everyone else in congratulating the Wellcome Trust on collecting and releasing these figures, and I think MRC, BBSRC, NERC and other funding bodies should follow.

Having said that, this dataset represents everything that’s bad about science – just not in the way you might think.  Biologists: this is why bioinformaticians hate you:

1) the text in the spreadsheet is not enclosed by quotation marks, except one of the paper titles, on line 18.  Why?!!!  Why must you do this?!!

2) To add insult to injury, the title of the paper on line 1127 only contains one set of quotes!  ONE!  Do you realise how many problems that creates?!!! Gah!

3) Spelling: oh my…. where do we start on this point….!

There are 4 entries for the American Society for Biochemistry and Molecular Biology:

17 American Soc for Biochemistry and Molecular Biology 1100.00
18 American Society for Biochemistry and Molecular Biolgy 2259.64
19 American Society for Biochemistry and Molecular Biology 24404.45
20 American Society for Biochemistry and Molecular Biology 1166.60

(yes the last two look identical, but the bottom entry has an inexplicable (and inexcusable) space at the end of it)

There are 6 entries for BioMed Central!

44 Biomed Central 10645.11
45 BioMed central 4348.80
46 BioMed Central 56561.05
47 BioMed Central 10891.04
48 BioMed Central Limited 8650.00
49 BioMed Central Ltd 2529.30
54 BMC 33933.62

You have no idea how to classify the British Medical Journal :-(

55 BMJ 44872.80
56 BMJ 3540.00
57 BMJ group 4080.00
58 BMJ Group 28230.00
59 BMJ Group 1700.00
60 BMJ Journals 2040.00
61 BMJ Publishing Group 16215.00
63 BMJ Publishing Group Ltd 12000.00
64 BMJ Publishing Group Ltd & British Thoracic Society 2340.00

Or Elsevier (I’m crying now)

98 Elsevier 936601.48
99 ELSEVIER 12083.68
100 Elsevier 18629.66
101 Elsevier (Cell Press) 7830.62
102 Elsevier / Cell Science 3895.64
103 Elsevier B.V. 5322.79
104 Elsevier Ltd 1428.68
105 Elsevier/Cell Press 4226.04

There’s no need to shout (or spell correctly)

263 The company of Biolgists 3444.00
264 The company of Biologists 1044.00
265 The Company of Biologists 4020.00
267 The Company of Biologists Ltd 1620.00

And don’t get me started on Wiley old Wiley!

139 John Wiley 9329.79
140 John Wiley & Sons 13394.41
141 JOHN WILEY & SONS 4555.12
142 John Wiley & Sons Inc 4581.55
143 John Wiley & Sons Ltd 14591.34
144 John Wiley & Sons, Inc. 1870.74
145 John Wiley and Sons 1852.01
146 John Wiley and Sons Ltd 2868.08
280 Wiley 264427.26
281 Wiley-Blackwell 121650.59
282 Wiley-Blackwell, John Wiley & Sons 1900.70
283 Wiley-VCH 12084.92
284 Wiley 22170.21
285 Wiley & Son 1800.00
286 Wiley Blackwell 9308.41
287 Wiley Online Library 1896.64
288 Wiley Subscription Services 17572.08
289 Wiley Subscription Services Inc. 23880.24
290 Wiley Subscription Services Inc 3724.65
291 Wiley Subscription Serviices Inc 1533.71
292 Wiley VCH 1635.00
293 Wiley/Blackwell 1513.73
294 Wliey-Blackwell 2400.00

I now have no hair left; I’ve torn it all out.  My teeth are just stumps from excessive gnashing.  My faith in humanity has been destroyed!

Dear Science

Every time you annotate things using free text and without using structured vocabularies, a kitten bioinformatician dies.  Every time you make a spelling mistake, computers explode in silent fury.  Your brain is amazing, it can recognise patterns in words and associate them together; alas there is no computer in existence that can match that ability.

Please – for the love of all that is good in the world – stop using free text and start using structured vocabularies, drop-down lists and ontologies.

Many thanks



Is this a realistic portrait of a modern student/post-doc in biology?

I was at a training workshop in Portugal recently, and whilst there I entered into a rather fruitless conversation about the usefulness of a certain software tool in bioinformatics training.  I won’t bore you with the details.  I came to the conclusion that there are many different types of trainee, and I wanted to paint a portrait of a certain type of trainee that I have come across fairly frequently, and then see how realistic readers of this blog think that portrait is.

DISCLAIMER: I encounter many trainees, in person and through my blog/twitter feed; they come from many different institutes and universities around the world.  The portrait below by no means reflects any specific individual or project, and they certainly do not reflect experiences at any particular institute or institution, including my own.

Inspired by Welch et al, I thought I would create a persona!  Here goes:

Ian has just started a 4-year PhD project with Professor Lollipop, an internationally renowned A. bacterium expert.  A. bacterium is pathogenic to both humans and animals, and is found in soil and water.  Ian’s project involves the sequencing and analysis of 1000 environmental and clinical samples of A. bacterium, including genome assembly, genome annotation, SNP calling, phylogenetic comparison and biological interpretation.  The project therefore requires significant bioinformatics expertise.  Prof Lollipop does not have any of the bioinformatics skills necessary to do this himself, and nor do any of Ian’s other co-supervisors.  Ian graduated from the University of Westeros with a 2:1 in Microbiology, and accepted this PhD project immediately after graduation.  Ian’s experience of bioinformatics is a week long course on how to use Galaxy during his second year.  Ian has heard of Linux/Unix, but has never worked on that platform.  Ian is bright and enthusiastic, and has a good knowledge of how to use Windows and Microsoft Office.  The institute where Ian is doing his PhD has some Linux/Unix servers, but Ian needs to demonstrate competence before he will be allowed to use them.  The institute offers Linux/Unix training, but this is only run every 6 months and there is a waiting list.  The servers themselves use a slightly obscure version of Linux that is 1 major version out-of-date.  Ian would not be allowed to have admin rights on these servers and couldn’t install any software.  He would be allowed to install software to his home directory, but any software that depends on up-to-date system libraries or software would not work.  The institute does not allow Ian to use external clouds such as Amazon EC2 as they consider them unsecure.  Ian’s project comes with £5,000 per year to spend on consumables (a total of £20,000) but only £500 from the entire budget has been allocated to training, and that has already been given to the institute to fund their “core skills” training programme – which includes “paper writing”, “how to use powerpoint” etc workshops. Ian has been tasked with learning the skills he needs to do the data analysis himself, and is very keen; however, he is also extremely worried that he may not be able to complete his PhD as he doesn’t have those skills and nor does he feel he has the support to acquire them.

So, question – do you think this is common?  Have you come across this type of person?  Are you this type of person?  Please comment below!

Now let’s extend the imagined scenario above to include the trainer:

Ian has managed to persuade Prof. Lollipop to fund a week long training course on NGS analysis at the University of Middle Earth.  You are the tutor on this training course.  Ian’s entire PhD depends on your ability to teach him everything he needs to know about bioinformatics.  Go!

Are you a trainer who has experienced this?  Thoughts?  Comments?

The only core competency you’re ever going to need

In case you haven’t read it already, some colleagues of mine, who I know mostly through GOBLET, have written a paper titled “Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies“.  You should go read it, it’s a nice paper.

I actually think this is a decent stab at defining core competencies for a profession which really struggles to define itself – how on Earth do you define the skill set needed when it’s impossible to define the role itself?  Sure, the paper itself has its idiosyncrasies (Oracle, PostgreSQL, and MySQL are defined as “database management languages”); and by surveying bioinformatics core facility directors they are limiting responses to a certain type of bioinformatician;  but overall, I think it’s easy to tell that the authors have actually sat down and thought long and hard about the content, and there’s a refreshing honesty to their approach to a difficult problem.

Readers of this blog will know that I have written about related issues at great length: see So you want to be a computational biologist?, A guide for the lonely bioinformatician, Bioinformatics is not something you are taught, it’s a way of life, and The alternative “what it takes to be a bioinformatician”

However, even before that, before the blog was even started, I presented at Eagle Genomics‘ annual symposium titled “Provisioning Bioinformatics For The Next Decade – Are We Prepared?”.  My slides are here, and of course in order to answer the question, one has to define what bioinformaticians actually do, which I start on slide 18.  I’ll expand on this later.

First, back to Welch et al.  In their competencies paper, they define three roles: bioinformatics user, bioinformatics scientist and bioinformatics engineer:

journal.pcbi.1003496.t002What immediately struck me is that all of the skills for the first role, the role of “bioinformatics user”, are skills that any biological scientist will need; on reflection, this makes sense.  Welch et al are trying to define those competencies required by anyone who needs bioinformatics training.  What else is a “bioinformatics user” but a scientist? (alternatively, a “bench” or “wet lab” scientist)

Therefore, I think what Welch et al are saying is that there are two types of bioinformatician (if we ignore “bioinformatics user”, who is essentially any kind of researcher in biology) – that of “bioinformatics scientist”, someone who takes existing tools/databases and produces pipelines to answer specific biological questions; and that of “bioinformatics engineer”, someone who develops the tools/databases themselves.

I have different ideas.  When I spoke about this back in 2011, I defined 4 roles for bioinformaticians:

  1. The software developer
  2. The statistician
  3. The data miner/analyst
  4. The database developer

Crucially, it’s important to note that these are roles that exist outside of the domain of biology too, and therefore to be a bioinformatician, one has to carry out one of the four roles and possess and use a knowledge of biology.  I’ve emphasized the “use” deliberately – just because you have a degree in biology, it doesn’t make you a bioinformatician.  Do you actually use your knowledge of biology?  If not then you may not actually be a bioinformatician.

At face value, my “software developer” and “database developer” roles are “bioinformatics engineers”, and my “statistician” and “data miner/analyst” roles are “bioinformatics scientists”.  I’ve split these out specifically because they require quite different skills – a software developer will have different skills to someone who models and creates databases; and a statistician requires in depth knowledge of statistics that perhaps a data miner does not.  The guys who write Galaxy don’t need to know about statistics;  but the guys who wrote edgeR do.  The roles are related and overlap, but they are definitely different roles.  Therefore I’d prefer to stick with my four roles over Welch et al‘s two.

For those of you bursting at the gut to say “what about modelling?” or “what about systems biology?”, then (i) modelling is just a subset of statistics, and (ii) systems biology isn’t a separate science – a wise colleague of mine once said “we all do systems biology – who doesn’t?”

The “bio” bit; the important bit

As I mentioned above, the roles 1-4 I name above also exist outside of the domain of biology; for example all of those roles exist within finance, and within social sciences etc  So what makes a bioinformatician different?  Clearly it is the knowledge of biology, but I consistently question and challenge bioinformaticians on how much of their biology they actually use.

For example, software developers can enter into a contract to develop a computer system to be used by a bank; it doesn’t make them bankers. The software developers creating a system for managing NHS data in the UK don’t automatically become doctors or nurses.  They’re just software developers, working towards a specification created by domain experts.

So I challenge bioinformaticians again – do you use your biological knowledge?  Or are you working to the specification of a “domain expert”?  Does someone else define the what and you simply define the how?

I may appear as if I’m being mean, but actually biological knowledge, and knowing how to apply it, is the most important “competency” (aka skill) that a bioinformatician can possess.  In a field full of techies, the thing that will make you stand out is your biological knowledge, not your impressive array of awk one-liners.

When we were writing “How to be a computational biologist?“, Nick turned to me and said “I’m just not sure what we’re actually trying to say”.  This is a good question to ask oneself!  I guess with many of these posts, what I’m trying to do is to get bioinformaticians to be scientists.  To be researchers.  To hypothesize, to test, to use their biological knowledge to pose questions and their bioinformatics skills to answer them.  We’ve all seen the rise of the pet bioinformatician, and I guess what I’m trying to say is that you don’t have to be the dude that analyses someone else’s data, you don’t have to be the dude that writes the pipeline that enables every single one of your PI’s papers yet you remain middle author, you don’t have to be the only dude in the room capable of dealing with the data yet made to feel like a second-class citizen.  What I’m trying to say that your core competency, the only one you will ever actually need, is the skill of being a scientist.  Develop that skill and you won’t ever look back.

Note: I specifically define “dude” as encompassing all genders

Agreeing and disagreeing with a scientific legend

There’s a tendency for people on social media to treat anything certain scientists say as if it were gospel, to speak about those scientists in awe-struck hushed tones.  I don’t think this is particularly constructive, and we don’t have to look far to see exceptional scientists saying things which are ill-advised and incorrect (to say the least).

I’m also very comfortable with the fact that any scientific legend worth their salt couldn’t give a hoot what I think about them.

So onwards to the point…

Sydney Brenner

Elizabeth Dzeng wrote a piece recently around her conversation with Sydney Brenner.  Nick Loman recommended this on twitter and I agree with him, it’s a good read.  I want to raise a few of Sydney’s points and give some of my own opinions on them.

One of the points on which I agree with him is his assertion that we should invest in the young; that we should give science to young people and let them run with it, that their youthful enthusiasm and naivety will bring benefits that older, established scientists would not.  I agree whole-heartedly, and this is something I have tried to do during my own career, to recognise when young people are ready to take the next step, to take responsibility.  It is a huge shame that we rarely reward good young scientists with grants, to see what they can do with the money (and a little guidance).  I’d love to see this change.

However, there is one huge caveat: perhaps Sydney Brenner (being who he is) experiences a different type of “young scientist” to those I have encountered.  For sure, I have encountered young scientists full of ideas, drive and enthusiasm, who work hard and are just chomping at the bit to get ahead and do amazing science.  They’re a joy to work with and soak up any advice you can give them.  I wish there were more.  I’m not naming names because I want to keep them!

However, there is also a fair share of the opposite: lazy, witless kids, full of self-importance yet devoid of ideas who think the World owes them a living. (this is a general observation, and does not reflect on anyone I have worked with, past or present :-)).

I’d have to say, based on my totally subjective observations, the split is about 50:50.

If I can turn this into any kind of positive, then it’s this: if you are reading this, and you are worried about which category you fit in to, then you probably fit into the former category.  In my experience, those who worry and question their own performance are generally those who are driven to succeed, who go the extra mile to achieve what others might not.  If you’re not worried about which category you fall into, if you’re just reading this to fill time before your bus comes, dreaming of which pizza you’re going to order later, then you probably fall into the latter, and to those people I’d say to you that scientific research is probably not for you, and there are tons of other jobs out there that you will be more successful in :-)

Brenner went on to say that Fred Sanger would not survive in today’s world of scientific research, and of course, he is talking complete crap.  I’ll make an analogy here: take a look at Fred Perry winning Wimbledon in 1934 on YouTube, and then tell me he’d even win a single point against Novak Djokovic.  He wouldn’t, he’d get creamed.  But of course, should Fred Perry have existed in the modern world, he wouldn’t have played tennis like it was played in 1934, he’d play it like it’s played now; and he wouldn’t use a heavy, wooden racket, he’d use one of the custom-designed carbon-fibre ones we see today – then who knows what would happen?

The point is that, just like tennis, the “game” of science has changed, it’s different now to the way it was during Sanger’s time and I have no doubt Sanger would have adapted.  He’d have been successful, but in a different way – for example, the technology he developed may have been spun out into a company (just as Solexa was spun out of Cambridge), and bought by a larger Biotech – where the technology would be developed free of the pressures of academic research.

The final point I wanted to bring up is this: Brenner states that he thinks that peer review has become a completely corrupt system.  Again, this is just not true.  It is not a perfect system and I would change it if I could, but the vast, vast majority of editors and reviewers working in the peer-review system today are honest and truthful; they work hard and with integrity.  To say the system if “completely corrupt” is really not true.


Perhaps I am being picky, and perhaps this post amounts to an ad hominem attack against Brenner’s points, many of which I largely agree with.  Opposing opinions are important in science though, and I’m sure Sydney wouldn’t mind :-)