How to not sack your professors

I have to admit to being  bit shaken by two recent events in UK academia – the tragic death of Stefan Grimm, a Professor at Imperial College, who had recently been told he was “struggling” to “fulfil the metrics” (grant income of £200k per annum); and the sacking of Dr Alison Hayman, a lecturer at Bristol University, for failing to generate enough income from grants.

Let me be clear – this post is not an attack on Imperial nor Bristol, and I am in no way suggesting either has done anything wrong.  I don’t know enough about either case to pass judgement.

Nor do I have a problem with institutions taking steps to remove people from their post who are not performing satisfactorily.  We are all paid to do a job, and we need to do it.  (Note I am not suggesting that there was a problem with either Stefan or Alison’s performance; I don’t know enough to pass judgement)

However, I do have an issue with judging an academics performance by only a single metric – how much money they have “won” in grants.  Firstly because the grants system can be hugely unfair; secondly, and more importantly, because winning grants is not something we do alone, it is a collaboration with other scientists, and also a collaboration with the institution where we work.

In this collaboration, the role of the PI is to come up with fundable ideas; and the role of the institution is to provide an environment conducive to getting those ideas funded.  I don’t think it is fair to sack someone for not winning grants if you have failed to provide an environment within which it is easy to do so.

I am very lucky, because the institution where I work, The Roslin Institute, is exceptional at this.  So I am going to pass on a few tips, simple things that academic institutions can implement, which will mean that no-one has to sack any more academics.

1. Provide funding

This will probably have the biggest impact, but comes at the highest cost.  In my experience, nothing makes a grant more fundable than some promising preliminary data.  Generating data costs money, but it takes data to generate money.  So fund up front.  Give every PI a budget to spend every year with the explicit intention that it should be spent to generate preliminary data for grant applications.  This single, simple step will likely have a greater impact on your grant success than any other.  And make sure it is a decent sum of money.  I recall speaking to a PI from a UK University who told me that each PI gets £150 per year, and out of that they need to pay for printing.  Printing.  3 pence a sheet.  That was over 10 years ago and I’m still shocked today.

2. Cut the admin

Every minute your PIs spend on compulsory training courses, filling in forms, filling in reports, dredging your awful intranet for information that should be easy to find, filling in spreadsheets, monitoring budgets, calculating costs, dealing with pointless emails and attending meetings is a minute they are not spending writing grants and papers.  Cut. The. Admin.  In fact, employ administrators to do the admin.  It’s what they’re good at.

3. Perform independent QC

So  one of your PIs grant proposals are repeatedly rejected.  Does that make them bad proposals?  Or bad ideas?  Perhaps they are excellent proposals, but they don’t hit the right priorities (which means they didn’t go to the right funder, and that might be your fault).  Read the grants yourself and form your own opinion.  Collect the review reports.  Collect the feedback from the funders.  Were they bad proposals?  Or were they good proposals that didn’t get funded?  I really don’t think it’s tenable to sack people if they repeatedly submit grant proposals that are rated as fundable by the committee concerned.  At that point the PI has done their job.

You might also think about putting in place an internal group of senior academics to survey proposals before they are submitted.  This will give you the opportunity to provide feedback on proposals and perhaps make them more fundable before they even reach the grant committee.  Proposals which are really not ready can be kept over until the next submission date, giving time for more improvements

4. Provide support

Do I even need to say this?  For the PIs who have their grants rejected, give them some support.  Give them a mentor, someone who gets a lot of proposals funded.  Provide training and workshops.  Share tips for success.  Sit with them and discuss their ideas, try and influence their future direction.  Do everything you possibly can to help them.

5. Pay it forwards

Every institution has their superstars, the guys who only need to wink at a committee and they’ll get funded.  But those guys, those professors with 10 post-docs and 15 students, they’re black holes that can suck in all the funding around them, making it difficult for others to get a fair share of the pot.  As an institution, you don’t want them to stop, because you want the funding, of course; but there is a compromise, where these superstars share their knowledge and expertise, where they co-author proposals with less successful (perhaps more junior) PIs, lending their name and their weight, their reputation and gravitas, to a proposal.  When the proposal is funded, it is the junior PI who runs the grant and gets the last author publications.  It doesn’t matter to the more senior PI as they probably already have tenure and an H index north of 50.  So pass it on.  Pay it forwards. Transfer that wonderful grantsmanship to the next generation, and coach your next round of superstars.

6. Be excellent

Yes, you.  The institution.  Be excellent at something.  I don’t care whether it’s medicine or ecology, worms or plants or cats or humans, evolution or botany, I couldn’t care less, but choose something and be excellent at it.  Invest in it.  Create a global reputation for that thing so that when reviewers see a proposal they immediately think “these guys know what they’re doing”.  Make sure you have the equipment and the facilities to do it (see 8).

7. Make it easy to work with industry

As PIs we are increasingly being pushed to generate “impact”, and one way of doing this is to collaborate with industry.  But this is a skill and some institutions are very good at it, others very bad.  Be one of the good ones.  Create strategic partnerships with industry, pump-prime some work (again to generate preliminary data), run workshops and industry days and have a legal team that are set up to make it work, rather than to make it difficult.  There are lots of funding streams available only to academic-industrial partnerships, and you’d be insane to ignore them.

8. Invest in infrastructure

Make sure you have the right equipment, the right type of lab, and the right computing to ensure PIs can actually do science.  It seems obvious, but you’d be surprised how many institutions are out there that simply don’t provide adequate facilities and infrastructure.

——————————————

So, there it is.  I’ve run out of suggestions.  As I said above, I am very lucky, I work at The Roslin Institute, which does as much as it possibly can to create an environment where PIs win grants.

Here’s the thing – if you think your staff are failing, for whatever reason, the very first thing you should do is ask yourself this question “Is it our fault?  Could we have done anything more to help them?”.  I’d argue that, in most cases, the answer to both would be “Yes”.  We all share in the success of published papers and grants won; so don’t forget that there is a shared responsibility in failure, and if some of your PIs are not winning grants, at least some of that is the institution’s fault.

How to recruit a good bioinformatician

I may have said this before (when you get to my age, you begin to forget things), but I’ve been in bioinformatics for around 17 years now, and for that entire time, bioinformatics skills and people have been in high demand.

Today, my friend and colleague Mike Cox asked this question:

So I thought I would write a blog post about how to recruit bioinformaticians.  Hint: it’s not necessarily about where you advertise.

1. Make sure they have something interesting to do

This is vital.  Do you have a really cool research project?  Do you have ideas, testable hypotheses, potential new discoveries?  Is bioinformatics key to this process and do you recognise that only by integrating bioinformatics into your group will it be possible to realise your scientific vision, to answer those amazing questions?

Or do you have a bunch of data you don’t know what to do with, and need someone to come along and analyse whatever it is you throw at them?

Which is it?  Hmm?

2. Make sure they have a good environment to work in

Bioinformatics is unique, I think, in that you can start the day not knowing how to do something, and by the end of the day, be able to do that thing competently.  Most bioinformaticians are collaborative and open and willing to help one another.  This is fantastic.  So a new bioinformatician will want to know: what other bioinformatics groups are around? Is there a journal club?  Is there a monthly regional bioinformatics meeting?  Are there peers I can talk to, to gain and give help and support?

Or will I be alone in the basement with the servers?

Speaking of servers, the *other* type of environment bioinformaticians need is access to good compute resources.  Does your institution have HPC?  Is there a cluster with enough grunt to get most tasks done?  Is there a sys/admin who understands Linux?

Or were you hoping to give them the laptop your student just handed back after having used it during their 4 year PhD?  The one with WIndows 2000 on it?

3. Give them a career path

Look around.  Does your institution value computational biology?  Are here computational PIs and group leaders?  Do you have professors in computational biology, do computational scientists run any of your research programmes?  Could you ever envisage that your director could be a computational biologist?

Or is bioinformatics just another tool, just another skill to acquire on your way to the top?

4. Give them a development path

Bioinformaticians love opportunities to learn, both new technical skills and new scientific skills.  They work best when they are embedded fully in the research process, are able to have input into study design, are involved throughout data generation and (of course) the data analysis.  They want to be allowed to make the discoveries and write the papers.  Is this going to be possible? Could you imagine, in your group, a bioinformatician writing a first author paper?

Or do you see them sitting at the end of the process, responsible merely for turning your data into p-values and graphs, before you and others write the paper?

5. Pay them what they’re worth

This is perhaps the most controversial, but the laws of supply and demand are at play here.  Whenever something is in short supply, the cost of that something goes up.  Pay it.  If you don’t, someone else will.

6. Drop your standards

Especially true in academia.  Does the job description/pay grade demand a PhD?  You know what?  I don’t have a PhD, and I’m doing OK (group leader for 11 years, over 60 publications, several million in grants won).  Take a chance.  A PhD isn’t everything

7. Promote them

Got funds for an RA?  Try and push it up to post-doc level and emphasize the possibility of being involved in research. Got funds for a post-doc?  Try and push it up to a fellowship and offer semi-independence and a small research budget.  Got money for a fellowship?  Try and push it up to group leader level, and co-supervise a PhD student with them.


If none of the above is possible, at least make sure you have access to good beer.


 

I’m not sure how this post is going to go down, to be honest.  I know a lot of lab people who might think “Why should bioinformaticians be treated any differently?”.  I understand, I do. I get annoyed at inequalities too.  However, the simple fact is that “supply and demand” is in play here.  I think it was Chris Fields who said that many people try and recruit bioinformatics unicorns, mythical creatures capable of solving all of their data problems for them.  Well, it’s possible that they just might, but if you want to find a unicorn, you’re going to have to do something magical.

Putting the HiSeq 4000 in context

Illumina have done it again, disrupted their own market under no competiton and produced some wonderful new machines with higher throughput and lower run times.  Below is a brief summary of what I have learned so far.

HiSeq X 5

Pretty basic, this is half of an X ten, but the reagents etc are going to be more expensive.  $6million caital for an X5 and the headline figure appears to be $1400 per 30X human genome.  The headline figure for X10 is $1000 per genome, so X5 may be 40% more expensive.

HiSeq 3000/4000

The 3000 is to the 4000 as the 1000 was to the 2000 and the 1500 to the 2500 – it’s a 4000 that can only run one flowcell instead of two.  I expect it to be as popular as the 1000/1500s were – i.e. not very.  No-one goes to a funder for capital investment and says “Give me millions of dollars so I can buy the second best machine”.

Details are scarce, but the 4000 (judging by the stats) will have 2 flowcells with 6 8 lanes each, do 2x150bp sequencing, it seems around 375 312 million clusters per lane in 3.5 days.

Here is how it stacks up against the other HiSeq systems:

Clusters per lane Read length Lanes Days Gb per lane Gb total Gb per day
V1 rapid 150000000 2×150 4 2 45 180 90
V2 rapid 150000000 2×250 4 2.5 75 300 120
V3 high output 180000000 2×100 16 11 36 576 52
V4 high output 250000000 2×125 16 6 62.5 1000 167
HiSeq 4000 312000000 2×150 16 3.5 93.6 1500 428
HiSeq X 375000000 2×150 16 3 112.5 1800 600

These are headine figures and contain some guesses. How the machines behave in reality might differ.

If any of my figures are wrong, please leave a comment!

UPDATE: there appears to be some confusion over the exact config of the HiSeq 4000.  The spec sheet says that 5 billion reads per run pass filter.  The RNA-Seq dataset has 378million reads “from one lane”.  5 billion / 378 million is ~ 13 (lanes).  My contact at Illumina says there are 8 lanes per flowcell.  5 billion clusters / 16 lanes would give us 312 million reads per lane.  Possible the RNA-Seq dataset is overclustered!

A 387million paired RNA-Seq data set is here.

Edinburgh and Glasgow invest in revolutionary HiSeq X clinical genomics technology

I don’t think I’m exaggerating when I say that the ability to sequence a person’s genome for less than £750 will turn out to be one of the biggest breakthroughs in human medicine this century.  This one technology gives us an unprecedented ability to understand and diagnose genetic disease, and to enable personalised medicine and pharmacogenomics.

Today, the Universities of Edinburgh and Glasgow announce the purchase of 15 HiSeq X instruments, one of the largest installs in the world.  This tied together with Scotland and the UK’s world-leading super computing infrastructure, much of which is based at the University of Edinburgh, will make Scotland a global leader in clinical genomics.  in Edinburgh, the installation is backed by three world leading institutions, The Roslin Institute (my employer), the Institute for Genetics and Molecular Medicine and Edinburgh’s School of Biological Sciences.  The Edinburgh node will be placed within Edinburgh Genomics.

One nuance of this announcement that should not be underestimated is the fact that Scotland has excellent electronic national health records, representing a huge advantage.  As anyone working in genomics knows – in the land of cheap sequencing, the person with good metadata is king.

This announcement represents both the beginning and the end of two separate but related stories.  The first began at PAG XXII last year (it doesn’t escape me that I am writing this at PAGXXIII this year), when Illumina announced the HiSeq X system.  This morning I re-read a 3 hour e-mail conversation from January last year between Mark Blaxter (director of Edinburgh Genomics), Karim Gharbi (head of genomics), Richard Talbot (head of laboratory) and myself (head of bioinformatics).  As Illumina announced the HiSeq X system, we started off at confusion, moved swiftly to wonder and amazement, and ended with Mark asking “Do you think we need to invest in this technology?”.  He received 3 resounding answers: “Yes!”.  Three hours to decide we needed to push for a clinical genomics revolution in Scotland, driven by a HiSeq X Ten install.  The last year has seen some incredible hard work, not least from Mark Blaxter, Tim Aitman, David Hume and countless others.  A huge amount of respect has to go to all three, especially Mark who really began to drive these discussions at the start of last year.  No doubt countless others worked incredibly hard, and everyone involved deserves credit.

That’s the end of one story, and the the beginning of another – sequencing 1000s of human genomes from Scotland and beyond.  I am incredibly excited by what that represents.  One of my first jobs was at GlaxoWellcome in the 90s, and around 1996-98 I recall the buzz around pharmacogenomics, about personalised medicine, about delivering the right drug to the right patient basedon genetic information.  That’s nearly 20 years ago now, which demonstrates that this is not a new idea, and nor is it one that has been quick to realise.

Within Edinburgh, the HiSeq X systems will be run by a new clinical genomics division of Edinburgh Genomics, directed by Tim Aitman.  All other sequencing, including agrigenomics, environmental and non-genomic human sequencing will be run by the existing Edinburgh Genomics infrastructure, Edinburgh Genomics Genome Sciences, directed by Mark Blaxter.  Existing customers, and those who work in fields other than human genomics should rest assured that we remain committed to our existing facility and will continue to deliver high quality data and projects across the biological sciences using our current install of HiSeq 2500 and MiSeq sequencers.

These are very, very exciting times!

Easy and powerful graph editing using R and PowerPoint

A recent conversation on Twitter reminded me of a powerful way to edit graphs created in R inside PowerPoint:

I realise that R is incredibly powerful anyway, and that much of this can be done within R, but I am also painfully well aware that R is hard, and that many users prefer “point-and-click”.

This example uses Windows.

1. In R, let’s create a graph:

data(iris)
plot(iris$Sepal.Length, iris$Sepal.Width, main="My Graph", xlab="Width", ylab="Length")

2. You should see something like the graph below.  In the top left hand corner, choose File -> Save As -> Metafile, and save it somewhere convenient

mygraph3. Now, fire up PowerPoint and start with a blank slide.  Choose insert picture

insertpictureNavigate to the .emf file you just saved and choose it.  You should see:

inserted4. Now, right click the image, choose group -> ungroup:

ungroupYou will get a message asking if you want to convert the image to a Microsoft Drawing object.  Choose yes:

drawingobj

5. Now repeat step 4. Right click the image, choose group -> ungroup:

ungroupYou should now see something like this:

editableEvery single part of the graph is now selectable, moveable and editable!  Click outside of the chart area to de-select everything, and then click on individual components to edit them.  In the graph below I have changed the title, and given one of the data points a red background:

edited

Enjoy!

(I realise there are other ways to do this, perhaps better ways e.g. export as PDF then edit in Illustrator (thanks !  This post is really aimed at those familiar and comfortable with PowerPoint :-))

Ripping apart that terrible Atlantic piece on Open Access

It’s genuinely not often that I read an article and disagree with every single point, but The Atlantic managed it with their terrible piece on Open Access.  However, when  one of your chief sources is the executive publisher of Science, a closed access glamour-journal, it’s perhaps not so surprising that the article is so bad.  A bit like asking Turkeys if they think Christmas is a good idea….

The author, Rose Eveleth, mentions a few times the moral side of open access, but states that it’s not as simple as just looking at morals and ethics.  For me, the moral, ethical argument is the strongest of all the arguments – that those who created the research, and those that paid for the research, should have free and open access to the outputs.  Surely this is self evident.  Add a third group, those who most benefit from the research i.e. students.  All 3 groups should have free access to research outputs.  Forget money, forget business, forget everything else – open access is the right thing to do.  

The article staggers from one blunder to the next:

“Making something open isn’t a simple check box or button—it takes work, money, and time.”

This ignores the argument that making something closed also takes work, money and time.   Compare if you will how much it costs to run arXiv or bioRxiv, compared to the cost of running Nature or Science.  It’s not the “making open” that costs money.  How about the cost of PLOS?  PLOS ONE publishes more papers than any other journal, is open access, and is cheap to publish in.  The argument above is nonsensical.  Open and closed journals cost money, and I would argue that closed journals actually cost more to run.  Perhaps the above sentence should be rewritten:

“Making money from science publishing isn’t easy – it takes work, money and time”

We then move on to a bizarre section, where Dan Gezelter, who will probably regret this for the rest of his career, seems to suggest that people don’t release their code because they would have to document it if they did:

“It’s only scientists who are relatively secure who can spend the time [to document], and it does take extra time to make sure that their stuff is released correctly”

First of all, coders document.  They have to.  If they don’t, even they don’t understand what on earth their code does.  Also, if you don’t write user documentation, then only you will ever be able to use your code, and you are not doing your job properly.  Documenting code, including creating user documentation, is part of being a coder.  If you’re not doing it, you’re an idiot, and a bad scientist.  This should never be a barrier to releasing code.

Not to mention that released, open-access but undocumented code is still a million times better than no code at all.  I’m seriously embarrassed that some of these arguments made it into print.

After that, we move on to a section with quotes from Melissa Bates, who apparently argues:

it’s not fair to ask graduate students and early career scientists to bear the brunt of the responsibility

But is anyone actually doing that?  This is a massive straw man.  Of course it would be unfair to ask young scientists to bear responsibility, and it’s for that exact reason that no-one is asking them to!  Melissa then goes on to state:

“But there’s also a business model to how science is done.”

Sure.  There’s a business model behind drug cartels too – that doesn’t mean they’re right, and it doesn’t mean we shouldn’t try and abolish them.

The article staggers on to a section where Alan Leshner, executive publisher of Science, talks about the cost of putting out a journal.  Given that open access threatens the business model of journals such as Science, it’s no surprise Leshner seems against it:

“The problem is it costs $50 million a year to publish Science. Somebody has to foot that bill”

That is incorrect.  No-one needs to foot the bill.  We don’t need Science.  Nothing bad would happen if Science didn’t exist.   Life, and research, would go on.

Moving on to some more quotes from Gezelter, we get to a stage where the article compares apples with oranges and comes up with 5.  Apparently, the additional cost of making a closed access paper open access is too high and funders won’t pay it.  This ignores the fact that open access journals, such as the PLOS journals, PeerJ and FrontiersIn are all far cheaper to publish in than Nature, Science and Cell.  It costs money, many thousands of dollars, to publish in closed access journals.  Then one has to pay additional thousands to make the article OA.  Publishing in a straight open access journal in the first place saves money.  How can anyone get it this wrong?

Next we move onto the impact factor argument, and here I have some sympathy – the article states:

At a time when the job market in science is extremely competitive, the institutions combing over resumes aren’t looking for someone’s commitment to the open-access cause, they’re looking at their potential for big research.

And I think that is correct.  In the current climate, if you are a young scientist and you have a chance to publish in Nature, you should grab it with both hands.  It will make your career.  But lets not overstate this – I’m an open scientist and I have a good career.  My first paper was in BMC Bioinformatics, and I haven’t stopped publishing in Open Access journals since.  Titus Brown is an open scientist and has tenure.  There are others.

Moving on to impact factor itself, the most atrocious piece of cherry picking occurs – comparing Nature‘s IF of 42 to PLOS ONE‘s impact factor of 3.5.  Of course, PLOS’s flagship journal is PLOS Biology with an IF hovering around 11 (Genome Biology, another OA journal, has a similar IF).  Comparing Nature‘s flagship IF with PLOS’s lowest is a very low, deceitful thing to do – shame on you Rose Eveleth.  You would never make it as a scientist, we don’t cherry pick data to suit our arguments.

Obviously Nature‘s IF is still higher than the best PLOS can produce, but let’s not forget, Nature has been around since 1869, whereas PLOS Biology was created in 2003.  PLOS has a lot of catching up to do.  Conspicuous by its absence in the article is the fact that impact factor correlates highly with the number of retractions, and that there have been calls for a retraction index to be published alongside the impact factor.  I wonder if Rose Eveleth knew about this and chose to ignore it?

We begin to get some sense when Virginia Barbour enters the article, arguing that impact factor is a bad way of measuring how good science is, and that open access is about culture change, which takes time.  She says culture change is hard, and I agree.  We can all think of sins of the past that required culture change which was hard – but we did it anyway, because it was the right thing to do.  It’s what people do, it’s what each new generation does – we fix the mistakes of the past.  Open access fixes mistakes in scientific publishing.  Melissa Bates somehow manages to frame this natural, essential process as a bad thing – I don’t know how, and I cannot agree.  Some may try and make the new generation “cannon fodder” (as Melissa says), but quite simply, we should not and will not let them.

Finally, Leshner writes his journal’s own epitaph:

“I don’t see Science becoming open access in the near future”

I liken this to high street stores in the 90s, looking at the internet and saying “Meh, that will never catch on”.  10 years later they were dead, crushed by Amazon and eBay.  Adapt or die, that’s where publishing is right now.

What strikes me about this whole thing is how little value closed access publishers actually add.  Leshner is right when he says that the “circulation model” will die – the last time I went to the library and read a paper journal must be over 10 years ago.  This way of publishing science is dying, soon to be dead.  So what other role do publisher’s play?  In the era of the internet, they actually add nothing.  But here, perhaps, is their salvation.  Could they somehow add value?  I’ll draw an analogy to the cinema – why do I pay to go to the cinema when I can watch all sorts of great movies free at home?  Well, because the cinema has a bigger screen, 3D surround sound, popcorn, ice cream, hot dogs etc.  I enjoy the cinema.  It is a better experience than watching movies at home.  The cinema adds value.

So what value could publisher’s add?  Could they release all papers as open access, yet then provide additional tools (under a subscription model) that allow people to explore the data?  Maybe.  I don’t know what the “added value” is, but publishers will have to find it somewhere, because the current closed model of scientific publishing is dying; conversely open access is alive, is growing, and is the future.

As I said above – adapt or die.  Your choice.