PhD life, plant genomes && computational biology

How can we measure quality, not privilege? (01 Mar 2014)

Scientists are evaluated based on publication citations and impact factor. We know this is a bad idea. People raised the issue at least 15 years ago, and many times since then. But nothing has changed. As a young scientist collaborating on a variety of projects, I've got a bunch of papers in preparation with colleagues. This has made me aware of the poisonous effect the metrics situation has on science.


Knowledge sets us free. Let's return the favour. (11 Feb 2014)

Last night at Open Research Cambridge, Jelena Aleksic gave a great talk about Open Access. In her closing comments, she floated the idea of an iTunes for scientific papers. Imagine being able to get any scientific paper for 79p. That's a reasonable price to cover costs of creating, archiving and distributing knowledge (given the research is already funded). Most people can afford it.

Current prices - $32 for one-off access to a Nature paper - are disgusting. Scientists created that knowledge, probably with public funding, then a team of other scientists peer reviewed it without getting paid, and Nature wants to make $32 from imprisoning it on their website? Fuck you, Nature.


Installing Transrate (19 Oct 2013)

NOTE... Transrate is now much easier to install! Just follow the instructions here.

Transrate is a program for analysing the quality of transcriptome assemblies. It's designed for anyone who is doing de-novo assembly of transcriptomes from RNA-Seq data. Recently I've had several requests for help installing transrate from non-expert users. This set of instructions is aimed at helping users new to the linux/unix environment to get up and running.

Transrate is written in the Ruby programming language. This makes it fairly easy to install. It also depends on some external software, which can be more complicated for new users to install. Here we'll go through the whole process step-by-step.


I hate usearch for being so good (03 Sep 2013)

Aligning biological sequences to other biological sequences is the bread and butter of bioinformatics.

By far the most popular software for this purpose is BLAST, the Basic Local Alignment Search Tool. BLAST was published in 1990, and at the time its heuristic algorithm was a huge advance (in computational speed) over the Smith-Waterman algorithm. People have since used it for all manner of sequence alignment tasks.

More recently, alternative aligners have risen to prominence in some specific alignment cases. For example, for aligning short next-generation sequencing reads to a set of longer sequences, bowtie, bwa, SOAP, and friends are all orders of magnitude faster than BLAST, and this is recognised by their usage.


The data structures challenge (18 Aug 2013)

To kick start this new blog, I'm setting myself a challenge: implement 42 data structures in 42 days.

Update: After about 2 weeks of this, my PhD exploded and I found myself totally overwhelmed. I've postponed (or failed!) the datastructures challenge having completed 10 structures and will resume work on the project when my PhD workload calms down again.

The reason

I'm nearing the end of the first year of my PhD. I started 10 months ago as a biologist with a bit of programming experience. A fair description of my current role in the lab is somewhere between a computational biologist and bioinformatician. I'm writing software every day to handle huge amounts of data in short turnaround times, and I have no formal CS training.