Some things run in families: eye colour, musical ability, language, nasty diseases etc. Some of these things, like table manners or language, are purely cultural. Others, such as thalassaemia and eye colour, are clearly genetic: if you inherit certain variants in a particular gene or set of genes, you get the trait. Since life tends to be complicated, it turns out that most things are partially genetic, and so are the sum total of the input of many genes and the environment. Genetics therefore usually runs like this: sorting out which traits are genetic, how genetic they are, measuring them, correlating them to variations to find the underlying genes, and understanding what the causative variants are doing to change the gene(s).
As you’d expect, we therefore need to measure two things: genetic variation and the trait, or phenotype, we’re interested in. Assessing genetic variation has always been the limiting factor: it requires collecting individuals (whether they’re yeast cells or human patients is irrelevant – you still need lots of them) and doing as many DNA variation tests on them as you possibly can, in order to cover as many genes as possible. You also have to know where these variations are, otherwise even if you find a correlation, you won’t know where the underlying gene is. Happily, thanks to concerted efforts over the last ten or fifteen years, we’ve gotten very good at both discovering variations and measuring them. In fact, we’re running out of variation: for instance, the HapMap project has collected the vast majority of common human variations (mostly single nucleotide polymorphisms, or SNPs, single-letter DNA changes which are by far the most common type of variation).
Now, the other shoe is starting to drop, in the sense that we’re discovering that our trait measurements are noisy and messy. People are measuring things every which way, then trying to compare results across studies. As you might expect, the results look less than spectacular. It’s not that any individual study is hopelessly flawed (although some are) and can therefore be discarded: it’s more that slightly different things are being measured at slightly different efficiencies. The cumulative effect is to degrade the trend in the trait measurements that correlates with genetic variation, a hit in statistical power that few studies can afford.
Enter a concept familiar to bioinformaticians, ITers, astronomers, internet geeks and particle physicists: standard operating procedures (SOPs), or just standards. A recent paper in PLoS Genetics discusses the requirements for development and implementation of standards in phenotyping, and why we need them, in the context of laboratory mouse biology. The gist of the argument is common-sensical: in order to compare results, you have to be doing the same experiment. Although it’s obvious, the realities of experimental science and the low genetic resolution of past experiments have made such efforts immaterial until now.
The EMPReSS project is a pioneering effort to provide a common standard for first-line mouse mutant characterisation assays. An EU initiative, it has rather predictably not yet been adopted by the North American research community, who seem content to merely make protocols available. The jury is still out on whether this is good enough, but in my opinion, it probably isn’t. The prevalent attitude represents the same resistance by the community in the 1990’s to extensive sequence annotation standards, and the emergence of multiple gene expression markup standards earlier this decade. There does, however, appear to be a shift in the air as large-scale collections of strains (mouse and otherwise) are being assembled and characterised.
Although I suspect we’re going to see a standards war analogous to previous ones in biology, I hope that we’ve learnt enough over the last ten or fifteen years to make it easier this time. Geneticists, are you listening?