Monday, February 23, 2009

Lost dog genes found

Dogs can learn new tricks, or at least we can find those they already had. In a paper published recently by Derrien et al. (including Elaine Ostrander) they look at the missing genes in the dog genome compared to the other high quality published genomes.

When the dog genome was published the existing methods of orthology detection (ie 1 to 1 gene detection from other genomes) and other gene prediction methods produced an annotation with 412 fewer genes than in the rat, mouse, chimp, and human genomes (high quality genomes). There are a number of hypothetical explanations for this disparity in number. Three general hypothesis can explain this: 1)they could be Euarchontoglires specific gain, 2)they could be dog specific (or Laurasiatheria specific) gene loss, or 3)they could be an inadequacy on the part of the annotation algorithm. Obviosely these are not exclusive but Derrien et al. set out with new methods in hand to test these hypothesis for these specific genes.

Their methods: they used a very clever synteny method using the genes they do have an annotated orthology and were found in the same orientation in the genome as in all five species. As an example of how they used this information, lets say gene A, B, and C are all on the same chromosome and in the same order in the high quality genomes (ie no inversions between these genes). Gene A and C are also next to each other in the dog genome but B is one of the missing genes. Their method uses A and C as margins and search the space between them for B's homolog in dog. This gives them more statistical power because the search space is so much smaller and their a priori hypothesis of B's location makes a greater chance of finding a true positive or at least the remnants of the missing gene.

The results were very encouraging for future genome annotation endeavors. The method identified or annotated 268 dog genes that were missing (36 were in Ensembl were previously known but orthology was not) (Hypothesis 3). In addition, they found some evidence for pseudogenized genes (34 with low, and 21 with high support) (Hypothesis 2) and 37 undetected genes (Possibly Hypothesis 1). 29 were not identified using their synteny methods (Possibly Hypothesis 1). But in the end a majority of the unclassified genes are now classified.

The sort fall of such an approach comes from the specific nature of the analysis. Most annotation procedures are highly automized (we are working with about 20,000 genes) but following up after the primary run is still useful. I personally am excited about this method as it makes the dog genome annotation better.

The other great thing about this finding is that other genomes, including a relook at the high quality genomes themselves, may reveal more genes and better understanding of homology and orthology for comparative genomics.

No comments: