Wednesday, October 26, 2011

Hacking the JPEG/PDF tree format

Just back from giving a presentation at the Scottish Phylogenetic Discussion group in Edinburgh. Nice mix of talks. I think I got a lot of people thinking and talking about the way we are doing things and how we could improve the way we do things. The slides are available on Nature Precedings but it is a bit slow to load, so I have added it to slideshare as well.

Wednesday, June 08, 2011

The phyloscape changes quickly, we need to build a better way to keep track of it


Since Hennig's 1969 major publication on the phylogeny of hexapod orders (Insecta + Entognatha), I have found more than 60 publications of phylogenies on the ordinal relationships within the group. Some may be re-analyses of the same data but it is still quite a large number of studies. Thirty-seven of these have been published in the last decade and this rapid change in the phylogenetic landscape of this group (and this is probably the case for many other lineages) is increasingly becoming hard to keep track of. Sure, you could do a regular Pubmed or WoS search for phylogen* + insecta but you then need to extract the phylogeny and put it in the context of previously published studies. Sure there are databases like TreeBase and PhyLoTa that provide ready-made phylogenetic reconstructions but the former has limited content and the latter has limited resolution at many nodes of interest.
It is important to have an up-to-date and complete image of the phylogenetic landscape of the groups we work on, even if the overall picture is blurry. This would provide a better idea of areas that require further taxonomic sampling and/or a larger number of characters to resolve the relationships of interest, it would also provide a valuable resource for comparative studies. For this to work, information needs to be integrated between different databases like PhyLoTA, TreeBASE, GenBank, Treefam etc. in an automated fashion as well as defrosting phylogenetic reconstructions from previously published studies (see my previous post). Perhaps a simple repository of third-party phylogenetic reconstructions would help: submitter - publication reference - figure number - phylogeny (newick, nexus, phyloxml, nexml ....). Although would anybody submit data? Perhaps I need to think of a way to reward those that do/


Reference
Hennig, W. 1969. Die Stammesgeschichte der Insekten. Frankfurt am Main, Germany: Kramer.

Sunday, May 22, 2011

Recognition of tree images



I have just published a program on the automated recognition of phylogenies from tree images. First of all, I would like to apologise for the use of the word 'towards' in the title, I know that it is increasingly being used and irritating to some. I just wanted to be honest in that this program does not succeed on all tree images but is a step in the right direction.

I thought I would take the opportunity to post a few links to software that deals with the same problem. We have all been rather unorginal with names!

To my knowledge, this was the first program that dealt with the problem of converting a phylogenetic image back to the more useful bracket format such as NEXUS or newick. It requires the user to click on tips and nodes in a specific order and type in the label at the tips. Unfortunately, this program only works on MacOS 9.

TreeSnatcher was a conceptual advance on TreeThief, relies heavily on Java libraries and is cross-platform. It requires a limited amount of input from the user, such as selecting the foreground and background and lets the user improve the quality of the extraction thanks to this interactivity.

TreeSnatcher Plus is an improvement on TreeSnatcher as it lets the user convert almost anything to a newick file, for example it works on radial tree images.

TreeRogue is essentially the same concept as TreeThief and I have just come across this so unfortunately do not make reference to it in my paper (sorry). It uses an R script that converts coordinates to a tree file. These coordinates can be detected from an image by using GraphClick, which costs $8.

TreeRipper has been written in C++ and there is a version running on the website, the code is available under GNU GPL v3. It uses heavily the C++ API to to ImageMagick image-processing library (Magick++) and it uses Tessecract-ocr to convert the leaf labels to text. This is a fully automated approach that unfortunately only works on a proportion of the tree images. You could for example use TreeRipper for batch processing a large number of trees and then use a semi-automated approach for the trees that weren't converted.

There is still a lot of room for improvement and I am hoping that someone out there will make further progress on this interesting challenge.

Of course, none of these programs would be necessary if we all shared our trees and this would only be possible if we had a useful phylogenetic standard <= this statement should please the TDWG Interest Group on Phylogenetic Standards ;)

Reference:
Hughes, J. (2011). TreeRipper web application: towards a fully automated optical tree recognition software. BMC Bioinformatics 12: 178 doi: 10.1186/1471-2105-12-178

Friday, May 20, 2011

Insect systematics: you've got to laugh, if you didn't you'd cry



In continuation from my previous post, I have now assembled 43 order level phylogenies of insects, i.e. they are based on more or less independent sources of data. The oldest study included is from 1993, so I still have my work cut out to find trees published before then especially as it becomes increasingly hard to get your hands on the articles as you go further back in time.
As more phylogenies are included, it also becomes hard to visualize this increasingly complex network on a 2D screen and I which I could explore it in a more intuitive way.

Enhanced by Zemanta

Wednesday, May 04, 2011

Many outstanding questions in the phylogenetic relationships of insect orders

I am trying to get my head around the multiplicity of phylogenetic hypotheses for insect phylogenetic relationships in continuation from my previous post. I have been gathering a number of insect phylogenies from the literature (these include morphological and molecular based phylogenies). I wanted to illustrate where the hypotheses were conflicting so I used a SuperNetwork with no edge weights in SplitsTree. This gives an idea of how much conflicting evidence there still is at the base of the Pterygota and also the large number of studies that have focused on the Endopterygota, in particular the relationship of the Strepsiptera to the other orders. Many of the orders have only been included in one study, in particular the basal orders. What I would like to do at some point, is show how the insect phylogeny has changed over time by layering the phylogenies chronologically onto one another to form the above SuperNetwork.

Enhanced by Zemanta

Disqus for Evo-Karma