I have just published a program on the
automated recognition of phylogenies from tree images. First of all, I would like to apologise for the use of the word 'towards' in the title, I know that it is
increasingly being used and irritating to some. I just wanted to be honest in that this program does not succeed on all tree images but is a step in the right direction.
I thought I would take the opportunity to post a few links to software that deals with the same problem. We have all been rather unorginal with names!
To my knowledge, this was the first program that dealt with the problem of converting a phylogenetic image back to the more useful bracket format such as NEXUS or newick. It requires the user to click on tips and nodes in a specific order and type in the label at the tips. Unfortunately, this program only works on MacOS 9.
TreeSnatcher was a conceptual advance on TreeThief, relies heavily on Java libraries and is cross-platform. It requires a limited amount of input from the user, such as selecting the foreground and background and lets the user improve the quality of the extraction thanks to this interactivity.
TreeSnatcher Plus is an improvement on TreeSnatcher as it lets the user convert almost anything to a newick file, for example it works on radial tree images.
TreeRogue is essentially the same concept as TreeThief and I have just come across this so unfortunately do not make reference to it in my paper (sorry). It uses an R script that converts coordinates to a tree file. These coordinates can be detected from an image by using
GraphClick, which costs $8.
TreeRipper has been written in C++ and there is a version running on the website, the
code is available under GNU GPL v3. It uses heavily the C++ API to to ImageMagick image-processing library (
Magick++) and it uses
Tessecract-ocr to convert the leaf labels to text. This is a fully automated approach that unfortunately only works on a proportion of the tree images. You could for example use TreeRipper for batch processing a large number of trees and then use a semi-automated approach for the trees that weren't converted.
There is still a lot of room for improvement and I am hoping that someone out there will make further progress on this interesting challenge.
Reference:
Hughes, J. (2011). TreeRipper web application: towards a fully automated optical tree recognition software. BMC Bioinformatics 12: 178 doi:
10.1186/1471-2105-12-178