Image via WikipediaThe idea behind crowdsourcing is that the answer to a question is often more likely to be correct if you average the answers from a large number of non-experts rather than a single expert in the field. The term "crowdsourcing" has also been used for projects that outsource repetitive or challenging work to a crowd via the internet.
I have been thinking of outsourcing the problem of conversion of embedded phylogenies in PDFs back to newick/nexus format and have been looking at various science projects that have used crowdsourcing.
The most impressive from my point of view is Galaxy Zoo which has already resulted in a number of publications and impressive discoveries. Astrophysicist use the crowd to categorise 1000s of galaxies and have expanded the crowd tasks to include matching images of galaxies with randomly simulated images.
Stardust@Home is another astrophysics project which asks that the crowd looks through images for dust particles brought back to earth by a spacecraft in 2006.
Another cool project is the Open Dinosaur Project which asks that the crowd aggregates published measurements of dinosaur limb bones for many different taxa from the literature and directly measured from specimens to study the evolutionary transitions from bipedality to quadrupedality.
Foldit is a computer game enabling the crowd to contribute to our understanding of how protein folds. Figuring out which of the many, many possible structures is the best one is regarded as one of the hardest problems in biology today and current methods take a lot of money and time, even for computers. The idea of using human's spare time to get further insight is genius!
Another game that might not be directly relevant to science is Google Image Labeler which I found rather addictive. Google gets users to label/tag images as a side-effect of playing a game and this is probably used to improve image searches on the web. I list it hear because I came across a few images of animals that in some cases were labeled down to the latin binomial.
UPDATE: An interesting new crowd sourcing project at http://www.oldweather.org/ to help gather information about past climates from hand written nautical records.
All about the good, the bad and the ugly things in life but mainly stuff about evolution, diversity of life, life forms and morphogenesis, phylogenies, trees and insects. Lots of biological news and comments. Cool discoveries of new species etc...
Thursday, April 19, 2012
Friday, April 13, 2012
Share your trees and reduce your carbon footprint
More efficient algorithms and programming, videoconferencing! This all got me thinking about the three Rs: REDUCE, REUSE and RECYCLE in the context of phylogenetics. We can all do our bit to REDUCE our carbon footprint when doing phylogenetics. For starters, is the analysis I want to do really necessary, does it have to run as long, can we use a better, more efficient algorithms. Secondly, we can REUSE the trees that others have already done but this means that we need to get much better at sharing our trees. TreeBASE and DataDryad are undoubtedly playing an important role in enabling us to share phylogenies and thus reduce our carbon footprint. However, as discussed in "Towards a taxonomically intelligent phylogenetic database" by Rod Page the pace at which we are publishing phylogenies is not being matched by the submissions to TreeBASE. This leaves us with the last option to RECYCLE our trees. This should only be a last resort but ends up happening most of the time. For this we need to get back to our raw materials, the sequences, which fortunately are more consistently shared in GenBank and redo the analyses.
Hopefully, this time round the algorithm will produce less carbon and the data will be submitted to TreeBASE!
Tuesday, April 10, 2012
Phylogeny digitisation
I was hunting around for further research on phylogeny image digitisation to see whether any advances had been made since I last published on the topic and to keep my previous post up-to-date. The main reason behind all of this is to see whether there would be a faster way to digitise a bunch of images that are accumulating on my hard drive. I thought it would be cool to do something with the ripped phylogenies for the iEvoBio Challenge but my current set of trees only has a total of 2,000 leaves and I need 10,000.
Anyway, I came across PHYLODIGM in my searches which looks promising. Thomas Laubach has also done some further work on TreeSnatcher Plus including using the benchmarking dataset from TreeRipper and a number of tree files found via Google searches. Additionally, he has released the source code under the GNU General Public License.
This all looks promising!!!
Anyway, I came across PHYLODIGM in my searches which looks promising. Thomas Laubach has also done some further work on TreeSnatcher Plus including using the benchmarking dataset from TreeRipper and a number of tree files found via Google searches. Additionally, he has released the source code under the GNU General Public License.
This all looks promising!!!
Subscribe to:
Posts (Atom)