Wednesday, December 09, 2009
The last decade has seen a huge increase in sequence data in public databases with an impressive increase in species coverage. Here, I have used Google's Motion Chart API and Google spreadsheet do illustrate the changes in numbers of sequences and numbers of species for different taxonomic groups. Whilst the nucleotide sequence increase against the number of species sequenced has been exponential for all taxonomic groups, the rate of increase in nucleotide sequences per species appears to have accelerated since 2007 for Fungi and Bacteria. The gap also seems to be widening between the number of nucleotides per species in the Metazoa compared the Viridiplantae. There appears to be no signs of a plateau and with the next-generation sequencers, we are likely to soon see an even sharper increase in the number of nucleotides per species. However, I suspect the rate at which additional species are added to NCBI might begin to slow as we find it harder to collect and sample novel species.