Well it has markedly improved. It is easier to install and it doesn't use Skype anymore so you do not need to leave your computer running Skype.
But now you have an additional company to buy credit from instead of Skype credit. The other down side is that now you cannot IM for free Skype users sitting in front of their computers. You can only communicate for free with people who have installed EQO mobile on their phones and this is not entirely free as you have to pay your network provider for data transfer (unless you have an all inclusive deal). I think that EQO mobile has now lost interesting functionality. O.K. it was a pain to have your computer on all the time but at least you could communicate with the millions of Skype users. Additionally, the costs of EQO calls to Mexico for example are higher than they use to be with the Skype set-up.
All about the good, the bad and the ugly things in life but mainly stuff about evolution, diversity of life, life forms and morphogenesis, phylogenies, trees and insects. Lots of biological news and comments. Cool discoveries of new species etc...
Monday, July 09, 2007
Wednesday, July 04, 2007
Tuesday, July 03, 2007
Installing tesseract command line OCR on MacOS X
Installing libpng from source:
http://kenno.wordpress.com/2006/04/20/compiling-libpng-for-mac-os-x/
fink install libjpeg, aspell, aspell-en
I will want to create my own aspell dictionary using taxonomic names:
http://www.mail-archive.com/code4lib@listserv.nd.edu/msg01545.html
Download and installing tesseract following install instructions:
http://code.google.com/p/tesseract-ocr/downloads/list
fink xpdf for pdfimages to extract images from a pdf:
>pdfimages -j LandPlants_paper.pdf LandPlantImg
To convert in imagemagick to tif for tesseract :
convert LandPlantImg.jpg -compress None test.tif
Using tesseract:
tesseract test.tif out.txt
I have now got a script to extract the names and check them against a dictionary of taxonomic names from spira.
I am thinking that using information from the article itself might provide even better results. When tesseract 2.0 comes out, there will also be a way of training the program to improve the character recognition. OCRupus also looks like an interesting program for layout detection but it doesn't work on MacOSx yet
The line extraction is proving to be much more difficult than first thought mainly because the lack of consistn format and the labelling at the nodes that get in the way of edge detection. I have tried a number of methods for cleaning up the image and bit by bit I will get there, I hope.
http://kenno.wordpress.com/2006/04/20/compiling-libpng-for-mac-os-x/
fink install libjpeg, aspell, aspell-en
I will want to create my own aspell dictionary using taxonomic names:
http://www.mail-archive.com/code4lib@listserv.nd.edu/msg01545.html
Download and installing tesseract following install instructions:
http://code.google.com/p/tesseract-ocr/downloads/list
fink xpdf for pdfimages to extract images from a pdf:
>pdfimages -j LandPlants_paper.pdf LandPlantImg
To convert in imagemagick to tif for tesseract :
convert LandPlantImg.jpg -compress None test.tif
Using tesseract:
tesseract test.tif out.txt
I have now got a script to extract the names and check them against a dictionary of taxonomic names from spira.
I am thinking that using information from the article itself might provide even better results. When tesseract 2.0 comes out, there will also be a way of training the program to improve the character recognition. OCRupus also looks like an interesting program for layout detection but it doesn't work on MacOSx yet
The line extraction is proving to be much more difficult than first thought mainly because the lack of consistn format and the labelling at the nodes that get in the way of edge detection. I have tried a number of methods for cleaning up the image and bit by bit I will get there, I hope.
Subscribe to:
Posts (Atom)