Wednesday, March 14, 2007

OCR taxa names from phylogenies

I have been trying out different open source OCR software, to recognise the taxa used in phylogenies. Tesseract (newly released by Google) did not fair as well as GOCR. If you train GOCR using a database of character images, it does even better.
________________________________
Tesseract
________________________________
M mii
is A
' 53 V S
i
3 7 @1
M `W gj i :[
T * 5?$fA
`Gy i i 8
V ~ 'E S` v~v :
;A bg fh , [
A `i $ > ggt wd
E V 1 Jvh xg E 4 i
A 3 ^ Awbt ) 4 V M
Tvlrjg A
W [ M Y 2 At A] [ vp
> 4 ii ; h i1
% V b a i % i i S
4-E A ) * k i VZ-WVYA S 4`
4 W AQAA A [A~Jw Q QT
VV 42 V V i mvi Q. A A4@ $
@ N [ 2 h Jii@ ' S
,1 > ) ~~%# A -` X * @8
A T V S E -3 A~I: W i
A ` J V V g - M ? A AE
3 `@ J g t. ,4
1 jji WE jzi Q E >
$ A ag H L 1 @ Q w$
vi V BE * V 3 V ` * miji ,)
I V `V as ` 7 ,! I i
5 ) i AVVM *4.3 ~# 4 p u ` 1 ~ V *
i J .41 " gm ^ ` V` U N9 7* 6) *`
2 =
` ~ " ,4 ` ` ` L.;i A>?, ps > ,` i V. g>v..Ve,g $?i[ X ` I j:)4;w . ( 'X A " K 7
4 1 qvb~: JJ S t *%%`~
T - V (A %%^) I As
3 I jwi @ @ A W J [
?g M 6* i Pg`( h b I S W
1 i ; v ^ T [f L AA A
v A H 2 3 Y44@ M V V [ A M { Vi
5 ? El t 52 Wi 1* AL i
* 1^ 4@=b> i-sm ^ 44J
Q *i b g N %A J
4 ; V A C A i ]i T J L ^ A$v J
1 ,V;gt t izm dmgg 4 A 7
T ~ "as A ` . , @ J AVE V7;z`* I?
i i O gm ] is Aig
`- ` V An 7 F x ``^ w ` `An pb _ A ~ e h 7 U ^,
W L - N 4 A #14$A i
M A @ * i
V Z V gi A E T v?)
` pig [A$[ X A 11 44 i L V
~$ Ivd? A ^~hr4 V mg
i $ ihb 4 4 A A p LLAE
& V L Ay M #
N A if '4jv[ 1 x
` V.>gg( A QA VIV \ v4Ji`
J7* **nif if @;ibA Mb 2 W A i & $@^@ Q
% & L% J 4i @ @w@ 4%
1 A x ][A Q ii ] ijN i&i A [
V A " F
i L t WAA! ia G iLAi*
!^=p~ .4 x vvgv V A
I - G 24 @1 V `'A. p m
VAF A i
1 @A[f A 2 X X 4A i @
W gv@ I gpa 'A @ ' E
,>gn I At M i-44 ` BE 4
m m M I V OF J; { V A V qil
)~'; E` > FF a ;; M `* ga ~
LA v 6 @ YAVZ4 A V 3 ) M VY J 8 (
A! i h A 1`3A& X *
HE % At _ ` 7 @ 2 " ' 5 4 "
S 9 4,p; 5
k A V j , i v [ MY >yAA SS 4
V ` ^%jV 4wJv` J 4 4
` Q? W ppgawi V 1 . 1
\ 1 _` 7 2* I-
74 [ ^*^- Av pw
2 ~T$ 'W`` V 1 i i i
Q) H pi In as i
3 4 1 'AY @4 ~ 4, Y
1 V Er A [ M $95 V 8 i `
V ( @4 V * ) jg.: A
V Q 4 s 1 ( vv ;
44 g E 3 iii A] >;1A 9;
C; 4%%3 2 P 4`4A%yV;` 4s
` v4'Y H4:tA? K ; i4 VA 1 VAV A`=$ g `
=?h V) V
4 s iqvv i @M,ib J .l@ A S V?
wbi` i4 $; ~ Q? V t
Aii 6 4 2; EYE V )AGv
A W g 1 ) S HE Jig
1 % AM fA4LN ii O Egg?
V; V @3 ;#
` V I iv V7>:v2 V VA N
* 'id; E ' X? V %.`i:h; 'M( @6i . 7 _ ` t V} .5 '
`V V4 gvgv E1`

M Q`;`A
' 'Apq
~ $$i
___________________________________
GOCR
___________________________________
c(PICTURE)100 C. _asicus (14)
C. nasicus (_8)
C. jo_e_sjs (30)
C. co_fusoy (32)
C. coMfusoy (12)
C. su_cafu_us (_3)
C. su_cafu_us (68)
C. humey__is(_4)
C. vicoyiensis (_5)
C. p_y__lis (26)
C. vicfoyie_sÌs (54)
C. _ongi_e_s (16)
C. longi_e_s (19)
C. ve_osus (82)
C. veMos_s (166)
C. ve_osus (160)
C. ve_osus (149)
C. s__icivoyus
C.pe_lifus (131)
C. pellifus (151)
C. pellifus (192)
C. eleph_s (_18)
C. eleph_s (_13)
C. eleph_s (_16)
C. c___e (204)
C. cR_Re (205)
C. ca__e (208)
C. pyobosci_eus (58)
C. humey__is (56)
C. p_o_osci_eus (22)
C. scufel1_pjs (5)
C. nucu_ (117)
C. nucum (_43)
C. g____ium (_OO)
C. g__n_ium (gg)
Pakjsfan _p. (_1)
C. c__elli_e (49)
C. c__el_i_e (20)
__iica_ sp. (8)
C. p__yhoce_as (1)
C. p_y_hoceyas (16)
_______________________
GOCR using database option
_______________________
c(PICTURE)100 C. nasicus (14)
C. nasicus (28)
C. iowensis (30)
C. confusor (32)
C. coMfusor (12)
C. sulcatulus (23)
C. sulcatulus (68)
C. humeyalis(24)
C. vicoyiensis (25)
C. paydalis (26)
C. vicforiensÌs (54)
C. longidens (16)
C. longidens (19)
C. venosus (82)
C. veMosus (166)
C. venosus (160)
C. venosus (149)
C. salicivoyus
C.pellifus (131)
C. pellitus (151)
C. pellitus (192)
C. elephas (218)
C. elephas (213)
C. elephas (216)
C. cawae (204)
C. cawae (205)
C. cawae (208)
C. pyoboscideus (58)
C. humeyalis (56)
C. proboscideus (22)
C. scufel1aris (5)
C. nucum (117)
C. nucum (243)
C. glandium (200)
C. glandium (99)
Pakistan sp. (21)
C. camelliae (49)
C. camelliae (20)
Afiican sp. (8)
C. pyrrhoceras (1)
C. pyyrhoceras (16)

Installing GOCR using fink

sudo nano /sw/etc/fink.conf
fink selfupdate
fink index
fink scanpackages
fink install gocr

Thursday, March 08, 2007

Juicing apples

Using Braeburns at £1.28/kg, I got 375ml from 0.740kg, so you get 0.5mL per gram. This means that 1L of Braeburn apple juice would cost £2.52 per Liter. That is quite a bit! Actually more than some pear juices. The juice was tasty, clear, tasting slightly like cider (farm-yardy/earthy).
Royal Gala, my personal favorite. Price of £1.18/kg. From 0.48kg, I got 325mL. That £1.75 per Liter. That's half the price of Braeburn and it is a much sweeter juice, it looks slightly pink, but maybe does not have such a strong apple flavour.

Disqus for Evo-Karma