Wednesday, March 14, 2007

OCR taxa names from phylogenies

I have been trying out different open source OCR software, to recognise the taxa used in phylogenies. Tesseract (newly released by Google) did not fair as well as GOCR. If you train GOCR using a database of character images, it does even better.
________________________________
Tesseract
________________________________
M mii
is A
' 53 V S
i
3 7 @1
M `W gj i :[
T * 5?$fA
`Gy i i 8
V ~ 'E S` v~v :
;A bg fh , [
A `i $ > ggt wd
E V 1 Jvh xg E 4 i
A 3 ^ Awbt ) 4 V M
Tvlrjg A
W [ M Y 2 At A] [ vp
> 4 ii ; h i1
% V b a i % i i S
4-E A ) * k i VZ-WVYA S 4`
4 W AQAA A [A~Jw Q QT
VV 42 V V i mvi Q. A A4@ $
@ N [ 2 h Jii@ ' S
,1 > ) ~~%# A -` X * @8
A T V S E -3 A~I: W i
A ` J V V g - M ? A AE
3 `@ J g t. ,4
1 jji WE jzi Q E >
$ A ag H L 1 @ Q w$
vi V BE * V 3 V ` * miji ,)
I V `V as ` 7 ,! I i
5 ) i AVVM *4.3 ~# 4 p u ` 1 ~ V *
i J .41 " gm ^ ` V` U N9 7* 6) *`
2 =
` ~ " ,4 ` ` ` L.;i A>?, ps > ,` i V. g>v..Ve,g $?i[ X ` I j:)4;w . ( 'X A " K 7
4 1 qvb~: JJ S t *%%`~
T - V (A %%^) I As
3 I jwi @ @ A W J [
?g M 6* i Pg`( h b I S W
1 i ; v ^ T [f L AA A
v A H 2 3 Y44@ M V V [ A M { Vi
5 ? El t 52 Wi 1* AL i
* 1^ 4@=b> i-sm ^ 44J
Q *i b g N %A J
4 ; V A C A i ]i T J L ^ A$v J
1 ,V;gt t izm dmgg 4 A 7
T ~ "as A ` . , @ J AVE V7;z`* I?
i i O gm ] is Aig
`- ` V An 7 F x ``^ w ` `An pb _ A ~ e h 7 U ^,
W L - N 4 A #14$A i
M A @ * i
V Z V gi A E T v?)
` pig [A$[ X A 11 44 i L V
~$ Ivd? A ^~hr4 V mg
i $ ihb 4 4 A A p LLAE
& V L Ay M #
N A if '4jv[ 1 x
` V.>gg( A QA VIV \ v4Ji`
J7* **nif if @;ibA Mb 2 W A i & $@^@ Q
% & L% J 4i @ @w@ 4%
1 A x ][A Q ii ] ijN i&i A [
V A " F
i L t WAA! ia G iLAi*
!^=p~ .4 x vvgv V A
I - G 24 @1 V `'A. p m
VAF A i
1 @A[f A 2 X X 4A i @
W gv@ I gpa 'A @ ' E
,>gn I At M i-44 ` BE 4
m m M I V OF J; { V A V qil
)~'; E` > FF a ;; M `* ga ~
LA v 6 @ YAVZ4 A V 3 ) M VY J 8 (
A! i h A 1`3A& X *
HE % At _ ` 7 @ 2 " ' 5 4 "
S 9 4,p; 5
k A V j , i v [ MY >yAA SS 4
V ` ^%jV 4wJv` J 4 4
` Q? W ppgawi V 1 . 1
\ 1 _` 7 2* I-
74 [ ^*^- Av pw
2 ~T$ 'W`` V 1 i i i
Q) H pi In as i
3 4 1 'AY @4 ~ 4, Y
1 V Er A [ M $95 V 8 i `
V ( @4 V * ) jg.: A
V Q 4 s 1 ( vv ;
44 g E 3 iii A] >;1A 9;
C; 4%%3 2 P 4`4A%yV;` 4s
` v4'Y H4:tA? K ; i4 VA 1 VAV A`=$ g `
=?h V) V
4 s iqvv i @M,ib J .l@ A S V?
wbi` i4 $; ~ Q? V t
Aii 6 4 2; EYE V )AGv
A W g 1 ) S HE Jig
1 % AM fA4LN ii O Egg?
V; V @3 ;#
` V I iv V7>:v2 V VA N
* 'id; E ' X? V %.`i:h; 'M( @6i . 7 _ ` t V} .5 '
`V V4 gvgv E1`

M Q`;`A
' 'Apq
~ $$i
___________________________________
GOCR
___________________________________
c(PICTURE)100 C. _asicus (14)
C. nasicus (_8)
C. jo_e_sjs (30)
C. co_fusoy (32)
C. coMfusoy (12)
C. su_cafu_us (_3)
C. su_cafu_us (68)
C. humey__is(_4)
C. vicoyiensis (_5)
C. p_y__lis (26)
C. vicfoyie_sÌs (54)
C. _ongi_e_s (16)
C. longi_e_s (19)
C. ve_osus (82)
C. veMos_s (166)
C. ve_osus (160)
C. ve_osus (149)
C. s__icivoyus
C.pe_lifus (131)
C. pellifus (151)
C. pellifus (192)
C. eleph_s (_18)
C. eleph_s (_13)
C. eleph_s (_16)
C. c___e (204)
C. cR_Re (205)
C. ca__e (208)
C. pyobosci_eus (58)
C. humey__is (56)
C. p_o_osci_eus (22)
C. scufel1_pjs (5)
C. nucu_ (117)
C. nucum (_43)
C. g____ium (_OO)
C. g__n_ium (gg)
Pakjsfan _p. (_1)
C. c__elli_e (49)
C. c__el_i_e (20)
__iica_ sp. (8)
C. p__yhoce_as (1)
C. p_y_hoceyas (16)
_______________________
GOCR using database option
_______________________
c(PICTURE)100 C. nasicus (14)
C. nasicus (28)
C. iowensis (30)
C. confusor (32)
C. coMfusor (12)
C. sulcatulus (23)
C. sulcatulus (68)
C. humeyalis(24)
C. vicoyiensis (25)
C. paydalis (26)
C. vicforiensÌs (54)
C. longidens (16)
C. longidens (19)
C. venosus (82)
C. veMosus (166)
C. venosus (160)
C. venosus (149)
C. salicivoyus
C.pellifus (131)
C. pellitus (151)
C. pellitus (192)
C. elephas (218)
C. elephas (213)
C. elephas (216)
C. cawae (204)
C. cawae (205)
C. cawae (208)
C. pyoboscideus (58)
C. humeyalis (56)
C. proboscideus (22)
C. scufel1aris (5)
C. nucum (117)
C. nucum (243)
C. glandium (200)
C. glandium (99)
Pakistan sp. (21)
C. camelliae (49)
C. camelliae (20)
Afiican sp. (8)
C. pyrrhoceras (1)
C. pyyrhoceras (16)

Disqus for Evo-Karma