|    |   
Appen ButlerHill  
A global leader in linguistic technology solutions 
RECENT CATALOG ADDITIONS—MARCH 2012 
1. Speech Databases 
1.1 Telephony 
| 
 1.1 Telephony 
Language  | 
 Database Type 
 | 
 Catalogue Code 
 | 
 Speakers 
 | 
 Status 
 | 
 
| 
 Bahasa Indonesia 
 | 
 Conversational 
 | 
 BAH_ASR001 
 | 
 1,002 
 | 
 Available 
 | 
 
| 
 Bengali 
 | 
 Conversational 
 | 
 BEN_ASR001 
 | 
 1,000 
 | 
 Available 
 | 
 
| 
 Bulgarian 
 | 
 Conversational 
 | 
 BUL_ASR001 
 | 
 217 
 | 
 Available shortly 
 | 
 
| 
 Croatian 
 | 
 Conversational 
 | 
 CRO_ASR001 
 | 
 200 
 | 
 Available shortly 
 | 
 
| 
 Dari 
 | 
 Conversational 
 | 
 DAR_ASR001 
 | 
 500 
 | 
 Available 
 | 
 
| 
 Dutch 
 | 
 Conversational 
 | 
 NLD_ASR001 
 | 
 200 
 | 
 Available 
 | 
 
| 
 Eastern Algerian Arabic 
 | 
 Conversational 
 | 
 EAR_ASR001 
 | 
 496 
 | 
 Available 
 | 
 
| 
 English (UK) 
 | 
 Conversational 
 | 
 UKE_ASR001 
 | 
 1,150 
 | 
 Available 
 | 
 
| 
 Farsi/Persian 
 | 
 Scripted 
 | 
 FAR_ASR001 
 | 
 789 
 | 
 Available 
 | 
 
| 
 Farsi/Persian 
 | 
 Conversational 
 | 
 FAR_ASR002 
 | 
 1,000 
 | 
 Available 
 | 
 
| 
 French (EU) 
 | 
 Conversational 
 | 
 FRF_ASR001 
 | 
 563 
 | 
 Available 
 | 
 
| 
 French (EU) 
 | 
 Voicemail 
 | 
 FRF_ASR002 
 | 
 550 
 | 
 Available 
 | 
 
| 
 German 
 | 
 Voicemail 
 | 
 DEU_ASR002 
 | 
 890 
 | 
 Available 
 | 
 
| 
 Hebrew 
 | 
 Conversational 
 | 
 HEB_ASR001 
 | 
 200 
 | 
 Available shortly 
 | 
 
| 
 Italian 
 | 
 Conversational 
 | 
 ITA_ASR003 
 | 
 200 
 | 
 Available shortly 
 | 
 
| 
 Italian 
 | 
 Voicemail 
 | 
 ITA_ASR004 
 | 
 550 
 | 
 Available 
 | 
 
| 
 Kannada 
 | 
 Conversational 
 | 
 KAN_ASR001 
 | 
 1,000 
 | 
 In development 
 | 
 
| 
 Pashto 
 | 
 Conversational 
 | 
 PAS_ASR001 
 | 
 967 
 | 
 Available 
 | 
 
| 
 Portuguese (EU) 
 | 
 Conversational 
 | 
 PTP_ASR001 
 | 
 200 
 | 
 Available shortly 
 | 
 
| 
 Romanian 
 | 
 Conversational 
 | 
 ROM_ASR001 
 | 
 200 
 | 
 Available shortly 
 | 
 
| 
 Russian 
 | 
 Conversational 
 | 
 RUS_ASR001 
 | 
 200 
 | 
 Available 
 | 
 
| 
 Somali 
 | 
 Conversational 
 | 
 SOM_ASR001 
 | 
 1,000 
 | 
 Available 
 | 
 
| 
 Spanish (EU) 
 | 
 Voicemail 
 | 
 ESO_ASR002 
 | 
 500 
 | 
 Available 
 | 
 
| 
 Turkish 
 | 
 Conversational 
 | 
 TUR_ASR001 
 | 
 200 
 | 
 Available 
 | 
 
| 
 Urdu 
 | 
 Conversational 
 | 
 URD_ASR001 
 | 
 1,000 
 | 
 Available 
 | 
 
 
| 
 1.2 Wideband 
Language  | 
 Database Type 
 | 
 Catalogue Code 
 | 
 Speakers 
 | 
 Status 
 | 
 
| 
 English (US) 
 | 
 Studio 
 | 
 USE_ASR001 
 | 
 200 
 | 
 Available 
 | 
 
| 
 French (Canadian) 
 | 
 Home/ Office 
 | 
 FRC_ASR002 
 | 
 120 
 | 
 Available 
 | 
 
| 
 German 
 | 
 Studio 
 | 
 DEU_ASR001 
 | 
 127 
 | 
 Available 
 | 
 
| 
 Thai 
 | 
 Home/Office 
 | 
 THA_ASR001 
 | 
 100 
 | 
 Available 
 | 
 
| 
 Korean 
 | 
 Home/Office 
 | 
 KOR_ASR001 
 | 
 100 
 | 
 Available 
 | 
 
 
2. Pronunciation Lexica 
Appen Butler Hill has considerable experience in providing a variety of lexicon types. These include: 
 Pronunciation Lexica providing phonemic representation, syllabification, and stress (primary and secondary as appropriate)  
 Part-of-speech tagged Lexica providing grammatical and semantic labels  
 Other reference text based materials including spelling/mis-spelling lists, spell-check dictionar-ies, mappings of colloquial language to standard forms, orthographic normalization lists. 
Over a period of 15 years, Appen Butler Hill has generated a significant volume of licensable material for a wide range of languages. For holdings information in a given language or to discuss any customized development efforts, please contact: sales@appenbutlerhill.com 
| 
 3. Named Entity Corpora 
Language  | 
 Catalogue Code 
 | 
 Words 
 | 
 Description 
 | 
 
| 
 Arabic 
 | 
 ARB_NER001 
 | 
 500,000 
 | 
 These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities 
 | 
 
| 
 English 
 | 
 ENI_NER001 
 | 
 500,000 
 | 
 
| 
 Farsi/Persian 
 | 
 FAR_NER001 
 | 
 500,000 
 | 
 
| 
 Korean 
 | 
 KOR_NER001 
 | 
 500,000 
 | 
 
| 
 Japanese 
 | 
 JPY_NER001 
 | 
 500,000 
 | 
 
| 
 Russian 
 | 
 RUS_NER001 
 | 
 500,000 
 | 
 
| 
 Mandarin 
 | 
 MAN_NER001 
 | 
 500,000 
 | 
 
| 
 Urdu 
 | 
 URD_NER001 
 | 
 500,000 
 | 
 
 
| 
 3. Named Entity Corpora 
Language  | 
 Catalogue Code 
 | 
 Words 
 | 
 Description 
 | 
 
| 
 Arabic 
 | 
 ARB_NER001 
 | 
 500,000 
 | 
 These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities 
 | 
 
| 
 English 
 | 
 ENI_NER001 
 | 
 500,000 
 | 
 
| 
 Farsi/Persian 
 | 
 FAR_NER001 
 | 
 500,000 
 | 
 
| 
 Korean 
 | 
 KOR_NER001 
 | 
 500,000 
 | 
 
| 
 Japanese 
 | 
 JPY_NER001 
 | 
 500,000 
 | 
 
| 
 Russian 
 | 
 RUS_NER001 
 | 
 500,000 
 | 
 
| 
 Mandarin 
 | 
 MAN_NER001 
 | 
 500,000 
 | 
 
| 
 Urdu 
 | 
 URD_NER001 
 | 
 500,000 
 | 
 
 
4. Other Language Resources 
 Morphological Analyzers – Farsi/Persian & Urdu  
 Arabic Thesaurus 
 Language Analysis Documentation – multiple languages  
  
For additional information on these resources, please contact: sales@appenbutlerhill.com 
5. Customized Requests and Package Configurations 
Appen Butler Hill is committed to providing a low risk, high quality, reliable solution and has worked in 130+ languages to-date supporting both large global corporations and Government organizations. 
We would be glad to discuss to any customized requests or package configurations and prepare a cus-tomized proposal to meet your needs. 
| 
 6. Contact Information 
Prithivi Pradeep 
Business Development Manager 
ppradeep@appenbutlerhill.com 
+61 2 9468 6370 
 | 
 Tom Dibert 
Vice President, Business Development, North America 
tdibert@appenbutlerhill.com 
+1-315-339-6165 
 | 
 
 
                                                         www.appenbutlerhill.com  |