|
Appen ButlerHill
A global leader in linguistic technology solutions
RECENT CATALOG ADDITIONS—MARCH 2012
1. Speech Databases
1.1 Telephony
1.1 Telephony
Language |
Database Type
|
Catalogue Code
|
Speakers
|
Status
|
Bahasa Indonesia
|
Conversational
|
BAH_ASR001
|
1,002
|
Available
|
Bengali
|
Conversational
|
BEN_ASR001
|
1,000
|
Available
|
Bulgarian
|
Conversational
|
BUL_ASR001
|
217
|
Available shortly
|
Croatian
|
Conversational
|
CRO_ASR001
|
200
|
Available shortly
|
Dari
|
Conversational
|
DAR_ASR001
|
500
|
Available
|
Dutch
|
Conversational
|
NLD_ASR001
|
200
|
Available
|
Eastern Algerian Arabic
|
Conversational
|
EAR_ASR001
|
496
|
Available
|
English (UK)
|
Conversational
|
UKE_ASR001
|
1,150
|
Available
|
Farsi/Persian
|
Scripted
|
FAR_ASR001
|
789
|
Available
|
Farsi/Persian
|
Conversational
|
FAR_ASR002
|
1,000
|
Available
|
French (EU)
|
Conversational
|
FRF_ASR001
|
563
|
Available
|
French (EU)
|
Voicemail
|
FRF_ASR002
|
550
|
Available
|
German
|
Voicemail
|
DEU_ASR002
|
890
|
Available
|
Hebrew
|
Conversational
|
HEB_ASR001
|
200
|
Available shortly
|
Italian
|
Conversational
|
ITA_ASR003
|
200
|
Available shortly
|
Italian
|
Voicemail
|
ITA_ASR004
|
550
|
Available
|
Kannada
|
Conversational
|
KAN_ASR001
|
1,000
|
In development
|
Pashto
|
Conversational
|
PAS_ASR001
|
967
|
Available
|
Portuguese (EU)
|
Conversational
|
PTP_ASR001
|
200
|
Available shortly
|
Romanian
|
Conversational
|
ROM_ASR001
|
200
|
Available shortly
|
Russian
|
Conversational
|
RUS_ASR001
|
200
|
Available
|
Somali
|
Conversational
|
SOM_ASR001
|
1,000
|
Available
|
Spanish (EU)
|
Voicemail
|
ESO_ASR002
|
500
|
Available
|
Turkish
|
Conversational
|
TUR_ASR001
|
200
|
Available
|
Urdu
|
Conversational
|
URD_ASR001
|
1,000
|
Available
|
1.2 Wideband
Language |
Database Type
|
Catalogue Code
|
Speakers
|
Status
|
English (US)
|
Studio
|
USE_ASR001
|
200
|
Available
|
French (Canadian)
|
Home/ Office
|
FRC_ASR002
|
120
|
Available
|
German
|
Studio
|
DEU_ASR001
|
127
|
Available
|
Thai
|
Home/Office
|
THA_ASR001
|
100
|
Available
|
Korean
|
Home/Office
|
KOR_ASR001
|
100
|
Available
|
2. Pronunciation Lexica
Appen Butler Hill has considerable experience in providing a variety of lexicon types. These include:
Pronunciation Lexica providing phonemic representation, syllabification, and stress (primary and secondary as appropriate)
Part-of-speech tagged Lexica providing grammatical and semantic labels
Other reference text based materials including spelling/mis-spelling lists, spell-check dictionar-ies, mappings of colloquial language to standard forms, orthographic normalization lists.
Over a period of 15 years, Appen Butler Hill has generated a significant volume of licensable material for a wide range of languages. For holdings information in a given language or to discuss any customized development efforts, please contact: sales@appenbutlerhill.com
3. Named Entity Corpora
Language |
Catalogue Code
|
Words
|
Description
|
Arabic
|
ARB_NER001
|
500,000
|
These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities
|
English
|
ENI_NER001
|
500,000
|
Farsi/Persian
|
FAR_NER001
|
500,000
|
Korean
|
KOR_NER001
|
500,000
|
Japanese
|
JPY_NER001
|
500,000
|
Russian
|
RUS_NER001
|
500,000
|
Mandarin
|
MAN_NER001
|
500,000
|
Urdu
|
URD_NER001
|
500,000
|
3. Named Entity Corpora
Language |
Catalogue Code
|
Words
|
Description
|
Arabic
|
ARB_NER001
|
500,000
|
These NER Corpora contain text material from a vari-ety of sources and are tagged for the following Named Entities: Person, Organization, Location, Na-tionality, Religion, Facility, Geo-Political Entity, Titles, Quantities
|
English
|
ENI_NER001
|
500,000
|
Farsi/Persian
|
FAR_NER001
|
500,000
|
Korean
|
KOR_NER001
|
500,000
|
Japanese
|
JPY_NER001
|
500,000
|
Russian
|
RUS_NER001
|
500,000
|
Mandarin
|
MAN_NER001
|
500,000
|
Urdu
|
URD_NER001
|
500,000
|
4. Other Language Resources
Morphological Analyzers – Farsi/Persian & Urdu
Arabic Thesaurus
Language Analysis Documentation – multiple languages
For additional information on these resources, please contact: sales@appenbutlerhill.com
5. Customized Requests and Package Configurations
Appen Butler Hill is committed to providing a low risk, high quality, reliable solution and has worked in 130+ languages to-date supporting both large global corporations and Government organizations.
We would be glad to discuss to any customized requests or package configurations and prepare a cus-tomized proposal to meet your needs.
6. Contact Information
Prithivi Pradeep
Business Development Manager
ppradeep@appenbutlerhill.com
+61 2 9468 6370
|
Tom Dibert
Vice President, Business Development, North America
tdibert@appenbutlerhill.com
+1-315-339-6165
|
www.appenbutlerhill.com |