ISCApad #188 |
Sunday, February 09, 2014 by Chris Wellekens |
In this newsletter:
LDC Membership Discounts for MY 2014 Still Available
New publications:
CALLFRIEND Farsi Second Edition Speech
CALLFRIEND Farsi Second Edition Transcripts
LDC Membership Discounts for MY 2014 Still Available
If you are considering joining LDC for Membership Year 2014 (MY2014), there is still time to save on membership fees. Any organization which joins or renews membership for 2014 through Monday, March 3, 2014, is entitled to a 5% discount on membership fees. Organizations which held membership for MY2013 can receive a 10% discount on fees provided they renew prior to March 3, 2014. For further information on pricing, please view our Invitation to Join for Membership Year 2014 announcement or contact LDC.
New Publications
(1) CALLFRIEND Farsi Second Edition Speech was developed by LDC and consists of approximately 42 hours of telephone conversation (100 recordings) among native Farsi speakers. The calls were recorded in 1995 and 1996 as part of the CALLFRIEND collection, a project designed primarily to support research in automatic language identification. One hundred native Farsi speakers living in the continental United States each made a single telephone call, lasting up to 30 minutes, to a family member or friend living in the United States.
This release represents all calls from the collection. LDC released recordings from 60 calls without transcripts in 1996 as CALLFRIEND Farsi (LDC96S50) after 20 of those calls were used as evaluation data in the first NIST Language Recognition Evaluation (LRE).
Corresponding transcripts are available in CALLFRIEND Farsi Second Edition Transcripts (LDC2014T01).
All recordings involved domestic calls routed through LDC’s automated telephone collection platform and were stored as 2-channel (4-wire), 8-KHz mu-law samples taken directly from the public telephone network via a T-1 circuit. Each audio file is a FLAC-compressed MS-WAV (RIFF) format audio file containing 2-channel, 8-KHz, 16-bit PCM sample data.
This release includes speaker information, including gender, the number of speakers on each channel and call duration.
CALLFRIEND Farsi Second Edition Speech is distributed on one DVD-ROM.
*
(2) CALLFRIEND Farsi Second Edition Transcripts was developed by LDC and consists of transcripts for approximately 42 hours of telephone conversation (100 recordings) among native Farsi speakers. The calls were recorded in 1995 and 1996 as part of the CALLFRIEND collection, a project designed primarily to support research in automatic language identification. One hundred native Farsi speakers living in the continental United States made a single telephone call, lasting up to 30 minutes, to a family member or friend living in the United States.
Corresponding speech data is available as CALLFRIEND Farsi Second Edition Speech (LDC2014S01).
Transcripts are presented in three formats: romanized transcripts (*asc.txt), Arabic-script transcripts (*ntv.txt) and both romanized and Arabic forms in a simple XML format (*.xml). For the *.txt files, the four main fields on each line (start-offset, end-offset, speaker-label, transcript-text) are separated by tabs. Each file begins with a single comment line containing the file_id string. This is followed immediately by the list of time-stamped segments, in order according to their start-offset values, with no blank lines. The XML form of the transcripts contains both Arabicized and romanized forms for Farsi words.
CALLFRIEND Farsi Second Edition Transcripts is distributed via web download. 2014 Subscription Members will automatically receive two copies of this data on disc. 2014 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$1000.
|
Back | Top |