ISCA - International Speech
Communication Association


ISCApad Archive  »  2013  »  ISCApad #186  »  Resources  »  Database  »  LDC Newsletter (November 2013)

ISCApad #186

Tuesday, December 10, 2013 by Chris Wellekens

5-2-3 LDC Newsletter (November 2013)
  

 

     
In               this newsletter:
       
        -  Invitation to Join for Membership               Year (MY) 2014  -
       
        -  Spring 2014 LDC Data Scholarship               Program  -
       
        -  LDC to Close for Thanksgiving Break          -
       
        New

            publications:

                     
                      -                  Chinese Treebank 8.0  -
                     
                      -                  CSC Deceptive Speech  -
     

                       



       
   

Invitation to
            Join for Membership Year (MY) 2014

       
      Membership Year (MY)         2014 is open for joining!  We would like to invite all current         and previous members of LDC to renew their membership as well as         welcome new organizations to join the Consortium.  For MY2014, LDC is pleased to         maintain membership fees at last year’s rates – membership fees         will not increase.  Additionally, LDC will extend discounts on         membership fees to members who keep their membership current and         who join early in the year.
       
        The details of our early renewal discounts for MY2014 are as         follows:
       
        ·         Organizations who joined for MY2013 will receive a 5%         discount when renewing. This discount will apply throughout         2014, regardless of time of renewal. MY2013 members renewing         before Monday, March 3, 2014 will receive an additional 5%         discount, for a total 10% discount off the membership fee.
       
        ·         New members as well as organizations who did not join         for MY2013, but who held membership in any of the previous MYs         (1993-2012), will also be eligible for a 5% discount provided         that they join/renew before March 3, 2014.
       
        The following table provides exact pricing information.
       
         

                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

           


                 

         
           


                MY2014 Fee

         
           


                MY2014 Fee
                  with 5% Discount*

         
           


                MY2014 Fee
                  with 10% Discount** 

         
           


                Not-for-Profit /US Government

         
           


                 

         
           


                 

         
           


                 

         
           


                 

         
           


                Standard

         
           


                US$2400

         
           


                US$2280

         
           


                US$2160

         
           


                 

         
           


                Subscription

         
           


                US$3850

         
           


                US$3658

         
           


                US$3465

         
           


                For-Profit

         
           


                 

         
           


                 

         
           


                 

         
           


                 

         
           


                Standard

         
           


                US$24000

         
           


                US$22800

         
           


                US$21600

         
           


                 

         
           


                Subscription

         
           


                US$27500

         
           


                US$26125

         
           


                US$24750

         

   


        *  For new members, MY2013 Members renewing for MY2014, and any         previous year Member who renews before March 3, 2014
       
        ** For MY2013 Members renewing before March 3, 2014
       
       
        Publications for MY2014 are still being planned; here are the         working titles of data sets we intend to provide:
       
       
       

                                             

                                                                                                                                                                                                                                                                                                                                   
                   


                        2009 NIST Language Recognition Evaluation

                 
                   


                        MADCAT Phase 4 Training

                 
                   


                        Callfriend Farsi Speech and Transcripts

                 
                   


                        MALACH Czech ASR

                 
                   


                        GALE data – all phases and tasks

                 
                   


                        NIST OpenMT Five Language Progress Set

                 
                   


                        Hispanic-English Speech

                 
                   


                         

                 
           


               
               

         

   


     
        In addition to receiving new publications, current year members         of  LDC also enjoy the         benefit of licensing older data at reduced costs; current year         for-profit members may use most data for commercial         applications.

   


     
      Spring 2014
            LDC Data Scholarship Program

         
          Applications
            are now being accepted through Wednesday, January 15, 2014,             11:59PM EST for the Spring 20143 LDC Data Scholarship             program! The LDC Data Scholarship program provides             university students with access to LDC data at no-cost.             During previous program cycles, LDC has awarded no-cost             copies of LDC data to over 35 individual students and             student research groups.
         
          This             program is open to students pursuing both undergraduate and             graduate studies in an accredited college or university. LDC             Data Scholarships are not restricted to any particular field             of study; however, students must demonstrate a             well-developed research agenda and a bona fide inability to             pay. The selection process is highly competitive.
         
          The             application consists of two parts:
         
          (1) Data             Use Proposal. Applicants must submit a proposal describing             their intended use of the data. The proposal should state             which data the student plans to use and how the data will             benefit their research project as well as information on the             proposed methodology or algorithm.
         
          Applicants             should consult the LDC  Catalog for a complete list of data           distributed by LDC. Due to certain restrictions, a handful of           LDC corpora are restricted to members of the Consortium.           Applicants are advised to select a maximum of one to two           datasets; students may apply for additional datasets during           the following cycle once they have completed processing of the           initial datasets and publish or present work in some juried           venue.
         
          (2) Letter             of Support. Applicants must submit one letter of support             from their thesis adviser or department chair. The letter             must verify the student's need for data and confirm that the             department or university lacks the funding to pay the full             Non-member Fee for the data or to join the Consortium.          
         
          For             further information on application materials and program             rules, please visit the LDC Data Scholarship page.
         
          Students can email             their applications to the LDC             Data Scholarship program.           Decisions will be sent by email from the same address.
         
          The             deadline for the Spring 2014 program cycle is January 15,             2014, 11:59PM EST.
           
         
        LDC
          to Close for Thanksgiving Break

       
        LDC will be closed on Thursday, November 28, 2013 and Friday,         November 29, 2013 in observance of the US Thanksgiving Holiday.          Our offices will reopen on Monday, December 2, 2013.
       
       

   


        New publications
       
      (1)
      Chinese Treebank 8.0 consists of         approximately 1.5 million words of annotated and parsed text         from Chinese newswire, government documents, magazine articles,         various broadcast news and broadcast conversation programs, web         newsgroups and weblogs.

   

The
        Chinese Treebank project began at the University of Pennsylvania         in 1998, continued at the University of Colorado and then moved         to Brandeis University. The project’s goal is         to provide a large, part-of-speech tagged and fully bracketed         Chinese language corpus. The first delivery, Chinese Treebank         1.0, contained 100,000 syntactically annotated words from Xinhua         News Agency newswire. It was later corrected and released in         2001 as Chinese Treebank 2.0           (LDC2001T11) and consisted of         approximately 100,000 words. The LDC released Chinese Treebank 4.0           (LDC2004T05), an updated version         containing roughly 400,000 words, in 2004. A year later, LDC         published the 500,000 word Chinese Treebank 5.0           (LDC2005T01). Chinese Treebank 6.0           (LDC2007T36), released in 2007,         consisted of 780,000 words. Chinese Treebank 7.0           (LDC2010T08), released in 2010,         added new annotated newswire data, broadcast material and web         text to the approximate total of one million words. Chinese         Treebank 8.0 adds new annotated data from newswire, magazine         articles and government documents.

   

There
        are 3,007 text files in this release, containing 71,369         sentences, 1,620,561 words, 2,589,848 characters (hanzi or         foreign). The data is provided in UTF-8 encoding, and the         annotation has Penn Treebank-style labeled brackets. Details of         the annotation standard can be found in the  segmentation, POS-tagging and         bracketing guidelines included in the release. The data is         provided in four different formats: raw text, word segmented,         POS-tagged, and syntactically bracketed formats. All files were         automatically verified and manually checked.

   

Chinese
        Treebank 8.0 is distributed via web download. 

   

2013
        Subscription Members will automatically receive two copies of         this data on disc. 2013 Standard Members may request a copy as         part of their 16 free membership corpora.  Non-members may         license this data for US$300.
       
       
       

   

*
       
       

   


      (2)
      CSC Deceptive Speech was developed by         Columbia University, SRI International and University of         Colorado Boulder. It consists of 32 hours of audio interview         from 32 native speakers of Standard American English (16 male,         16 female) recruited from the Columbia University student         population and the community. The purpose of the study was to         distinguish deceptive speech from non-deceptive speech using         machine learning techniques on extracted features from the         corpus.

   

The
        participants were told that they were participating in a         communication experiment which sought to identify people who fit         the profile of the top entrepreneurs in America. To this end,         the participants performed tasks and answered questions in six         areas. Tthey were later told that they had received low scores         in some of those areas and did not fit the profile. The subjects         then participated in an interview where they were told to         convince the interviewer that they had actually achieved high         scores in all areas and that they did indeed fit the profile.         The task of the interviewer was to determine how he thought the         subjects had actually performed, and he was allowed to ask them         any questions other than those that were part of the performed         tasks. For each question from the interviewer, subjects were         asked to indicate whether the reply was true or contained any         false information by pressing one of two pedals hidden from the         interviewer under a table.

   

Interviews
        were conducted in a double-walled sound booth and recorded to         digital audio tape on two channels using Crown CM311A Differoid         headworn close-talking microphones, then down sampled to 16kHz         before processing.

   

The
        interviews were orthographically transcribed by hand using the         NIST EARS transcription guidelines. Labels for local lies were         obtained automatically from the pedal-press data and         hand-corrected for alignment, and labels for global lies were         annotated during transcription based on the known scores of the         subjects versus their reported scores. The orthographic         transcription was force-aligned using the SRI telephone speech         recognizer adapted for full-bandwidth recordings. There are         several segmentations associated with the corpus: the implicit         segmentation of the pedal presses, derived semi-automatically         sentence-like units (EARS SLASH-UNITS or SUs) which were hand         labeled, intonational phrase units and the units corresponding         to each topic of the interview.

   

CSC
        Deceptive Speech is distributed on 1 DVD-ROM. 

   

2013
        Subscription Members will automatically receive two copies of         this data  provided they have completed and returned the User License           Agreement for CSC Deceptive Speech (LDC2013S09). 2013 Standard Members may         request a copy as part of their 16 free membership corpora.          Non-members may license this data for US$1000.

   

 

  

 


   

      


Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA