|  | We are happy to announce that 1 new Written Corpus and 1 new Terminological Resource are now available in our catalogue.  
ELRA-W0081 Khresmoi manually annotated reference corpusISLRN: 764-036-829-417-7
 This corpus is a collection of Khresmoi English web documents annotated with key entities (such as disease, drug). The corpus is divided into two parts:
 1. The initial corpus: 625 documents from the Genetics Home Reference data set, automatically annotated with anatomical locations and diseases, and manually corrected by 3-4 annotators. Size of documents: between 26 and 8,306 tokens each.
 2. The main corpus: 6,950 English documents from the Khresmoi crawl and 5,518 English Wikipedia pages, automatically annotated through the GATE Platform for Anatomy, Disease, Drug and Investigation. Size of documents: between 200 and 2,000 tokens each.
 The corpus is using the GATE XML format.
 For more information, see: http://catalog.elra.info/product_info.php?products_id=1237
 
   
ELRA-T0375 ACL RD-TEC: A Reference Dataset for Terminology Extraction and Classification Research in Computational LinguisticsISLRN: 699-305-362-089-6
 This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). This dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms.
 For more information, see: http://catalog.elra.info/product_info.php?products_id=1236
 
         |