| We are happy to announce that 1 new Written Corpus and 1 new Terminological Resource are now available in our catalogue.
ELRA-W0081 Khresmoi manually annotated reference corpus ISLRN: 764-036-829-417-7 This corpus is a collection of Khresmoi English web documents annotated with key entities (such as disease, drug). The corpus is divided into two parts: 1. The initial corpus: 625 documents from the Genetics Home Reference data set, automatically annotated with anatomical locations and diseases, and manually corrected by 3-4 annotators. Size of documents: between 26 and 8,306 tokens each. 2. The main corpus: 6,950 English documents from the Khresmoi crawl and 5,518 English Wikipedia pages, automatically annotated through the GATE Platform for Anatomy, Disease, Drug and Investigation. Size of documents: between 200 and 2,000 tokens each. The corpus is using the GATE XML format. For more information, see: http://catalog.elra.info/product_info.php?products_id=1237
ELRA-T0375 ACL RD-TEC: A Reference Dataset for Terminology Extraction and Classification Research in Computational Linguistics ISLRN: 699-305-362-089-6 This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). This dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. For more information, see: http://catalog.elra.info/product_info.php?products_id=1236
|