ISCApad Archive » 2019 » ISCApad #247 » Resources » Software » MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP |
ISCApad #247 |
Friday, January 18, 2019 by Chris Wellekens |
We are happy to announce the release of our new toolkit “MultiVec” for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]’s word2vec features, Le and Mikolov [2014]’s paragraph vector (batch and online) and Luong et al. [2015]’s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification. The toolkit also includes C++ and Python libraries, that you can use to query bilingual and monolingual models.
The project is fully open to future contributions. The code is provided on the project webpage (https://github.com/eske/multivec) with installation instructions and command-line usage examples.
When you use this toolkit, please cite:
@InProceedings{MultiVecLREC2016, Title = {{MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP}}, Author = {Alexandre Bérard and Christophe Servan and Olivier Pietquin and Laurent Besacier}, Booktitle = {The 10th edition of the Language Resources and Evaluation Conference (LREC 2016)}, Year = {2016}, Month = {May} }
The paper is available here: https://github.com/eske/multivec/raw/master/docs/Berard_and_al-MultiVec_a_Multilingual_and_Multilevel_Representation_Learning_Toolkit_for_NLP-LREC2016.pdf
Best regards,
Alexandre Bérard, Christophe Servan, Olivier Pietquin and Laurent Besacier |
Back | Top |