ISCA Services

Impact LUE Open Language and Knowledge for Citizens ? OLKi
Application for a PhD grant 2018 co-supervised by the Crem and the Loria
?Online hate speech against migrants?

Deadline to apply : May 1^st 2018

According to the 2017 International Migration Report, the number of international migrants worldwide has continued to grow rapidly in recent years, reaching 258 million in 2017, up from 220
million in 2010 and 173 million in 2000. In 2017, 64 per cent of all international migrants worldwide ?
equal to 165 million international migrants ? lived in high-income countries; 78 million of them were
residing in Europe. Since 2000, Germany and France figure among the countries hosting the largest
number of international migrants. A key reason for the difficulty of EU leaders to take a decisive and
coherent approach to the refugee crisis has been the high levels of public anxiety about immigration
and asylum across Europe. Indeed, across the EU, attitudes towards asylum and immigration have
hardened in recent years because of (Berri et al., 2015): (i) the increase in the number and visibility of
migrants in recent years, (ii) the economic crisis and austerity policies enacted since the 2008 Global
Financial Crisis, (iii) the role of the mass media in influencing public and elite political attitudes towards
asylum and migration. Refugees and migrants tend to be framed negatively as a problem, potentially
nourishing.

Indeed, the BRICkS ? Building Respect on the Internet by Combating Hate Speech ? EU project1
has revealed a significant increase of the use of hate speech towards immigrants and minorities, which
are often blamed to be the cause of current economic and social problems. The participatory web and
the social media seem to accelerate this tendency, accentuated by the online rapid spread of fake news
which often corroborate online violence towards migrants. Based on existing research, Carla Schieb and
Mike Preuss (2016) highlight that hate speech deepens prejudice and stereotypes in a society (Citron &
Norton, 2011). It also has a detrimental effect on mental health and emotional well-being of targeted
groups, especially on targeted individuals (Festl & Quandt, 2013) and is a source of harm in general for
those under attack (Waldron, 2012), when culminating in violent acts incited by hateful speech. Such
violent hate crimes may erupt in the aftermath of certain key events, e.g. anti-Muslim hate crimes in
response to the 9/11 terrorist attacks (King & Sutton, 2013).

Hate speech and fake news are not, of course, just problems of our times. Hate speech has always
been part of antisocial behavior such as bullying or stalking (Delgado & Stefancic, 2014); ?trapped?,
emotional, unverified and/or biased contents have always existed (Dauphin, 2002; Froissart, 2002, 2004;
Lebre, 2014) and need to be understood on an anthropological level as reflections of people?s fears,
anxieties or fantasies. They reveal what Marc Angenot calls a certain ?state of society? (Angenot, 1978;
1989; 2006). Indeed, according to this author, analysis of situated specific discourses sheds light to some
of the topoi ? common premises and patterns ? that characterize public doxa. This ?gnoseological?
perspective reveals the ways the visions of the ?world? can be systematically schematized on linguistic
materials at a certain moment.

Within this context and problematic, the PhD project jointly proposed by the Crem and the Loria
aims to analyse hate speech towards migrants in social media and more particularly on Twitter.
It seeks to provide answers to the following questions:
? What are the representations of migrants as they emerge in hate speech on Twitter?
? What themes are they associated with?
? What can the latter tell us about the ?state? of our society, in the sense previously given to this
term by Marc Angenot?

Secondary questions will also be addressed as to refine the main results:
1 http://www.bricks-project.eu/wp/about-the-project/
? What is the origin of these messages? (individual accounts, political party accounts, bots, etc.)
? What is the circulation of these messages? (reactions, retweets, interactions, etc.)
? Can we measure the emotional dimension of these messages? Based on which indicators?
? Can a scale be established to measure the intensity of hate in speech?
More and more audio/video/text appear on Internet each day. About 300 hours of multimedia are
uploaded per minute. In these multimedia sources, manual content retrieval is difficult or impossible.
The classical approach for spoken content retrieval from multimedia documents is an automatic text
retrieval. Automatic text classification is one of the widely used technologies for the above purposes.
In text classification, text documents are usually represented in some so-called vector space and then
assigned to predefined classes through supervised machine learning. Each document is represented as a
numerical vector, which is computed from the words of the document. How to numerically represent
the terms in an appropriate way is a basic problem in text classification tasks and directly affects the
classification accuracy. Sometimes, in text classification, the classes cannot be defined in advance. In
this case, unsupervised machine learning is used and the challenge consists in finding underlying
structures from unlabeled data. We will use methodologies to perform one of the important tasks of text
classification: automatic hate speech detection.

Developments in Neural Network (Mikolov et al., 2013a) led to a renewed interest in the field of
distributional semantics, more specifically in learning word embeddings (representation of words in a
continuous space). Computational efficiency was one big factor which popularized word embeddings.
The word embeddings capture syntactic as well as semantic properties of the words (Mikolov et al.,
2013b). As a result, they outperformed several other word vector representations on different tasks
(Baroni et al., 2014).

Our methodology in the hate speech classification will be related on the recent approaches for text
classification with neural networks and word embeddings. In this context, fully connected feed forward
networks (Iyyer et al., 2015; Nam et al., 2014), Convolutional Neural Networks (CNN) (Kim, 2014;
Johnson and Zhang, 2015) and also Recurrent/Recursive Neural Networks (RNN) (Dong et al., 2014)
have been applied. On the one hand, the approaches based on CNN and RNN capture rich compositional
information, and have outperformed the state-of-the-art results in text classification; on the other hand
they are computationally intensive and require careful hyperparameter selection and/or regularization
(Dai and Le, 2015).

This thesis aims at proposing concepts, analysis and software components (Hate Speech Domain
Specific Analysis and related software tools in connection with migrants in social media) to bridge the
gap between conceptual requirements and multi-source information from social media. Automatic hate
speech detection software will be experimented in the modeling of various hate speech phenomenon and
assess their domain relevance with both partners.
The language of the analysed messages will be primarily French, although links with other languages
(including messages written in English) may appear throughout the analysis.
This PhD project complies with the Impact OLKi (Open Language and Knowledge for Citizens)
framework because:
? It is centred on language.
? It aims to implement new methods to study and extract knowledge from linguistic data
(indicators, scales of measurement).
? It opens perspectives to produce technical solutions (applications, etc.) for citizens and digital
platforms, to better control the potential negative use of language data.
Scientific challenges:
? to study and extract knowledge from linguistic data that concern hate speech towards migrants in
social media;
? to better understand hate speech as a social phenomenon, based on the data extracted and analysed;
? to propose and assess new methods based on Deep Learning for automatic detection of documents
containing hate speech. This will allow to set up a hate speech online management protocol.

Keywords: hate speech, migrants, social media, natural language processing.
Doctoral school: Computer Science (IAEM)
Principal supervisor: Irina Illina, Assistant Professor in Computer Science, irina.illina@loria.fr
Co-supervisors: Crem Loria
Angeliki Monnier, Professor Information-Communication, angeliki.monnier@univ-lorraine.fr
Dominique Fohr, Research scientist CNRS, dominique.fohr@loria.fr

References
Angenot M (1978) Fonctions narratives et maximes idéologiques. Orbis Litterarum 33: 95-100.
Angenot M (1989) 1889 : un état du discours social. Montréal : Préambule.
Angenot M (2006) Théorie du discours social. Notions de topographie des discours et de coupures cognitives,
COnTEXTES. thttps://contextes.revues.org/51.
Baroni, M., Dinu, G., and Kruszewski, G. (2014). ?Don?t count, predict! a systematic comparison of contextcounting
vs. contextpredicting semantic vectors?. In Proceedings of the 52nd Annual Meeting of the
Association for Computational Linguistics, Volume 1, pages 238-247.
Berri M, Garcia-Blanco I, Moore K (2015), Press coverage of the Refugee and Migrant Crisis in the EU: A Content
Analysis of five European Countries, Report prepared for the United Nations High Commission for Refugees,
Cardiff School of Journalism, Media and Cultural Studies.
Chouliaraki L, Georgiou M and Zaborowski R (2017), The European ?migration crisis? and the media: A cross-
European press content analysis. The London School of Economics and Political Science, London, UK.
Citron, D. K., Norton, H. L. (2011), ?Intermediaries and hate speech: Fostering digital citizenship for our
information age?, Boston University Law Review, 91, 1435.
Dai, A. M. and Le, Q. V. (2015). ?Semi-supervised sequence Learning?. In Cortes, C., Lawrence, N. D., Lee, D.
D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, pages
3061-3069. Curran Associates, Inc
Dauphin F (2002), Rumeurs électroniques : synergie entre technologie et archaïsme. Sociétés 76 : 71-87.
Delgado R., Stefancic J. (2014), ?Hate speech in cyberspace?, Wake Forest Law Review, 49.
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., and Xu, K. (2014). ?Adaptive recursive neural network for targetdependent
twitter sentiment classification?. In Proceedings of the 52nd Annual Meeting of the Association for
Computational Linguistics, ACL, Baltimore, MD, USA, Volume 2: pages 49-54.
Festl R., Quandt T (2013), Social relations and cyberbullying: The influence of individual and structural attributes
on victimization and perpetration via the internet, Human Communication Research, 39(1), 101?126.
Froissart P (2002) Les images rumorales, une nouvelle imagerie populaire sur Internet. Bry-Sur-Marne : INA.
Froissart P (2004) Des images rumorales en captivité : émergence d?une nouvelle catégorie de rumeur sur les sites
de référence sur Internet. Protée 32(3) : 47-55.
Johnson, R. and Zhang, T. (2015). ?Effective use of word order for text categorization with convolutional neural
networks?. In Proceedings of the 2015 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, pages 103-112.
Iyyer, M., Manjunatha, V., Boyd-Graber, J., and Daumé, H. (2015). ?Deep unordered composition rivals syntactic
methods for text classification?. In Proceedings of the 53rd Annual Meeting of the Association for
Computational Linguistics, volume 1, pages 1681-1691.
Kim, Y. (2014). ?Convolutional neural networks for sentence classification?. In Proceedings of the Conference on
Empirical Methods in Natural Language Processing (EMNLP), pages 1746-1751.
King R. D., Sutton G. M. (2013). High times for hate crimes: Explaining the temporal clustering of hate-motivated
offending. Criminology, 51 (4), 871?894.
Lebre J (2014) Des idées partout : à propos du partage des hoaxes entre droite et extrême droite. Lignes 45: 153-
162.
Mikolov, T., Yih, W.-t., and Zweig, G. (2013a). ?Linguistic regularities in continuous space word representations?.
In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages 746-751.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013b). ?Distributed representations of words
and phrases and their Compositionality?. In Advances in Neural Information Processing Systems, 26, pages
3111-3119. Curran Associates, Inc.
Nam, J., Kim, J., Loza Menc__a, E., Gurevych, I., and F urnkranz, J. (2014). ?Large-scale multi-label text
classification ? revisiting neural networks?. In Proceedings of the European Conference on Machine Learning
and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD-14), Part 2, volume 8725,
pages 437-452.
Schieb C, Preuss M (2016), Governing Hate Speech by Means of Counter Speech on Facebook, 66th ICA Annual
Conference, Fukuoka, Japan.
United Nations (2018), International Migration Report 2017. Highlights, New York, Department of Economic
and Social Affairs.
Waldron J. (2012), The harm in hate speech, Harvard University Press.

ISCApad #241