Keynote lectures
Societal Challenges for Information Retrieval
Benno Stein
Benno Stein is chair of the Web-Technology and Information Systems Group at the Bauhaus-Universität Weimar. His research focuses on modeling and solving data- and knowledge-intensive information processing tasks. Common ground of his research are the principles and methods of symbolic Artificial Intelligence. Benno Stein has developed theories, algorithms, and tools for information retrieval, data mining, computational linguistics, knowledge processing, as well as for engineering design and simulation (patents granted). For several achievements of his research he has been awarded with scientific and commercial prizes.

Professional background: Study at the University of Karlsruhe (1984 - 1989). Dissertation (1995) and Habilitation (2002) in computer science at the University of Paderborn. Appointment as a full professor for Web Technology and Information Systems at the Bauhaus-Universität Weimar (2005). Research stays at IBM, Germany, and the International Computer Science Institute, Berkeley. Benno Stein serves on scientific boards, on program committees, as reviewer in various relevant conferences and journals, and he is the initiator and a co-chair of PAN, an excellence network and evaluation lab on digital text forensics with focus on authorship analysis, profiling, and reuse detection. He is cofounder and spokesperson of the Digital Bauhaus Lab Weimar, a visionary and interdisciplinary research center for Computer Science, Arts, and Engineering. Not least, he is a cofounder (1996) and scientific director of the Art Systems Software Ltd, a world leading company for simulation technology in fluidic engineering.
Automatic Text Simplification
Dr. Sanja Štajner
Lexically and syntactically complex texts pose difficulties both for humans and for various NLP tasks (machine translation, information extraction, semantic role labelling, etc.). The task of automatic text simplification appeared some 30 years ago, but attracted wide attention from the NLP community only in the last few years. In this tutorial, I will present the main types of automatic text simplification systems, the most successful approaches proposed so far, and the existing resources for text simplification. The main focus will be on discussing the strengths and weaknesses of each of the approaches, the limitations of the existing resources and text simplification systems, and the complexity of the task.

Sanja Štajner is currently a postdoctoral research fellow at the University of Mannheim, Germany. She holds a multiple Masters degree in Natural Language Processing and Human Language Technologies (Autonomous University of Barcelona, Spain and University of Wolverhampton, UK) and the PhD degree in Computer Science from the University of Wolverhampton on the topic of "Data-driven Text Simplification". She participated in Simplext and FIRST projects on automatic text simplification, and is the lead author of four ACL papers on text simplification (including the first neural text simplification system) and numerous other papers on the topics of text simplification and readability assessment at various leading international conferences and journals.

Sanja regularly teaches NLP at Masters and PhD levels, delivers invited talks and seminars at various universities and companies. She held a tutorial on "Deep Learning for Text Simplification" at RANLP 2017, and tutorial on "Data-Driven Text Simplification" at COLING 2018. She is an area chair for COLING 2018, and regular program committee member of ACL, EMNLP, LREC, IJCAI, IAAA and other international conferences and journals. She was a lead organizer of the first international workshop and shared task on Quality Assessment of Text Simplification (QATS) in 2016, and the Complex Word Identification shared task in 2018.
How to create virtual assistant skill in Just AI DSL.
Darya Serdyuk & Svetlana Volskaya, Just AI
We will talk about computational linguistics problems in chatbot development, show how our platform can solve these problems and demonstrate how to create a chatbot quickly and easily with our Just AI DSL.
Graph Clustering for Natural Language Processing
Dr. Dmitry Ustalov
Graph-based representations are proven to be an effective approach for a variety of Natural Language Processing (NLP) tasks. Graph clustering makes it possible to extract useful knowledge by exploiting the implicit structure of the data. In this tutorial, we will present several efficient graph clustering algorithms, show their strengths and weaknesses as well as their implementations and applications. Then, the evaluation methodology in unsupervised NLP tasks will be discussed.

Dmitry Ustalov is a post-doctoral research fellow at the University of Mannheim, Germany. His research is focused on Computational Lexical Semantics and Crowdsourcing. In 2018 he defended his Kandidat Nauk (PhD) thesis which he worked on at the Krasovskii Institute of Mathematics and Mechanics, Russia. Dmitry's research is published in the premier international scientific conferences, such as ACL, EACL, and EMNLP. He serves as a reviewer for ACL, EMNLP, *SEM, EKAW, TextGraphs, and other high-level events.

In 2012 Dmitry founded NLPub, the leading Russian wiki on Computational Linguistics. Since 2014 he has been co-organizing the workshop on Russian Semantic Evaluation (RUSSE). Also, Dmitry teaches Text Analytics and Web Mining classes to master students.
Paper Presentations
Denis Fedorenko, Nikita Smetanin and Artem Rodichev
Benyamin Ahmadnia, Javier Serrano, Gholamreza Haffari and Nik-Mohammad Balouchzahi
Oksana Antropova, Elena Arslanova, Maxim Shaposhnikov, Pavel Braslavski and Mihail Mukhin
Tatiana Litvinova, Alexandr Sboev and Polina Panicheva
Evgeny Kotelnikov, Tatiana Peskisheva, Anastasia Kotelnikova and Elena Razova
Anton Ermilov, Natasha Murashkina, Valeria Goryacheva and Pavel Braslavski
Anna Kriukova, Aliia Erofeeva, Olga Mitrofanova and Kirill Sukharev
Svetlana Toldova, Dina Pisarevskaya and Maria Koboseva
Andrey Mavrin, Andrey Filchenkov and Sergei Koltcov
poster and demo session
Denotation graph as knowledge storage structure in goal-oriented dialogue systems
Aleksandr Perevalov
Legal compliance with EU and Russian data protection regulation: voice and speech technology case
Ilya Ilin
Using Ensemble of Binary Classifiers for Voice Spoofing Detection
Andrey Lependin
Antidictionary: an annotated database of out-of-dictionary words
Irina Krotova
An approach for solving the unbalanced hierarchical multiclass classification problem for large textual datasets
Petr Pogorelov
Component-based approach to automatic poetry generation
Anna Mosolova
Deriving Cognitive Map Concepts on the Basis of Social Media Data Clustering
Vasiliy Kireev
RusVectōrēs: word embeddings for Russian
Elizaveta Kuzmenko
Russian Learner Translator Corpus 2.0: newer and better
Maria Kunilovskaya
RusNLP: Semantic search engine for Russian NLP conference papers
Andrey Kutuzov, Irina Nikishina, Amir Bakarov
Vecto: A Framework for Word, Sentence, Character Embeddings and Beyond! (
Amir Bakarov
industrial session
ML problems in crowdsoursing platform Yandex.Toloka.
Misha Slabodkin, Yandex

Crowdsourcing platform is a two-sided market where customers and workers look up each other. Customers place their tasks (usually related to machine learning purposes, such as collecting ground-truth labels for their datasets), workers do these tasks gaining money. As a platform, we should regulate their relationships effectively, increasing satisfaction of the both sides. Generally, there are many examples of two-sided markets: Uber,, AirBnB, etc. But crowdsourcing market has its own specific features: online job, low payments, low entry barrier for users and high entry barrier for customers and many others. The speech is devoted to machine learning problems we face with every day.
Specific features of conversations with chatbots: what people don't tell them.
Roshchina Nataliia & Ganyukova Maria, Just AI
Developing chatbots, computational linguists aim to simulate the most natural and "human" dialog. Nevertheless, it seems like ordinary people are not ready to take the chatbots as full-fledged interlocutors yet.
While analysing dialogues with our pilot voice virtual consultant, we noticed that the client's communication with a human operator and a bot significantly differs. These differences include the length of queries, their prosodic, grammatical and syntactic characteristics. However, the main purpose of the study was to identify content differences in queries and describe what the client usually says, referring to the operator, but omits when he communicates with the bot.
DeepPavlov: Open-Source Library for Dialogue Systems
Valentin Malykh, IPavlov
In this talk I'll describe briefly theory of dialog systems and shed some light on the internals of DeepPavlov library. In spite of the library's title it could be used for a wide variety of natural language processing tasks, like named entity recognition and many others
Machine learning solutions for chatbot interfaces
Sergey Verentsov, CTO EORA ZenSolutions
Currently, most of chatbots use machine learning only for fun purposes, like text generation based on chat content. Due to low interpretability and possible unpredictable results of ML methods, production chatbots tend to use strict scenarios with navigation based on buttons and regular expressions. However, in most cases, ML solutions can provide much cleaner UX, if set up correctly. This talk will cover various applications of advanced NLP methods for chatbots based on our production experience. We will describe possible drawbacks and pitfalls of them and give guidelines when to choose ML over classic chatbot practices.
Digitalization of strategic planning
Ekaterina Chernyak, Sberbank
In this talk we are going to present an ongoing project of Sberbank Chief Data Science Office. At Sberbank we develop frameworks, methods and instruments for AI transformation and digitalization of economy and strategy planning. Our work is based on the analysis of huge corpus of strategic planning documents, devoted to various aspects of development of Russian regions. The main purposes of the project are: 1) to extract various aspects of goal setting and planning, 2) to form an ontology of goals and indicators of achieving these goals. Such unsupervised NLP methods as phrase chunking, word embeddings and topic modeling are used for information extraction and ontology construction. The resulting ontology should serve in short-term as a helper tool for writing strategic planning documents and in long-term resolve the need to compose strategic planning documents at all by navigating through the ontology and selecting relevant goals and indicators.
How to analyse datasets more than 5 Gb on your laptop
Anna Voevodskaya, Jet Infosystems
It's quite easy with data analysis. If you have more experiments you have more successful results. If Jupyter notebook takes 3 Gb RAM instead of 10 you can run three of them with optimizing different models in parallel (like catboost, xgboost, lightgbm). The sooner the better. Isn't it better to spend 10 minutes instead of 5 hours on data preprocessing?
key points: How to use less memory? how pandas Dataframe storage in memory preprocessing of text variables preprocessing of numeric variables saving: joblib.dump vs pd.to_csv How to speed up work with Dataframe? how to parallelise pd.groupby the most efficient way? join vs merge functions on pd.Series. Which way is the fastest one? 1.iloc 2.iterrows 3.apply 4.vectorization 5. vectorization with .values 6. cython 7. numba
Why automatic processing of legal documents is so complicated? Specific features of legal applications of NLP algorithms.
Dmitry Konyrev, PAO Sberbank
Automation of legal texts processing and decision-making based on them is an important problem for many business tasks. However, this problem has a number of features related to the complexity of legal formulations. The task of understanding the legal text is often difficult even for a human. How to make the machine understand legal language and make decisions based on it? Many problems in this area are about extracting information from legal documents. However, this is complicated by the fact that legal knowledge is very subtle and often difficult to formalize. The purpose of our report is to tell about the real tasks related legal texts processing, the various approaches to their solution and the problems that arise in this regard.
Methods of Evaluation of Word Embeddings
Amir Bakarov, Huawei
Distributional semantic models like Word2Vec are probably the most ubiquitous tools in the natural language processing community. The weird situation is that herewith nobody knows how to evaluate and interpret them. Moreover, they are considered as representations of lexical meaning, but no formal definition of meaning exist in linguistics. In this talk I will try to bridge computational models that exploit distributional hypothesis with linguistic theories of lexical semantics, overviewing current directions in interpretation of word embeddings and existing attempts to evaluation. I will describe both widely-used and experimental approaches, systematize information about evaluation datasets, and also discuss some key challenges.
Intelligent Assistants and Conversational Interfaces
Marina Ashurkina, Huawei
Conversational interfaces are becoming more and more popular. Despite of a lot of unfortunate use cases, voice interfaces finally find their way to the customer. Intelligent assistants for mobile phones, conversational platforms, AI chatbots for messengers and of course smart speakers. Big players like Google, Amazon, Microsoft, Facebook and others are investing in the development of voice devices. In my presentation I will discuss the current state of voice technologies, popular and successful use cases.
Business Intelligence Workshop
In the framework of AINL Conference the Program Committee holds the section Business Intelligence organized by the Russian Presidential Academy of National Economy and Public Administration (RANEPA) being the partner of AINL-2018. Here the terms Business Intelligence (BI) mean application of artificial intelligence and natural language processing to economy, sociology, and administrative activity.

The objective of this section is to demonstrate possibilities of various technologies of text processing in the mentioned applications. For this we propose two tutorials and several reports.

Workshop organizers:
Sergey Maruev, RANEPA, chief of the Workshop
Mikhail Alexandrov, RANEPA, secretary of the Workshop
Application of Internet signals to problems of analysis and forecast (economy, sociology, administrative activity…
Anna Boldyreva
Internet Web pages for almost a decade has been a popular source of information for analysis and forecast of various events and phenomena of socio-economic and socio-political state of countries and their regions. With the development of social networks, these data have become a significant indicator of people's mood and expectations. In recent years, queries to Internet search engines have been actively used for the same purposes and in this tutorial we consider both Internet mentions and Internet queries named together as Internet signals. We show: a) where and how to measure the intensity of Internet signals; b) how to use statics and dynamics of these signals to analyze and forecast various economic, social, demographic and other parameters. Examples include: the analysis and forecast of the criminal situation in Russia and its regions, the analysis of attractiveness of various oil companies, and the forecast of indicators reflecting the oil market.


Anna Boldyreva received her B. Sc. (2015) and M.Sc. (2017) degrees in System Analysis and Control in the Moscow Institute of Physics and Technology (State University) and in Economics in the Russian Presidential Academy of National Economy and Public Administration (RANEPA). Currently she studies on PhD program in Moscow Institute of Physics and Technology (State University) and simultaneously she is involved in the project for the General Prosecutor Office. Anna had Potanin's Foundation scholarship and the scholarship of the Russian Government. Anna was an invited lecturer on the Intern. Conference AINL-2015, the Winter School in Minsk in 2016, and RANEPA&Sberbank master program FinTech in 2017. This year she was granted by the Russian Government for taking part in the Summer School "Island-2018" having been kept in Vladivostok. She has published 13 papers in her areas of interests, which are: data mining, Internet sociology, opinion mining, inductive modeling.
Assessment of the prospectivity of an innovative project in digital economy
Vadim Surin
Automated crowdfunding projects are the riskiest ones from the all other riskiest types of investments. The difficulty of assessment concerning prospect of an innovative project in digital economy is a lack of traditional market metrics for assessing value of assets. In the tutorial we show one of the possible ways to select the qualitative projects from the least promising one. For this we use visual data mining tools taken from Orange.


Vadim Surin received his BA degree in financial markets and financial engineering in the Financial University under the Government of Russia in 2017. Currently he studies on master program 'Systems of Big data in economics'. Vadim got experience as a blockchain analyst working in investment boutique in 2017. During this period Vadim took part in preparation of more than 20 analytical reports concerning risk assessments for these projects.
Feel free to contact us at