AINL 2018 - Agenda

Program

Keynote lectures

Paper Presentations

Poster and Demo Session

Industrial Session

Tutorials

BI Workshop

Keynote lectures

Societal Challenges for Information Retrieval

Benno Stein

Biography
Benno Stein is chair of the Web-Technology and Information Systems Group at the Bauhaus-Universität Weimar. His research focuses on modeling and solving data- and knowledge-intensive information processing tasks. Common ground of his research are the principles and methods of symbolic Artificial Intelligence. Benno Stein has developed theories, algorithms, and tools for information retrieval, data mining, computational linguistics, knowledge processing, as well as for engineering design and simulation (patents granted). For several achievements of his research he has been awarded with scientific and commercial prizes.

Professional background: Study at the University of Karlsruhe (1984 - 1989). Dissertation (1995) and Habilitation (2002) in computer science at the University of Paderborn. Appointment as a full professor for Web Technology and Information Systems at the Bauhaus-Universität Weimar (2005). Research stays at IBM, Germany, and the International Computer Science Institute, Berkeley. Benno Stein serves on scientific boards, on program committees, as reviewer in various relevant conferences and journals, and he is the initiator and a co-chair of PAN, an excellence network and evaluation lab on digital text forensics with focus on authorship analysis, profiling, and reuse detection. He is cofounder and spokesperson of the Digital Bauhaus Lab Weimar, a visionary and interdisciplinary research center for Computer Science, Arts, and Engineering. Not least, he is a cofounder (1996) and scientific director of the Art Systems Software Ltd, a world leading company for simulation technology in fluidic engineering.

Tutorials

Automatic Text Simplification

Dr. Sanja Štajner

Lexically and syntactically complex texts pose difficulties both for humans and for various NLP tasks (machine translation, information extraction, semantic role labelling, etc.). The task of automatic text simplification appeared some 30 years ago, but attracted wide attention from the NLP community only in the last few years. In this tutorial, I will present the main types of automatic text simplification systems, the most successful approaches proposed so far, and the existing resources for text simplification. The main focus will be on discussing the strengths and weaknesses of each of the approaches, the limitations of the existing resources and text simplification systems, and the complexity of the task.

Biography
Sanja Štajner is currently a postdoctoral research fellow at the University of Mannheim, Germany. She holds a multiple Masters degree in Natural Language Processing and Human Language Technologies (Autonomous University of Barcelona, Spain and University of Wolverhampton, UK) and the PhD degree in Computer Science from the University of Wolverhampton on the topic of "Data-driven Text Simplification". She participated in Simplext and FIRST projects on automatic text simplification, and is the lead author of four ACL papers on text simplification (including the first neural text simplification system) and numerous other papers on the topics of text simplification and readability assessment at various leading international conferences and journals.

Sanja regularly teaches NLP at Masters and PhD levels, delivers invited talks and seminars at various universities and companies. She held a tutorial on "Deep Learning for Text Simplification" at RANLP 2017, and tutorial on "Data-Driven Text Simplification" at COLING 2018. She is an area chair for COLING 2018, and regular program committee member of ACL, EMNLP, LREC, IJCAI, IAAA and other international conferences and journals. She was a lead organizer of the first international workshop and shared task on Quality Assessment of Text Simplification (QATS) in 2016, and the Complex Word Identification shared task in 2018.
How to create virtual assistant skill in Just AI DSL.

Darya Serdyuk & Svetlana Volskaya, Just AI

We will talk about computational linguistics problems in chatbot development, show how our platform can solve these problems and demonstrate how to create a chatbot quickly and easily with our Just AI DSL.
Graph Clustering for Natural Language Processing

Dr. Dmitry Ustalov

Graph-based representations are proven to be an effective approach for a variety of Natural Language Processing (NLP) tasks. Graph clustering makes it possible to extract useful knowledge by exploiting the implicit structure of the data. In this tutorial, we will present several efficient graph clustering algorithms, show their strengths and weaknesses as well as their implementations and applications. Then, the evaluation methodology in unsupervised NLP tasks will be discussed.

Biography
Dmitry Ustalov is a post-doctoral research fellow at the University of Mannheim, Germany. His research is focused on Computational Lexical Semantics and Crowdsourcing. In 2018 he defended his Kandidat Nauk (PhD) thesis which he worked on at the Krasovskii Institute of Mathematics and Mechanics, Russia. Dmitry's research is published in the premier international scientific conferences, such as ACL, EACL, and EMNLP. He serves as a reviewer for ACL, EMNLP, *SEM, EKAW, TextGraphs, and other high-level events.

In 2012 Dmitry founded NLPub, the leading Russian wiki on Computational Linguistics. Since 2014 he has been co-organizing the workshop on Russian Semantic Evaluation (RUSSE). Also, Dmitry teaches Text Analytics and Web Mining classes to master students.

Paper Presentations

Avoiding Echo-Responses in a Retrieval-Based Conversation System

Denis Fedorenko, Nikita Smetanin and Artem Rodichev

Supervised Mover's Distance: A simple model for sentence comparison

Muktabh Mayank Srivastava

Direct-Bridge Combination Scenario for Persian-Spanish Minimal Parallel-Resource Statistical Machine Translation

Benyamin Ahmadnia, Javier Serrano, Gholamreza Haffari and Nik-Mohammad Balouchzahi

Deep Convolutional Networks for Supervised Morpheme Segmentation of Russian Language

Alexey Sorokin and Anastasia Kravtsova

Cleaning up after a Party: Post-processing Thesaurus Crowdsourced Data

Oksana Antropova, Elena Arslanova, Maxim Shaposhnikov, Pavel Braslavski and Mihail Mukhin

A model-free comorbidities-based events prediction in ICU unit

Tatiana Malygina and Ivan Drokin

Profiling the age of Russian bloggers

Tatiana Litvinova, Alexandr Sboev and Polina Panicheva

A comparative study of publicly available Russian sentiment lexicons

Evgeny Kotelnikov, Tatiana Peskisheva, Anastasia Kotelnikova and Elena Razova

Smart Context Generation for Disambiguation to Wikipedia

Andrey Sysoev and Irina Nikishina

Acoustic Features of Speech of Typically Developing Children Aged 5-16 Years

Aleksey Grigorev, Olga Frolova and Elena Lyakso

Stierlitz Meets SVM: Humor Detection in Russian

Anton Ermilov, Natasha Murashkina, Valeria Goryacheva and Pavel Braslavski

Interractive Attention Network for Adverse Drug Reaction Classification

Ilseyar Alimova and Valery Solovyev

Explicit Semantic Analysis as a Means for Topic Labelling

Anna Kriukova, Aliia Erofeeva, Olga Mitrofanova and Kirill Sukharev

A Multi-Feature Classifier for Verbal Metaphor Identification in Russian Texts

Yulia Badryzlova and Polina Panicheva

Lemmatization for ancient languages: rules or neural networks?

Oksana Dereza

Automatic mining of discourse connectives for Russian

Svetlana Toldova, Dina Pisarevskaya and Maria Koboseva

Four Keys to Topic Interpretability in Topic Modeling

Andrey Mavrin, Andrey Filchenkov and Sergei Koltcov

Named Entity Recognition in Russian with Word Representation Learned by a Bidirectional Language Model

Georgy Konoplich, Evgeny Putin, Andrey Filchenkov and Roman Rybka

Modeling Propaganda Battle: Decision-Making, Homophily and Echo Chambers

Alexander Petrov and Olga Proncheva

poster and demo session

POSTERS

Denotation graph as knowledge storage structure in goal-oriented dialogue systems

Aleksandr Perevalov

Legal compliance with EU and Russian data protection regulation: voice and speech technology case

Ilya Ilin

Using Ensemble of Binary Classifiers for Voice Spoofing Detection

Andrey Lependin

Antidictionary: an annotated database of out-of-dictionary words

Irina Krotova

An approach for solving the unbalanced hierarchical multiclass classification problem for large textual datasets

Petr Pogorelov

Component-based approach to automatic poetry generation

Anna Mosolova

Deriving Cognitive Map Concepts on the Basis of Social Media Data Clustering

Vasiliy Kireev

DEMOS

RusVectōrēs: word embeddings for Russian

Elizaveta Kuzmenko

Russian Learner Translator Corpus 2.0: newer and better

Maria Kunilovskaya

RusNLP: Semantic search engine for Russian NLP conference papers

Andrey Kutuzov, Irina Nikishina, Amir Bakarov

Vecto: A Framework for Word, Sentence, Character Embeddings and Beyond! (http://vecto.space)

Amir Bakarov

industrial session

ML problems in crowdsoursing platform Yandex.Toloka.

Misha Slabodkin, Yandex

Crowdsourcing platform is a two-sided market where customers and workers look up each other. Customers place their tasks (usually related to machine learning purposes, such as collecting ground-truth labels for their datasets), workers do these tasks gaining money. As a platform, we should regulate their relationships effectively, increasing satisfaction of the both sides. Generally, there are many examples of two-sided markets: Uber, booking.com, AirBnB, etc. But crowdsourcing market has its own specific features: online job, low payments, low entry barrier for users and high entry barrier for customers and many others. The speech is devoted to machine learning problems we face with every day.
Specific features of conversations with chatbots: what people don't tell them.

Roshchina Nataliia & Ganyukova Maria, Just AI

Developing chatbots, computational linguists aim to simulate the most natural and "human" dialog. Nevertheless, it seems like ordinary people are not ready to take the chatbots as full-fledged interlocutors yet.
While analysing dialogues with our pilot voice virtual consultant, we noticed that the client's communication with a human operator and a bot significantly differs. These differences include the length of queries, their prosodic, grammatical and syntactic characteristics. However, the main purpose of the study was to identify content differences in queries and describe what the client usually says, referring to the operator, but omits when he communicates with the bot.
DeepPavlov: Open-Source Library for Dialogue Systems

Valentin Malykh, IPavlov

In this talk I'll describe briefly theory of dialog systems and shed some light on the internals of DeepPavlov library. In spite of the library's title it could be used for a wide variety of natural language processing tasks, like named entity recognition and many others
Machine learning solutions for chatbot interfaces

Sergey Verentsov, CTO EORA ZenSolutions

Currently, most of chatbots use machine learning only for fun purposes, like text generation based on chat content. Due to low interpretability and possible unpredictable results of ML methods, production chatbots tend to use strict scenarios with navigation based on buttons and regular expressions. However, in most cases, ML solutions can provide much cleaner UX, if set up correctly. This talk will cover various applications of advanced NLP methods for chatbots based on our production experience. We will describe possible drawbacks and pitfalls of them and give guidelines when to choose ML over classic chatbot practices.
Digitalization of strategic planning

Ekaterina Chernyak, Sberbank

In this talk we are going to present an ongoing project of Sberbank Chief Data Science Office. At Sberbank we develop frameworks, methods and instruments for AI transformation and digitalization of economy and strategy planning. Our work is based on the analysis of huge corpus of strategic planning documents, devoted to various aspects of development of Russian regions. The main purposes of the project are: 1) to extract various aspects of goal setting and planning, 2) to form an ontology of goals and indicators of achieving these goals. Such unsupervised NLP methods as phrase chunking, word embeddings and topic modeling are used for information extraction and ontology construction. The resulting ontology should serve in short-term as a helper tool for writing strategic planning documents and in long-term resolve the need to compose strategic planning documents at all by navigating through the ontology and selecting relevant goals and indicators.
How to analyse datasets more than 5 Gb on your laptop

Anna Voevodskaya, Jet Infosystems

It's quite easy with data analysis. If you have more experiments you have more successful results. If Jupyter notebook takes 3 Gb RAM instead of 10 you can run three of them with optimizing different models in parallel (like catboost, xgboost, lightgbm). The sooner the better. Isn't it better to spend 10 minutes instead of 5 hours on data preprocessing?
key points: How to use less memory? how pandas Dataframe storage in memory preprocessing of text variables preprocessing of numeric variables saving: joblib.dump vs pd.to_csv How to speed up work with Dataframe? how to parallelise pd.groupby the most efficient way? join vs merge functions on pd.Series. Which way is the fastest one? 1.iloc 2.iterrows 3.apply 4.vectorization 5. vectorization with .values 6. cython 7. numba
Why automatic processing of legal documents is so complicated? Specific features of legal applications of NLP algorithms.

Dmitry Konyrev, PAO Sberbank

Automation of legal texts processing and decision-making based on them is an important problem for many business tasks. However, this problem has a number of features related to the complexity of legal formulations. The task of understanding the legal text is often difficult even for a human. How to make the machine understand legal language and make decisions based on it? Many problems in this area are about extracting information from legal documents. However, this is complicated by the fact that legal knowledge is very subtle and often difficult to formalize. The purpose of our report is to tell about the real tasks related legal texts processing, the various approaches to their solution and the problems that arise in this regard.
Methods of Evaluation of Word Embeddings

Amir Bakarov, Huawei

Distributional semantic models like Word2Vec are probably the most ubiquitous tools in the natural language processing community. The weird situation is that herewith nobody knows how to evaluate and interpret them. Moreover, they are considered as representations of lexical meaning, but no formal definition of meaning exist in linguistics. In this talk I will try to bridge computational models that exploit distributional hypothesis with linguistic theories of lexical semantics, overviewing current directions in interpretation of word embeddings and existing attempts to evaluation. I will describe both widely-used and experimental approaches, systematize information about evaluation datasets, and also discuss some key challenges.
Intelligent Assistants and Conversational Interfaces

Marina Ashurkina, Huawei

Conversational interfaces are becoming more and more popular. Despite of a lot of unfortunate use cases, voice interfaces finally find their way to the customer. Intelligent assistants for mobile phones, conversational platforms, AI chatbots for messengers and of course smart speakers. Big players like Google, Amazon, Microsoft, Facebook and others are investing in the development of voice devices. In my presentation I will discuss the current state of voice technologies, popular and successful use cases.

Business Intelligence Workshop

In the framework of AINL Conference the Program Committee holds the section Business Intelligence organized by the Russian Presidential Academy of National Economy and Public Administration (RANEPA) being the partner of AINL-2018. Here the terms Business Intelligence (BI) mean application of artificial intelligence and natural language processing to economy, sociology, and administrative activity.

The objective of this section is to demonstrate possibilities of various technologies of text processing in the mentioned applications. For this we propose two tutorials and several reports.

Workshop organizers:
Sergey Maruev, RANEPA, chief of the Workshop
Mikhail Alexandrov, RANEPA, secretary of the Workshop

Application of Internet signals to problems of analysis and forecast (economy, sociology, administrative activity…

Anna Boldyreva

Internet Web pages for almost a decade has been a popular source of information for analysis and forecast of various events and phenomena of socio-economic and socio-political state of countries and their regions. With the development of social networks, these data have become a significant indicator of people's mood and expectations. In recent years, queries to Internet search engines have been actively used for the same purposes and in this tutorial we consider both Internet mentions and Internet queries named together as Internet signals. We show: a) where and how to measure the intensity of Internet signals; b) how to use statics and dynamics of these signals to analyze and forecast various economic, social, demographic and other parameters. Examples include: the analysis and forecast of the criminal situation in Russia and its regions, the analysis of attractiveness of various oil companies, and the forecast of indicators reflecting the oil market.

Biography

Anna Boldyreva received her B. Sc. (2015) and M.Sc. (2017) degrees in System Analysis and Control in the Moscow Institute of Physics and Technology (State University) and in Economics in the Russian Presidential Academy of National Economy and Public Administration (RANEPA). Currently she studies on PhD program in Moscow Institute of Physics and Technology (State University) and simultaneously she is involved in the project for the General Prosecutor Office. Anna had Potanin's Foundation scholarship and the scholarship of the Russian Government. Anna was an invited lecturer on the Intern. Conference AINL-2015, the Winter School in Minsk in 2016, and RANEPA&Sberbank master program FinTech in 2017. This year she was granted by the Russian Government for taking part in the Summer School "Island-2018" having been kept in Vladivostok. She has published 13 papers in her areas of interests, which are: data mining, Internet sociology, opinion mining, inductive modeling.
Assessment of the prospectivity of an innovative project in digital economy

Vadim Surin

Automated crowdfunding projects are the riskiest ones from the all other riskiest types of investments. The difficulty of assessment concerning prospect of an innovative project in digital economy is a lack of traditional market metrics for assessing value of assets. In the tutorial we show one of the possible ways to select the qualitative projects from the least promising one. For this we use visual data mining tools taken from Orange.

Biograghy

Vadim Surin received his BA degree in financial markets and financial engineering in the Financial University under the Government of Russia in 2017. Currently he studies on master program 'Systems of Big data in economics'. Vadim got experience as a blockchain analyst working in investment boutique in 2017. During this period Vadim took part in preparation of more than 20 analytical reports concerning risk assessments for these projects.

Feel free to contact us at ainlevent@gmail.com

Cover photo by Stanislav Zaburdaev