AINL 2022: Conference Schedule

(Moscow Time)

Academia Day
Thursday, April 14
10:00 – 10:10
10:00 – 10:10
Conference Opening
10:10 – 11:00
10:10 – 11:00
Towards Few-shot Learning in Task-oriented Dialogue Systems
Mi Fei, Huawei Noah's Ark lab

As the labeling cost for different modules in task-oriented dialog (ToD) systems is expensive, a major challenge is to train different modules with the least amount of labeled data. Recently, large-scale pre-trained language models (PLMs) have shown promising results for few-shot learning in ToD. This presentation will introduce two recent works to deal with this task from Huawei Noah's Ark Lab Speech & Semantics Lab. The first paper [1] devises a Self-Training approach to utilize the abundant unlabeled dialog data as well as a new text augmentation technique (GradAug) by replacing non-crucial tokens using a masked language model. The second paper [2] proposes Comprehensive Instruction (CINS) that better exploits PLMs with extra task-specific instructions for few-show learning w.r.t. different ToD Downstream tasks. Empirical results on multiple ToD downstream tasks reveal that both approaches consistently and notably outperforms using PLMs with standard finetune.

Paper references:
[1] Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems, EMNLP 2021
[2] CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems, AAAI 2022
11:00 – 11:40
11:00 – 11:40
Rethinking Crowd Sourcing for Semantic Similarity
Shaul Solomon, Adam Cohn, Hernan Rosenblum, Hezi Hershkovitz and Ivan Yamshchikov
Estimation of semantic similarity is crucial for various natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowdsourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category, namely, two sentences are either similar or not similar, and there is no middle ground, play the most critical role in the labeling. The paper offers heuristics to filter out unreliable annotators and stimulates further discussions on the human perception of semantics as a key to further developing human-centered artificial intelligence.
11:40 - 12:20
11:40 - 12:20
Interplay of Visual and Acoustic Cues of Irony Perception: a Case Study of Actor's Speech
Ulyana Kochetkova, Vera Evdokimova, Pavel Skrelin, Rada German and Daria Novoselova
This paper deals with the interaction of various cues – visual and acoustic – of ironic meaning, observed in the speech of Russian professional actors in films and series. We selected ironic and non-ironic utterances from modern films and series basing on the context, lexical and grammatical markers or on the whole storyline. We extracted video parts containing the target ironic and non-ironic utterances. Lexical and semantic markers of irony were excluded from the material. Then the video parts were suggested without sound to the participants of the irony recognition experiment. Another perceptual experiment was conducted with the audio material of the same utterances. Various acoustic and visual cues were considered in the target utterances in order to compare the perception and the production of irony. Segments duration, pitch movement, gestures and mimics, as well as the synchrony of those cues were analyzed. The results of the experiments demonstrated that the visual cues were more important for irony perception than the audio signal. At the same time, some video stimuli that had low recognition of irony were better recognized in the experiment with audio. It led us to suppose that actors use in various proportion visual and acoustic cues to express irony in speech.
12:20 - 13:00
12:20 - 13:00
WikiMulti: a Corpus for Cross-Lingual Summarization
Pavel Tikhonov
Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language.
We introduce WikiMulti - a new dataset for cross-lingual summarization based on Wikipedia articles in 15 languages. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We make our dataset publicly available.
13:00 - 14:00
13:00 - 14:00
Lunch
14:00 – 14:40
14:00 – 14:40
Inferring image background from text description
Dmitry Chizhikov, Valeria Efimova, Viacheslav Shalamov and Andrey Filchenkov
Over the past five years, text-to-image synthesis has achieved impressive results. However, the generated images are still produced in low resolution and suffer from indeterminacy. In order to improve this situation, we split the image synthesis task into separate parts with predicting the layout of the scene as one of these tasks.
We believe that it could be started with the background, which is usually determined by the location of the scene. Given a text, we can try to infer the location described or implied by it. In this paper, we propose two methods for obtaining this information from a given text. The first method called LET (Location Extraction Transformer) is intended to extract the words from a text, in which the scene location is directly mentioned. The second method we propose we call LIT (Location Inference Transformer). It is intended to infer the scene location which is implied by the text, but not mentioned directly. We have also collected two datasets to train and test the corresponding models. These datasets are publicly available at kaggle.com. We have compared the performance of our algorithm with several existing approaches which might be used for extracting the location information from the text. The results obtained by both LET and LIT have occurred to be more superior to other algorithms.
14:40 – 15:20
14:40 – 15:20
Development of folklore motif classifier using limited data
Maria Matveeva, Novosibirsk State University
The existence of mythological universals - common or similar folklore images and motifs in different cultures, makes it possible to catalog them and present them in the form of classifications. Attributing folklore texts to certain motifs is part of the work of folklorists, but at the moment only manual marking is possible. This paper proposes methods for developing a classifier of folklore motifs using the zero-shot approach, which makes it possible to train the classifier on a limited data set, and also allows predicting the plot for any text, even if the text with such a motif was not present in the training set. Various ways of vectorizing texts and various models were tested. Evaluation of the results of the work of classifiers allows us to assert that the developed classifier can correlate texts with motives with sufficient accuracy.
15:20 – 16:00
15:20 – 16:00
Morphological and emotional features of the speech in children with typical development, autism spectrum disorders and Down syndrome
Olesia Makhnytkina, Olga Frolova and Elena Lyakso
The paper presents the results of automatic classification of the dialogues of chil-dren with typical and atypical development using machine learning methods. The study proposes an approach to developing the automatic system for classification the state of children (typical development, autism spectrum disorders, and Down syndrome) based on the linguistic characteristics of speech. 62 children aged 8-11 years including 20 children with typical development (TD), 28 with autism spec-trum disorders (ASD), and 14 with Down syndrome (DS) were interviewed. The dataset contains 69 files with dialogues between adult and child. Only children's responses were used for further analysis. Morphological, graphematic, and emo-tional characteristics of speech were extracted from the text of the dialogues. A total of 62 linguistic features were extracted from each dialogue: the number of replicas, sentences, tokens, pauses, and unfinished words; the relative frequencies of parts of speech and some grammatical categories (animacy, number, aspect, involvement, mood, person, tense), and the statistics of positive and negative words use. The Mann-Whitney U test was used to assess differences in the lin-guistic features of the speech. The differences between boys with TD, ASD, and DS in 40 linguistic features of their speech were revealed. These features were used to develop classification models using machine learning methods: Gradient Boosting, Random Forest, Ada Boost. The revealed features showed a good dif-ferentiating ability. The classification accuracy for the dialogues of boys with TD, ASD and DS was 88%.
16:00 – 16:40
16:00 – 16:40
Russian Paraphrase Generation with Deep Reinforcement Learning
Nikita Martynov, Irina Krotova

We present deep reinforcement learning approach to paraphrase generation for the Russian language. Our framework consists of a generator and an evaluator. We used fine-tuned mT5-base model for paraphrase generation and RuBERT-based model trained on our RuPAWS dataset with adversarial paraphrase examples for evaluation. Compared to other models for Russian paraphrase generation, our model has the best results by LaBSE Cosine Distance and perplexity.
16:40 – 17:20
16:40 – 17:20
The Semantic Shifts of the Topical Structure in the Corpus of Lentach News Posts
Daria Axenova, Ivan Mamaev and Alena Mamaeva
Nowadays the interests of linguists are aimed at analyzing text collections with the help of automatic procedures. They pay special attention to the texts on social networks as their language features differ from the features of fiction texts or scientific articles. The paper is dedicated to the creation of dynamic topic models of Lentach news posts on VK social network. We use a mixture of NLP libraries to identify the semantic shifts of main topical sets since the end of 2018. The corpus contains more than 26,000 posts on various topics. In contemporary Russian linguistic papers, the collection of Lentach posts is rarely analyzed in terms of NLP. The results show that the main topics that are widely discussed in this community cover sports events, health issues, and protests.
17:20 - 18:20
17:20 - 18:20
Open Panel
Irina Pionkovskaya, Valentin Malykh, Tatiana Shavrina, et al.
Open discussion on the recent advances in NLP, especially on large language models.
Industrial Day
Friday, April 15
10:00 – 10:10
10:00 – 10:10
MTS Sponsor Greeting
Nikita Zelinsky, DS RnD Director, MTS Digital
Talks
10:00 – 10:50
10:00 – 10:50
Generating Music Using NLP
Ivan Bazanov, ML-developer, MTS Digital
10:50 – 11:20
10:50 – 11:20
Ontology-Based Question Answering over Corporate Structured Data
Konstantin Kondratiev
11:20 – 12:00
11:20 – 12:00
Salute, Turing: Turing test for Russian chat bots, 62 years after
Tatiana Shavrina
12:00 – 12:40
12:00 – 12:40
Creating the multipurpose russian chit-chat for the turing test
Alena Fenogenova
12:40 – 14:00
12:40 – 14:00
Lunch / Demo-session
TextToFace: neural network technologies for the speech and face of the speaker on video
Dmitry Botov
14:00 – 14:40
14:00 – 14:40
Medical Machine Translation Challenge
Maria Fjodorova, Elizaveta Ezhergina
14:40 – 15:20
14:40 – 15:20
Multilingual GPT-3: downstream task evaluation with seq2seq setup, few-shot and zero-shot
Oleh Shliazhko, Maria Tikhonova
15:20 – 16:00
15:20 – 16:00
Continuous prompt tuning for Russian: efficient solution for a variety of NLP task
Nikita Konodyuk, Maria Tikhonova
16:00 – 16:40
16:00 – 16:40
Named Entity Sentiment Analysis in Russian language
Margarita Tsobenko, Daria Rodionova
16:40 – 17:20
16:40 – 17:20
Topical Extractive Summarization
Kristina Zheltova, Anastasia Ianina and Valentin Malykh
17:20 – 18:00
17:20 – 18:00
Massively multilingual T5 for low-resource languages
Vitaly Protasov, Oleg Serikov
18:00 – 18:40
18:00 – 18:40
A thorny path to the creation of a summarizer for Russian.
Albina Akhmetgareeva
18:40 – 19:00
18:40 – 19:00
Closing remarks
Valentin Malykh
Feel free to contact us at ainlevent@gmail.com