We are pleased to announce shared task on sentence paraphrase detection for the Russian language. A core dataset for the task is
ParaPhraser, a freely available corpus of Russian sentence pairs manually annotated as paraphrases, near-paraphrases and non-paraphrases. The corpus is developed in St. Petersburg State University as a part of the project led by E.Yagunova. The current size of the corpus is 7000 pairs, these data will be used as a training set. The test set is being currently collected using the same crowdsourcing procedure and will become a part of the general corpus after the end of the shared task; we expect the size of the test set will be about 1000 pairs.
DescriptionThe shared task will follow the standard procedure: the participating teams will take as an input a pair of sentences and return as a response the similarity class. The shared task will consist of two sub-tasks: binary classification (paraphrase or non-paraphrase) and three-class classification (precise paraphrases, near paraphrases or non-paraphrases). Participants may submit "standard" runs, which use as training data only the ParaPhraser corpus, and "non-standard" runs, which may use any other data. "Standard" and "non-standard" run will be evaluated separately.
More details
on the tasks, evaluation and data formats
are available here:
click