AINL-Eval-2025

Call for Papers

Shared Task

AINL:
ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE CONFERENCE

Submission

Topics

Dates

Committee

•

INtroduction

In recent years, developing large language models (LLMs) has revolutionized NLP. Nowadays, they are used almost everywhere, and this is undoubtedly a great progress. The generated texts are really close to human-level writing, making it increasingly challenging to differentiate between AI and human-generated content. However, there are some domains where LLMs usage is not desirable such as fake news generation or writing scientific papers. The latter problem is the main focus of this challenge. We aim to detect AI-generated abstracts for the scientific papers in the Russian language to facilitate research and development tools for this language.

Useful Links
Telegram chat: t.me/AINL_Eval_2025
Competition Platform: AINL-Eval 2025
Repo with baselines: github.com/iis-research-team/AINL-Eval-2025

Task Description

Define whether the text is human-written or AI-generated; if it is AI-generated, identify texts generated by a model that is not included in the training data.

Dataset
The generated texts were obtained using GPT4-Turbo, Gemma2-27B, Llama3.3-70B and two other models. The models were prompted with the title and keywords from the paper. Additionally, we performed some post-processing by removing empty outputs and model-specific text beginnings, while preserving the main content of the abstracts as is.

The training set consists of ~35 000 texts from 10 different domains (around 4 000 texts per domain), where each text is labeled as one of the following: 'human', 'llama-3.3-70b', 'gemma-2-27b', 'gpt-4-turbo'.

The public test set includes ~11 000 texts from the same 10 domains with the same labels, and additional ~2 000 texts generated by an unseen model (unknown to the participants).

The private test set contains ~6 000 texts from 10 different domains (around 600 texts per domain). Among these, only 8 domains overlap with the training set, while 2 domains are not present in the training data. This set includes texts written by humans, generated by the aforementioned models, and by another model unknown to the participants.

Thus, we invite participants to propose solutions to the following key challenges:

Handling data that extends beyond the training set (generalization to new domains).
Detecting texts generated by a model not included in the training data (generalization to new models).

Submission Format
The file submission.csv provides the submission format.

Baselines
We provide two baselines (look into the repository):

TF-IDF + LogReg
fine-tuning BERT

Evaluation
Accuracy will be used to evaluate the solutions.

Participation Rules
Final results will be obtained on the private test set and announced during the AINL 2025 conference. Each team will have 3 attempts to submit final results. After the competition is over, we will publish the ground truths for the test set, so the participants could perform some ablation studies.
The competition does not involve the use of external data, so the participants are encouraged to use only the given training set.

important dates

Competition Starts

March 3

March 5

Dev Phase
is Open

March 25

Test Phase
is Open

Result
Announcement

April 18

Test Phase
is Closed

April 1

May 5

Paper Submission

Shared Task Organizers

Tatiana Batura, Ershov Institute of Informatics Systems SB RAS
Elena Bruches, Ershov Institute of Informatics Systems SB RAS
and Novosibirsk State University
Milana Shvenk, Novosibirsk State University
Valentin Malykh, Moscow Institute of Physics and Technology

organizers

Cover photo from pixabay