Define whether the text is human-written or AI-generated; if it is AI-generated, identify texts generated by a model that is not included in the training data.
DatasetThe generated texts were obtained using GPT4-Turbo, Gemma2-27B, Llama3.3-70B and two other models. The models were prompted with the title and keywords from the paper. Additionally, we performed some post-processing by removing empty outputs and model-specific text beginnings, while preserving the main content of the abstracts as is.
The training set consists of
~35 000 texts from 10 different domains (around 4 000 texts per domain), where each text is labeled as one of the following: 'human', 'llama-3.3-70b', 'gemma-2-27b', 'gpt-4-turbo'.
The public test set includes
~11 000 texts from the same 10 domains with the same labels, and additional
~2 000 texts generated by an unseen model (unknown to the participants).
The private test set contains
~6 000 texts from 10 different domains (around 600 texts per domain). Among these, only 8 domains overlap with the training set, while 2 domains are not present in the training data. This set includes texts written by humans, generated by the aforementioned models, and by another model unknown to the participants.
Thus, we invite participants to propose solutions to the following key challenges:
- Handling data that extends beyond the training set (generalization to new domains).
- Detecting texts generated by a model not included in the training data (generalization to new models).
Submission FormatThe file
submission.csv provides the submission format.
BaselinesWe provide two baselines (look into the repository):
- TF-IDF + LogReg
- fine-tuning BERT
EvaluationAccuracy will be used to evaluate the solutions.
Participation RulesFinal results will be obtained on the private test set and announced during the
AINL 2025 conference. Each team will have 3 attempts to submit final results. After the competition is over, we will publish the ground truths for the test set, so the participants could perform some ablation studies.
The competition does not involve the use of external data, so the participants are encouraged to use only the given training set.