Evaluation metrics.
Systems participating in all subtasks will be ranked using the macro-F1 score. For the multilabel task, additional evaluation metrics include micro-F1, example-based F1, and Hamming loss. Macro-averaged metrics are computed by evaluating each label independently and averaging across labels, while micro-averaged metrics aggregate contributions across all labels. Hamming loss is reported to quantify the proportion of incorrectly predicted labels.
Final submission format
All the submissions must be in a .zip format. Inside the ZIP, the predictions must be in a CSV format and the order of the IDs must be in the same order as you downloaded. E.g.
For both tasks 1 and 2 the format is the same:
ID,label
0000, low
0001, physical
The name of the CSV for task 1 must be task1_predictions.csv.
The name of the CSV for task 2 must be task2_predictions.csv.
Submit a single ZIP file with a generic name, for example, predictions.zip. If you participate in only one task, the ZIP file will contain a single CSV file. If you participate in two tasks, you should send both prediction files (CSV) in the same ZIP file.
info@womenhelp.com.mx
© 2026. All rights reserved.