WomenHelp 2025: Base de datos académica

All data for both the first and the second tasks will be available on Codabench.

Dataset

The data are divided into two main subsets: a training set (70%) and a testing set (30%). Additionally, the training data are further divided into development (20%) and training (80%) subsets.

https://www.codabench.org/competitions/14614/

the dataset contains 21K samples in total, and the subsets were randomly selected while preserving proportional distribution across classes.

The classes are codified as follows:

Mild (0)
Medium (1)
High (2)
Severe (3)

Task 1: Classification problem focused on the degree or severity of gender-based violence

Due to the characteristics of the multi-class problem, a random sample of 10K instances was selected. In this case, the task addresses a multi-label classification problem, where a single observation may include more than one type of violence.

The labels are defined as follows:

L0: Economic
L1: Physical
L2: Property-related
L3: Psychological
L4: Sexual
L5: Vicarious
L6: N/A

Task 2: Multiclass classification problem, where narratives are analyzed to identify the different types of violence present in each scenario

Email

info@womenhelp.com.mx

https://www.codabench.org/competitions/14614/