All data for both the first and the second tasks will be available on Codabench.
Dataset
The data are divided into two main subsets: a training set (70%) and a testing set (30%). Additionally, the training data are further divided into development (20%) and training (80%) subsets.
the dataset contains 21K samples in total, and the subsets were randomly selected while preserving proportional distribution across classes.
The classes are codified as follows:
Mild (0)
Medium (1)
High (2)
Severe (3)
Task 1: Classification problem focused on the degree or severity of gender-based violence
Due to the characteristics of the multi-class problem, a random sample of 10K instances was selected. In this case, the task addresses a multi-label classification problem, where a single observation may include more than one type of violence.
The labels are defined as follows:
L0: Economic
L1: Physical
L2: Property-related
L3: Psychological
L4: Sexual
L5: Vicarious
L6: N/A


