All data for both the first and the second tasks will be available on Codabench.

Dataset

The data are divided into two main subsets: a training set (70%) and a testing set (30%). Additionally, the training data are further divided into development (20%) and training (80%) subsets.

the dataset contains 21K samples in total, and the subsets were randomly selected while preserving proportional distribution across classes.

The classes are codified as follows:

  • Mild (0)

  • Medium (1)

  • High (2)

  • Severe (3)

Task 1: Classification problem focused on the degree or severity of gender-based violence

Due to the characteristics of the multi-class problem, a random sample of 10K instances was selected. In this case, the task addresses a multi-label classification problem, where a single observation may include more than one type of violence.

The labels are defined as follows:

  • L0: Economic

  • L1: Physical

  • L2: Property-related

  • L3: Psychological

  • L4: Sexual

  • L5: Vicarious

  • L6: N/A

Task 2: Multiclass classification problem, where narratives are analyzed to identify the different types of violence present in each scenario