paint-brush
Enhancing Rhetorical Role Labeling with Training-Time Neighborhood Learningby@instancing

Enhancing Rhetorical Role Labeling with Training-Time Neighborhood Learning

by Instancing
Instancing HackerNoon profile picture

Instancing

@instancing

Pioneering instance management, driving innovative solutions for efficient resource utilization,...

April 2nd, 2025
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section shows that contrastive learning, discourse-aware loss, and multi-prototype methods improve rhetorical role labeling by enhancing embeddings and addressing class imbalance, particularly in low-data settings.
featured image - Enhancing Rhetorical Role Labeling with Training-Time Neighborhood Learning
1x
Read by Dr. One voice-avatar

Listen to this story

Instancing HackerNoon profile picture
Instancing

Instancing

@instancing

Pioneering instance management, driving innovative solutions for efficient resource utilization, and enabling a more sus

About @instancing
LEARN MORE ABOUT @INSTANCING'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item

STORY’S CREDIBILITY

Academic Research Paper

Academic Research Paper

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Abstract and 1. Introduction

  1. Related Work

  2. Task, Datasets, Baseline

  3. RQ 1: Leveraging the Neighbourhood at Inference

    4.1. Methods

    4.2. Experiments

  4. RQ 2: Leveraging the Neighbourhood at Training

    5.1. Methods

    5.2. Experiments

  5. RQ 3: Cross-Domain Generalizability

  6. Conclusion

  7. Limitations

  8. Ethics Statement

  9. Bibliographical References

5.2. Experiments

5.2.1. Implementation Details


We use the same training setup as described in Sec. 4.2.1. We conduct grid-search for size of memory bank per label and number of prototypes in multi-prototypical learning in powers of 2 from [32,512] and [4,256] respectively using the validation set performance.


5.2.2. Results


Table 2, shows that incorporating contrastive loss improves performance across all datasets. Furthermore, the discourse-aware contrastive loss, which leverages relative position to organize embeddings,


Table 2: Results on four datasets for methods leveraging neighbourhood during training (RQ2). Contr., Disc., MB, Proto. indicates Contrastive, Discouse-aware, Memory Bank and Prototypical respectively.

Table 2: Results on four datasets for methods leveraging neighbourhood during training (RQ2). Contr., Disc., MB, Proto. indicates Contrastive, Discouse-aware, Memory Bank and Prototypical respectively.


Figure 2: t-SNE visualizations of different models on M-CL dataset. Disc.: Discourse, Contr.: Contrastive. head, torso and tail in Disc.-aware Contr. plot indicate the relative position of the sentence in a document.

Figure 2: t-SNE visualizations of different models on M-CL dataset. Disc.: Discourse, Contr.: Contrastive. head, torso and tail in Disc.-aware Contr. plot indicate the relative position of the sentence in a document.


enhances performance, supporting our hypothesis that sentences with the same label and in close proximity in the document should be closer in the embedding space. Augmenting the contrastive loss with a memory bank further enhances performance, particularly in macro-F1, benefiting sparse classes. However, the degree of improvement is less or negative in the discourse-aware variant. This can be due to the positional factor, as additional sentences from other documents retrieved from the memory bank are placed at the end of the document, leading to smaller penalization factors and contributing only marginally to the loss. Overall, the discourse-aware contrastive model emerges as the most effective among the contrastive variants.


The single prototypical variant performs comparably to the best contrastive variant and outperforms the baseline. This demonstrates that specific guiding points through prototypes can effectively aggregate knowledge from neighboring instances. Moreover, multiple prototypes further improve performance, highlighting the need to capture multifaceted nuances. These results suggest that the addition of respective losses can eliminate the need to design specific memory banks to expose the model to large batches for effective guidance from neighbors in contrastive learning.


Finally, combining the discourse-aware contrastive variant with both single and multiple prototype variants yields further improvement, highlighting the complementarity between these approaches. These results suggest that deriving supervisory signals from interactions among training instances can be an effective strategy for addressing the class imbalance problem, particularly in low-data settings.


Qualitative Analysis: To examine the impact of our auxiliary loss functions on the learned representations, we employ t-SNE (Hinton and Roweis, 2002) to project the high-dimensional latent space hidden states obtained by the model in Fig. 2. In the case of contrastive learning, we observe that sentences with the same label form distinct clusters. With the addition of discourse-aware contrastive loss, samples with the same label in a specific document adhere to the positional constraint, aligning with our hypothesis that samples sharing a label and closer in the discourse sequence should be positioned closer in the embedding space compared to those farther apart. In single prototypical learning, prototypes occupy the centers of corresponding sentences, forming distinctive manifolds. Similarly, multi-prototypical learning captures multifaceted aspects with prototypes dispersed across the embedding space, each prototype serving as


Table 3: Macro-F1 scores of our methods across three datasets. The column ‘train’ indicates the source dataset on which the model is trained and each of the dataset columns indicates the target test dataset. Scores in grey indicates the in-domain performance (trained and tested on same dataset). {DC, Disc. Contr.} : Discourse-aware contrastive, {Pr., Proto.} : Prototypical

Table 3: Macro-F1 scores of our methods across three datasets. The column ‘train’ indicates the source dataset on which the model is trained and each of the dataset columns indicates the target test dataset. Scores in grey indicates the in-domain performance (trained and tested on same dataset). {DC, Disc. Contr.} : Discourse-aware contrastive, {Pr., Proto.} : Prototypical


the center for respective samples. These visualizations affirm the effectiveness of our learning methods.


Authors:

(1) Santosh T.Y.S.S, School of Computation, Information, and Technology; Technical University of Munich, Germany (santosh.tokala@tum.de);

(2) Hassan Sarwat, School of Computation, Information, and Technology; Technical University of Munich, Germany (hassan.sarwat@tum.de);

(3) Ahmed Abdou, School of Computation, Information, and Technology; Technical University of Munich, Germany (ahmed.abdou@tum.de);

(4) Matthias Grabmair, School of Computation, Information, and Technology; Technical University of Munich, Germany (matthias.grabmair@tum.de).


This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.


L O A D I N G
. . . comments & more!

About Author

Instancing HackerNoon profile picture
Instancing@instancing
Pioneering instance management, driving innovative solutions for efficient resource utilization, and enabling a more sus

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Also published here
Hackernoon
X
Threads
Bsky