ParaBank 2: pushing the limit of diversity with sampling and clustering

Should paraphrasing use the same generation pipeline as machine translation? A question whose answer has been taken for granted. We suggest that paraphrasing is about the flexibility of human communication, while translation is about finding the one best elicitation. This inspired us to use a novel “sample, and then cluster” pipeline to replace beam search when generating paraphrases.

The paper is titled “Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering,” which I presented at CoNLL 2019.

Check it out on ACL Anthology: or here.