Learning Syntax from Naturally-Occurring Bracketings. (arXiv:2104.13933v1 [cs.CL])

Naturally-occurring bracketings, such as answer fragments to natural language
questions and hyperlinks on webpages, can reflect human syntactic intuition
regarding phrasal boundaries. Their availability and approximate correspondence
to syntax make them appealing as distant information sources to incorporate
into unsupervised constituency parsing. But they are noisy and incomplete; to
address this challenge, we develop a partial-brackets-aware structured ramp
loss in learning. Experiments demonstrate that our distantly-supervised models
trained on naturally-occurring bracketing data are more accurate in inducing
syntactic structures than competing unsupervised systems. On the English WSJ
corpus, our models achieve an unlabeled F1 score of 68.9 for constituency



