Collocation is a linguistic phenomenon that is difficult to define and harder to explain; it has been largely overlooked in the field of computational linguistics due to its difficulty. Although standard techniques exist for finding collocations, they tend to be rather noisy and suffer from sparse data problems. In this paper, we demonstrate that by utilising parsed input to concentrate on one very specific type of collocation---in this case, verbs with particles, a subset of the so-called ``multi-word'' verbs---and applying an algorithm to promote those collocations in which we have more confidence, the problems with statistically learning collocations can be overcome.
@inproceedings{blah01, author = {Don Blaheta and Mark Johnson}, year = 2001, title = {Unsupervised learning of multi-word verbs}, booktitle = {{ACL} Workshop on Collocation}, pages = {54--60} }Other papers