Don Blaheta, ``Handling noisy training and testing data'', Proceedings of the 7th conference on Empirical Methods in Natural Language Processing, 2002.

Abstract

In the field of empirical natural language processing, researchers constantly deal with large amounts of marked-up data; whether the markup is done by the researcher or someone else, human nature dictates that it will have errors in it. This paper will more fully characterise the problem and discuss whether and when (and how) to correct the errors. The discussion is illustrated with specific examples involving function tagging in the Penn treebank.

Full text

PS (6 pages/321K)
PDF (6 pages/102K)

BibTeX entry

@inproceedings{blah02,
  author = {Don Blaheta},
  month = {July},
  year = 2002,
  title = {Handling noisy training and testing data},
  booktitle = {Proceedings of the 7th conference on
	{E}mpirical {M}ethods in
	{N}atural {L}anguage {P}rocessing},
  pages = {111--116} }

Other papers

Don Blaheta / blahetadp@blahedo.org

Don Blaheta, ``Handling noisy training and testing data'', Proceedings of the 7th conference on Empirical Methods in Natural Language Processing, 2002.

Abstract

Full text

Related links

BibTeX entry