Don Blaheta's PhD thesis, ``Function tagging''.


Function tags are a context-sensitive annotation applied to words and phrases of natural language text, marking their syntactic or semantic role within a larger utterance. As researchers improve results on various other problems in ``pure'' natural language processing (e.g. part-of-speech tagging, parsing), those who work in the more ``applied'' NLP fields (e.g. question-answering, temporal analysis) are seeking more powerful sorts of linguistic annotation as input for their own systems. Hence, function tags.

In the first part of the thesis, I present the problem of function tagging: why it is an interesting problem, who has worked on similar thing, and what exactly I intend to do. I briefly review the function tags of the Penn treebank, and explain the specific metrics by which I will evaluate my work.

In the second part of the thesis, I introduce the many features that I will use to train a function tagging system, and then I present some systems that make use of them: one using feature trees, one using decision trees (briefly), and one using perceptron models. For each system, I give a brief historical perspective, an overview of where it has been used before and why I think it will be useful in this task. I will then try a number of feature combinations with interesting properties; and finally, present the best-performing tweaked-out version of that system.

Finally, in the third part of the thesis, I bring them all together and discuss the advantages and disadvantages of each system in various situations. More interestingly, I will present an analysis of what features prove to be the most helpful for the different function tagging subtasks. Lastly, I will present a comparison to other systems performing related tasks, and speculate on some interesting future work.

Full text

BibTeX entry

  author = {Don Blaheta},
  title = {Function Tagging},
  school = {Brown University},
  year = 2004}

Talk slides

other publications

Don Blaheta /