VERB SENSE ANNOTATION PROJECT
University of Colorado at Boulder
Disambiguating Word Meanings
Word sense ambiguity is a continuing major obstacle to accurate information extraction, summarization and machine translation. Subtle fine-grained sense distinctions in WordNet have not lent themselves to high agreement between human annotators or high automatic tagging performance. Building on results in grouping fine-grained WordNet senses into more coarse-grained senses that led to improved inter-annotator agreement (ITA) and system performance (Palmer, Babko-Malaya, Dang, 2004, Palmer, Dang, Fellbaum, 2006), we have developed a process for rapid sense inventory creation and annotation that includes critical links between the grouped word senses and the Omega ontology. As part of OntoNotes we are annotating the most frequent noun and verb senses in a 300K subset of the PropBank, and will have this data available for release in the early fall of 2006. The senses are grouped by linguists using explicit syntactic and semantic criteria, which are included in the presentation of the sense groupings for the annotators. A 50-sentence sample of instances is annotated and immediately checked for inter-annotator agreement. ITA scores below 90% lead to a revision and clarification of the groupings by the linguist. It is only after the groupings have passed the ITA hurdle that each individual group is linked to a conceptual node in the ontology. In addition to higher accuracy, we find at least a three-fold increase in annotator productivity. The grouping for the 5 WN 2.1 senses for adjust is given below. We are also finding a corresponding improvement in system performance. Preliminary results on a subset of the newly annotated data achieve 80.7% accuracy for verbs, using the smoothing maximum entropy (MaxEnt) model from Mallet (McCallum, 2002). The features used for disambiguating the verb senses included topical features, collocation features, syntactic features (e.g., the subject, object, and prepositional phrases taken by a target verb), and semantic features (e.g., the WordNet synsets and hypernyms of the head nouns of a verb’s NP arguments). This system achieves state-of-the art performance on fine-grained senses as well, but the results are more than 10% lower, (Chen & Palmer, 2005). AGILE Team
Martha Palmer, Linguistics and Computer Science, University of Colorado Mitch Marcus, Computer Science, University of Pennsylvania Eduard Hovy, ISI, University of Southern California |
Acknowledgements
We gratefully acknowlege the support of the National Science Foundation Grant NSF-0415923, Word Sense Disambiguation, the DTO-AQUAINT NBCHC- 040036 grant under the University of Illinois subcontract to University of Pennsylvania 2003- 07911-01 and the GALE program of the Defense Advanced Research Projects Agency, Contract No. HR0011-06-C-0022, a subcontract on the BBN-AGILE Team. Disclaimer
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, the DTO, or DARPA |