The Hindi-Urdu PropBank Project
The development of computational resources will provide a major boost to
Natural Language Processing applications for Hindi-Urdu. The Hindi-Urdu
PropBank project is part of an effort to create a large scale resource that
can be used for training machine learning algorithms and for carrying out
corpus linguistics studies. This is a collaborative effort to build a
'Multi-representational and multi-layered Treebank for Hindi-Urdu'. This is
a pioneering project in a language that has been resource-poor from the
computational point of view.
The PropBank project will result in an annotated corpus of 400K words for
Hindi and 200K words for Urdu. The Predicate-argument structure will be
annotated for each verb in the corpus and a detailed lexicon will also be
created for Hindi verb frames. The annotation of predicate-argument
structure will be preceded by a thorough analysis of linguistic phenomena
such as complex predicates. The guidelines that we will develop will merge
a theoretical understanding of Hindi grammar with computational
requirements.
Description of the project:
- Task: Building a PropBanked Hindi-Urdu corpus. The data consists of
Hindi newswire articles that have been treebanked using dependency
relations. The Hindi-Urdu PropBank project is part of a larger effort to
build a multi-layered and multi-representational resource for Hindi.
- Project status: The project is currently working on creating PropBank
annotation for 150K words from the Hindi Dependency Treebank.
Project Wiki:
The Hindi PropBank project shares a wiki page with the other research teams
from University of Washington, Columbia University, University of
Massachusetts at Amherst and IIT-Hyderabad, India.
Frameset Files:
The Hindi PropBank frameset files can be found here.
PropBank Team
Principal Investigators:
Martha Palmer, University of Colorado, Boulder
Bhuvana Narasimhan, University of Colorado, Boulder
Graduate Students:
Ashwini Vaidya, University of Colorado, Boulder
Consulting Postdoctoral Fellow:
Archna Bhatia,
PhD, University of Illinois, Urbana-Champaign
Programming Support:
Jinho D. Choi, University of Colorado, Boulder
Jena Hwang, University of Colorado, Boulder
Annotators:
Nithin Singh Mohan
Sanmati Suresh
Omkar Rao
Smitha Kamat
Mrigendra Singh
Neeraj Sharma
Computational Language
and Education Research, University of Colorado Boulder