The Hindi-Urdu PropBank Project
The development of computational resources will provide a major boost to
Natural Language Processing applications for Hindi-Urdu. The Hindi-Urdu
PropBank project is part of an effort to create a large scale resource that
can be used for training machine learning algorithms and for carrying out
corpus linguistics studies. This is a collaborative effort to build a
'Multi-representational and multi-layered Treebank for Hindi-Urdu'. This is
a pioneering project in a language that has been resource-poor from the
computational point of view.
The PropBank project will result in an annotated corpus of 400K words for
Hindi and 200K words for Urdu. The Predicate-argument structure will be
annotated for each verb in the corpus and a detailed lexicon will also be
created for Hindi verb frames. The annotation of predicate-argument
structure will be preceded by a thorough analysis of linguistic phenomena
such as complex predicates. The guidelines that we will develop will merge
a theoretical understanding of Hindi grammar with computational
Description of the project:
- Task: Building a PropBanked Hindi-Urdu corpus. The data consists of
Hindi newswire articles that have been treebanked using dependency
relations. The Hindi-Urdu PropBank project is part of a larger effort to
build a multi-layered and multi-representational resource for Hindi.
- Project status: The project is currently working on creating PropBank
annotation for 150K words from the Hindi Dependency Treebank.
The Hindi PropBank project shares a wiki page with the other research teams
from University of Washington, Columbia University, University of
Massachusetts at Amherst and IIT-Hyderabad, India.
The Hindi PropBank frameset files can be found here.
Martha Palmer, University of Colorado, Boulder
Bhuvana Narasimhan, University of Colorado, Boulder
Ashwini Vaidya, University of Colorado, Boulder
Consulting Postdoctoral Fellow:
PhD, University of Illinois, Urbana-Champaign
Jinho D. Choi, University of Colorado, Boulder
Jena Hwang, University of Colorado, Boulder
Nithin Singh Mohan
and Education Research, University of Colorado Boulder