The Hindi-Urdu PropBank Project

Digital photo taken by Marc Averette. Retrieved from Wikimedia Commons.

The development of computational resources will provide a major boost to Natural Language Processing applications for Hindi-Urdu. The Hindi-Urdu PropBank project is part of an effort to create a large scale resource that can be used for training machine learning algorithms and for carrying out corpus linguistics studies. This is a collaborative effort to build a 'Multi-representational and multi-layered Treebank for Hindi-Urdu'. This is a pioneering project in a language that has been resource-poor from the computational point of view.

The PropBank project will result in an annotated corpus of 400K words for Hindi and 200K words for Urdu. The Predicate-argument structure will be annotated for each verb in the corpus and a detailed lexicon will also be created for Hindi verb frames. The annotation of predicate-argument structure will be preceded by a thorough analysis of linguistic phenomena such as complex predicates. The guidelines that we will develop will merge a theoretical understanding of Hindi grammar with computational requirements.

Description of the project:

Task: Building a PropBanked Hindi-Urdu corpus. The data consists of Hindi newswire articles that have been treebanked using dependency relations. The Hindi-Urdu PropBank project is part of a larger effort to build a multi-layered and multi-representational resource for Hindi.
Project status: The project is currently working on creating PropBank annotation for 150K words from the Hindi Dependency Treebank.

Project Wiki:

The Hindi PropBank project shares a wiki page with the other research teams from University of Washington, Columbia University, University of Massachusetts at Amherst and IIT-Hyderabad, India.

Frameset Files:

The Hindi PropBank frameset files can be found here.

PropBank Team

Principal Investigators:
Martha Palmer, University of Colorado, Boulder
Bhuvana Narasimhan, University of Colorado, Boulder

Graduate Students:
Ashwini Vaidya, University of Colorado, Boulder

Consulting Postdoctoral Fellow:
Archna Bhatia, PhD, University of Illinois, Urbana-Champaign

Programming Support:
Jinho D. Choi, University of Colorado, Boulder
Jena Hwang, University of Colorado, Boulder

Annotators:
Nithin Singh Mohan
Sanmati Suresh
Omkar Rao
Smitha Kamat
Mrigendra Singh
Neeraj Sharma

Computational Language and Education Research, University of Colorado Boulder