The Hindi-Urdu PropBank Project

The development of computational resources will provide a major boost to Natural Language Processing applications for Hindi-Urdu. The Hindi-Urdu PropBank project is part of an effort to create a large scale resource that can be used for training machine learning algorithms and for carrying out corpus linguistics studies. This is a collaborative effort to build a 'Multi-representational and multi-layered Treebank for Hindi-Urdu'. This is a pioneering project in a language that has been resource-poor from the computational point of view.

The PropBank project will result in an annotated corpus of 400K words for Hindi and 200K words for Urdu. The Predicate-argument structure will be annotated for each verb in the corpus and a detailed lexicon will also be created for Hindi verb frames. The annotation of predicate-argument structure will be preceded by a thorough analysis of linguistic phenomena such as complex predicates. The guidelines that we will develop will merge a theoretical understanding of Hindi grammar with computational requirements.

Description of the project:

Project Wiki:

The Hindi PropBank project shares a wiki page with the other research teams from University of Washington, Columbia University, University of Massachusetts at Amherst and IIT-Hyderabad, India.

Frameset Files:

The Hindi PropBank frameset files can be found here.

PropBank Team

Principal Investigators:
Martha Palmer, University of Colorado, Boulder
Bhuvana Narasimhan, University of Colorado, Boulder

Graduate Students:
Ashwini Vaidya, University of Colorado, Boulder

Consulting Postdoctoral Fellow:
Archna Bhatia, PhD, University of Illinois, Urbana-Champaign

Programming Support:
Jinho D. Choi, University of Colorado, Boulder
Jena Hwang, University of Colorado, Boulder

Nithin Singh Mohan
Sanmati Suresh
Omkar Rao
Smitha Kamat
Mrigendra Singh
Neeraj Sharma

Computational Language and Education Research, University of Colorado Boulder