The Hindi-Urdu Treebank Project

This work is supported by NSF grants CNS-0751089, CNS-0751171, CNS-0751202, and CNS-0751213

The goal of the Hindi-Urdu Treebank (HUTB) project is to build a multi-representational and multi-layered treebank for Hindi and Urdu.

The project is a collaborative effort of five universities in two countries:

University of Colorado Boulder
Columbia University
University of Massachusetts at Amherst (UMass)
University of Washington (UW)
International Institute of Information Technology (IIIT) in Hyderabad, India.

The project is supported by multiple NSF grants.

We aim to build a multi-layered treebank that will provide both syntactic and semantic annotation. The syntactic annotation using a dependency framework is being carried out at IIIT, Hyderabad. Semantic annotation (PropBank annotation) is being done at the University of Colorado Boulder.

In addition, the treebank will be available in two representations: a dependency version as well as a phrase structure version. The conversion from dependency to phrase structure is being carried out at the University of Washington.

The Hindi Treebank Pre-Release version is now available for download! IIIT download site NEW

Details about the COLING 2012 Tutorial may be found here

Project Wiki:

The Hindi PropBank project shares a wiki page with the other research teams from University of Washington, Columbia University, University of Massachusetts at Amherst and IIT-Hyderabad, India.


Workshop on South Asian Syntax and Semantics, University of Massachusetts, Amherst, 19th to 20th March 2011.

South Asian Languages: Formal Approaches and Computational Resources, July 23, 2011, University of Colorado, Boulder, CO
Computational Language and Education Research, University of Colorado Boulder