The Hindi-Urdu Treebank Project
This work is supported by NSF grants CNS-0751089, CNS-0751171,
CNS-0751202, and CNS-0751213
The goal of the Hindi-Urdu Treebank (HUTB) project is to build a
multi-representational and multi-layered treebank for Hindi and Urdu.
The project is a collaborative effort of five universities in two
University of Colorado Boulder
University of Massachusetts at Amherst (UMass)
University of Washington (UW)
International Institute of Information Technology (IIIT) in Hyderabad, India.
The project is supported by multiple NSF grants.
We aim to build a multi-layered treebank that will provide both syntactic
and semantic annotation. The syntactic annotation using a dependency
framework is being carried out at IIIT, Hyderabad. Semantic annotation
(PropBank annotation) is being done at the University of Colorado
In addition, the treebank will be available in two representations: a
dependency version as well as a phrase structure version. The conversion
from dependency to phrase structure is being carried out at the University
The Hindi Treebank Pre-Release version is now available for download! IIIT download site
Details about the COLING 2012 Tutorial may be found here
The Hindi PropBank project shares a wiki page with the other research teams
from University of Washington, Columbia University, University of Massachusetts at Amherst and IIT-Hyderabad, India.
Workshop on South Asian Syntax and Semantics, University of Massachusetts,
Amherst, 19th to 20th March 2011. http://dl.dropbox.com/u/1329068/Treebank-Hyderabad/sass-program.html
South Asian Languages: Formal Approaches and Computational Resources, July
23, 2011, University of Colorado, Boulder, CO
and Education Research, University of Colorado Boulder