The 2nd Linguistic Annotation Workshop (The LAW II)


ACL Special Interest Group on Annotation (SIGANN)
Sharable Corpus and Best Practice Guidelines Working Group Sessions
1-6PM, May 27, 2008

The Second Linguistic Annotation Workshop
Held in conjunction with LREC 2008
Marrakech, Morocco

The SIGANN Sharable Corpus and Best Practice Guidelines Working Groups will hold a joint session at the Second Linguistic Annotation Workshop on the afternoon of May 27, 2008, in Marrakech, Morocco. The session will be devoted to issues surrounding the merging and harmonization of linguistic annotations representing various phenomena that may have been produced by different groups using different formats, and may be based on different theoretical approaches. The discussions will use as a point of departure linguistic annotations of a portion of the SIGANN Sharable Corpus contributed by members of the computational linguistics community.

We solicit contributions of manually or automatically produced annotations of the SIGANN Sharable Corpus (download data here) for any linguistic phenomenon, including but not limited to morpho-syntax, syntax, semantic roles, word senses, named entities, temporal elements, events, co-reference and other discourse-level phenomena. The annotations will be collected in early April, after which the session organizers will coordinate an effort to merge and compare the contributed annotations. Based on the experience of this exercise, discussion points including examples will be drawn up for consideration in the joint session. Issues to be considered will include:

(1) What are the issues/problems of merging diverse annotations of different phenomena into a single multi-layer annotation, in terms of harmonizing different physical formats?

(2) What are the issues/problems of merging diverse annotations of different phenomena into a single multi-layer annotation, in terms of enabling a coherent and comprehensive linguistic description?

(3) Are there phenomena for which an attempt at compatibility/harmonization is not desirable?

(4) What are the implications and/or suggestions of this exercise for the development of best practice guidelines for linguistic annotation?

(5) Are there certain phenomena (e.g. segmentation into tokens, phrases, etc.) that lend themselves more readily to the specification of standard practices, and for which the existence of a common method would enhance annotation interoperability?

(6) What are the good and bad consequences of introducing a theoretical bias into the merging process? A theoretically biased merging procedure creates essentially a new annotation that uses previously created annotation as input in a destructive manner so that the input annotation can not be read directly from the merged output. Can the creation of a merged annotation that is consistent with a theory justify making these changes? Can errors in input annotation be detected in this way?

Those who wish to contribute annotations and/or be involved in discussions at the session should consult the LAW II website for details:, or contact the session organizers.

Session organizers:

Best Practices Working Group

Nancy Ide, Vassar College (ide [at]

Sharable Corpus Working Group

Adam Meyers, New York University (meyers [at]