================================ SemLink VerbNet/PropBank Mapping ================================ Version: 1.1 URL: http://verbs.colorado.edu/~edloper/semlink/ Citation: Edward Loper, Szu ting Yi, and Martha Palmer. 2007. Combining lexical resources: Mapping between propbank and verbnet. In Proceedings of the 7th International Workshop on Computational Linguistics, Tilburg, the Netherlands. Overview -------- The SemLink mapping between VerbNet and PropBank consists of two parts: a lexical mapping and a token mapping. The lexical mapping specifieds the potential mappings between PropBank and VerbNet for a given word; but it does not specify which of those mappings should be used for any given occurrence of the word. The token mapping provides the correct mapping between arguments for every predicate in the PropBank corpus. In some cases, a predicate from PropBank will not exist in VerbNet; will not exist in the correct sense; or will have arguments without corresponding roles in VerbNet. In these cases, the VerbNet role is listed as 'None' and the argument is left in its unmapped (ARGn) form. Type Mapping ------------ The type mapping is provided as a single xml file, containing entries of the form: Each entry describes a single verb lemma, and contains one or more entries. Each entry describes the mapping between arguments for a specific (PropBank roleset, VerbNet class) pair, using one or more entries. Each entry describes the mapping between PropBank ARGn labels and VerbNet thematic roles for a single argument role. In the above example, when the "muzzle" verb is used in the sense described by VerbNet class 9.9, the PropBank and VerbNet roles map as follows: ================================ PropBank VerbNet -------------------------------- ARG0 <-> Agent ARG1 <-> Destination ARG2 <-> Theme ================================ Token Mapping ------------- The token mapping is available in two forms: - vnprop.txt -- contains just VerbNet role labels - vnpbprop.txt -- contains both VerbNet and PropBank role labels Both files use the same format as PropBank's prop.txt file. In particular, each line describes a single predicate and its arguments. The columns are as follows: wsj-filename sentence terminal tagger verb inflection arguments... Where: - 'wsj-filename' is the name of the file in merged penn treebank, wsj section - 'sentence' is the number of the sentence in the file (starting with 0) - 'terminal' is the number of the terminal in the sentence that is the location of the verb. note that the terminal number counts empty constituents as terminals and starts with 0. This will hold for all references to terminal number in this description. - 'tagger' is the name of the annotator who performed the mapping. - 'verb' is a token identifying the verb's PropBank roleset and VerbNet class. It has the form ;VN= where is a PropBank roleset and is a VerbNet class number. - 'inflection' consists of 5 characters representing person, tense, aspect, voice, and form of the verb, respectively. See the PropBank documentation for details. - 'arguments...' is a string representing the annotation associated with a particular argument or adjunct of the proposition. Each proplabel is dash '-' delimited and has the following columns 1) column for the 'syntactic relation'. See the PropBank documentation for details. 2) column for the 'label'. In vnprop.txt, this will consist of a VerbNet thematic role label (Agent, Patient, etc); or a PropBank role (ARG0, ARG1, etc) if the role does not have an appropraite mapping target in VerbNet. In vnpbprop.txt, this will have the form "ARG[]", where is a PropBank role number and is a VerbNet thematic role; or simply "ARG" if there is no appropriate mapping target. The label "rel" is used to mark the position of the relation word (i.e., the verb). 3) column for feature. See the PropBank documentation for details. History ------- Release 1.0 contained a bug that caused some of the verbs that are not contained in VerbNet at all to get improper annotations in the vnpbprop.txt file. This bug has now been fixed.