Return Home   University of Colorado


Maintenance Notes
for the UVI and VN

These notes were created to pass on knowledge concerning the management and configuration of the Unified Verb Index web site and the VerbNet XML files.

  1. How to update the UVI
    1. Overview of the 'generator' directory contents
    2. How to update the data sources
      1. How to update VerbNet
      2. How to update the supplemental data sources
    3. How to execute the UVIG
  2. How to update the 'release-newest' symbolic link
  3. How to update the 'verbnet-X.Y.tar.gz' archive
  4. How to update the 'propbank-X.Y.tar.gz' archive
  5. Overview of the 'verb-index' directory contents
  6. Summary of scripts
  7. How to modify the UVIG's code
  8. How to use the WordNetUpdater/VerbNet (vn_wnu)
  9. How to use the GroupingUpdater/VerbNet (vn_gu)
  10. How to use the XML Validator (xml_val)
  11. How to install a new version of VerbNet



1. How to update the UVI

The UVI is just a representation, or view, of several underlying data sources. It is generated by a piece of software called the UVIG, or Unified Verb Index Generator. The generator reads this raw data and formats it into a series of appealing web pages. These underlying data sources include VerbNet, FrameNet, PropBank, OntoNotes Sense Groupings, WordNet, and the VerbNet-FrameNet mapping. When an underlying data source changes, the UVI is not automatically affected. It must be regenerated to reflect the changes.

The following paragraphs explain how to use the UVIG to regenerate the UVI. This section is divided into three subsections.

a. Overview of the 'generator' directory contents

All source code, compiled code, and supporting files for the UVIG are located in:

/home/verbs/shared/public_html/verb-index/generator

Here is a visual listing of the contents of this directory (the * denote scripts and the / denote directories):

generator/
|-- bin/
|-- compile*
|-- index.html
|-- javadoc/
|-- javadoc-gen*
|-- README
|-- run*
|-- run-public*
|-- src/
|-- supplemental/
|-- suppl-gen-scripts/

  • The 'bin' directory contains the compiled code of the UVIG. You won't need to ever touch this, but is required to run the generator and is referenced by the run scripts.
  • The 'compile' script recompiles the source code in 'src' into object files located in 'bin' that can then be executed. You wouldn't need to run this unless you made a change to the UVIG's source code.
  • The 'index.html' file provides a public description of the UVIG project.
  • The 'javadoc' directory houses documentation for the UVIG source code. This might come in handy when trying to understand or modify the UVIG.
  • The 'javadoc-gen' script conveniently regenerates the Javadoc documentation. One would execute this after having modified the UVIG source code.
  • The 'README' file will eventually contain further UVIG documenation. Right now it just contains notes of what should be in there, but is wholly incomplete.
  • The 'run' and 'run-public' scripts make it easy to execute the UVIG.
  • The 'src' directory holds all the source code. Modify one or more of these files if you want to make a change to the UVIG. It would be good to either backup the files you change before you do so, or completely comment the changes you make. This is because these files are not yet under version control.
  • The 'supplemental' directory contains all the data sources (except for VerbNet) and any other files that will support the UVI web site (i.e. stylesheets, etc.).
  • The 'suppl-gen-scripts' directory contains two scripts that automate the generation of two of the UVI's data sources - 'propbank.s' and 'grouping.s'.

Every effort has been given to assuring files and subdirectories in this directory are in the 'www' (web) group and that regular files have group-write permissions (i.e. -rw-rw-r--), scripts have group-write and group-execute permissions (i.e. -rwxrwxr--), and subdirectories have group-write, group-execute, and other-execute permissions (i.e. drwxrwxr-x). This will allow anyone in the web group to modify and execute these files.

b. How to update the data sources

In the UVI there are 6 principal data sources. Each data source is either used to construct the verb index, supplement the VerbNet class pages, or both. It is also important to remember that only the VerbNet data source is supplied on the command line to the UVIG (shown later). All other data sources are stored within the generator's 'supplemental' directory under a specific file name. The UVIG will always look in the 'supplemental' directory to find these data sources. Here is a quick table enumerating these details:

Data Source Used In Index? Used In Class Pages? In 'supplemental'? File Name
VerbNetYesYesNoN/A
FrameNetYesNoYesframenet.s
PropBankYesNoYespropbank.s
GroupingsYesYes*Yesgrouping.s
WordNetNoYesYeswordnet.s
VN-FN MappingNoYesYesvn-fn.s

* The 'grouping.s' file doesn't contribute to the grouping links in the VerbNet class pages. These are created entirely from the 'grouping=' attribute for each member.

i. How to update VerbNet

The updating of VerbNet is done independently of the UVIG. The VerbNet files are edited in the standard way and configuration managed using SVN (using standard "update", and "check in" operations). When it is decided that the most recent files in SVN are ready to be pushed to the public UVI, you may follow this process to prepare the new files to be shown in the UVI.

ii. How to update the supplemental data sources

Updating the supplemental data sources for the UVI is a little more involved. Supplemental data sources are copied into the generator's 'supplemental' directory under specific file names, and the generator looks for these file names (an alternative implementation would have been to supply all these file names on the command line, but then the usage of the UVIG would have become too cumbersome).

The tricky part is knowing what files to copy into the 'supplemental' directory. Each data source has its own format, and the UVIG 1) is expecting a certain format for each file, and 2) knows only how to read that one format for each file. This section will show which files are valid to copy into the 'supplemental' directory for each source.

  • Let's start with WordNet, as it's possibly the most straight-forward example of how this process works. The UVI relies on WordNet data to display the sense number which corresponds to each sense key inside VerbNet. This sense-key-to-sense-number mapping is stored in the 'index.sense' file located in the 'dict' directory of any WordNet distribution. Just copy this file to the 'supplemental' directory under the name 'wordnet.s'. Here is an example of this, assuming you are in the 'verb-index' directory:

    cd /home/verbs/shared/public_html/verb-index
    cp /usr/local/WordNet-3.0/dict/index.sense
                       generator/supplemental/wordnet.s

    And there you go. The UVI relies on the 'index.sense' file from WordNet. Just remember, when you generate the UVI, verify that the version of 'wordnet.s' (1.7.1, 2.0, 2.1, 3.0, etc.) in the 'supplemental' directory matches the version of WordNet sense keys in the version of VerbNet that you use to generate the UVI. For example if you copied the above 2.1 version of the 'index.sense' file to the 'supplemental' directory, you don't want to regenerate the UVI with VerbNet XML files that have WN 3.0 sense keys.

  • Next let's consider the VerbNet-FrameNet mapping. The mapping has been made public via two files placed on the web server, 'vn-fn.xml' and 'vn-fn-roles.xml'. These are in the 'verb-index/fn' directory and thus are available externally here:

    http://verbs.colorado.edu/verb-index/fn/vn-fn.xml
    http://verbs.colorado.edu/verb-index/fn/
                                      vn-fn-roles.xml

    The UVI only uses one of these, the first one, 'vn-fn.xml'. So let's copy this to the right spot under the right name:

    cd /home/verbs/shared/public_html/verb-index/fn
    cp vn-fn.xml ../generator/supplemental/vn-fn.s

    This file also carries with it a warning. The two mapping files shown above have an original home. It is here:

    /home/verbs/shared/mappings/vn-fn

    Here you can find all the different versions of this mapping. Each mapping was made with a specific version of VerbNet in mind. The files in the 'verb-index/fn' directory are given simpler names than those used in this directory. Be sure to use a mapping file ('vn-fn.xml' or 'VNclass-FNframeMappings.xml') which corresponds to the version of VerbNet against which you are generating the UVI.

    Now, at some point you will get a new mapping file. This file goes through iterations to keep up with changes to VerbNet and FrameNet. You may also get a new roles file. When you get new files, copy them to the '.../mappings/vn-fn' directory above. Validate the two files using the XML Validator. Sometimes when you receive new XML files they may have small syntax errors that make them unreadable by the UVI Generator. Click here to learn about this. If there are errors with the files correct them.

    Then, you can copy the files to the 'verb-index/fn' directory. Make sure they are given the names 'vn-fn.xml' and 'vn-fn-roles.xml'. Then follow the above procedure to copy the one file to the 'supplemental' directory.

  • Now let's talk about FrameNet. This data source is used not to supplement any VerbNet class pages, but rather just to add to the power of the centralized verb index. This is one of the four sources used in the verb index.

    This file is actually supplied to us by the FrameNet folks. Our copy of this file is located here:

    /home/verbs/shared/framenet/verbLexEntries.xml

    Although we rarely get updated versions of this file, this would be how you copy it into the generator's supplemental files:

    cd /home/verbs/shared/public_html/verb-index
    cp /home/verbs/shared/framenet/verbLexEntries.xml
                      generator/supplemental/framenet.s

    In other words, you won't need to copy this file to the generator until you get a new version. Make sure if you do get a new version, the FrameNet people have used the same XML format that they did in the previous version, or the UVIG will have to be modified slightly.

    Make sure you validate any new XML file you get before feeding it to the UVIG. You need to correct any errors in the XML before using it to generate the UVI.

  • Next we arrive at the PropBank data source. This data source is also involved in the construction of the verb index. In PropBank there is one file per verb in the lexicon. Unfortunately, there is no file which contains a list of all these files. Therefore, we must create this list ourselves via a directory listing. Even though the following process may seem complicated (and it would be great if there arises another, better way), remember that you only need to perform this if a new version of PropBank has been created and there is a desire to get the new PropBank links into the UVI. The existing 'propbank.s' file works fine for now.

    The relevant HTML PB files are located in:

    /home/verbs/shared/public_html/framesets

    We are going to run a series of commands to take the directory contents and create a simple list of (verb, URL) pairs that the UVIG can parse and display. Here is a snippit of the desired file:

    abandon,http://verbs.colorado.edu/framesets/abandon-v.html
    abate,http://verbs.colorado.edu/framesets/abate-v.html
    abdicate,http://verbs.colorado.edu/framesets/abdicate-v.html
    ...

    We will perform these instructions using a handful of steps instead of chaining them all together with pipes, which you can also do. This is for clarification purposes. Also, this is only one of MANY different ways to produce the above format. As long as you get it into the above format, the UVIG will correctly add the verbs in the file with the link next to them.

    1. cd /home/verbs/shared/public_html/framesets
    2. ls -l > tmp1
    3. grep -vE '(^total| 0 )' tmp1 > tmp2
    4. grep -o ' [^ ]*-v.html' tmp2 > tmp3
    5. sed 's_^ \(.*\)-v.html_\1,http://verbs.colorado.edu/framesets/\1-v.html_' tmp3 > tmp4

    • STEP 1: Change to PropBank directory.
    • STEP 2: Get a long listing of the directory contents (-l = dash ell).
    • STEP 3: Filter out invalid lines (those that don't represent a frameset):
      • The 'total' line (very first line)
      • Any files with size zero (this currently eliminates most the irrelevant files)
    • STEP 4: Simplify each line to just a space followed by the file name.
    • STEP 5: Modify each line to be a verb followed by a comma followed by a URL. The 'sed' command takes a line that looks like this:

       abandon-v.html

      (notice preceding space) and turns it into this:

      abandon,http://verbs.colorado.edu/framesets/abandon-v.html

    Viola! You have massaged the directory listing into a simple format that the UVIG understands. If you use the above commands, remember to use the exact syntax included above. Every little space has meaning. If there is a line break on the 'sed' command in your browser, ignore it completely.

    Now all you need to do is put the file into the 'supplemental' directory:

    cp tmp4
       ../verb-index/generator/supplemental/propbank.s

    That was a long process, I know. You shouldn't have to do this that often, as the PropBank directory is not changing radically. This was supplied just in case however, to make sure you could update 'propbank.s' if need be. By the way, to make this process easier, a script has been created to automate this process. You can have the 'propbank.s' file created automatically for you by executing these two commands:

    cd verb-index/generator/suppl-gen-scripts
    make-pb-data-source

    (assuming you are in the 'public_html' directory). The 'propbank.s' file will be created for you and placed in the 'suppl-gen-scripts' directory. Copy it to the 'supplemental' directory.

  • Lastly, the OntoNotes Sense Groupings data source is constructed and utilized almost exactly like the PropBank data source. The content that comprises this UVI data source comes directly from the listing of a directory, since each file in that directory will get an entry in the UVI. That listing is massaged using 'grep' and 'sed'. Remember that you only need to perform this if updated grouping files exist and there is a desire to get the new links into the UVI. The existing 'grouping.s' file works fine for now. The following process will mirror the PropBank process above.

    The relevant grouping HTML files are located in:

    /home/verbs/shared/public_html/html_groupings

    We are going to run a series of commands to take the directory contents and create a simple list of (verb, URL) pairs that the UVIG can parse and display. Here is a snippit of the desired file:

    abandon,http://verbs.colorado.edu/html_groupings/abandon-v.html
    abate,http://verbs.colorado.edu/html_groupings/abate-v.html
    abolish,http://verbs.colorado.edu/html_groupings/abolish-v.html
    ...

    We will perform these instructions using a handful of steps instead of chaining them all together with pipes, which you can do. This is for clarification purposes. Also, this is only one of MANY different ways to produce the above format. As long as you get it into the above format, the UVIG will add the verbs in the file with the link next to them.

    1. cd /home/verbs/shared/public_html/html_groupings
    2. ls -l > tmp1
    3. grep -e '-v\.html' tmp1 > tmp2
    4. grep -o ' [^ ]*-v.html' tmp2 > tmp3
    5. sed 's_^ \(.*\)-v.html_\1,http://verbs.colorado.edu/html\_groupings/\1-v.html_' tmp3 > tmp4

    • STEP 1: Change to groupings directory.
    • STEP 2: Get a long listing of the directory contents (-l = dash ell).
    • STEP 3: Filter out invalid lines (those that don't represent a grouping):
      • The 'total' line (very first line)
    • STEP 4: Simplify each line to just a space followed by the file name.
    • STEP 5: Modify each line to be a verb followed by a comma followed by a URL. The 'sed' command takes a line that looks like this:

       abandon-v.html

      (notice preceding space) and turns it into this:

      abandon,http://verbs.colorado.edu/html_groupings/abandon-v.html

    Now put the file into the 'supplemental' directory:

    cp tmp4
       ../verb-index/generator/supplemental/grouping.s

    And now you're done. By the way, to make this process easier, a script has been created to automate this process. You can have the 'grouping.s' file created automatically for you by executing these two commands:

    cd verb-index/generator/suppl-gen-scripts
    make-grp-data-source

    (assuming you are in the 'public_html' directory). The file will be created for you and placed in the 'suppl-gen-scripts' directory.

NOTE: If the format of any of these 5 files changes, the UVIG code that reads these files may need to be modified. So hopefully the data sources we get externally (WordNet, FrameNet) will not change. Similarly we should attempt to keep the data sources that we create (VN-FN, PropBank, Groupings) in the same format unless there is some pertinent need to change them.

c. How to execute the UVIG

Now that you have your data sources ready to be published, follow these instructions to execute the UVIG (i.e. regenerate the UVI) via the simplest method:
  1. Change to 'generator' directory
  2. Execute command 'run-public'
  3. Pat self on back - you're done!
Why was this so easy? There are three ways to regenerate the UVI using the generator:
  1. java uvi.Generator [flags] <xml-input-dir> <html-output-dir>
  2. run [flags] <xml-input-dir> <html-output-dir>
  3. run-public
The 'run' and 'run-public' scripts just make things a little simpler for you. They hide the need to remember 'java uvi.Generator'. You will always want to use one of these two scripts (the command 'java' as used in these maintenance notes is shorthand for the path to the actual 'java' command).

The 'run-public' script makes things REALLY simple for you. To execute it you just type:

run-public

from the 'generator' directory. It is essentially equivalent to typing this:

java uvi.Generator -vos
          /home/verbs/shared/verbnet/release-newest ..

The most common arguments are already supplied in this script, and thus it's the one you'll most likely want to use to update the public UVI. The XML input directory supplied is the symbolic link that points to the most recent version of the VerbNet files. Thus, the use of this script assumes that the 'release-newest' symbolic link is maintained and updated. To get more information on maintaining this symbolic link, click here. The .. stands for the 'verb-index' directory, which is the parent directory of 'generator' and is where all the UVI files are created. The -v flag prints verbose output, the -o flag means overwrite (you'll almost always want to use this) and the -s flag means to sort the members within each class or subclass.

The 'run' script does not have such defaults. You would only use 'run' over 'run-public' if you do not want to use the default flags, input directory, and output directory supplied within. Using the 'run' script allows you to specify your own XML input directory if you don't want to use the directory currently pointed to by the symbolic link. Likewise, you can specify your own output directory (yes you can create a completely independent UVI site on the server with this feature).

For example, let's say you wanted to create another, separate UVI, but using an old version of VerbNet for testing and consistency checking (i.e. to see what it used to be like 1 year ago). Also, let's say you have an empty target directory, so you don't need the -o flag (but you can still use it), and you don't want to see tons of output, so you don't want the -v flag. You would execute something like this:

run -s /home/verbs/shared/verbnet/history/release2.0
                           ~/public_html/vn2.0-test-uvi

Now you would be able to access your new test UVI via a URL similar to:

http://verbs.colorado.edu/~trumbo/vn2.0-test-uvi

This is made possible because everything the UVIG needs to build a complete site is contained within the 'supplemental' directory (images, stylesheets, etc.).

However, there is one caveat to creating UVIs from previous versions of VerbNet. You must make sure that the other data sources in the 'supplemental' directory at the time of generation are consistent with the version of VerbNet against which you are generating. More specifically, VerbNet is dependent on a certain version of WordNet, and the VN-FN mapping is dependent on a certain version of VerbNet. For example, you wouldn't want to generate against VerbNet 2.0 while leaving a 'wordnet.s' file in the 'supplemental' directory that corresponds to WN 3.0 - because VN 2.0 doesn't have WN 3.0 sense keys! Doing so would create an inconsistent UVI. Similarly if you had placed an old version of 'vn-fn.s' in the 'supplemental' directory for testing purposes, make sure you swap it out for the most recent one before executing 'run-public' (which generates against the most recent VN version).

More than likely though, you will just need to use the 'run-public' script to generate the UVI. But ALWAYS keep your 'supplemental' data sources consistent with the version of VerbNet when you generate. Also, the older the version of VerbNet with which you use to generate the (an) UVI, the more error messages you are likely to see. This is because as VerbNet matured, errors were evermore eliminated.

If you want to learn more about what you can do with the UVIG, type:

run -?

in the 'generator' directory. Or read the documentation inside the 'run' and 'run-public' scripts.


2. How to update the 'release-newest' symbolic link

This symbolic link points to the directory of VerbNet files thought of as the most recently released version. The symbolic link has two primary purposes. The first purpose is to allow users not to have to remember which directory holds the most recent version of VerbNet. In fact, all server users need to remember is that all the newest VerbNet XML files are located in

/home/verbs/shared/verbnet/release-newest

They never have to concern themselves with which directory the link actually points to. The second purpose of the link is to facilitate the operation of the 'run-public' script in the 'generator' directory. This script directly relies on this symbolic link. In fact, the symbolic link allows the 'run-public' command to be as simple as it is. The concept of the 'run-public' script is that it will just regenerate the UVI with the most recent version of the UVI, and it does this by just using the files located in the 'release-newest' directory (i.e. link).

If another directory is added to the VerbNet directory, and it is to become the "newest" version of VerbNet, then these simple commands will update the symbolic link to the new directory.

cd /home/verbs/shared/verbnet
rm release-newest
ln -s current/release-7.0 release-newest

This will repoint the 'release-newest' symbolic link at the directory 'current/release-7.0'. Do not put a forward-slash after 'release-newest' in the 'rm' command, nor after the new directory in the 'ln' command. Both of these are bad in other words:

rm release-newest/

ln -s current/release-7.0/ release-newest

Maintaining the link is simple and can save a lot of hassle when it comes to UVI/VN management. Most likely you will grab the entire VerbNet file base from Subversion when it is ready to be published. Put a copy into '/home/verbs/shared/verbnet/current' and redirect the link to it.


3. How to update the 'verbnet-X.Y.tar.gz' archive

This file contains the official set of files of the most recent version of VerbNet avaiable to the public. From time to time it will need to be refreshed with the newest files. Let's assume you are in a directory containing another directory which has the newest VerbNet files that you wish to publish to the public web. Make sure this directory is in publishable format (no extraneous files, all the XML is valid, etc.). This means that the directory also does not have the '.svn' directory in it, if you are using SVN. You can check by executing 'ls -d .svn' in the directory. If it does exist, you will need to use the subsquent 'copy' method instead of the 'move' method. Here are some example directory structures:

.../dir1/dir2/
.../dir1/dir2/new-vn-files/
.../dir1/dir2/new-vn-files/accompany-51.7.xml
.../dir1/dir2/new-vn-files/admire-31.2.xml
.../dir1/dir2/new-vn-files/...etc...

.../dir1/verbnet/
.../dir1/verbnet/trunk/
.../dir1/verbnet/trunk/accompany-51.7.xml
.../dir1/verbnet/trunk/admire-31.2.xml
.../dir1/verbnet/trunk/...etc...

You can execute these commands once you are in the parent directory of directory with the VerbNet files (i.e. '.../dir1/dir2' or '../dir1/verbnet'). Replace "X.Y" with the actual version of the VerbNet release (i.e. "2.5").

cp -r trunk verbnet-X.Y
rm -r verbnet-X.Y/.svn (remove extra files if necessary) tar cvf verbnet-X.Y.tar verbnet-X.Y
gzip verbnet-X.Y.tar
rm -r verbnet-X.Y

The first step copies the whole directory to a new directory with a standard name. Then that directory is tar'ed into a single file (e.g. verbnet-X.Y.tar). Then the tar file is compressed and the new directory is removed. This is just one way to accomplish this task. You could also just rename your directory for the tar/gzip steps and name it back afterwards. It's up to you. Make sure there are not superfluous files in the directory first though (e.g. '.svn/').

mv new-vn-files verbnet-X.Y
tar czvf verbnet-X.Y.tar.gz verbnet-X.Y
mv verbnet-X.Y new-vn-files

Either way you will be left with a 'verbnet-X.Y.tar.gz' file (with the actual version number of course). Now all we need to do is place this file in the correct location and update the public web with the new link. Move your file to the following location:

/home/verbs/shared/public_html/verb-index/vn

Then update this web page:

http://verbs.colorado.edu/~mpalmer/projects/verbnet/downloads.html

to reflect the new name (URL) of the file:

http://verbs.colorado.edu/verb-index/vn/verbnet-X.Y.tar.gz

You can move the previous VerbNet archive to:

/home/verbs/shared/verbnet/history

Now it's ready to go. We did it this way so that when someone downloads and expands the archive file they see a standard-named directory that includes the version of the VerbNet files.


4. How to update the 'propbank-X.Y.tar.gz' archive

When you create a new PropBank archive, move it to the

/home/verbs/shared/public_html/verb-index/pb

directory. The previous PropBank archives can remain in this directory for posterity if you want. There is one more change that needs to be made. The UVI has a link to this archive on its home page. You need to update the

/home/verbs/shared/public_html/verb-index/                          generator/supplemental/index.s

file to refer to the new file. After you edit this file, regenerate the UVI:

cd /home/verbs/shared/public_html/verb-index/generator
run-public


5. Overview of the 'verb-index' directory contents

Below is a listing of the 'verb-index' directory. The majority of the files and folders were created by the UVIG. However, some files exist just for the sake of being made available in a logical place. These include the VN-FN mapping in 'fn', the most recent PropBank archive in 'pb', and the most recent VerbNet archive in 'vn'. The 'inspector' and 'vxc' directories and the 'maint-notes*' files were also not created by the generator.

verb-index/
|-- comments/               (UVI Comments)
|-- comments.php            (UVI Comments)
|-- contact.php             (UVI Contact Page)
|-- fn/                     (VN-FN Mapping Files)
|-- generator/              (UVI Generator Code)
|-- .htaccess               (Site Configuration File)
|-- images/                 (UVI Images)
|-- include.php             (UVI Scripts)
|-- index/                  (UVI A-Z Index Pages)
|-- index.php               (UVI Home Page)
|-- inspector/              (VerbNet Inspector Code)
|-- login.php               (UVI Login)
|-- maint-notes.html        (This Page)
|-- maint-notes-suppl1.txt  (UVIG Developer Notes)
|-- pb/                     (PropBank Archives)
|-- postcomment.php         (UVI Comments)
|-- scripts.js              (UVI Scripts)
|-- search/                 (UVI Search Indicies)
|-- search.php              (UVI Search)
|-- styles.css              (UVI Styles)
|-- users/                  (UVI Users)
|-- vn/                     (VerbNet Pages and Archive)
|-- vxc/                    (VerbNet-Cyc Mapper Code)
|-- wn/                     (UVI WordNet Page)


6. Summary of scripts

In this table the * denote scripts and the / denote directories.

/home/verbs/shared/public_html/verb-index/
|-- generator/
   |-- compile*
   |-- javadoc-gen*
   |-- run*
   |-- run-public*
   |-- suppl-gen-scripts/
      |-- make-grp-data-source*
      |-- make-pb-data-source*
|-- inspector/
   |-- compile*
   |-- download-prepare*
   |-- javadoc-gen*
   |-- mock-app/
      |-- javadoc-gen*
   |-- run*
   |-- run-all*
   |-- scripts/
      |-- compile*
      |-- run*
|-- vxc/
   |-- compile*
   |-- download-prepare*
   |-- javadoc-gen*
   |-- run*
   |-- run-public*
   |-- scripts/
      |-- compile*
      |-- run*
/home/verbs/shared/verbnet/
|-- GroupingUpdater/
   |-- compile*
   |-- run*
|-- WordNetUpdater/
   |-- compile*
   |-- run*
|-- XMLValidator/
   |-- compile*
   |-- run*


7. How to modify the UVIG's code

There are various reasons why you may need to modify the UVI Generator's code. Mainly these fall under some basic categories: addition of new features, changes made to the format of existing data sources, or addition of new data sources. Some of these changes will require modification of the Java files in the 'generator/src' directory, and others will require modificiation of the '.s' supplemental files in the 'generator/supplemental' directory.

How to add a data source that just adds verbs to the index

If the format of any of these supplemental data sources has changed:
  • propbank.s
  • framenet.s
  • grouping.s
  • wordnet.s
  • vn-fn.s
Then you will need to change how the file is parsed. The method 'Generator.addOthers' is a good place to start.


8. How to use the WordNetUpdater/VerbNet (vn_wnu)

The WordNetUdpater for VerbNet has the goal of updating of the 'wn=' attributes in an entire set of VerbNet XML files based on the mapping info provided by WordNet each time it releases a new version of WordNet. The WordNetUpdater operates very similarly in concept to the GroupingUpdater.

The WordNetUpdater command is:

/home/verbs/shared/bin/vn_wnu

You can always see the usage of the command by executing this on the command line:

vn_wnu -?

Finally, for completeness, the 'vn_wnu' command is actually a symbolic link to a Java program in the 'verbnet' directory:

/home/verbs/shared/verbnet/WordNetUpdater

In this directory you will find the program's source and some scripts that help with compiling and running. But in general you just need to use the 'vn_wnu' command.


9. How to use the GroupingUpdater/VerbNet (vn_gu)

The GroupingUpdater for VerbNet has the goal of updating of the 'grouping=' attributes in an entire set of VerbNet XML files based on the mapping info within a set of OntoNotes Sense Grouping XML files.

This is how it works. The GroupingUpdater is given 4 things:
  • VerbNet XML directory
  • Sense Groupings XML directory
  • WordNet 'index.sense' file
  • VerbNet XML output directory (optional)
If the output directory is not supplied, the GroupingUpdater simply prints a summary report about the update process without generating any updated VerbNet files. The WordNet 'index.sense' file is required to translate VerbNet's WordNet sense keys into WordNet sense numbers.

Let's assume that the GroupingUpdater has been initiated and is scanning through the VerbNet XML files. It encounters a VerbNet class C and a member M with some pre-existing 'grouping=' attribute (this example will describe what the GroupingUpdater does):

C.xml:
   ...
   <MEMBER name="M" wn="K1 K2" grouping="M.01 M.03" />
   ...

When the GroupingUpdater reaches this member to update it, it looks for a sense grouping file named 'M-v.xml'. If the file does not exist then the grouping attribute is left blank (grouping=""). If it does exist, then the file has some number of sense groups within. Let's say there are G sense groups. The GroupingUpdater must decide which sense groups in the file (M.01, M.02, ... M.G) belong in the 'grouping=' attribute. There are two methods examined:
  1. Method 1: If a sense group in 'M-v.xml' defines a link to the VerbNet class (e.g. <vn>accompany-51.7,captain-29.8-1</vn>) in which the member M resides (class C in this case), then the sense group is added to the member's 'grouping=' attribute.
  2. Method 2: If a sense group in 'M-v.xml' defines a link to any WordNet sense number (e.g. <wn version="3.0">5,6,7</wn>) that appears in the WordNet sense numbers defined for the member, then the sense group is added to the member's 'grouping=' attribute. The member's sense keys K1 and K2 must be translated into sense numbers using the 'index.sense' file first. In other words, if the intersection between the sense group's WordNet sense numbers and the VerbNet member's WordNet is non-empty, then the sense group is added to the member's 'grouping=' attribute.
But which method is actually used to update the files? When an output directory is supplied and updated files are desired, Method 2 is currently used to update the 'grouping=' attributes.

Besides updating the files, printing a comprehensive summary report is the other main capability of the GroupingUpdater. The report is supposed to help analysts maintain the consistency and accuracy of the VerbNet and Grouping data sources. It contains these sections:
  1. VerbNet Statistics Only
  2. WordNet Statistics Only
  3. Grouping Statistics Only
  4. Update Statistics
  5. Exception Lists
Among many other things, the report lists all the instances when the two methods disagree about how to update a given member line. Theoretically the two methods should always agree if the mappings are consistent.

The GroupingUpdater command is:

/home/verbs/shared/bin/vn_gu

and should be available to you automatically if your path is set up correctly. If it isn't follow these directions. You can always see the usage of the command by executing this on the command line:

vn_gu -?

An example usage of this command is:

vn_gu -n ~/svn/verbnet/trunk ~/svn/sense-inventories
               /usr/local/WordNet/dict/index.sense ~/updated-vn-files

Finally, for completeness, the 'vn_gu' command is actually a symbolic link to a Java program in the 'verbnet' directory:

/home/verbs/shared/verbnet/GroupingUpdater

In this directory you will find the program's source and some scripts that help with compiling and running. But in general you just need to use the 'vn_gu' command.


10. How to use the XML Validator (xml_val)

The XML Validator is a generic tool used to check if an XML document is 1) well-formed and 2) valid. Well-formed is an XML term meaning that the XML syntax is correct (e.g. there's a closing > for every <). Valid is an XML term meaning that if the XML document specifies a DTD, that the XML document does not violate any of the rules in the DTD.

The XML Validator command is:

/home/verbs/shared/bin/xml_val

You can always see the usage of the command by executing this on the command line:

xml_val -?

The 'xml_val' command takes any number of XML Input Objects. An XML Input Object is either an individual file (which does not have to end in '.xml') or a directory with XML files in it (only the '.xml' files will be scanned). This tool does not recurse into subdirectories, it just scans the '.xml' files in any directories you list. Here are some examples:

xml_val some_file.q
xml_val my_file1.xml my_file2.xml
xml_val ~/xml_files ~/more_xml_files
xml_val my_file1.xml ~/xml_files some_file.q
xml_val *

The 'some_file.q' file is a single file in XML format but it just has a non-standard extension (.q). The '~/xml_files' and '~/more_xml_files' are directories whose '.xml' files inside will be scanned. The last example sends all files and directories in the current directory to the command.

There is currently only one non-trivial command line option, -q. When you have lots of files the 'xml_val' command can produce lots of output. This is because under normal operating mode the command prints out the name of the file or directory it's currently scanning and then the phrase 'XML OK' if everything was fine. However, when you have lots of files, you may only want to see the errors and warnings. Use -q or --quiet to accomplish this. Here is an example using this option - only errors and warnings will be displayed:

xml_val -q *.xml

Note on output: All error and warning messages are output on stderr. All other output is output on stdout. This means that if you want to grep the output of 'xml_val' for the number of errors, you'll need to use a shell trick to send stderr to where stdout goes before/during the piping operation like so:

bash:
   xml_val *.xml 2>&1 | grep -c ERROR
csh:
   xml_val *.xml |& grep -c ERROR

Note on DTDs: The 'xml_val' command does not receive any DTD information on the command line. A DTD is specified inside of each individual XML file. If an XML file specifies a DTD, then that XML file's validity will be additionally checked against that DTD. Here are the top two lines of an example XML file which specifies a DTD:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE BOOKS SYSTEM "books_format.dtd">
   ...

The path of the '.dtd' file should be relative to the '.xml' file in question. In the above example, the '.dtd' file would have to reside in the same directory as the '.xml' file.

Finally, for completeness, the 'xml_val' command is actually a symbolic link to a Java program in the 'verbnet' directory:

/home/verbs/shared/verbnet/XMLValidator

In this directory you will find the program's source and some scripts that help with compiling and running. But in general you just need to use the 'xml_val' command.
If when you execute the 'xml_val' command you get an error message indicating that the command cannot be found (e.g. -bash: xml_val: command not found), you will need to add the '/home/verbs/shared/bin' path in your $PATH variable. To do so, you will need to add one of the lines below to either your '~/.bashrc' or '~/.cshrc' file depending on whether you use bash or csh.

~/.bashrc:
   PATH=$PATH:/home/verbs/shared/bin
~/.cshrc:
   setenv PATH $PATH":/home/verbs/shared/bin"


11. How to install a new version of VerbNet

When it is time to release a new version of VerbNet, various steps should be followed to maintain the consistency and understandability of our files. Some of these steps are already detailed in these maintenance notes.
  1. Verify that everyone is indeed ready for a new version to be released. In other words make sure everyone contributing to the next version of VerbNet has all their files in and no further changes need to be made, so that this process need not be repeated if someone forgot something.
  2. Find out what the next version number will be. Will it be 2.3, 2.5, 9.6? This version number will be represented as X.Y in the coming steps.
  3. Make a new directory in the shared VerbNet directory for the new version.
    • cd /home/verbs/shared/verbnet/current
    • mkdir releaseX.Y
    • chgrp shared releaseX.Y
    • chmod g+w releaseX.Y
  4. Update your own Subversion "working copy" with the most recent VerbNet files. Assume that for this example my Subversion working copy is located at '~/svn/verbnet' (i.e. the '.svn/' directory is in '~/svn/verbnet').
    • cd ~/svn/verbnet
    • svn update
  5. Validate all the XML in your working copy with 'xml_val'. The period after the 'xml_val' command represents the current directory. If there are errors, correct them, and commit the changes to Subversion before continuing to the next step.
    • cd ~/svn/verbnet
    • xml_val ./trunk
  6. Copy the Subversion VerbNet files to the new directory.
    • cd ~/svn/verbnet/trunk
    • cp *.xml *.xsd *.dtd
             /home/verbs/shared/verbnet/current/releaseX.Y
  7. Move any previous VerbNet versions in the 'current' directory to the 'history' directory. You'll have to make the directory writable for the move. We try to keep the directories as read-only as possible to prevent accidental modification. The token A.B represents the previous VerbNet version.
    • cd /home/verbs/shared/verbnet/current
    • chmod a+w releaseA.B
    • mv releaseA.B ../history
    • chmod a-w ../history/releaseA.B
    • chmod -R a-w releaseX.Y
  8. Change the directory at which the 'release-newest' symbolic link points.
    • cd /home/verbs/shared/verbnet
    • rm release-newest
    • ln -s current/releaseX.Y release-newest
    See here for more information.
  9. Regenerate the UVI.
    • cd /home/verbs/shared/public_html/verb-index/generator
    • run-public
    See here for more information.
  10. Look through the output of the UVIG and look for warnings and errors. You may find that the UVIG found more errors with the VerbNet XML files. This is because the UVIG looks beyond simple XML errors and verifies some VerbNet consistency rules as well. If there are more errors, you will need to edit the VerbNet files inside SVN and start this process over again. If there are no major errors you may consider the UVI generated and continue to the next step.
  11. Update the 'verbnet-X.Y.tar.gz' downloadable archive file.
    • Follow the procedure located here. In our case the 'new-vn-files' directory in the procedure is the new 'releaseX.Y' directory you just created, and the '.../dir1/dir2' path is '/home/verbs/shared/verbnet/current'.




This page last updated on 2010.3.31. Derek Trumbo.