These notes were created to pass on knowledge concerning the
management and configuration of the Unified Verb Index web site and
the VerbNet XML files.
- How to update the UVI
- Overview of the 'generator' directory contents
- How to update the data sources
- How to update VerbNet
- How to update the supplemental data sources
- How to execute the UVIG
- How to update the 'release-newest' symbolic link
- How to update the 'verbnet-X.Y.tar.gz' archive
- How to update the 'propbank-X.Y.tar.gz' archive
- Overview of the 'verb-index' directory contents
- Summary of scripts
- How to modify the UVIG's code
- How to use the WordNetUpdater/VerbNet (vn_wnu)
- How to use the GroupingUpdater/VerbNet (vn_gu)
- How to use the XML Validator (xml_val)
- How to install a new version of VerbNet
1. How to update the UVI
The UVI is just a representation, or view, of several
underlying data sources. It is generated by a piece of software
called the UVIG, or Unified Verb Index Generator. The generator
reads this raw data and formats it into a series of appealing web
pages. These underlying data sources include VerbNet, FrameNet,
PropBank, OntoNotes Sense Groupings, WordNet, and the
VerbNet-FrameNet mapping. When an underlying data source changes,
the UVI is not automatically affected. It must be
regenerated to reflect the changes.
The following paragraphs explain how to use the UVIG to regenerate
the UVI. This section is divided into three subsections.
a. Overview of the 'generator' directory contents
All source code, compiled code, and supporting files for the UVIG
are located in:
/home/verbs/shared/public_html/verb-index/generator |
Here is a visual listing of the contents of this directory (the *
denote scripts and the / denote directories):
generator/
|-- bin/
|-- compile*
|-- index.html
|-- javadoc/
|-- javadoc-gen*
|-- README
|-- run*
|-- run-public*
|-- src/
|-- supplemental/
|-- suppl-gen-scripts/
|
-
The 'bin' directory contains the compiled code of the UVIG. You won't need to
ever touch this, but is required to run the generator and is referenced by the
run scripts.
-
The 'compile' script recompiles the source code in 'src' into object files
located in 'bin' that can then be executed. You wouldn't need to run this unless
you made a change to the UVIG's source code.
-
The 'index.html' file provides a
public description
of the UVIG project.
-
The 'javadoc' directory houses documentation for the UVIG source code. This might
come in handy when trying to understand or modify the UVIG.
-
The 'javadoc-gen' script conveniently regenerates the Javadoc documentation. One
would execute this after having modified the UVIG source code.
-
The 'README' file will eventually contain further UVIG documenation. Right now it
just contains notes of what should be in there, but is wholly incomplete.
-
The 'run' and 'run-public' scripts make it easy to execute the UVIG.
-
The 'src' directory holds all the source code. Modify one or more of these files
if you want to make a change to the UVIG. It would be good to either backup the files you
change before you do so, or completely comment the changes you make. This is because these
files are not yet under version control.
-
The 'supplemental' directory contains all the data sources (except for VerbNet) and
any other files that will support the UVI web site (i.e. stylesheets, etc.).
-
The 'suppl-gen-scripts' directory contains two scripts that automate the generation
of two of the UVI's data sources - 'propbank.s' and 'grouping.s'.
Every effort has been given to assuring files and subdirectories in
this directory are in the 'www' (web) group and that regular files
have group-write permissions (i.e. -rw-rw-r--), scripts have
group-write and group-execute permissions (i.e. -rwxrwxr--), and
subdirectories have group-write, group-execute, and other-execute
permissions (i.e. drwxrwxr-x). This will allow anyone in the web
group to modify and execute these files.
b. How to update the data sources
In the UVI there are 6 principal data sources. Each data source is
either used to construct the verb index, supplement the VerbNet
class pages, or both. It is also important to remember that only
the VerbNet data source is supplied on the command line to the UVIG
(shown later). All other data sources are stored within the
generator's 'supplemental' directory under a specific file name.
The UVIG will always look in the 'supplemental' directory to find
these data sources. Here is a quick table enumerating these
details:
Data Source |
Used In Index? |
Used In Class Pages? |
In 'supplemental'? |
File Name |
VerbNet | Yes | Yes | No | N/A |
FrameNet | Yes | No | Yes | framenet.s |
PropBank | Yes | No | Yes | propbank.s |
Groupings | Yes | Yes* | Yes | grouping.s |
WordNet | No | Yes | Yes | wordnet.s |
VN-FN Mapping | No | Yes | Yes | vn-fn.s |
* The 'grouping.s' file doesn't contribute to the grouping links in
the VerbNet class pages. These are created entirely from the
'grouping=' attribute for each member.
i. How to update VerbNet
The updating of VerbNet is done independently of the UVIG. The
VerbNet files are edited in the standard way and configuration
managed using SVN (using standard "update", and "check in"
operations). When it is decided that the most recent files in SVN
are ready to be pushed to the public UVI, you may follow this process to prepare the new files to
be shown in the UVI.
ii. How to update the supplemental data sources
Updating the supplemental data sources for the UVI is a little more
involved. Supplemental data sources are copied into the generator's
'supplemental' directory under specific file names, and the
generator looks for these file names (an alternative implementation
would have been to supply all these file names on the command line,
but then the usage of the UVIG would have become too cumbersome).
The tricky part is knowing what files to copy into the
'supplemental' directory. Each data source has its own format, and
the UVIG 1) is expecting a certain format for each file, and 2)
knows only how to read that one format for each file. This section
will show which files are valid to copy into the 'supplemental'
directory for each source.
- Let's start with WordNet, as it's possibly the most
straight-forward example of how this process works. The UVI
relies on WordNet data to display the sense number which
corresponds to each sense key inside VerbNet. This
sense-key-to-sense-number mapping is stored in the 'index.sense'
file located in the 'dict' directory of any WordNet
distribution. Just copy this file to the 'supplemental'
directory under the name 'wordnet.s'. Here is an example of
this, assuming you are in the 'verb-index' directory:
cd /home/verbs/shared/public_html/verb-index
cp /usr/local/WordNet-3.0/dict/index.sense
generator/supplemental/wordnet.s
|
And there you go. The UVI relies on the 'index.sense' file
from WordNet. Just remember, when you
generate the UVI, verify that the version of 'wordnet.s' (1.7.1,
2.0, 2.1, 3.0, etc.) in the 'supplemental' directory matches the
version of WordNet sense keys in the version of VerbNet that you
use to generate the UVI. For example if you copied the above
2.1 version of the 'index.sense' file to the 'supplemental'
directory, you don't want to regenerate the UVI with VerbNet XML
files that have WN 3.0 sense keys.
- Next let's consider the VerbNet-FrameNet mapping.
The mapping has been made public via two files placed on the web
server, 'vn-fn.xml' and 'vn-fn-roles.xml'. These are in the
'verb-index/fn' directory and thus are available externally
here:
http://verbs.colorado.edu/verb-index/fn/vn-fn.xml
http://verbs.colorado.edu/verb-index/fn/
vn-fn-roles.xml
|
The UVI only uses one of these, the first one, 'vn-fn.xml'. So
let's copy this to the right spot under the right name:
cd /home/verbs/shared/public_html/verb-index/fn
cp vn-fn.xml ../generator/supplemental/vn-fn.s
|
This file also carries with it a warning. The two mapping files
shown above have an original home. It is here:
/home/verbs/shared/mappings/vn-fn
|
Here you can find all the different versions of this mapping.
Each mapping was made with a specific version of VerbNet in
mind. The files in the 'verb-index/fn' directory are given
simpler names than those used in this directory. Be sure to use a mapping file ('vn-fn.xml' or
'VNclass-FNframeMappings.xml') which corresponds to the version
of VerbNet against which you are generating the UVI.
Now, at some point you will get a new mapping file. This file
goes through iterations to keep up with changes to VerbNet and
FrameNet. You may also get a new roles file. When you get new
files, copy them to the '.../mappings/vn-fn' directory above.
Validate the two files using the XML Validator. Sometimes when
you receive new XML files they may have small syntax errors that
make them unreadable by the UVI Generator. Click here to learn about this. If there are
errors with the files correct them.
Then, you can copy the files to the 'verb-index/fn' directory.
Make sure they are given the names 'vn-fn.xml' and
'vn-fn-roles.xml'. Then follow the above procedure to copy the
one file to the 'supplemental' directory.
- Now let's talk about FrameNet. This data source is
used not to supplement any VerbNet class pages, but rather just
to add to the power of the centralized verb index. This is one
of the four sources used in the verb index.
This file is actually supplied to us by the FrameNet folks. Our
copy of this file is located here:
/home/verbs/shared/framenet/verbLexEntries.xml
|
Although we rarely get updated versions of this file, this would
be how you copy it into the generator's supplemental files:
cd /home/verbs/shared/public_html/verb-index
cp /home/verbs/shared/framenet/verbLexEntries.xml
generator/supplemental/framenet.s
|
In other words, you won't need to copy this file to the
generator until you get a new version. Make sure if you do get
a new version, the FrameNet people have used the same XML format
that they did in the previous version, or the UVIG will have to
be modified slightly.
Make sure you validate any new XML
file you get before feeding it to the UVIG. You need to correct
any errors in the XML before using it to generate the UVI.
- Next we arrive at the PropBank data source. This
data source is also involved in the construction of the verb
index. In PropBank there is one file per verb in the lexicon.
Unfortunately, there is no file which contains a list of all
these files. Therefore, we must create this list ourselves via
a directory listing. Even though the following process may seem
complicated (and it would be great if there arises another,
better way), remember that you only need to perform this if a
new version of PropBank has been created and there is a desire
to get the new PropBank links into the UVI. The existing
'propbank.s' file works fine for now.
The relevant HTML PB files are located in:
/home/verbs/shared/public_html/framesets
|
We are going to run a series of commands to take the directory
contents and create a simple list of (verb, URL) pairs that the
UVIG can parse and display. Here is a snippit of the desired
file:
abandon,http://verbs.colorado.edu/framesets/abandon-v.html
abate,http://verbs.colorado.edu/framesets/abate-v.html
abdicate,http://verbs.colorado.edu/framesets/abdicate-v.html
...
|
We will perform these instructions using a handful of steps
instead of chaining them all together with pipes, which you can
also do. This is for clarification purposes. Also, this is
only one of MANY different ways to produce the above format. As
long as you get it into the above format, the UVIG will
correctly add the verbs in the file with the link next to
them.
1. cd /home/verbs/shared/public_html/framesets
2. ls -l > tmp1
3. grep -vE '(^total| 0 )' tmp1 > tmp2
4. grep -o ' [^ ]*-v.html' tmp2 > tmp3
5. sed 's_^ \(.*\)-v.html_\1,http://verbs.colorado.edu/framesets/\1-v.html_' tmp3 > tmp4
|
- STEP 1: Change to PropBank directory.
- STEP 2: Get a long listing of the directory contents (-l = dash ell).
- STEP 3: Filter out invalid lines (those that don't represent a frameset):
- The 'total' line (very first line)
- Any files with size zero (this currently eliminates most the irrelevant files)
- STEP 4: Simplify each line to just a space followed by the file name.
- STEP 5: Modify each line to be a verb followed by a comma followed by a URL. The
'sed' command takes a line that looks like this:
(notice preceding space) and turns it into this:
abandon,http://verbs.colorado.edu/framesets/abandon-v.html
|
Viola! You have massaged the directory listing into a simple
format that the UVIG understands. If you use the above
commands, remember to use the exact syntax included above.
Every little space has meaning. If there is a line break on the
'sed' command in your browser, ignore it completely.
Now all you need to do is put the file into the 'supplemental'
directory:
cp tmp4
../verb-index/generator/supplemental/propbank.s
|
That was a long process, I know. You shouldn't have to do this
that often, as the PropBank directory is not changing radically.
This was supplied just in case however, to make sure you could
update 'propbank.s' if need be.
By the way, to make this process easier, a script has been
created to automate this process. You can have the 'propbank.s'
file created automatically for you by executing these two
commands:
cd verb-index/generator/suppl-gen-scripts
make-pb-data-source
|
(assuming you are in the 'public_html' directory). The
'propbank.s' file will be created for you and placed in the
'suppl-gen-scripts' directory. Copy it to the 'supplemental'
directory.
- Lastly, the OntoNotes Sense Groupings data source is
constructed and utilized almost exactly like the PropBank data
source. The content that comprises this UVI data source comes
directly from the listing of a directory, since each file in
that directory will get an entry in the UVI. That listing is
massaged using 'grep' and 'sed'. Remember that you only need to
perform this if updated grouping files exist and there is a
desire to get the new links into the UVI. The existing
'grouping.s' file works fine for now. The following process
will mirror the PropBank process above.
The relevant grouping HTML files are located in:
/home/verbs/shared/public_html/html_groupings
|
We are going to run a series of commands to take the directory
contents and create a simple list of (verb, URL) pairs that the
UVIG can parse and display. Here is a snippit of the desired
file:
abandon,http://verbs.colorado.edu/html_groupings/abandon-v.html
abate,http://verbs.colorado.edu/html_groupings/abate-v.html
abolish,http://verbs.colorado.edu/html_groupings/abolish-v.html
...
|
We will perform these instructions using a handful of steps
instead of chaining them all together with pipes, which you can
do. This is for clarification purposes. Also, this is only
one of MANY different ways to produce the above format. As
long as you get it into the above format, the UVIG will add
the verbs in the file with the link next to them.
1. cd /home/verbs/shared/public_html/html_groupings
2. ls -l > tmp1
3. grep -e '-v\.html' tmp1 > tmp2
4. grep -o ' [^ ]*-v.html' tmp2 > tmp3
5. sed 's_^ \(.*\)-v.html_\1,http://verbs.colorado.edu/html\_groupings/\1-v.html_' tmp3 > tmp4
|
- STEP 1: Change to groupings directory.
- STEP 2: Get a long listing of the directory contents (-l = dash ell).
- STEP 3: Filter out invalid lines (those that don't represent a grouping):
- The 'total' line (very first line)
- STEP 4: Simplify each line to just a space followed by the file name.
- STEP 5: Modify each line to be a verb followed by a comma followed by a URL. The
'sed' command takes a line that looks like this:
(notice preceding space) and turns it into this:
abandon,http://verbs.colorado.edu/html_groupings/abandon-v.html
|
Now put the file into the 'supplemental' directory:
cp tmp4
../verb-index/generator/supplemental/grouping.s
|
And now you're done. By the way, to make this process easier, a
script has been created to automate this process. You can have
the 'grouping.s' file created automatically for you by executing
these two commands:
cd verb-index/generator/suppl-gen-scripts
make-grp-data-source
|
(assuming you are in the 'public_html' directory). The file
will be created for you and placed in the 'suppl-gen-scripts'
directory.
NOTE: If the format of any of these 5 files
changes, the UVIG code that reads these files may need to be
modified. So hopefully the data sources we get externally
(WordNet, FrameNet) will not change. Similarly we should attempt
to keep the data sources that we create (VN-FN, PropBank,
Groupings) in the same format unless there is some pertinent need
to change them.
c. How to execute the UVIG
Now that you have your data sources ready to be published, follow
these instructions to execute the UVIG (i.e. regenerate the
UVI) via the simplest method:
- Change to 'generator' directory
- Execute command 'run-public'
- Pat self on back - you're done!
Why was this so easy? There are three ways to regenerate the UVI
using the generator:
- java uvi.Generator [flags] <xml-input-dir> <html-output-dir>
- run [flags] <xml-input-dir> <html-output-dir>
- run-public
The 'run' and 'run-public' scripts just make things a little
simpler for you. They hide the need to remember 'java
uvi.Generator'. You will always want to use one of these
two scripts (the command 'java' as used in these maintenance notes
is shorthand for the path to the actual 'java' command).
The 'run-public' script makes things REALLY simple for you. To
execute it you just type:
from the 'generator' directory. It is essentially equivalent to
typing this:
java uvi.Generator -vos
/home/verbs/shared/verbnet/release-newest ..
|
The most common arguments are already supplied in this script, and
thus it's the one you'll most likely want to use to update the
public UVI. The XML input directory supplied is the symbolic link
that points to the most recent version of the VerbNet files. Thus,
the use of this script assumes that the
'release-newest' symbolic link is maintained and
updated. To get more information on maintaining this symbolic
link, click here. The .. stands for
the 'verb-index' directory, which is the parent directory of
'generator' and is where all the UVI files are created. The -v
flag prints verbose output, the -o flag means overwrite (you'll
almost always want to use this) and the -s flag means to sort the
members within each class or subclass.
The 'run' script does not have such defaults. You would only use
'run' over 'run-public' if you do not want to use the
default flags, input directory, and output directory supplied
within. Using the 'run' script allows you to specify your own XML
input directory if you don't want to use the directory currently
pointed to by the symbolic link. Likewise, you can specify your
own output directory (yes you can create a completely independent
UVI site on the server with this feature).
For example, let's say you wanted to create another, separate UVI,
but using an old version of VerbNet for testing and consistency
checking (i.e. to see what it used to be like 1 year ago). Also,
let's say you have an empty target directory, so you don't need the
-o flag (but you can still use it), and you don't want to see tons
of output, so you don't want the -v flag. You would execute
something like this:
run -s /home/verbs/shared/verbnet/history/release2.0
~/public_html/vn2.0-test-uvi
|
Now you would be able to access your new test UVI via a URL similar
to:
http://verbs.colorado.edu/~trumbo/vn2.0-test-uvi |
This is made possible because everything the UVIG needs to build a
complete site is contained within the 'supplemental' directory
(images, stylesheets, etc.).
However, there is one caveat to creating UVIs
from previous versions of VerbNet. You must make sure that the
other data sources in the 'supplemental' directory at the time
of generation are consistent with the version of VerbNet
against which you are generating. More specifically, VerbNet is
dependent on a certain version of WordNet, and the VN-FN mapping is
dependent on a certain version of VerbNet. For example, you
wouldn't want to generate against VerbNet 2.0 while leaving a
'wordnet.s' file in the 'supplemental' directory that corresponds
to WN 3.0 - because VN 2.0 doesn't have WN 3.0 sense keys! Doing
so would create an inconsistent UVI. Similarly if you had placed an
old version of 'vn-fn.s' in the 'supplemental' directory for
testing purposes, make sure you swap it out for the most recent one
before executing 'run-public' (which generates against the most
recent VN version).
More than likely though, you will just need to use the
'run-public' script to generate the UVI. But ALWAYS keep your
'supplemental' data sources consistent with the version of VerbNet
when you generate. Also, the older the version of VerbNet with
which you use to generate the (an) UVI, the more error messages you
are likely to see. This is because as VerbNet matured, errors were
evermore eliminated.
If you want to learn more about what you can do with the UVIG,
type:
in the 'generator' directory. Or read the documentation inside the
'run' and 'run-public' scripts.
2. How to update the 'release-newest' symbolic link
This symbolic link points to the directory of VerbNet files thought
of as the most recently released version. The symbolic link has
two primary purposes. The first purpose is to allow users not to
have to remember which directory holds the most recent version of
VerbNet. In fact, all server users need to remember is that all
the newest VerbNet XML files are located in
/home/verbs/shared/verbnet/release-newest |
They never have to concern themselves with which directory the link
actually points to. The second purpose of the link is to
facilitate the operation of the 'run-public' script in the
'generator' directory. This script directly relies on this
symbolic link. In fact, the symbolic link allows the
'run-public' command to be as simple as it is. The concept of
the 'run-public' script is that it will just regenerate the UVI
with the most recent version of the UVI, and it does this by just
using the files located in the 'release-newest' directory (i.e.
link).
If another directory is added to the VerbNet directory, and it is
to become the "newest" version of VerbNet, then these simple
commands will update the symbolic link to the new directory.
cd /home/verbs/shared/verbnet
rm release-newest
ln -s current/release-7.0 release-newest
|
This will repoint the 'release-newest' symbolic link at the
directory 'current/release-7.0'. Do not put a forward-slash after
'release-newest' in the 'rm' command, nor after the new directory
in the 'ln' command. Both of these are bad in other words:
ln -s current/release-7.0/ release-newest |
Maintaining the link is simple and can save a lot of hassle when it
comes to UVI/VN management. Most likely you will grab the entire
VerbNet file base from Subversion when it is ready to be published.
Put a copy into '/home/verbs/shared/verbnet/current' and redirect the link
to it.
3. How to update the 'verbnet-X.Y.tar.gz' archive
This file contains the official set of files of the most recent
version of VerbNet avaiable to the public. From time to time it
will need to be refreshed with the newest files. Let's assume you
are in a directory containing another directory which has the
newest VerbNet files that you wish to publish to the public web.
Make sure this directory is in publishable format (no extraneous
files, all the XML is valid, etc.). This means that the directory
also does not have the '.svn' directory in it, if you are using
SVN. You can check by executing 'ls -d .svn' in the directory. If
it does exist, you will need to use the subsquent 'copy' method
instead of the 'move' method. Here are some example directory
structures:
.../dir1/dir2/
.../dir1/dir2/new-vn-files/
.../dir1/dir2/new-vn-files/accompany-51.7.xml
.../dir1/dir2/new-vn-files/admire-31.2.xml
.../dir1/dir2/new-vn-files/...etc...
|
.../dir1/verbnet/
.../dir1/verbnet/trunk/
.../dir1/verbnet/trunk/accompany-51.7.xml
.../dir1/verbnet/trunk/admire-31.2.xml
.../dir1/verbnet/trunk/...etc...
|
You can execute these commands once you are in the parent directory
of directory with the VerbNet files (i.e. '.../dir1/dir2' or
'../dir1/verbnet'). Replace "X.Y" with the actual version of the
VerbNet release (i.e. "2.5").
cp -r trunk verbnet-X.Y
rm -r verbnet-X.Y/.svn (remove extra files if necessary)
tar cvf verbnet-X.Y.tar verbnet-X.Y
gzip verbnet-X.Y.tar
rm -r verbnet-X.Y
|
The first step copies the whole directory to a new directory with a
standard name. Then that directory is tar'ed into a single
file (e.g. verbnet-X.Y.tar). Then the tar file is compressed and
the new directory is removed. This is just one way to accomplish
this task. You could also just rename your directory for the
tar/gzip steps and name it back afterwards. It's up to you. Make
sure there are not superfluous files in the directory first though
(e.g. '.svn/').
mv new-vn-files verbnet-X.Y
tar czvf verbnet-X.Y.tar.gz verbnet-X.Y
mv verbnet-X.Y new-vn-files
|
Either way you will be left with a 'verbnet-X.Y.tar.gz' file (with
the actual version number of course). Now all we need to do is
place this file in the correct location and update the public web
with the new link. Move your file to the following location:
/home/verbs/shared/public_html/verb-index/vn
|
Then update this web page:
http://verbs.colorado.edu/~mpalmer/projects/verbnet/downloads.html
|
to reflect the new name (URL) of the file:
http://verbs.colorado.edu/verb-index/vn/verbnet-X.Y.tar.gz
|
You can move the previous VerbNet archive to:
/home/verbs/shared/verbnet/history
|
Now it's ready to go. We did it this way so that when someone
downloads and expands the archive file they see a standard-named
directory that includes the version of the VerbNet files.
4. How to update the 'propbank-X.Y.tar.gz' archive
When you create a new PropBank archive, move it to the
/home/verbs/shared/public_html/verb-index/pb
|
directory. The previous PropBank archives can remain in this
directory for posterity if you want. There is one more change
that needs to be made. The UVI has a link to this archive on its
home page. You need to update the
/home/verbs/shared/public_html/verb-index/
generator/supplemental/index.s
|
file to refer to the new file. After you edit this file, regenerate the UVI:
cd /home/verbs/shared/public_html/verb-index/generator
run-public
|
5. Overview of the 'verb-index' directory contents
Below is a listing of the 'verb-index' directory. The majority of
the files and folders were created by the UVIG. However, some
files exist just for the sake of being made available in a logical
place. These include the VN-FN mapping in 'fn', the most recent
PropBank archive in 'pb', and the most recent VerbNet archive in
'vn'. The 'inspector' and 'vxc' directories and the 'maint-notes*'
files were also not created by the generator.
verb-index/
|-- comments/ (UVI Comments)
|-- comments.php (UVI Comments)
|-- contact.php (UVI Contact Page)
|-- fn/ (VN-FN Mapping Files)
|-- generator/ (UVI Generator Code)
|-- .htaccess (Site Configuration File)
|-- images/ (UVI Images)
|-- include.php (UVI Scripts)
|-- index/ (UVI A-Z Index Pages)
|-- index.php (UVI Home Page)
|-- inspector/ (VerbNet Inspector Code)
|-- login.php (UVI Login)
|-- maint-notes.html (This Page)
|-- maint-notes-suppl1.txt (UVIG Developer Notes)
|-- pb/ (PropBank Archives)
|-- postcomment.php (UVI Comments)
|-- scripts.js (UVI Scripts)
|-- search/ (UVI Search Indicies)
|-- search.php (UVI Search)
|-- styles.css (UVI Styles)
|-- users/ (UVI Users)
|-- vn/ (VerbNet Pages and Archive)
|-- vxc/ (VerbNet-Cyc Mapper Code)
|-- wn/ (UVI WordNet Page)
|
6. Summary of scripts
In this table the * denote scripts and the / denote directories.
/home/verbs/shared/public_html/verb-index/
|-- generator/
|-- compile*
|-- javadoc-gen*
|-- run*
|-- run-public*
|-- suppl-gen-scripts/
|-- make-grp-data-source*
|-- make-pb-data-source*
|-- inspector/
|-- compile*
|-- download-prepare*
|-- javadoc-gen*
|-- mock-app/
|-- javadoc-gen*
|-- run*
|-- run-all*
|-- scripts/
|-- compile*
|-- run*
|-- vxc/
|-- compile*
|-- download-prepare*
|-- javadoc-gen*
|-- run*
|-- run-public*
|-- scripts/
|-- compile*
|-- run*
/home/verbs/shared/verbnet/
|-- GroupingUpdater/
|-- compile*
|-- run*
|-- WordNetUpdater/
|-- compile*
|-- run*
|-- XMLValidator/
|-- compile*
|-- run*
|
7. How to modify the UVIG's code
There are various reasons why you may need to modify the UVI
Generator's code. Mainly these fall under some basic categories:
addition of new features, changes made to the format of existing
data sources, or addition of new data sources. Some of these
changes will require modification of the Java files in the
'generator/src' directory, and others will require modificiation of
the '.s' supplemental files in the 'generator/supplemental'
directory.
How to add a data source that just adds verbs to the index
If the format of any of these supplemental data sources has changed:
- propbank.s
- framenet.s
- grouping.s
- wordnet.s
- vn-fn.s
Then you will need to change how the file is parsed. The method
'Generator.addOthers' is a good place to start.
8. How to use the WordNetUpdater/VerbNet (vn_wnu)
The WordNetUdpater for VerbNet has the goal of updating of the
'wn=' attributes in an entire set of VerbNet XML files based on the
mapping info provided by WordNet each time it releases a new
version of WordNet. The WordNetUpdater operates very similarly in
concept to the GroupingUpdater.
The WordNetUpdater command is:
/home/verbs/shared/bin/vn_wnu
|
You can always see the usage of the command by executing this on
the command line:
Finally, for completeness, the 'vn_wnu' command is actually a
symbolic link to a Java program in the 'verbnet' directory:
/home/verbs/shared/verbnet/WordNetUpdater
|
In this directory you will find the program's source and some
scripts that help with compiling and running. But in general
you just need to use the 'vn_wnu' command.
9. How to use the GroupingUpdater/VerbNet (vn_gu)
The GroupingUpdater for VerbNet has the goal of updating of the
'grouping=' attributes in an entire set of VerbNet XML files based
on the mapping info within a set of OntoNotes Sense Grouping XML
files.
This is how it works. The GroupingUpdater is given 4 things:
- VerbNet XML directory
- Sense Groupings XML directory
- WordNet 'index.sense' file
- VerbNet XML output directory (optional)
If the output directory is not supplied, the GroupingUpdater simply
prints a summary report about the update process without generating
any updated VerbNet files. The WordNet 'index.sense' file is
required to translate VerbNet's WordNet sense keys into WordNet
sense numbers.
Let's assume that the GroupingUpdater has been initiated and is
scanning through the VerbNet XML files. It encounters a VerbNet
class C and a member M with some pre-existing 'grouping=' attribute
(this example will describe what the GroupingUpdater does):
C.xml:
...
<MEMBER name="M" wn="K1 K2" grouping="M.01 M.03" />
...
|
When the GroupingUpdater reaches this member to update it, it looks
for a sense grouping file named 'M-v.xml'. If the file does not
exist then the grouping attribute is left blank (grouping="").
If it does exist, then the file has some number of sense groups
within. Let's say there are G sense groups. The
GroupingUpdater must decide which sense groups in the file
(M.01, M.02, ... M.G) belong in the 'grouping=' attribute.
There are two methods examined:
- Method 1: If a sense group in 'M-v.xml' defines a
link to the VerbNet class (e.g.
<vn>accompany-51.7,captain-29.8-1</vn>) in which
the member M resides (class C in this case), then the sense
group is added to the member's 'grouping=' attribute.
- Method 2: If a sense group in 'M-v.xml' defines a
link to any WordNet sense number (e.g. <wn
version="3.0">5,6,7</wn>) that appears in the WordNet
sense numbers defined for the member, then the sense group is
added to the member's 'grouping=' attribute. The member's
sense keys K1 and K2 must be translated into sense numbers
using the 'index.sense' file first. In other words, if the
intersection between the sense group's WordNet sense numbers
and the VerbNet member's WordNet is non-empty, then the sense
group is added to the member's 'grouping=' attribute.
But which method is actually used to update the files? When an
output directory is supplied and updated files are desired,
Method 2 is currently used to update
the 'grouping=' attributes.
Besides updating the files, printing a comprehensive summary
report is the other main capability of the GroupingUpdater.
The report is supposed to help analysts maintain the consistency
and accuracy of the VerbNet and Grouping data sources. It contains
these sections:
- VerbNet Statistics Only
- WordNet Statistics Only
- Grouping Statistics Only
- Update Statistics
- Exception Lists
Among many other things, the report lists all the instances when
the two methods disagree about how to update a given member line.
Theoretically the two methods should always agree if the mappings
are consistent.
The GroupingUpdater command is:
/home/verbs/shared/bin/vn_gu
|
and should be available to you automatically if your path is set up correctly. If it isn't follow these directions. You can always see the usage of the command by executing this on
the command line:
An example usage of this command is:
vn_gu -n ~/svn/verbnet/trunk ~/svn/sense-inventories
/usr/local/WordNet/dict/index.sense ~/updated-vn-files
|
Finally, for completeness, the 'vn_gu' command is actually a
symbolic link to a Java program in the 'verbnet' directory:
/home/verbs/shared/verbnet/GroupingUpdater
|
In this directory you will find the program's source and some
scripts that help with compiling and running. But in general
you just need to use the 'vn_gu' command.
10. How to use the XML Validator (xml_val)
The XML Validator is a generic tool used to check if an
XML document is 1) well-formed and 2) valid.
Well-formed is an XML term meaning that the XML syntax is
correct (e.g. there's a closing > for every <). Valid
is an XML term meaning that if the XML document specifies a DTD,
that the XML document does not violate any of the rules in the DTD.
The XML Validator command is:
/home/verbs/shared/bin/xml_val
|
You can always see the usage of the command by executing this on
the command line:
The 'xml_val' command takes any number of XML Input Objects.
An XML Input Object is either an individual file (which does
not have to end in '.xml') or a directory with XML files in it
(only the '.xml' files will be scanned). This tool does not
recurse into subdirectories, it just scans the '.xml' files in any
directories you list. Here are some examples:
xml_val some_file.q
xml_val my_file1.xml my_file2.xml
xml_val ~/xml_files ~/more_xml_files
xml_val my_file1.xml ~/xml_files some_file.q
xml_val *
|
The 'some_file.q' file is a single file in XML format but it just
has a non-standard extension (.q). The '~/xml_files' and
'~/more_xml_files' are directories whose '.xml' files inside will be
scanned. The last example sends all files and directories in the
current directory to the command.
There is currently only one non-trivial command line option,
-q. When you have lots of files the 'xml_val' command can
produce lots of output. This is because under normal operating
mode the command prints out the name of the file or directory it's
currently scanning and then the phrase 'XML OK' if everything was
fine. However, when you have lots of files, you may only want to
see the errors and warnings. Use -q or --quiet to
accomplish this. Here is an example using this option - only
errors and warnings will be displayed:
Note on output: All error and warning messages are output on
stderr. All other output is output on stdout. This
means that if you want to grep the output of 'xml_val' for
the number of errors, you'll need to use a shell trick to send
stderr to where stdout goes before/during the piping
operation like so:
bash:
xml_val *.xml 2>&1 | grep -c ERROR
csh:
xml_val *.xml |& grep -c ERROR
|
Note on DTDs: The 'xml_val' command does not receive any DTD
information on the command line. A DTD is specified inside of each
individual XML file. If an XML file specifies a DTD, then that XML
file's validity will be additionally checked against that
DTD. Here are the top two lines of an example XML file which
specifies a DTD:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE BOOKS SYSTEM "books_format.dtd">
...
|
The path of the '.dtd' file should be relative to the '.xml' file in
question. In the above example, the '.dtd' file would have to
reside in the same directory as the '.xml' file.
Finally, for completeness, the 'xml_val' command is actually a
symbolic link to a Java program in the 'verbnet' directory:
/home/verbs/shared/verbnet/XMLValidator
|
In this directory you will find the program's source and some
scripts that help with compiling and running. But in general
you just need to use the 'xml_val' command.
If when you execute the 'xml_val' command
you get an error message indicating that the command cannot be
found (e.g. -bash: xml_val: command not found), you will
need to add the '/home/verbs/shared/bin' path in your $PATH
variable. To do so, you will need to add one of the lines below to
either your '~/.bashrc' or '~/.cshrc' file depending on whether you
use bash or csh.
~/.bashrc:
PATH=$PATH:/home/verbs/shared/bin
~/.cshrc:
setenv PATH $PATH":/home/verbs/shared/bin"
|
11. How to install a new version of VerbNet
When it is time to release a new version of VerbNet,
various steps should be followed to maintain the consistency
and understandability of our files. Some of these steps
are already detailed in these maintenance notes.
- Verify that everyone is indeed ready for a new version to be
released. In other words make sure everyone contributing to the
next version of VerbNet has all their files in and no further
changes need to be made, so that this process need not be
repeated if someone forgot something.
- Find out what the next version number will be. Will it be
2.3, 2.5, 9.6? This version number will be represented as
X.Y in the coming steps.
- Make a new directory in the shared VerbNet directory for the
new version.
- cd /home/verbs/shared/verbnet/current
- mkdir releaseX.Y
- chgrp shared releaseX.Y
- chmod g+w releaseX.Y
- Update your own Subversion "working copy" with the most
recent VerbNet files. Assume that for this example my
Subversion working copy is located at '~/svn/verbnet' (i.e. the
'.svn/' directory is in '~/svn/verbnet').
- cd ~/svn/verbnet
- svn update
- Validate all the XML in your working copy with 'xml_val'. The period
after the 'xml_val' command represents the current directory. If there
are errors, correct them, and commit the changes to Subversion before
continuing to the next step.
- cd ~/svn/verbnet
- xml_val ./trunk
- Copy the Subversion VerbNet files to the new directory.
- cd ~/svn/verbnet/trunk
- cp *.xml *.xsd *.dtd
/home/verbs/shared/verbnet/current/releaseX.Y
- Move any previous VerbNet versions in the 'current'
directory to the 'history' directory. You'll have to make the
directory writable for the move. We try to keep the directories
as read-only as possible to prevent accidental modification.
The token A.B represents the previous VerbNet version.
- cd /home/verbs/shared/verbnet/current
- chmod a+w releaseA.B
- mv releaseA.B ../history
- chmod a-w ../history/releaseA.B
- chmod -R a-w releaseX.Y
- Change the directory at which the 'release-newest' symbolic link points.
- cd /home/verbs/shared/verbnet
- rm release-newest
- ln -s current/releaseX.Y release-newest
See here for more information.
- Regenerate the UVI.
- cd /home/verbs/shared/public_html/verb-index/generator
- run-public
See here for more information.
- Look through the output of the UVIG and look for warnings
and errors. You may find that the UVIG found more errors with
the VerbNet XML files. This is because the UVIG looks beyond
simple XML errors and verifies some VerbNet consistency rules as
well. If there are more errors, you will need to edit the
VerbNet files inside SVN and start this process over again. If
there are no major errors you may consider the UVI generated and
continue to the next step.
- Update the 'verbnet-X.Y.tar.gz' downloadable archive file.
- Follow the procedure located here. In our case the
'new-vn-files' directory in the procedure is the new
'releaseX.Y' directory you just created, and the
'.../dir1/dir2' path is
'/home/verbs/shared/verbnet/current'.
|