uvi
Class WordNet

java.lang.Object
  extended by uvi.WordNet

public class WordNet
extends Object

This class manages the extraction of the WordNet data for the goal of showing sense numbers next to the verbs in the members section of each class or subclass. The file given to the UVIG is the 'index.sense' file from the WordNet system. This file however is named 'wordnet.s' however, per the standards of the supplemental files. The index.sense file is very, very large. Therefore, the data structures in this class are designed in such a way so as not to require a huge amount of processing time upon each regeneration of the UVI.

Version:
1.0, 2006.11.24
Author:
Derek Trumbo
See Also:
Sweeper.printMembers(), Generator.addOthers(int)

Nested Class Summary
private static class WordNet.WordNetSense
          Represents a WordNet sense by holding both the sense key and the sense number of a given sense.
 
Field Summary
private static ArrayList[] allSenseLists
          Holds all the sense keys in VerbNet.
private static int WN_SENSE_LISTS
          The number of lists the sense keys are divided into.
 
Constructor Summary
private WordNet()
          This constructor is private because the class is not intended to ever be instantiated.
 
Method Summary
(package private) static String getSenseNumber(String key)
          Returns the WordNet sense number associated with a given WordNet sense key.
(package private) static void loadSenseNumbers(File senseIndexFile)
          Searches WordNet's sense file for sense numbers that correspond to the sense keys found during the VerbNet pre-scan.
(package private) static void preScan(File[] xmlFiles)
          Scans through all the VerbNet XML files and extract just the WordNet sense keys.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

WN_SENSE_LISTS

private static final int WN_SENSE_LISTS
The number of lists the sense keys are divided into. There is one for each letter of the alphabet (this is to speed sorting and access times).

See Also:
Constant Field Values

allSenseLists

private static ArrayList[] allSenseLists
Holds all the sense keys in VerbNet. This array of arrays contains one array per letter of the English alphabet. Each sense key will be added to the list that corresponds to the letter with which it begins. Additionally each list will also be held in sorted order. Here's an example of what this data structure might hold:
ABC ...
abandon%2:38:00
abandon%2:40:00
abash%2:37:00
abate%2:30:00
abate%2:30:01
abduct%2:35:02
...
baa%2:32:00
babble%2:32:00
babble%2:32:02
babble%2:39:00
backbite%2:32:00
backpack%2:38:00
...
cabbage%2:40:00
cable%2:32:00
cackle%2:29:00
cackle%2:32:00
cackle%2:32:01
caddy%2:33:00
...
 

Constructor Detail

WordNet

private WordNet()
This constructor is private because the class is not intended to ever be instantiated. The UVI generation is a very procedural process and thus all the members are static.

Method Detail

preScan

static void preScan(File[] xmlFiles)
Scans through all the VerbNet XML files and extract just the WordNet sense keys. These are added to the per-letter array of arrays in sorted order (the array for each letter is sorted). The reason for this is that the index.sense file itself is in sorted order and this will allow us to perform just a single pass over this 200,000 line file.

Parameters:
xmlFiles - a list of all the XML files in the XML input directory. This is received from the Generator class.
See Also:
Generator.addOthers(int), allSenseLists

loadSenseNumbers

static void loadSenseNumbers(File senseIndexFile)
Searches WordNet's sense file for sense numbers that correspond to the sense keys found during the VerbNet pre-scan. Each WordNet.WordNetSense object in the allSenseLists data structure will have its sense number filled in if the object's sense key exists in the file. The allSenseLists data structure is in sorted order and the index.sense file is in sorted order so this allows this method to perform just one pass over this 200,000+ line file.

Parameters:
senseIndexFile - the File object corresponding to the WordNet index.sense file (named wordnet.s in the UVIG system)

getSenseNumber

static String getSenseNumber(String key)
Returns the WordNet sense number associated with a given WordNet sense key. Searches the allSenseLists data structure looking for the appropriate WordNet.WordNetSense object. Once found, the sense number it contains is returned.

Parameters:
key - the sense key for which a sense number should be retrieved
Returns:
the sense number corresponding to the given sense key, or ? if the sense key does not exist in the lists
See Also:
Sweeper.printMembers()