Steps to add a new data source to the UVI (i.e. how to add more verbs with corresponding links to the index of verbs). The steps are ordered top-to-bottom order within a given source file. This tutorial assumes that your data source will *not* affect the VerbNet class pages (like the WordNet and VN-FN Mapping data sources do). ------------------------------------ FILE: Generator.java (generator/src) ------------------------------------ STEP 1 ====== Add a new 'static final int' variable to uniquely identify the data source. This acts as a kind of 'numeric ID' for the data source in the code. Examples of other such variables are: DS_PROPBANK DS_FRAMENET DS_WORDNET ... Add this variable under the other such variables in the same section. Also add the appropriate documentation above your new variable. In fact, you can just copy and paste an existing ID variable and its documentation, then modify it for your needs. An example of what you might add: /** * Used to identify the _______ as a data source. * * @see uvi.Generator#generateHTMLFiles() * @see uvi.Generator#addOthers(int) */ static final int DS_MY_SOURCE = 7; Remember to modify the documentation to represent your data source. STEP 2 ====== Add a string literal (i.e. "xxxxxxxx") to the "sNames" array. The variable name "sNames" stands for "supplemental file names". This is an array of strings which holds all the names of the files that must be present in the "generator/supplemental" directory when the generator is executed. If a file whose name is in this string array is not present in the "supplemental" directory, then the generator will not run. An example of what currently is there is: private static String[] sNames = { "propbank.s", // CSV (verb,url) "framenet.s", // XML "grouping.s", // CSV (verb,url) "vn-fn.s", // XML "vn-cyc.s", // XML (optional to have in directory) "wordnet.s", // WordNet format (index.sense) ... Put a file name of your choosing into this list: private static String[] sNames = { "propbank.s", // CSV (verb,url) "framenet.s", // XML "grouping.s", // CSV (verb,url) "my_source.s", // MY NEW DATA SOURCE (A NOTE HERE) "vn-fn.s", // XML "vn-cyc.s", // XML (optional to have in directory) "wordnet.s", // WordNet format (index.sense) ... Don't forget to make it end with a ".s" and don't forget the comma afterwards (unless you add the string as the last string in the array). STEP 3 ====== Add a some code that will 1) announce upon generation that the data source is being added and 2) extract the data from your file. This code belongs in the "generateHTMLFiles" method, after the line "VN_FN_Map.printUnused();". Here you will see existing sets of "if/println/addOthers" lines. Copy these lines from another such data source and modify them appropriately. For example, you might add the following lines after a different data source has been announced and extracted (just remember to put your lines before the "Index.sort();" line): if( flVerbose && !flQuiet ) println( "Additional Source: Extracting _______ for index from \"my_source.s\"..." ); // Add _______ links to index. addOthers( DS_MY_SOURCE ); Remember to change 4 things: 1) the name of your data source to be printed to the screen 2) the name of your supplemental file at the end of the output text 3) the documentation above the "addOthers" line 4) the numeric ID argument that is passed to the "addOthers" method STEP 4 ====== Add a mapping between the numeric ID and the supplemental file name inside the "addOthers" method. In other words, modify the "switch" statement near the top like so: switch( type ) { case DS_PROPBANK: fileName = "propbank.s"; break; case DS_FRAMENET: fileName = "framenet.s"; break; case DS_GROUPING: fileName = "grouping.s"; break; case DS_WORDNET: fileName = "wordnet.s"; break; case DS_VN_FN: fileName = "vn-fn.s"; break; case DS_VN_CYC: fileName = "vn-cyc.s"; break; case DS_MY_SOURCE: fileName = "my_source.s"; break; // NEW LINE default: eprintln( "ERROR: Invalid Generator.DS_* constant." ); return; } STEP 5 ====== Next is the heart of the changes required to extract the information from your data source. These changes also occur in "addOthers". Our goal now is to write some code that will read your file, and add the correct information to the generator's internal data structure for representing the verb index. This data structure is essentially the "Index" class. Here are some examples of adding a verb to the index with a corresponding URL: Index.addLink( "jump", DS_PROPBANK, "jump-link", "http://verbs.colorado.edu/framesets/jump-v.html" ); Index.addLink( "accumulate", DS_FRAMENET, "click me!", "http://framenet.com/accumulate" ); Index.addLink( "stab", DS_GROUPING, "(Grouping)", "http://verbs.colorado.ed/groupings/stab-v.html" ); So you see, the first argument is the verb to be placed into the left column of the index. The second argument is your numeric data source ID. The third and fourth arguments define the URL that you want to add for the verb you have specified in the first argument. The third argument is the *text* of the link, and the fourth argument is the *URL* of the link. Obviously, you won't be specifying actual hard-coded strings in the code, you'll be extracting them from a data source and adding them iteratively. Here's the "addLink" line from the PropBank data source: Index.addLink( parts[ 0 ], DS_PROPBANK, "(PropBank)", parts[ 1 ] ); The code that extracts the data for the PropBank data source has created a two-element array like {"verb", "http://where"}. Thus, the first argument is the verb, the second is the numeric ID for PropBank, the third argument is just the phrase "(PropBank)" since there's only one URL per PB verb, we can get away with a constant label, and the fourth argument is the URL to which the link should point. Now let's talk about where/how you add your data extraction code. As you'll notice, the bulk of the "addOthers" method is a series of "if/else if" statements. These statements define which code codes with which numeric ID that is passed into the method (DS_PROPBANK, DS_FRAMENET, etc.). Essentially you'll want to add another "else if" statement that compares the "type" variable to your new numeric ID (DS_MY_SOURCE). The first "if" statement actually groups all the XML data sources together so their extraction code can share some common XML parsing code. So this section currently handles at least FrameNet and the VN-FN Mapping. If your data file is XML, you can also have your code fall into this code block by modifying it: // If the data source is an XML file... if( type == DS_FRAMENET || type == DS_VN_FN || type == DS_VN_CYC || type == DS_MY_SOURCE ) { And adding your code block under the data-source-specific section: // Scan the FrameNet XML tree. if( type == DS_FRAMENET ) { ... (code not shown) } // Scan the VN-FN XML tree. else if( type == DS_VN_FN ) { ... } // Scan the VN-CYC XML tree. else if( type == DS_VN_CYC ) { ... } // Scan the _______ XML tree. (this is the new section) else if( type == DS_MY_SOURCE ) { // Your "extraction code" here. // Will use the line Index.addLink( x1, x2, x3, x4 ); } But if your data source is not XML, but just regular text (like "propbank.s" or "grouping.s") then you can place your "else if" block like so: // If the data source is an XML file... if( type == DS_FRAMENET || type == DS_VN_FN || type == DS_VN_CYC ) { ... (code not shown) } // Scan the PropBank file. else if( type == DS_PROPBANK ) { ... } // Scan the Grouping file. else if( type == DS_GROUPING ) { ... } // Scan the WordNet file. else if( type == DS_WORDNET ) { ... } // Scan the ______ file. else if( type == DS_MY_SOURCE ) { // Your "extraction code" here. // Will use the line Index.addLink( x1, x2, x3, x4 ); } Most likely, you will need to know how to code in Java in order to complete the "extraction code". But, if your data source file is similar to one of the existing data sources, you can just copy and paste the code extraction code and make some changes. For example if your file is just one verb or link per line of text in the file, you might be able to write some code similar to the PropBank or Grouping data sources. If your file is formatted like this: verb1,http://verblink1 verb2,http://verblink2 verb3,http://verblink3 ... then you could use code like this as the other mentioned sources have: BufferedReader in = new BufferedReader( new FileReader( ( File ) sFiles.get( fileName ) ) ); String line; // Read the desired file line-by-line. while( ( line = in.readLine() ) != null ) { // Split the 'verb,url' pair into its constituent parts. String[] parts = line.split( "," ); // Add a Grouping entry to the index. Index.addLink( parts[ 0 ], DS_MY_SOURCE, "(My Source)", parts[ 1 ] ); } in.close(); Or if you wanted to put text in the link specific to each verb instead of the same "(My Source)" label: verb1,link text 1,http://verblink1 verb2,link text 2,http://verblink2 verb3,link text 3,http://verblink3 ... You would use this code: BufferedReader in = new BufferedReader( new FileReader( ( File ) sFiles.get( fileName ) ) ); String line; // Read the desired file line-by-line. while( ( line = in.readLine() ) != null ) { // Split the 'verb,url' pair into its constituent parts. String[] parts = line.split( "," ); // Add a Grouping entry to the index. Index.addLink( parts[ 0 ], DS_MY_SOURCE, parts[ 1 ], parts[ 2 ] ); } in.close(); But the main idea is that you use Java's IO constructs to read your file, and then add the appropriate data into the index using the "Index.addLink" method. STEP 6 ====== Add your data source's "verb total" to the UVI's home page. This is accomplished in the "generateIndexFiles" method. Add a line of code like so to the correct section: ... Q.oh( 5, "" + Index.getNumVerbs( DS_PROPBANK ) + " total PropBank links
" ); Q.oh( 5, "" + Index.getNumVerbs( DS_FRAMENET ) + " total FrameNet links
" ); Q.oh( 5, "" + Index.getNumVerbs( DS_GROUPING ) + " total Grouping links" ); Q.oh( 5, "" + Index.getNumVerbs( DS_MY_SOURCE ) + " total _______ links" ); // NEW LINE Q.oh( 4, "" ); ... The "Index" class will take care of counting the number of unique verbs that you added from your data source. STEP 7 ====== The last code change in the "Generator.java" file is to make your new data source searchable! This change also takes place in the "generateIndexFiles" method. Update the following block of code to include your data source: // Choose type token and label. switch( il.type ) { case DS_VERBNET: src = "V"; label = il.text; break; case DS_PROPBANK: src = "P"; label = ie.verb + ".v"; break; case DS_FRAMENET: src = "F"; label = il.link.substring( fnURL.length() ); break; case DS_GROUPING: src = "G"; label = ie.verb + ".v"; break; case DS_MY_SOURCE: src = "M"; label = _______; break; // NEW LINE } The letter "M" is a letter arbitrarily chosen by you. Just make sure it is unique among the other such letters in this switch block (don't choose "C" either, as that is taken for VerbNet classes). The blank is the text of the link that you want to be shown when the result is returned. There are two objects from which you can pull information "ie" (index entry) and "il" (index link). Remember how the index in the UVI is structured: ... verb link, link, link verb link, link, link ... Each verb and its associated links is an "ie", and each link is an "il": [verb [link=il], [link=il], [link=il]]=ie Here is an example of the data you will find in these objects: Each entry object has: ie.verb = "jump" ie.links = {il, il, il} Each link object has: il.type = 6 (the value of DS_GROUPING, what data source the link is from) il.text = "(Grouping)" il.link = "http://verbs.colorado.edu/grouping/jump-v.html" So if you know your data source only provides one link per verb, maybe you just want the search results to show the verb name, in which case you would type: label = ie.verb; Or maybe your data source adds multiple links to a given verb: verbMy linkMy1, linkMy2 In which case you might want to have the search results print out something related to the specific link text, not just the verb name, like this: label = ie.verb + " (" + il.text + ")"; One thing you can do is decide what existing data source your data source is most similar to, and try using its pattern above. VerbNet and FrameNet can provide multiple links for a given verb, and the PropBank and Grouping data sources provide at most a single link for a given verb. -------------------------------------------------- FILE: search.s (generator/supplemental) [PHP Code] -------------------------------------------------- STEP 8 ====== Add a new variable that will act as a counter for the matches found when a search is made against your data source: // Initialize match counters (one for each of the sources). $vc = 0; $pc = 0; $fc = 0; $cc = 0; $gc = 0; $mc = 0; // NEW "MY SOURCE COUNT" VARIABLE STEP 9 ====== Add a line that will place a match from the search files into the appropriate PHP array. In this step just copy and paste an existing line and modify it correctly. // If the verb or class begins with the user's input, // then we have a match. Store the line into the appropriate array. if( substr( $verbOrClass, 0, strlen( $searchWord ) ) == $searchWord ) { if( $type == "V" ) $vmatches[ $vc++ ] = $line; elseif( $type == "P" ) $pmatches[ $pc++ ] = $line; elseif( $type == "F" ) $fmatches[ $fc++ ] = $line; elseif( $type == "G" ) $gmatches[ $gc++ ] = $line; elseif( $type == "C" ) $cmatches[ $cc++ ] = $line; elseif( $type == "M" ) $mmatches[ $mc++ ] = $line; // NEW LINE } So add a line similar to the others with 1) the letter you chose in STEP 7 (i.e. "M"), 2) a new array variable name *different* from the others above it ("$mmatches"), and the counter you initialized in STEP 8 (i.e. "$mc"). STEP 10 ======= Modify the line that will show a warning table if there were no matches: // If there were no matches made, show a warning table. if( $vc == 0 && $pc == 0 && $fc == 0 && $gc == 0 && $cc == 0 && $mc == 0 ) { ... STEP 11 ======= The last step in modifying the "search.s" file is to actually modify the HTML sent to the browser so that your new data source has its own "result box". The search results page is divided into two columns. Each column has a handful of variable-height result boxes. You can add your result box to the end of either of these columns (or you could add it before another data source's result box if you wished). In order to do this you'll need to add some HTML/PHP in the following section like so:
Notice that an entire HTML table row was copied from a different data source's row and pasted at the end of one of the two table columns. There were 4 things that needed to be changed in the call to "showResultTable": 1) the first argument is the name that will be displayed in the title bar of the result box 2) the second argument is the text T to be used in the phrase "no T matches" when there were no matches in your data source 3) the third argument is your data source's counter variable as specified in STEP 8 (i.e. "$mc") 4) the fourth argument is your data source's match array as specified in STEP 9 (i.e. "$mmatches") Make sure you copy and paste the table row correctly, revising the syntax carefully, or you will end up with some malformed HTML or a broken PHP script. STEP 12 ======= All of the source code changes are done! Now compile the new Java code by going into this directory: /home/verbs/shared/public_html/verb-index/generator and typing compile This will compile your new code. Now add your data source file to the other supplemental files by copying the file into the appropriate directory: cp ~/my_source.s /home/verbs/shared/public_html/verb-index/generator/supplemental (replace the first argument with the actual location and name of your supplemental data file) Now from the "generator" directory above you can regenerate the UVI with this command: run-specific Revise the error output shown to the console and the generated UVI.