VerbNet Sense and Class Embeddings
VerbNet Sense and Class Embeddings
Our current work on VerbNet has lead to the development of embeddings based
on specific verb senses, as well as generic embeddings that represent
VerbNet classes.
- The VerbNet sense embeddings are learned through corpora. We ran
automatic word sense tagging (for VerbNet 3.2) on 450m sentences of
Wikipeida, generating a word sense corpus. The embeddings can then be
learned via any word embedding algorithm. We trained 100 dimension
embeddings using both GloVe and Word2Vec embeddings models. This process yields normal word vectors for
all other parts of speech, with word sense vectors for verbs in the same
space.
- Class embeddings were generated in two ways. First, we clipped the word
sense information, replacing each verb in the corpus with just the class
number. This allows embeddings to be trained on new data, as the word
sense vectors were. As a flexible alternative, we generated class
embeddings based on the GoogleNews vectors used for Word2Vec, by taking
the average embedding over all words in each class. Preliminary studies
show these two methods to be similarly effective, and this allows for
class embeddings to be used in the same space as the GoogleNews-based vectors.
The learned models are hosted here on verbs, but note that they are fairly
large. The class vectors for GoogleNews are limited to VerbNet classes, so
they are much smaller. The Word2Vec models were trained using GenSim;
documentation here is helpful for loading either type in Python.
Sense Embeddings
GloVe 100 dimension model with Verb Sense embeddings
Word2Vec 100 dimension model with Verb Sense embeddings
Verb Class Embeddings
Glove 100 dimension model with Verb Class embeddings
Word2Vec 100 dimension model with Verb Class embeddings
GoogleNews Centroid Verb Class Embeddings (Coming soon)