VerbNet Sense and Class Embeddings

Our current work on VerbNet has lead to the development of embeddings based on specific verb senses, as well as generic embeddings that represent VerbNet classes.

The VerbNet sense embeddings are learned through corpora. We ran automatic word sense tagging (for VerbNet 3.2) on 450m sentences of Wikipeida, generating a word sense corpus. The embeddings can then be learned via any word embedding algorithm. We trained 100 dimension embeddings using both GloVe and Word2Vec embeddings models. This process yields normal word vectors for all other parts of speech, with word sense vectors for verbs in the same space.
Class embeddings were generated in two ways. First, we clipped the word sense information, replacing each verb in the corpus with just the class number. This allows embeddings to be trained on new data, as the word sense vectors were. As a flexible alternative, we generated class embeddings based on the GoogleNews vectors used for Word2Vec, by taking the average embedding over all words in each class. Preliminary studies show these two methods to be similarly effective, and this allows for class embeddings to be used in the same space as the GoogleNews-based vectors.

The learned models are hosted here on verbs, but note that they are fairly large. The class vectors for GoogleNews are limited to VerbNet classes, so they are much smaller. The Word2Vec models were trained using GenSim; documentation here is helpful for loading either type in Python.

Sense Embeddings

GloVe 100 dimension model with Verb Sense embeddings
Word2Vec 100 dimension model with Verb Sense embeddings

Verb Class Embeddings

Glove 100 dimension model with Verb Class embeddings
Word2Vec 100 dimension model with Verb Class embeddings

GoogleNews Centroid Verb Class Embeddings (Coming soon)