Term Project Ideas - Intention
The goal behind this is assignment is to have you become familiar with
a specific research area and take a stab at moving the state of the
art forward. The project could be primarily linguistic analysis or
primarily programming or could combine the two. You are encouraged to
work in teams of 2, 3 or even 4 persons. If you do work in a team it
should be interdisciplinary, it should include students from at least
two different departments. You should assume that you will have to
read something like 3 or 4 papers over and above the class required
readings to ground yourself in the research area. You will then
define an experiment or a set of analyses or a system that you will
run, perform or implement, respectively, to explore some aspect of
your research area. You are expected to turn in a 5 page, single
spaced paper describing your project, and give a 10-15 minute
presentation on it. You could also do serious comparison of two or
three approaches at a detailed level, in which case you would turn in
a longer paper of at least 10 pages or more.
In addition to your own Term Project, you are expected to be a
discussant on another project. The Discussant assignments are on the
class web page. That will involve reading the project background
paper(s), asking constructive questions during the project
presentation, and turning in this questionnaire.
- Look at the web
pages for previous versions of this course to see what other students
A synopsis of IBM's Watson Q/A system - KA
An Empirically Validated Thematic Role Hierarchy - LG & JI
In Trento, Sara Tonelli and Irina Sergienya (a Ph.D. student) are
working on automatically extracting hierarchical relations between
PropBank arguments, VerbNet thematic roles and FrameNet Frame Elements
from the annotations in PropBank I that are available from Semlink. This project would
take the output of their automatic extraction and evaluate it, so that
the results can be published. It would therefore require at least
some computing skills, but primarily linguistic judgements. A useful
reference for this evaluation, in addition to the PB, VN and FN papers
already posted, would be
- Bonial, C., Corvey, W., Palmer, M., Petukhova, V., and Bunt,
A Hierarchical Unification of LIRICS and VerbNet Semantic
Proceedings of the ICSC Workshop on Semantic Annotation for
Computational Linguistic Resources (SACL-ICSC 2011), Sep, 2011.
- A FrameNet hierarchy paper
A Critical Anlaysis of BabelNet - LE, JE, GR
See this link to be introduced to
an "encyclopedic dictionary" and a multilingual ontology created by
mapping the largest multilingual Web encyclopedia - i.e., Wikipedia -
to the most popular computational lexicon of English - i.e., WordNet.
The integration is performed via an automatic linking algorithm and by
filling in lexical gaps with the aid of Machine Translation. The
result is an "encyclopedic dictionary" that provides babel synsets,
i.e., concepts and named entities lexicalized in many languages and
connected with large amounts of semantic relations. For the paper
presentations, one or two from the website, but also one or two
explaining the technical approaches in more detail.
This project could include a detailed critique of the strengths and limitations of BabelNet as currently implemented, and a pilot implementation of an alternative approach.
CPA and FrameNet - DP, KS, TO, JP (see induction of SR as well)
How well does CPA corresopond to distinct FrameNet Frames? In concert
with Octavian Popescu and Sara Tonelli (via skype), investigate the
relationships that exists between the corpus patterns and
FrameNet. The topic may be investigated from both a theoretical and an
empirical point of view.
Trento will make available corpora with examples of corpus patterns,
both manually annotated (CPA - extracted from BNC) , and automatically
extracted (from BNC, but we could also use the OntoNotes/SemLink text
resources) in Italian and English, in order to see if a sufficiently
accurate mapping could be generated between corpus patterns and
frames, maybe via ontological attributes (SUMO). Tools involving
corpus patterns (preprocessing, extraction, learning, recognition)
could also be provided. In the IWSC 2013 paper, the tool for learning
and recognition is described.
How does Octavian's approach constrast with the approach outlined in
the Lexical Substitutability paper?
- Popescu, O., 2012,
Building a Resource of Patterns Using Semantic Types, In the
Proceedings of LREC-2012, Istanbul, Turkey.
- Popescu, O., 2013, Learning Corpus Patterns Using Finite State
Automata, In the Proceedings of the International Workshop on
Computational Semantics, Potsdam, Germany. Ask Dr. Palmer for the paper.
Event Detection - MB (see also WSD)
There are several possible projects on this topic. For instance,
could the ideas on inducing semantic relations (see below) be extended
to the notion of event detection? How would that differ from the
approach outlined in the following papers?
How would any of these differ from the following:
James Allen, Jansen Orfan, Will de Beaumont, Choh Man Teng, Lucian
Galescu and Mary Swift, Automatically Deriving Event Ontologies for a
Commonsense Knowledge Base, In the Proceedings of the
International Workshop on Computational Semantics, Potsdam,
Germany. Ask Dr. Palmer for the paper.
Wei Lu; Dan Roth (2012),
Automatic Event Extraction with Structured Preference Modeling, In the Proceedings of ACL-2012, South Korea.
Michael Roth; Anette Frank (2012),
Aligning Predicates across Monolingual Comparable Texts using
Graph-based Clustering, In the Proceedings of the Conference on
Empirical Methods in Natural Language Processing and Natural Language
Learning, EMNLP-CoNLL held in conjunction with ACL 2012, South Korea
- Extract a list of eventive types - verbs and nominalizations from
PropBank OR from WordNet OR from FrameNet.
- Do any of these resources provide useful event/subevent relations?
Are there recognizable syntactic/semantic patterns in how these
relationships are expressed?
- Can these patterns provide a bootstrapping mechanism for
discovering more such relations? (see Inducing semnatic relations again).
We have a syntactic parser that has been trained on DARPA data. We
have new fragmentary medical data it has been retrained on, and we
would like to know if it is handling the sentence fragments correctly.
This requires identifying the fragmentary sentence in the
corpus, comparing the performance before and after retraining on the
fragmentary training data, and doing error analysis on the sentence
fragments that are not parsed correctly. This would be an excellent
interdisciplinary project, with linguists and cs students.
Project Ideas based on Papers such as the following: - MP, Sentiment
- Humphreys et al, 1997,Fifth Workshop on Very Large Corpora, held in conjunction with ACL 1997,
Event Coreference for Information Extraction,
- Oliver Culo, Katrin Erk, Sebastian Pado and Sabine Schulte im Walde.
Comparing and Combining Semantic Verb Classifications.
Language Resources and Evaluation 42(3),2008
- Stephan Greene and Philip Resnik, 2009,
More Than Words: Syntactic Packaging and Implicit Sentiment,
In the Proceedings of NAACL 2009, Boulder, CO.
Acquiring Semantic Class Preferences from Corpora - MG, MO
We have just parsed and done SRL on Gigaword. We can add VN class
tags, so we have the data to investigate how well some of these
techniques work, and what their strengths and weaknesses are. Again,
something that would really benefit from a team of linguists and
computer scientists. It would also be interesting to compare these
approaches with the Lexical Substitutability paper.
- Beqat Zapirain; Eneko Agirre; Lluis Marquez,
Generalizing over Lexical Features: Selectional Preferences for Semantic Role Classification ACL-09
- Mihai Surdeanu, Lluis Marquez, Xavier Carreras, and Pere R. Comas.
Combination Strategies for Semantic Role Labeling.
Journal of Artificial Intelligence Research 29 (2007).
- Benat Zapirain, Eneko Agirre, Lluis Marquez and Mihai Surdeanu,
Selectional Preferences for Semantic Role Classification. Computational Linguistics 39(3), 2013.
- Pavel Rychly and Adam Kilgarriff, 2007,
An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments), In the
Proceedings of the 45th Annual Meeting of the ACL: Interactive Poster and Demonstration Sessions
Pages 41-44, Prague, the Czech Republic.
Exploring the Argument mismatches between PropBank and VerbNet - YA, Arabic PB and VN
We have a database that maps between PropBank Frame File entries and VerbNet thematic grids. Sometimes an argument on one side will have no mapping on the other side. Are these usually adjuncts, or is there another explanation?
This could be for a language other than English, such as Arabic, if there is already an Arabic PropBank and an Arabic VerbNet.
Evaluating the contribution SRL makes to an application/Extensions to SRL - TL, Chinese VerbNet
This could include evaluating the contribution SRL makes to any NLP application, or developing PropBanks or VerbNets or FrameNets for other languages such as Chinese or Arabic.
Word Sense Disambiguation, extended to include NER - JG & AJ (Clinical NER), MB
A topic in this area could range from an implementation to an in-depth analysis of different approaches and their strengths and weaknesses.
- Zhi Zhong; Hwee Tou Ng; Yee Seng Chan, 2008,
Word Sense Disambiguation Using OntoNotes: An Empirical Study EMNLP-08
- Navigli, R., 2006,
Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance, In the Proceedings of ACL2006, Sydney, Australia.
- Navigli, R., 2009,
Word Sense Disambiguation: A Survey ACM Computing Surveys,
41(2), ACM Press, pp. 1-69.
- Daniel M. Bikel, Richard Schwartz and Ralph M. Weischedel. 1999.
An Algorithm that Learns What's in a Name, In the Special
Issue on Natural Language Learning, Machine Learning, 34,
Design Challenges and Misconceptions in Named Entity Recognition
L. Ratinov and D. Roth
CoNLL - 2009
Topic Modeling - TS w/ Jim Martin and EPIC group
Gloria Mark, Mossaab Bagdouri, Leysia Palen, James Martin, Ban Al-Ani1, Kenneth Anderson, 2012, Blogs as a Collective War Diary,
In the Proceedings of CSCW 2012, Seattle,WASH. Ask me for the paper
Induction of Semantic Relations - DB, KS, TO, JP
Just about anything related to one of these papers or to
- Rion Snow, Daniel Jurafsky and Andrew Y. Ng,
Semantic Taxonomy Induction from Heterogenous Evidence,
pp 801-808, ACL-06
- Dmitry Davidov; Ari Rappoport,
Classification of Semantic Relationships between Nominals Using Pattern Clusters
- Lili Kotlerman; Ido Dagan; Idan Szpektor; Maayan Zhitomirsky-Geffet, 2009,
Directional Distributional Similarity for Lexical Expansion In the Proceedings of ACL-09
- P. D. Turney and P. Pantel (2010)
"From Frequency to Meaning: Vector Space Models of Semantics" ,
Journal of Artificial Intelligence Research, Volume 37, pages 141-188.
Topic of Your Choice - SL, rich dependency labels for Korean Dependency Structure parses
I'm open to suggestion. Find a couple of papers you are interested in and we can talk about how to turn that into a project.