Term Project Ideas for LING 7800/ CSCI 7000

Term Project Ideas - Intention

The goal behind this is assignment is to have you become familiar with a specific research area and take a stab at moving the state of the art forward. The project could be primarily linguistic analysis or primarily programming or could combine the two. You are encouraged to work in teams of 2, 3 or even 4 persons. If you do work in a team it should be interdisciplinary, it should include students from at least two different departments. You should assume that you will have to read something like 3 or 4 papers over and above the class required readings to ground yourself in the research area. You will then define an experiment or a set of analyses or a system that you will run, perform or implement, respectively, to explore some aspect of your research area. You are expected to turn in a 5 page, single spaced paper describing your project, and give a 10-15 minute presentation on it. You could also do serious comparison of two or three approaches at a detailed level, in which case you would turn in a longer paper of at least 10 pages or more.

In addition to your own Term Project, you are expected to be a discussant on another project. The Discussant assignments are on the class web page. That will involve reading the project background paper(s), asking constructive questions during the project presentation, and turning in this questionnaire.

Past Projects

Look at the web pages for previous versions of this course to see what other students have done.

A synopsis of IBM's Watson Q/A system - KA

An Empirically Validated Thematic Role Hierarchy - LG & JI

In Trento, Sara Tonelli and Irina Sergienya (a Ph.D. student) are working on automatically extracting hierarchical relations between PropBank arguments, VerbNet thematic roles and FrameNet Frame Elements from the annotations in PropBank I that are available from Semlink. This project would take the output of their automatic extraction and evaluate it, so that the results can be published. It would therefore require at least some computing skills, but primarily linguistic judgements. A useful reference for this evaluation, in addition to the PB, VN and FN papers already posted, would be

Bonial, C., Corvey, W., Palmer, M., Petukhova, V., and Bunt, H. 2011. A Hierarchical Unification of LIRICS and VerbNet Semantic Roles. Proceedings of the ICSC Workshop on Semantic Annotation for Computational Linguistic Resources (SACL-ICSC 2011), Sep, 2011.
A FrameNet hierarchy paper

A Critical Anlaysis of BabelNet - LE, JE, GR

See this link to be introduced to an "encyclopedic dictionary" and a multilingual ontology created by mapping the largest multilingual Web encyclopedia - i.e., Wikipedia - to the most popular computational lexicon of English - i.e., WordNet. The integration is performed via an automatic linking algorithm and by filling in lexical gaps with the aid of Machine Translation. The result is an "encyclopedic dictionary" that provides babel synsets, i.e., concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. For the paper presentations, one or two from the website, but also one or two explaining the technical approaches in more detail.

This project could include a detailed critique of the strengths and limitations of BabelNet as currently implemented, and a pilot implementation of an alternative approach.

CPA and FrameNet - DP, KS, TO, JP (see induction of SR as well)

How well does CPA corresopond to distinct FrameNet Frames? In concert with Octavian Popescu and Sara Tonelli (via skype), investigate the relationships that exists between the corpus patterns and FrameNet. The topic may be investigated from both a theoretical and an empirical point of view. Trento will make available corpora with examples of corpus patterns, both manually annotated (CPA - extracted from BNC) , and automatically extracted (from BNC, but we could also use the OntoNotes/SemLink text resources) in Italian and English, in order to see if a sufficiently accurate mapping could be generated between corpus patterns and frames, maybe via ontological attributes (SUMO). Tools involving corpus patterns (preprocessing, extraction, learning, recognition) could also be provided. In the IWSC 2013 paper, the tool for learning and recognition is described. How does Octavian's approach constrast with the approach outlined in the Lexical Substitutability paper?

Popescu, O., 2012, Building a Resource of Patterns Using Semantic Types, In the Proceedings of LREC-2012, Istanbul, Turkey.
Popescu, O., 2013, Learning Corpus Patterns Using Finite State Automata, In the Proceedings of the International Workshop on Computational Semantics, Potsdam, Germany. Ask Dr. Palmer for the paper.

Event Detection - MB (see also WSD)

There are several possible projects on this topic. For instance, could the ideas on inducing semantic relations (see below) be extended to the notion of event detection? How would that differ from the approach outlined in the following papers?

James Allen, Jansen Orfan, Will de Beaumont, Choh Man Teng, Lucian Galescu and Mary Swift, Automatically Deriving Event Ontologies for a Commonsense Knowledge Base, In the Proceedings of the International Workshop on Computational Semantics, Potsdam, Germany. Ask Dr. Palmer for the paper.
Wei Lu; Dan Roth (2012), Automatic Event Extraction with Structured Preference Modeling, In the Proceedings of ACL-2012, South Korea.
Michael Roth; Anette Frank (2012), Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering, In the Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning, EMNLP-CoNLL held in conjunction with ACL 2012, South Korea

How would any of these differ from the following:

Extract a list of eventive types - verbs and nominalizations from PropBank OR from WordNet OR from FrameNet.
Do any of these resources provide useful event/subevent relations? Are there recognizable syntactic/semantic patterns in how these relationships are expressed?
Can these patterns provide a bootstrapping mechanism for discovering more such relations? (see Inducing semnatic relations again).

Syntactic Parsing

We have a syntactic parser that has been trained on DARPA data. We have new fragmentary medical data it has been retrained on, and we would like to know if it is handling the sentence fragments correctly. This requires identifying the fragmentary sentence in the corpus, comparing the performance before and after retraining on the fragmentary training data, and doing error analysis on the sentence fragments that are not parsed correctly. This would be an excellent interdisciplinary project, with linguists and cs students.

Project Ideas based on Papers such as the following: - MP, Sentiment

Humphreys et al, 1997,Fifth Workshop on Very Large Corpora, held in conjunction with ACL 1997, Event Coreference for Information Extraction,
Oliver Culo, Katrin Erk, Sebastian Pado and Sabine Schulte im Walde. Comparing and Combining Semantic Verb Classifications. Language Resources and Evaluation 42(3),2008
Stephan Greene and Philip Resnik, 2009, More Than Words: Syntactic Packaging and Implicit Sentiment, In the Proceedings of NAACL 2009, Boulder, CO.

Acquiring Semantic Class Preferences from Corpora - MG, MO

We have just parsed and done SRL on Gigaword. We can add VN class tags, so we have the data to investigate how well some of these techniques work, and what their strengths and weaknesses are. Again, something that would really benefit from a team of linguists and computer scientists. It would also be interesting to compare these approaches with the Lexical Substitutability paper.

Beqat Zapirain; Eneko Agirre; Lluis Marquez,
Generalizing over Lexical Features: Selectional Preferences for Semantic Role Classification ACL-09
Mihai Surdeanu, Lluis Marquez, Xavier Carreras, and Pere R. Comas.
Combination Strategies for Semantic Role Labeling. Journal of Artificial Intelligence Research 29 (2007).
Benat Zapirain, Eneko Agirre, Lluis Marquez and Mihai Surdeanu, Selectional Preferences for Semantic Role Classification. Computational Linguistics 39(3), 2013.
Pavel Rychly and Adam Kilgarriff, 2007, An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments), In the Proceedings of the 45th Annual Meeting of the ACL: Interactive Poster and Demonstration Sessions Pages 41-44, Prague, the Czech Republic.

Exploring the Argument mismatches between PropBank and VerbNet - YA, Arabic PB and VN

We have a database that maps between PropBank Frame File entries and VerbNet thematic grids. Sometimes an argument on one side will have no mapping on the other side. Are these usually adjuncts, or is there another explanation? This could be for a language other than English, such as Arabic, if there is already an Arabic PropBank and an Arabic VerbNet.

Evaluating the contribution SRL makes to an application/Extensions to SRL - TL, Chinese VerbNet

This could include evaluating the contribution SRL makes to any NLP application, or developing PropBanks or VerbNets or FrameNets for other languages such as Chinese or Arabic.

Dan Shen; Mirella Lapata, Using Semantic Roles to Improve Question Answering EMNLP-07
M. Gerber, J. Chai, and A. Meyers (2009). The Role of Implicit Argumentation in Nominal SRL, NAACL HLT 2009.
Anders Bjvrkelund; Love Hafdell; Pierre Nugues, Multilingual Semantic Role Labeling, CoNLL-09
Wanxiang Che; Zhenghua Li; Yongqiang Li; Yuhang Guo; Bing Qin; Ting Liu, 2009, Multilingual Dependency-based Syntactic and Semantic Parsing In the Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, held in conjunction with EACL 2009.

Word Sense Disambiguation, extended to include NER - JG & AJ (Clinical NER), MB

A topic in this area could range from an implementation to an in-depth analysis of different approaches and their strengths and weaknesses.

Zhi Zhong; Hwee Tou Ng; Yee Seng Chan, 2008, Word Sense Disambiguation Using OntoNotes: An Empirical Study EMNLP-08
Navigli, R., 2006, Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance, In the Proceedings of ACL2006, Sydney, Australia.
Navigli, R., 2009, Word Sense Disambiguation: A Survey ACM Computing Surveys, 41(2), ACM Press, pp. 1-69.
Daniel M. Bikel, Richard Schwartz and Ralph M. Weischedel. 1999. An Algorithm that Learns What's in a Name, In the Special Issue on Natural Language Learning, Machine Learning, 34, 1-3. AJ
Design Challenges and Misconceptions in Named Entity Recognition L. Ratinov and D. Roth CoNLL - 2009

Topic Modeling - TS w/ Jim Martin and EPIC group

Extensions to

Gloria Mark, Mossaab Bagdouri, Leysia Palen, James Martin, Ban Al-Ani1, Kenneth Anderson, 2012, Blogs as a Collective War Diary, In the Proceedings of CSCW 2012, Seattle,WASH. Ask me for the paper

Induction of Semantic Relations - DB, KS, TO, JP

Just about anything related to one of these papers or to NELL.

Rion Snow, Daniel Jurafsky and Andrew Y. Ng, Semantic Taxonomy Induction from Heterogenous Evidence, pp 801-808, ACL-06
Dmitry Davidov; Ari Rappoport,
Classification of Semantic Relationships between Nominals Using Pattern Clusters
Lili Kotlerman; Ido Dagan; Idan Szpektor; Maayan Zhitomirsky-Geffet, 2009, Directional Distributional Similarity for Lexical Expansion In the Proceedings of ACL-09
P. D. Turney and P. Pantel (2010) "From Frequency to Meaning: Vector Space Models of Semantics" , Journal of Artificial Intelligence Research, Volume 37, pages 141-188.

Topic of Your Choice - SL, rich dependency labels for Korean Dependency Structure parses

I'm open to suggestion. Find a couple of papers you are interested in and we can talk about how to turn that into a project.