The J.D. Power and Associates Sentiment Corpus

The JDPA Corpus consists of user-generated content (blog posts) containing opinions about automobiles and digital cameras. They have been manually annotated for named, nominal, and pronominal mentions of entities. Entities are marked with the aggregate sentiment expressed toward them in the document. Mentions of each entity are marked as co-referential. Mentions are assigned semantic types consisting of the Automatic Content Extraction (ACE) mention types and additional domain-specific types. Meronymy (part-of and feature-of) and instance relations are also annotated. Expressions which convey sentiment toward an entity are annotated with the polarity of their prior and contextual sentiments as well the mentions they target. The following modifiers are annotated. These may target other modifiers or sentiment expressions

negators (expressions which invert the polarity of a sentiment expression or modifier)
neutralizers (expressions that do not commit the the speaker to the truth of the target sentiment expression or modifier)
committers (expressions which shift the commitment of the speaker toward the truth a sentiment expression or modifier)
intensifiers (expressions which shift the intensity of a sentiment expression or modifier)

Additionally, we have annotated when the opinion holder of a sentiment expression is someone other than the author of the blog by linking the expression to the holder. We also annotate when two entities are compared on a particular dimension.

The data, organized into training and testing sets, consists of 515 documents (blog posts) covering 330,762 tokens which make up 19,322 sentences. 87,532 mentions, 15,637 sentiment expressions, and 22,662 relations between entities (co-reference groups) are annotated.

Please see the included README file for more information about this data. For a more detailed explanation of the preparation of the corpus, please read The JDPA Sentiment Corpus Annotation Guidelines or The ICWSM 2010 JDPA Sentiment Corpus for the Automotive Domain.

Licensing the Corpus

The Corpus is free to use for academic research after agreeing to the licensing agreement. Please send a signed copy of the agreement to:

Attention: Alan Dale
The Center for Computational Language and Education Research
Campus Box 594
Boulder, Colorado 80309-0594

Citing the J.D. Power and Associates Sentiment Corpus

Jason S. Kessler, Miriam Eckert, Lyndsie Clark, and Nicolas Nicolov. The ICWSM 2010 JDPA Sentiment Corpus for the Automotive Domain. In the 4th International AAAI Conference on Weblogs and Social Media Data Challenge Workshop (ICWSM-DCW 2010), 2010. Washington, D.C.

@inproceedings{KesslerEtAl2010,
  author = {Jason S. Kessler and Miriam Eckert and Lyndsie Clark and Nicolas Nicolov},
  title = {The 2010 ICWSM JDPA Sentment Corpus for the Automotive Domain},
  booktitle = {4th International AAAI Conference on Weblogs and Social Media Data Workshop Challenge (ICWSM-DWC 2010)},
  year = {2010},
  url = {http://www.cs.indiana.edu/\~{}jaskessl/icwsm10.pdf}
}

Download the J.D. Power and Associates Sentiment Corpus:

The JDPA Sentiment Corpus (June 2011) - 16mb .tar.gz file

Software for Reading the Corpus

A scala/java libary exsits for reading the corpus and can be obtained from: git@github.com:gibrown/jdpacorpus-lib.git

For questions, send email to Greg Brown (gregory.p.browncolorado.edu)

Random Document Splits

Greg Brown's Entity Relations research randomly divided the car and camera documents each into 10 separate datasets. Splits 0-6 were used for training, split 7 for development, and splits 8 and 9 were used for testing.

Research Using the JDPA Sentiment Corpus

Jason S. Kessler and Nicolas Nicolov. Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations. 3rd International AAAI Conference on Weblogs and Social Media (ICWSM 2009), 2009. San Jose, California.
Jason S. Kessler, Miriam Eckert, Lyndsay Clark, and Nicolas Nicolov. The ICWSM 2010 JDPA Sentiment Corpus for the Automotive Domain 4th International AAAI Conference on Weblogs and Social Media Data Challenge Workshop (ICWSM-DCW 2010), 2010. Washington, D.C.
Gregory Ichenuemon Brown. An Error Analysis of Relation Detection in Social Media Documents. Student Session, Proceedings of the Association for Computational Linguistics 49th annual meeting (ACL-2011), Portland, OR, 2011
Gregory Ichenuemon Brown. Relation Extraction on the JD Power and Associates Sentiment Corpus. Master's Thesis. Department of Computer Science, University of Colorado at Boulder, 2011.

Please send email if you perform additional research that we can cite.

Questions, comments or concerns?

Contact ICWSM.JDPA.Corpusgmail.com and/or Greg Brown at the University of Colorado (gregory.p.browncolorado.edu)