The JDPA Corpus consists of user-generated content (blog posts) containing opinions about automobiles and digital cameras. They have been manually annotated for named, nominal, and pronominal mentions of entities. Entities are marked with the aggregate sentiment expressed toward them in the document. Mentions of each entity are marked as co-referential. Mentions are assigned semantic types consisting of the Automatic Content Extraction (ACE) mention types and additional domain-specific types. Meronymy (part-of and feature-of) and instance relations are also annotated. Expressions which convey sentiment toward an entity are annotated with the polarity of their prior and contextual sentiments as well the mentions they target. The following modifiers are annotated. These may target other modifiers or sentiment expressions
Additionally, we have annotated when the opinion holder of a sentiment expression is someone other than the author of the blog by linking the expression to the holder. We also annotate when two entities are compared on a particular dimension.
The data, organized into training and testing sets, consists of 515 documents (blog posts) covering 330,762 tokens which make up 19,322 sentences. 87,532 mentions, 15,637 sentiment expressions, and 22,662 relations between entities (co-reference groups) are annotated.
Please see the included README file for more information about this data. For a more detailed explanation of the preparation of the corpus, please read The JDPA Sentiment Corpus Annotation Guidelines or The ICWSM 2010 JDPA Sentiment Corpus for the Automotive Domain.
Licensing the Corpus
The Corpus is free to use for academic research after agreeing to the licensing agreement. Please send a signed copy of the agreement to:
Attention: Alan Dale The Center for Computational Language and Education Research Campus Box 594 Boulder, Colorado 80309-0594
Citing the J.D. Power and Associates Sentiment Corpus
Jason S. Kessler, Miriam Eckert, Lyndsie Clark, and Nicolas Nicolov. The ICWSM 2010 JDPA Sentiment Corpus for the Automotive Domain. In the 4th International AAAI Conference on Weblogs and Social Media Data Challenge Workshop (ICWSM-DCW 2010), 2010. Washington, D.C.
@inproceedings{KesslerEtAl2010, author = {Jason S. Kessler and Miriam Eckert and Lyndsie Clark and Nicolas Nicolov}, title = {The 2010 ICWSM JDPA Sentment Corpus for the Automotive Domain}, booktitle = {4th International AAAI Conference on Weblogs and Social Media Data Workshop Challenge (ICWSM-DWC 2010)}, year = {2010}, url = {http://www.cs.indiana.edu/\~{}jaskessl/icwsm10.pdf} }
Download the J.D. Power and Associates Sentiment Corpus:
The JDPA Sentiment Corpus (June 2011) - 16mb .tar.gz fileSoftware for Reading the Corpus
A scala/java libary exsits for reading the corpus and can be obtained from: git@github.com:gibrown/jdpacorpus-lib.git
For questions, send email to Greg Brown (gregory.p.browncolorado.edu)
Random Document Splits
Greg Brown's Entity Relations research randomly divided the car and camera documents each into 10 separate datasets. Splits 0-6 were used for training, split 7 for development, and splits 8 and 9 were used for testing.
Research Using the JDPA Sentiment Corpus
Please send email if you perform additional research that we can cite.
Questions, comments or concerns?
Contact ICWSM.JDPA.Corpusgmail.com and/or Greg Brown at the University of Colorado (gregory.p.browncolorado.edu)