To access the discs in the LDC library, contact Ghazaleh Kazeminejad.

Note: You need to have a "verbs" or "babel" account to access the corpora that are on the verbs server.



List of all corpora:

1996 English Broadcast News Speech (HUB4)

Catalog ID: LDC97S44

1996 English Broadcast News Transcripts (HUB4)

Catalog ID: LDC97T22

1996-2008 NIST Speaker Recognition Evaluation Data Collection

Catalog ID: LDC2009E100

1997 English Broadcast News Transcripts (HUB4)

Catalog ID: LDC98T28

1997 HUB4 Broadcast News Evaluation Non-English Test Material

Catalog ID: LDC2001S91

1997 HUB4 English Evaluation Speech and Transcripts

Catalog ID: LDC2002S11

1997 HUB5 Arabic Evaluation

Catalog ID: LDC2002S22

1997 HUB5 Arabic Transcripts

Catalog ID: LDC2002T39

1997 HUB5 English Evaluation

Catalog ID: LDC2002S23

1997 HUB5 German Evaluation

Catalog ID: LDC2002S24

1997 HUB5 German Transcripts

Catalog ID: LDC2003T03

1997 HUB5 Spanish Evaluation

Catalog ID: LDC2002S25

1997 HUB5 Spanish Transcripts

Catalog ID: LDC2003T04

1997 Mandarin Broadcast News Speech (HUB4-NE)

Catalog ID: LDC98S73

1997 Spanish Broadcast News Transcripts (HUB4-NE)

Catalog ID: LDC98T29

1998 HUB4 Broadcast News Evaluation English Test Material

Catalog ID: LDC2000S86

1998 HUB5 English Evaluation

Catalog ID: LDC2002S10

1998 HUB5 English Transcripts

Catalog ID: LDC2003T02

1999 HUB4 Broadcast News Evaluation English Test Material

Catalog ID: LDC2000S88

2000 Communicator Evaluation

Catalog ID: LDC2002S56

2000 HUB5 English Evaluation Speech

Catalog ID: LDC2002S09

2000 HUB5 English Evaluation Transcripts

Catalog ID: LDC2002T43

2000 NIST Speaker Recognition Evaluation

Catalog ID: LDC2001S97

2001 Communicator Evaluation

Catalog ID: LDC2003S01

2001 HUB5 English Evaluation

Catalog ID: LDC2002S13

2001 HUB5 Mandarin Evaluation

Catalog ID: LDC2002S12

2001 HUB5 Mandarin Transcripts

Catalog ID: LDC2003T01

2001 NIST Speaker Recognition Evaluation Corpus

Catalog ID: LDC2002S34

2002 Rich Transcription Broadcast News and Conversational Telephone Speech

Catalog ID: LDC2004S11

2009 CoNLL Shared Task Part 1

Catalog ID: LDC2012T03

2009 CoNLL Shared Task Part 2

Catalog ID: LDC2012T04

8 years worth of summary/article sets collected via Newsblaster

Catalog ID: LDC2012E80

ACE 2004 Evaluation Corpus

Catalog ID: LDC2004E51

ACE 2004 Multilingual Training Corpus

Catalog ID: LDC2005T09

ACE 2004 Pilot Corpus V1.3

Catalog ID: LDC2004E03

ACE 2005 Multilingual Training Data V6.0

Catalog ID: LDC2005E18

ACE-2 Version 1.0

Catalog ID: LDC2003T11

ACL Multilingual Corpus 1

Catalog ID: 1006

AIDA 1.2 : Automatic Identification of Dialectal Arabic

Catalog ID: LDC2012E56

AQUAINT CrossLingual QA Arabic Newswire Corpus

Catalog ID: LDC2004E49

ATIS3 Test Data

Catalog ID: LDC95S26

ATIS3 Training Data

Catalog ID: LDC94S19

Abstract Meaning Representation (AMR) Annotation Release 1.0

Catalog ID: LDC2014T12

American English Nickname Collection

Catalog ID: LDC2012T11

American English Spoken Lexicon

Catalog ID: LDC99L23

American National Corpus (ANC) Second Release

Catalog ID: LDC2005T35

Annotated English Gigaword

Catalog ID: LDC2012T21

Arabic Gigaword Third Edition

Catalog ID: LDC2007T40

Arabic Newswire Part 1

Catalog ID: LDC2001T55

Arabic Treebank - Broadcast News v1.0

Catalog ID: LDC2012T07

Arabic Treebank ARZ Part 1, V1.0

Catalog ID: LDC2012E28

Arabic Treebank Part 20 V1.0 - BOLT Pilot ARZ Email

Catalog ID: LDC2012E25

Arabic Treebank: Part 1 - 10K-word English Translation

Catalog ID: LDC2003T07

Arabic Treebank: Part 1 v 2.0

Catalog ID: LDC2003T06

Arabic Treebank: Part 3 v 3.2

Catalog ID: LDC2010T08

BBN Pronoun Coreference and Entity Type Corpus

Catalog ID: LDC2005T33

BBN/LDC WebForum Selections Arabic/English Parallel Corpus

Catalog ID: LDC2012E75

BBN/LDC WebForum Selections Chinese/English Parallel Corpus

Catalog ID: LDC2012E76

BBN/LDC/Sakhr Arabic-Dialect/English Parallel Corpus

Catalog ID: LDC2012E17

BLLIP 1987-89 WSJ Corpus Release 1

Catalog ID: LDC2000T43

BOLT - Phase 1 Discussion Forums Source Data R1 V2

Catalog ID: LDC2012E04

BOLT - Phase 1 Discussion Forums Source Data R2

Catalog ID: LDC2012E16

BOLT - Phase 1 Discussion Forums Source Data R3

Catalog ID: LDC2012E21

BOLT - Phase 1 Rejected Training Data Thread IDs

Catalog ID: LDC2012E62

BOLT - Phase 1 Translation Samples V2

Catalog ID: LDC2012E11

BOLT LRL Hausa Representative Language Pack V1.2

Catalog ID: LDC2015E70

BOLT LRL Turkish Representative Language Pack V2.2

Catalog ID: LDC2014E115

BOLT LRL Uzbek Representative Language Pack

Catalog ID: LDC2016E29

BOLT Phase 1 - Arabic Treebank ARZ Part 2, V1.0

Catalog ID: LDC2012E88

BOLT Phase 1 - Chinese Parallel Word Alignment and Tagging Part 3

Catalog ID: LDC2012E95

BOLT Phase 1 - English Treebank BOLT WB Part 2, V 1.0

Catalog ID: LDC2012E97

BOLT Phase 1 Chinese Parallel Word Alignment and Tagging DF Part 4

Catalog ID: LDC2013E02

BOLT Phase 1 Chinese Parallel Word Alignment and Tagging Part 1

Catalog ID: LDC2012E24

BOLT Phase 1 Chinese Parallel Word Alignment and Tagging Part 2

Catalog ID: LDC2012E72

BOLT Phase 1 Chinese Propbank DF Part 1

Catalog ID: LDC2012E121

BOLT Phase 1 Chinese Propbank DF Part 2

Catalog ID: LDC2012E131

BOLT Phase 1 Chinese Treebank DF Part 1

Catalog ID: LDC2012E109

BOLT Phase 1 Chinese Treebank DF Part 2

Catalog ID: LDC2012E120

BOLT Phase 1 Chinese Treebank DF Part 3

Catalog ID: LDC2012E130

BOLT Phase 1 DevTest Source and Translation V4

Catalog ID: LDC2012E30

BOLT Phase 1 Egyptian Arabic Parallel Word Alignment DF

Catalog ID: LDC2013E01

BOLT Phase 1 Egyptian Arabic Parallel Word Alignment DF Part 2 v2

Catalog ID: LDC2012E94

BOLT Phase 1 Egyptian Arabic Parallel Word Alignment Part 1 V2

Catalog ID: LDC2012E51

BOLT Phase 1 Egyptian Arabic Propbank DF Part 1

Catalog ID: LDC2012E122

BOLT Phase 1 Egyptian Arabic Propbank DF Part 2

Catalog ID: LDC2012E129

BOLT Phase 1 Egyptian Arabic Treebank DF Part 1 V2.0

Catalog ID: LDC2012E93

BOLT Phase 1 Egyptian Arabic Treebank DF Part 2 V2.0

Catalog ID: LDC2012E98

BOLT Phase 1 Egyptian Arabic Treebank DF Part 3 V2.0

Catalog ID: LDC2012E89

BOLT Phase 1 Egyptian Arabic Treebank DF Part 4 V2.0

Catalog ID: LDC2012E99

BOLT Phase 1 Egyptian Arabic Treebank DF Part 5 V2.0

Catalog ID: LDC2012E107

BOLT Phase 1 Egyptian Arabic Treebank DF Part 6 V2.0

Catalog ID: LDC2012E125

BOLT Phase 1 Egyptian Arabic Treebank DF Part 7 V1.0

Catalog ID: LDC2013E12

BOLT Phase 1 English Propbank DF Part 1

Catalog ID: LDC2012E123

BOLT Phase 1 English Propbank DF Part 2

Catalog ID: LDC2012E128

BOLT Phase 1 English Propbank DF Part 3

Catalog ID: LDC2013E05

BOLT Phase 1 English Treebank DF Part 1 V1.0

Catalog ID: LDC2012E92

BOLT Phase 1 English Treebank DF Part 3 V1.0

Catalog ID: LDC2012E114

BOLT Phase 1 English Treebank DF Part 4 V1.0

Catalog ID: LDC2013E17

BOLT Phase 1 HTER Experiment Source and Reference Translation

Catalog ID: LDC2012E18

BOLT Phase 1 IR Eval Assessment Results V1.1

Catalog ID: LDC2012E118

BOLT Phase 1 IR Eval Source Data Document List

Catalog ID: LDC2012E82

BOLT Phase 1 Translation Training Data R1

Catalog ID: LDC2012E15

BOLT Phase 1 Translation Training Data R2

Catalog ID: LDC2012E19

BOLT Phase 1 Translation Training Data R3

Catalog ID: LDC2012E55

BOLT Phase 1 Translation Training Data R4

Catalog ID: LDC2012E81

BOLT Phase 1 Translation Training Data R5

Catalog ID: LDC2012E96

BOLT Phase 1 Translation Training Data R6

Catalog ID: LDC2012E124

BOLT Phase 2 English Treebank SMS/Chat Part 1

Catalog ID: LDC2013E127

BOLT Phase 2 IR Source Data Document List and Sample Query

Catalog ID: LDC2013E08

BOLT Phase 2 SMS and Chat Sample Source Data

Catalog ID: LDC2013E10

Boston University Radio Speech Corpus

Catalog ID: LDC96S36

Boulder Coercion Corpus

Catalog ID: Other_8

British National Corpus Parses and BNC

Catalog ID: 1000

Brown Corpus (treebanked)

Catalog ID: Other_7

Buckwalter Arabic Morphological Analyzer

Catalog ID: LDC2004L02

CALIMA 0.3: Columbia Arabic Language Morphological Analyzer -- Egyptian Arabic

Catalog ID: LDC2012E57

CALLFRIEND American English-Non-Southern Dialect

Catalog ID: LDC96S46

CALLFRIEND American English-Southern Dialect

Catalog ID: LDC96S47

CALLFRIEND Canadian French

Catalog ID: LDC96S48

CALLFRIEND Farsi

Catalog ID: LDC96S50

CALLFRIEND German

Catalog ID: LDC96S51

CALLFRIEND Hindi

Catalog ID: LDC96S52

CALLFRIEND Japanese

Catalog ID: LDC96S53

CALLFRIEND Korean

Catalog ID: LDC96S54

CALLFRIEND Mandarin Chinese-Mainland Dialect

Catalog ID: LDC96S55

CALLFRIEND Mandarin Chinese-Taiwan Dialect

Catalog ID: LDC96S56

CALLFRIEND Spanish-Caribbean Dialect

Catalog ID: LDC96S57

CALLFRIEND Spanish-Caribbean Dialect

Catalog ID: LDC96S57

CALLFRIEND Tamil

Catalog ID: LDC96S59

CALLFRIEND Vietnamese

Catalog ID: LDC96S60

CALLHOME American English Lexicon (PRONLEX)

Catalog ID: LDC97L20

CALLHOME American English Speech

Catalog ID: LDC97S42

CALLHOME American English Transcripts

Catalog ID: LDC97T14

CALLHOME Egyptian Arabic Speech Supplement

Catalog ID: LDC2002S37

CALLHOME Egyptian Arabic Transcripts

Catalog ID: LDC97T19

CALLHOME Egyptian Arabic Transcripts Supplement

Catalog ID: LDC2002T38

CALLHOME German Lexicon

Catalog ID: LDC97L18

CALLHOME German Speech

Catalog ID: LDC97S43

CALLHOME German Transcripts

Catalog ID: LDC97T15

CALLHOME Japanese Lexicon

Catalog ID: LDC96L17

CALLHOME Japanese Speech

Catalog ID: LDC96S37

CALLHOME Japanese Transcripts

Catalog ID: LDC96T18

CALLHOME Mandarin Chinese Lexicon

Catalog ID: LDC96L15

CALLHOME Mandarin Chinese Speech

Catalog ID: LDC96S34

CALLHOME Mandarin Chinese Transcripts

Catalog ID: LDC96T16

CALLHOME Spanish Dialogue Act Annotation

Catalog ID: LDC2001T61

CALLHOME Spanish Lexicon

Catalog ID: LDC96L16

CALLHOME Spanish Speech

Catalog ID: LDC96S35

CALLHOME Spanish Transcripts

Catalog ID: LDC96T17

CELEX2

Catalog ID: LDC96L14

CETEMpublico

Catalog ID: LDC2001T62

CODAFY 0.1: Automatic mapper into the Conventional Orthography of Dialectal Arabic

Catalog ID: LDC2012E58

COMLEX English Syntax Lexicon

Catalog ID: LDC96L6

COMLEX Pronouncing Dictionary

Catalog ID: LDC96L7

COMLEX Syntax Text Corpus Version 2.0

Catalog ID: LDC96T11

CSLU: Kids` Speech Version 1.1

Catalog ID: LDC2007S18

CSLU: Spelled and Spoken Words

Catalog ID: LDC2006S15

CSLU: Spoltech Brazilian Portuguese Version 1.0

Catalog ID: LDC2006S16

CSLU: Stories v 1.2

Catalog ID: LDC2006S14

CSR-I (WSJ0) Complete

Catalog ID: LDC93S6A

CSR-IV HUB4

Catalog ID: LDC96S31

Childes Corpus 1996

Catalog ID: 1001

Childes Corpus 1998

Catalog ID: 1002

Chinese <-> English Name Entity Lists v 1.0

Catalog ID: LDC2005T34

Chinese English News Magazine Parallel Text

Catalog ID: LDC2005T10

Chinese Gigaword

Catalog ID: LDC2003T09

Chinese Gigaword Fifth Edition

Catalog ID: LDC2011T13

Chinese Gigaword Second Edition

Catalog ID: LDC2005T14

Chinese Proposition Bank 2.0

Catalog ID: LDC2008T07

Chinese Treebank 2.0

Catalog ID: LDC2001T11

Chinese Treebank 4.0

Catalog ID: LDC2004T05

Chinese Treebank 5.0

Catalog ID: LDC2005T01

Chinese Treebank 5.1

Catalog ID: LDC2005T01U01

Chinese Treebank 6.0

Catalog ID: LDC2007T36

Chinese Treebank 7.0

Catalog ID: LDC2010T07

Chinese Treebank 8.0

Catalog ID: LDC2013T21

Chinese Treebank Final Release

Catalog ID: LDC2000T48

Chinese idiom translation dictionary + word segmenter dictionary - web resources

Catalog ID: LDC2012E78

Chinese-English Translation Lexicon Version 3.0

Catalog ID: LDC2002L27

CoNNL 2008 Shared Task Development Set

Catalog ID: LDC2008E33

CoNNL 2008 Shared Task Test Set

Catalog ID: LDC2008E34

CoNNL 2008 Shared Task Training Set

Catalog ID: LDC2008E32

CoNNL 2008 Shared Task Trial Data Set

Catalog ID: LDC2008E31

CoNNL 2009 Shared Task Chinese Test Set

Catalog ID: LDC2009E37

CoNNL 2009 Shared Task Chinese Training Set

Catalog ID: LDC2009E38

CoNNL 2009 Shared Task Chinese Trial Data Set

Catalog ID: LDC2009E36D

CoRD | The London-Lund Corpus of Spoken English

Catalog ID: other_1234

Corpus Search

Catalog ID: 1003

DEFT ERE Cross-Doc Event Coreference Training Data Annotation

Catalog ID: LDC2017E24

DEFT ERE English Discussion Forum Annotation V3

Catalog ID: LDC2014E31

DEFT English Belief and Sentiment Annotation

Catalog ID: LDC2016E27

DEFT Event Sequencing After-Link And Parent-Child Annotation Training Data

Catalog ID: LDC2016E130

DEFT Event Sequencing Pilot Evaluation Source Data

Catalog ID: LDC2017E08

DEFT Phase 1 AMR Annotation R4

Catalog ID: LDC2014E41

DEFT Phase 1 ERE Annotation R3 V2

Catalog ID: LDC2013E64

DEFT Phase 1 Narrative Text Source Data R1

Catalog ID: LDC2013E19

DEFT Phase 2 AMR Annotation R1

Catalog ID: LDC2015E86

DEFT Phase 2 AMR Annotation R2

Catalog ID: LDC2016E25

DEFT Phase 2 AMR Exploratory Source Data

Catalog ID: LDC2014R46

DEFT Phase 2 AMR Selected Segmented DF Source Data V2.0

Catalog ID: LDC2015R11

DEFT Rich ERE English Training Annotation R2 V2

Catalog ID: LDC2015E68

DSO Corpus of Sense-Tagged English

Catalog ID: LDC97T12

ECI Multilingual Text

Catalog ID: LDC94T5

Emotional Prosody Speech and Transcripts

Catalog ID: LDC2002S28

English Gigaword

Catalog ID: LDC2003T05

English Gigaword Fifth Edition

Catalog ID: LDC2011T07

English Gigaword Second Edition

Catalog ID: LDC2005T12

English News Text Treebank: Penn Treebank Revised

Catalog ID: LDC2015T13

English Translation Treebank: An-Nahar Newswire

Catalog ID: LDC2012T02

English Web Treebank

Catalog ID: LDC2012T13

Entropic Speech Technology

Catalog ID: 1005

European Language Newspaper Text

Catalog ID: LDC95T11

FactBank 1.0

Catalog ID: LDC2009T23

Fisher English Training Part 2, Speech

Catalog ID: LDC2005S13

Fisher English Training Part 2, Transcripts

Catalog ID: LDC2005T19

Fisher English Training Speech Part 1 Speech

Catalog ID: LDC2004S13

Fisher English Training Speech Part 1 Transcripts

Catalog ID: LDC2004T19

GALE Arabic-English Parallel Aligned Treebank -- Newswire

Catalog ID: LDC2013T10

GALE Kickoff Release - Arabic Names Extracted from ACE V1.0

Catalog ID: LDC2005E66

GALE Kickoff Release - Arabic Names Extracted from ATB V1.0

Catalog ID: LDC2005E68

GALE Kickoff Release - Broadcast Conversation Audio V1.0

Catalog ID: LDC2005E61

GALE Kickoff Release - Broadcast Conversation Transcripts V1.0

Catalog ID: LDC2005E63

GALE Kickoff Release - Broadcast News Audio V1.0

Catalog ID: LDC2005E62

GALE Kickoff Release - English-Arabic Parallel Treebank V1.0

Catalog ID: LDC2005E69

GALE Kickoff Release - VOA Arabic Broadcast News Audio

Catalog ID: LDC2005E60

GALE Kickoff Release - VOA Arabic Broadcast News Transcripts

Catalog ID: LDC2005E71

GALE Kickoff Release 2 - English CTS Treebank with Structural Metadata

Catalog ID: LDC2005E79

GALE Kickoff Release 2 -- Levantine Arabic CTS Audio

Catalog ID: LDC2005E76

GALE Kickoff Release 2 -- Levantine Arabic CTS Transcripts

Catalog ID: LDC2005E77

GALE Kickoff Release 2 -- Levantine Arabic CTS Treebank

Catalog ID: LDC2005E78

GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1

Catalog ID: LDC2009T02

GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2

Catalog ID: LDC2009T06

GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1

Catalog ID: LDC2007T23

GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2

Catalog ID: LDC2008T08

GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3

Catalog ID: LDC2008T18

GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1

Catalog ID: LDC2009T15

GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2

Catalog ID: LDC2010T03

GALE Phase 2 Distillation - Training V5.0

Catalog ID: LDC2007E13

GALE Phase 2 Release 1 - Transcripts

Catalog ID: LDC2007E05

GALE Phase 2 Release 1 - Translations

Catalog ID: LDC2007E06

GALE Phase 2 Release 1 - Web Text

Catalog ID: LDC2007E04

GALE Phase 2 Release 2 - Transcripts

Catalog ID: LDC2007E45

GALE Phase 2 Release 2 - Translations

Catalog ID: LDC2007E46

GALE Phase 2 Release 3 - Transcripts

Catalog ID: LDC2007E86

GALE Phase 2 Release 3 - Translations

Catalog ID: LDC2007E87

GALE Phase 3 - MTPlus Pilot

Catalog ID: LDC2008E42

GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 1

Catalog ID: LDC2014T28

GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 2

Catalog ID: LDC2015T09

GALE Phase 3 DevTest - Broadcast Audio

Catalog ID: LDC2007E60

GALE Phase 3 Release 1 - Distillation V1.1

Catalog ID: LDC2007E104

GALE Phase 3 Release 1 - English Translation Treebank

Catalog ID: LDC2007E105

GALE Phase 3 Release 1 - Found Parallel Text

Catalog ID: LDC2007E103

GALE Phase 3 Release 1 - Transcripts

Catalog ID: LDC2007E100

GALE Phase 3 Release 1 - Translations

Catalog ID: LDC2007E101

GALE Phase 3 Release 1 - Web Text V 1.0

Catalog ID: LDC2007E102

GALE Phase 3 Release 2 - Broadcast Audio

Catalog ID: LDC2008E38

GALE Phase 3 Release 2 - Transcripts

Catalog ID: LDC2008E39

GALE Phase 3 Release 2 - Translations

Catalog ID: LDC2008E40

GALE Phase 3 Release 2 - Web Text

Catalog ID: LDC2008E41

GALE Phase 3 and 4 Eval Superset

Catalog ID: LDC2011E50

GALE Phase 4 Arabic Parallel Aligned Treebank Part 1 V1.2

Catalog ID: LDC2009E82

GALE Phase 4 Chinese Parallel Word Alignment and Tagging Part 1 V1.1

Catalog ID: LDC2009E83

GALE Phase 4 Release 1 - Transcripts V1.0

Catalog ID: LDC2008E55

GALE Phase 4 Release 1 - Translations V2.0

Catalog ID: LDC2008E56

GALE Phase 4 Release 1 - Web Text V1.0

Catalog ID: LDC2008E53

GALE Phase 4 Release 2 - Transcripts

Catalog ID: LDC2009E15

GALE Phase 4 Release 2 - Translations

Catalog ID: LDC2009E16

GALE Phase 4 Release 2 - Web Text

Catalog ID: LDC2009E14

GALE Phase 4 Release 3 - Found Parallel Text

Catalog ID: LDC2009E105

GALE Phase 4 Release 3 - Transcripts

Catalog ID: LDC2009E94

GALE Phase 4 Release 3 - Translations V1.2

Catalog ID: LDC2009E95

GALE Phase 4 Release 3 - Web Text

Catalog ID: LDC2009E93

GALE Phase 5 Eval Source Transcripts and Translation

Catalog ID: LDC2011E21

GALE Phase 5 Eval Superset Source Transcripts and Translation

Catalog ID: LDC2011E25

GALE Phase 5 Levantine Arabic Dialect Judgments and Translations

Catalog ID: LDC2010E79

GALE Y1 - Arabic English Parallel News Text

Catalog ID: LDC2006E25

GALE Y1 - BBN Iraqi Broadcast Conversation Corpus

Catalog ID: LDC2006G07

GALE Y1 - Distillation Blind Evaluation Audio Part A

Catalog ID: LDC2006E46_A

GALE Y1 - Distillation Blind Evaluation Audio Part B

Catalog ID: LDC2006E46_B

GALE Y1 - Distillation Blind Evaluation Audio Part C

Catalog ID: LDC2006E46_C

GALE Y1 - Distillation Blind Evaluation Audio Part D

Catalog ID: LDC2006E46_D

GALE Y1 - Distillation Blind Evaluation Audio Part E

Catalog ID: LDC2006E46_E

GALE Y1 - Distillation Blind Evaluation Newswire

Catalog ID: LDC2006E45

GALE Y1 - Distillation Evaluation Audio

Catalog ID: LDC2006E21

GALE Y1 - Distillation Evaluation Newswire

Catalog ID: LDC2006E22

GALE Y1 - English Chinese Parallel Financial News

Catalog ID: LDC2006E26

GALE Y1 - Interim Release: Transcripts

Catalog ID: LDC2006E23

GALE Y1 - Interim Release: Translations

Catalog ID: LDC2006E24

GALE Y1 - Web 1T 5-gram Version 1

Catalog ID: LDC2006E88

GALE Y1 Q1 Release - Arabic Treebank v 1.0

Catalog ID: LDC2005E84

GALE Y1 Q1 Release - English Translation Treebank v 1.0

Catalog ID: LDC2005E85

GALE Y1 Q1 Release - Transcripts V1.0

Catalog ID: LDC2005E82

GALE Y1 Q1 Release - Translations V1.0

Catalog ID: LDC2005E83

GALE Y1 Q1 Release - Web Text Collection V1.0

Catalog ID: LDC2005E81

GALE Y1 Q2 Release - Arabic Treebank v 1.0

Catalog ID: LDC2006E35

GALE Y1 Q2 Release - English Translation Treebank v 1.0

Catalog ID: LDC2006E36

GALE Y1 Q2 Release - Transcripts V1.0

Catalog ID: LDC2006E33

GALE Y1 Q2 Release - Translations V2.0

Catalog ID: LDC2006E34

GALE Y1 Q2 Release - Web Text Collection V1.0

Catalog ID: LDC2006E32

GALE Y1 Q3 Release - Arabic Treebank

Catalog ID: LDC2006E87

GALE Y1 Q3 Release - English Translation Treebank

Catalog ID: LDC2006E82

GALE Y1 Q3 Release - Transcripts

Catalog ID: LDC2006E84

GALE Y1 Q3 Release - Translations

Catalog ID: LDC2006E85

GALE Y1 Q3 Release - Web Text Collection

Catalog ID: LDC2006E77

GALE Y1 Q3 Release - Word Alignment

Catalog ID: LDC2006E86

GALE Y1 Q4 Release - Arabic Treebank

Catalog ID: LDC2006E94

GALE Y1 Q4 Release - English Translation Treebank

Catalog ID: LDC2006E95

GALE Y1 Q4 Release - Transcripts

Catalog ID: LDC2006E91

GALE Y1 Q4 Release - Translations

Catalog ID: LDC2006E92

GALE Y1 Q4 Release - Web Text Collection

Catalog ID: LDC2006E90

GALE Y1 Q4 Release - Word Alignment

Catalog ID: LDC2006E93

Gigaword English Automatic Parses

Catalog ID: Other_9

Google Question Bank Update-v1.0

Catalog ID: LDC2012R121

Google Treebank Weblog Subcorpus V2.0

Catalog ID: LDC2011E71

Grassfields Bantu Fieldwork: Ngomba Tone Paradigms

Catalog ID: LDC2001S16

HUB4 Radio Broadcast News

Catalog ID: 1014

HUB5 Spanish Telephone Speech Corpus

Catalog ID: LDC98S70

Hansard French/English

Catalog ID: LDC95T20

Hong Kong Hansards Parallel Text

Catalog ID: LDC2000T50

Hong Kong Laws Parallel Text

Catalog ID: LDC2000T47

Hong Kong News Parallel Text

Catalog ID: LDC2000T46

Hong Kong Parallel Text

Catalog ID: LDC2004T08

ICSI Meeting Speech

Catalog ID: LDC2004S02

ICSI Meeting Transcripts

Catalog ID: LDC2004T04

ISCA 1 and 3

Catalog ID: 1007

ISCA Tutorial

Catalog ID: 1008

ISL Meeting Speech Part 1

Catalog ID: LDC2004S05

ISL Meeting Transcripts Part 1

Catalog ID: LDC2004T10

JURIS

Catalog ID: LDC98T32

Japanese Business News Text

Catalog ID: LDC95T8

Japanese Business News Text Supplement

Catalog ID: LDC99T34

Korean English Treebank Annotations

Catalog ID: LDC2002T26

Korean Newswire

Catalog ID: LDC2000T45

Korean Propbank

Catalog ID: LDC2006T03

Korean Telephone Conversations Lexicon

Catalog ID: LDC2003L02

Korean Telephone Conversations Speech

Catalog ID: LDC2003S03

Korean Telephone Conversations Transcripts

Catalog ID: LDC2003T08

Korean Treebank Annotations Version 2.0

Catalog ID: LDC2006T09

LCTL Urdu

Catalog ID: LDC2006E110

LLHDB

Catalog ID: LDC98S68

LORELEI Akan Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2018E07

LORELEI Amharic Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2016E87

LORELEI Arabic Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2016E89

LORELEI Bengali Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2017E60

LORELEI Farsi Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2016E93

LORELEI Hindi Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2017E62

LORELEI Hungarian Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2016E98

LORELEI Indonesian Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1

Catalog ID: LDC2017E66

LORELEI Language Independent NLP Tools

Catalog ID: LDC2016E53

LORELEI Mandarin Incident Language Pack V2

Catalog ID: LDC2016E30

LORELEI Mandarin Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2016E101

LORELEI Russian Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2016E95

LORELEI Situation Frame Exercise Annotation

Catalog ID: LDC2017E07

LORELEI Somali Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2016E91

LORELEI Spanish Representative Language Pack Translation, Annotation, Grammar, Lexicon and Tools V1.

Catalog ID: LDC2016E97

LORELEI Swahili Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2017E64

LORELEI Tagalog Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2017E68

LORELEI Tamil Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2017E70

LORELEI Thai Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2018E03

LORELEI Vietnamese Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1

Catalog ID: LDC2016E103

LORELEI Wolof Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2018E09

LORELEI Year 1 Dry Run Evaluation IL2 V1.1

Catalog ID: LDC2016E56

LORELEI Yoruba Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2016E105

LORELEI Zulu Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0

Catalog ID: LDC2018E05

Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)

Catalog ID: LDC2005S14

MADA-ARZ 0.1: Morphological Analysis and Disambiguation for Arabic (Egyptian version)

Catalog ID: LDC2012E60

MRC Psycholinguistic Database Machine Usable Dictionary

Catalog ID: other_4

Mandarin Chinese News Text

Catalog ID: LDC95T13

Matlab

Catalog ID: 1009

Message Understanding Conference (MUC) 6

Catalog ID: LDC2003T13

Message Understanding Conference (MUC) 7

Catalog ID: LDC2001T02

Multiple-Translation Arabic (MTA) Part 1

Catalog ID: LDC2003T18

Multiple-Translation Arabic (MTA) Part 2

Catalog ID: LDC2005T05

Multiple-Translation Chinese (MTC) Part 2

Catalog ID: LDC2003T17

Multiple-Translation Chinese (MTC) Part 3

Catalog ID: LDC2004T07

Multiple-Translation Chinese (MTC) Part 4

Catalog ID: LDC2006T04

Multiple-Translation Chinese Corpus

Catalog ID: LDC2002T01

NIST 2009 Open Machine Translation (OpenMT) Evaluation

Catalog ID: LDC2010T23

NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source

Catalog ID: LDC2014T02

NIST Meeting Pilot Corpus Speech

Catalog ID: LDC2004S09

NIST Meeting Pilot Corpus Transcripts and Metadata

Catalog ID: LDC2004T13

NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations

Catalog ID: LDC2010T01

NLTK

Catalog ID: 1010

NTIMIT

Catalog ID: 1011

NomBank v 1.0

Catalog ID: LDC2008T23

North American News Text Corpus

Catalog ID: LDC95T21

OntoNotes Release 5.0

Catalog ID: LDC2013T19

OntoNotes V3.0 - GALE Pre-Release

Catalog ID: LDC2009E60

Original Penn Treebank release 2

Catalog ID: 1012

Penn Discourse Treebank Version 2.0

Catalog ID: LDC2008T05

Penn Treebank release 3

Catalog ID: 1013

Portuguese Newswire Text

Catalog ID: LDC99T40

Prague Dependency Treebank 1.0

Catalog ID: LDC2001T10

PropBank frameset files (v1.7)

Catalog ID: other_10

PropBank on the Brown corpus

Catalog ID: other_11

Proposition Bank I

Catalog ID: LDC2004T14

REFLEX Bengali

Catalog ID: LDC2015E13

REFLEX Hungarian

Catalog ID: LDC2015E82

REFLEX Tagalog

Catalog ID: LDC2015E90

REFLEX Tamil

Catalog ID: LDC2015E83

REFLEX Thai

Catalog ID: LDC2015E84

REFLEX Urdu

Catalog ID: LDC2015E14

REFLEX Yoruba

Catalog ID: LDC2015E91

RST Discourse Treebank

Catalog ID: LDC2002T07

Reuters vol 1

Catalog ID: 1015

Reuters vol. 2

Catalog ID: 1016

SAID

Catalog ID: LDC2003T10

SANCL 2012 Shared Task Release 1

Catalog ID: LDC2012E43

SIGHAN Bakeoff

Catalog ID: LDC2003E16

SUSAS

Catalog ID: LDC99S78

SUSAS Transcripts

Catalog ID: LDC99T33

Santa Barbara Corpus of Spoken American English Part I

Catalog ID: LDC2000S85

Santa Barbara Corpus of Spoken American English Part II

Catalog ID: LDC2003S06

Santa Barbara Corpus of Spoken American English Part III

Catalog ID: LDC2004S10

Santa Barbara Corpus of Spoken American English Part IV

Catalog ID: LDC2005S25

SemEval-2016 Task 8 - Meaning Representation Parsing - Gold Standard AMRs

Catalog ID: LDC2016E33

Spanish Discussion Forum Source Data R1

Catalog ID: LDC2014E14

Spanish Language News Corpus

Catalog ID: 1017

Spanish Newswire Text, Volume 2

Catalog ID: LDC99T41

Speech in Noisy Environments (SPINE) Evaluation Audio

Catalog ID: LDC2000S96

Speech in Noisy Environments (SPINE) Evaluation Transcripts

Catalog ID: LDC2000T54

Speech in Noisy Environments (SPINE) Training Audio

Catalog ID: LDC2000S87

Speech in Noisy Environments (SPINE) Training Transcripts

Catalog ID: LDC2000T49

Speech in Noisy Environments (SPINE2) Part 1 Audio

Catalog ID: LDC2001S04

Speech in Noisy Environments (SPINE2) Part 1 Transcripts

Catalog ID: LDC2001T05

Speech in Noisy Environments (SPINE2) Part 2 Audio

Catalog ID: LDC2001S06

Speech in Noisy Environments (SPINE2) Part 2 Transcripts

Catalog ID: LDC2001T07

Speech in Noisy Environments (SPINE2) Part 3 Audio

Catalog ID: LDC2001S08

Speech in Noisy Environments (SPINE2) Part 3 Transcripts

Catalog ID: LDC2001T09

Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio

Catalog ID: LDC2001S99

Switchboard Cellular Part 1 Transcription

Catalog ID: LDC2001T14

Switchboard-1 Release 2

Catalog ID: LDC97S62

Switchboard-2 Phase I

Catalog ID: LDC98S75

Switchboard-2 Phase II

Catalog ID: LDC99S79

Switchboard-2 Phase III Audio

Catalog ID: LDC2002S06

Syllable-Final /s/ Lenition

Catalog ID: LDC2001T60

TAC 2009 KBP Assessment Results

Catalog ID: LDC2009E90

TAC 2009 KBP Evaluation Generic Infoboxes V2.0

Catalog ID: LDC2009E56

TAC 2009 KBP Evaluation NIL Link Assessment

Catalog ID: LDC2009E110

TAC 2009 KBP Evaluation Reference Knowledge Base

Catalog ID: LDC2009E58A

TAC 2009 KBP Evaluation Reference Knowledge Base

Catalog ID: LDC2009E58C

TAC 2009 KBP Evaluation Reference Knowledge Base

Catalog ID: LDC2009E58B

TAC 2009 KBP Evaluation Slot Filling List

Catalog ID: LDC2009E65

TAC 2010 KBP Assessment Results

Catalog ID: LDC2010E61

TAC 2010 KBP Entity Linking IAA Study Results

Catalog ID: LDC2012E31

TAC 2010 KBP Evaluation Entity Linking Gold Standard V1.0

Catalog ID: LDC2010E82

TAC 2010 KBP Evaluation Slot Filling Annotation

Catalog ID: LDC2012E32

TAC 2010 KBP Evaluation Surprise Slot Filling Annotation

Catalog ID: LDC2012E33

TAC 2010 KBP Generic Infoboxes

Catalog ID: LDC2010E24

TAC 2010 KBP Source Data

Catalog ID: LDC2010E12

TAC 2010 KBP Training Entity Linking V2.0

Catalog ID: LDC2010E31

TAC 2010 KBP Training Slot Filling Annotation V2.1

Catalog ID: LDC2010E18

TAC 2010 RTE-6 KBP Validation Pilot Development Data

Catalog ID: LDC2010E32

TAC 2011 Guided Summarization Test Data

Catalog ID: LDC2011E28

TAC 2011 Guided Summarization Test Data V1.1

Catalog ID: LDC2011E62

TAC 2011 KBP English Evaluation Diagnostic Temporal Slot Filling Queries

Catalog ID: LDC2011E85

TAC 2011 KBP English Evaluation Entity Linking Annotation

Catalog ID: LDC2012E29

TAC 2011 KBP English Evaluation Entity Linking Queries

Catalog ID: LDC2012E36

TAC 2011 KBP English Evaluation Regular Slot Filling Annotation V1.2

Catalog ID: LDC2011E89

TAC 2011 KBP English Evaluation Regular Slot Filling Queries

Catalog ID: LDC2012E37

TAC 2011 KBP English Evaluation Temporal Slot Filling Annotation

Catalog ID: LDC2012E38

TAC 2011 KBP English Evaluation Temporal Slot Filling Queries

Catalog ID: LDC2012E39

TAC 2011 KBP English Regular Slot Filling Assessment Results

Catalog ID: LDC2011E88

TAC 2011 KBP English Sample Temporal Slot Filling Annotation V1.2

Catalog ID: LDC2011E47

TAC 2011 KBP English Temporal Slot Filling Assessment Results

Catalog ID: LDC2013E65

TAC 2011 KBP English Training Regular Slot Filling Annotation

Catalog ID: LDC2011E48

TAC 2011 KBP English Training Temporal Slot Filling Annotation

Catalog ID: LDC2011E49

TAC 2011 RTE-7 KBP Validation Development Data

Catalog ID: LDC2011E29

TAC 2011 RTE-7 KBP Validation Test Data

Catalog ID: LDC2011E30

TAC 2012 KBP English Regular Slot Filling Evaluation Annotations

Catalog ID: LDC2012E91

TAC 2013 KBP English Entity Linking Evaluation Queries and Knowledge Base Links V1.1

Catalog ID: LDC2013E90

TAC 2013 KBP English Regular Slot Filling Assessment Results

Catalog ID: LDC2013E91

TAC 2013 KBP English Regular Slot Filling Evaluation Queries and Annotations V1.1

Catalog ID: LDC2013E77

TAC 2013 KBP English Regular Slot Filling per:title Training Data

Catalog ID: LDC2013E60

TAC 2013 KBP English Temporal Slot Filling Assessment Results

Catalog ID: LDC2013E99

TAC 2013 KBP English Temporal Slot Filling Evaluation Queries and Annotations V1.1

Catalog ID: LDC2013E86

TAC 2013 KBP English Temporal Slot Filling Training Queries and Annotations

Catalog ID: LDC2013E82

TAC 2013 KBP Source Corpus

Catalog ID: LDC2013E45

TAC 2014 KBP English Entity Linking Training AMR Queries and KB Links V1.1

Catalog ID: LDC2014E15

TAC 2014 KBP English Event Argument Extraction Evaluation Assessment Results V2.0

Catalog ID: LDC2014E88

TAC 2014 KBP English Event Argument Extraction Evaluation Source Corpus V1.1

Catalog ID: LDC2014R43

TAC 2014 KBP English Source Corpus

Catalog ID: LDC2014E13

TAC 2014 KBP Event Argument Extraction Pilot Assessment Results V1.1

Catalog ID: LDC2014E40

TAC 2014 KBP Event Argument Extraction Pilot Source Corpus V1.1

Catalog ID: LDC2014E20

TAC KBP 2009 Evaluation Entity Linking List

Catalog ID: LDC2009E64

TAC KBP 2016 Belief and Sentiment Evaluation Gold Standard Annotation (Versions 1 and 2)

Catalog ID: LDC2016E114

TAC KBP Evaluation Surprise Slot Filling Queries

Catalog ID: LDC2010E53

TAC KBP Gold Standard Entity Linking Entity Type List

Catalog ID: LDC2009E86

TAC KBP Training Surprise Slot Filling Annotation

Catalog ID: LDC2010E52

TDT2 Careful Transcription Audio

Catalog ID: LDC2000S92

TDT2 Careful Transcription Text

Catalog ID: LDC2000T44

TDT2 English Text

Catalog ID: LDC99T35

TDT2 Mandarin Audio Corpus

Catalog ID: LDC2001S93

TDT2 Multilanguage Text Version 4.0

Catalog ID: LDC2001T57

TDT2 Text Data and Tables

Catalog ID: 1019

TDT3 Multilanguage Text Version 2.0

Catalog ID: LDC2001T58

TERN 2004 Training Data V1.3

Catalog ID: LDC2004E23

TI 46-Word

Catalog ID: LDC93S9

TI 46-word

Catalog ID: 1004

TIDES Extraction ACE 2004 Training Data V1.4

Catalog ID: LDC2004E17

TIMIT Acoustic-Phonetic Continuous Speech Corpus

Catalog ID: LDC93S1

TIPSTER Complete

Catalog ID: LDC93T3A

TREC Mandarin

Catalog ID: LDC2000T52

TREC Spanish

Catalog ID: LDC2000T51

Tactical Speaker Identification Speech Corpus (TSID)

Catalog ID: LDC99S83

Taiwanese Putonghua

Catalog ID: LDC98S72

Talkbank Switchboard corpus

Catalog ID: 1018

The 2012 IBM Egyptian Arabic Corpus

Catalog ID: LDC2012E77

The AQUAINT Corpus of English News Text

Catalog ID: LDC2002T31

The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English

Catalog ID: other_1

The Enron Sent Corpus v1.0

Catalog ID: other_2

The George Bushotter Lakhota Text collection

Catalog ID: Other_5

The IViE corpus

Catalog ID: other_3

The New York Times Annotated Corpus

Catalog ID: LDC2008T19

TimeBank 1.2

Catalog ID: LDC2006T08

Tipster

Catalog ID: 1020

Translanguage English Database (TED) Speech

Catalog ID: LDC2002S04

Translanguage English Database (TED) Transcripts

Catalog ID: LDC2002T03

Treebank-2

Catalog ID: LDC95T7

Treebank-3

Catalog ID: LDC99T42

USC Marketplace Broadcast News Speech

Catalog ID: LDC99S82

USC Marketplace Broadcast News Transcripts

Catalog ID: LDC99T36

Uzbek Incident Language Pack

Catalog ID: LDC2015E89

VAHA (POLYPHONE II)

Catalog ID: LDC96S41

Voice of America (VOA) Czech Broadcast News Audio

Catalog ID: LDC2000S89

Voice of America (VOA) Czech Broadcast News Transcripts

Catalog ID: LDC2000T53

Voicemail Corpus Part II

Catalog ID: LDC2002S35

WordNet 1.5

Catalog ID: Other_6

Zurich BNC web

Catalog ID: 1000.5

bilingual data extracted from three Creative Commons (CC BY-SA) sources

Catalog ID: LDC2012E79