All Corpora
To access the discs in the LDC library, contact Michael Ginn.
You need to have a verbs account to access the corpora that are on the verbs server.
| Corpus Name | Language | Catalog ID |
|---|---|---|
| 1996 English Broadcast News Speech (HUB4) | English | LDC97S44 |
| 1996 English Broadcast News Transcripts (HUB4) | English | LDC97T22 |
| 1996-2008 NIST Speaker Recognition Evaluation Data Collection | English | LDC2009E100 |
| 1997 English Broadcast News Transcripts (HUB4) | English | LDC98T28 |
| 1997 HUB4 Broadcast News Evaluation Non-English Test Material | Spanish, Mandarin Chinese | LDC2001S91 |
| 1997 HUB4 English Evaluation Speech and Transcripts | English | LDC2002S11 |
| 1997 HUB5 Arabic Evaluation | Egyptian Arabic | LDC2002S22 |
| 1997 HUB5 Arabic Transcripts | Egyptian Arabic | LDC2002T39 |
| 1997 HUB5 English Evaluation | English | LDC2002S23 |
| 1997 HUB5 German Evaluation | German | LDC2002S24 |
| 1997 HUB5 German Transcripts | German | LDC2003T03 |
| 1997 HUB5 Spanish Evaluation | Spanish | LDC2002S25 |
| 1997 HUB5 Spanish Transcripts | Spanish | LDC2003T04 |
| 1997 Mandarin Broadcast News Speech (HUB4-NE) | Mandarin Chinese | LDC98S73 |
| 1997 Spanish Broadcast News Transcripts (HUB4-NE) | Spanish | LDC98T29 |
| 1998 HUB4 Broadcast News Evaluation English Test Material | English | LDC2000S86 |
| 1998 HUB5 English Evaluation | English | LDC2002S10 |
| 1998 HUB5 English Transcripts | English | LDC2003T02 |
| 1999 HUB4 Broadcast News Evaluation English Test Material | English | LDC2000S88 |
| 2000 Communicator Evaluation | English | LDC2002S56 |
| 2000 HUB5 English Evaluation Speech | English | LDC2002S09 |
| 2000 HUB5 English Evaluation Transcripts | English | LDC2002T43 |
| 2000 NIST Speaker Recognition Evaluation | English | LDC2001S97 |
| 2001 Communicator Evaluation | English | LDC2003S01 |
| 2001 HUB5 English Evaluation | English | LDC2002S13 |
| 2001 HUB5 Mandarin Evaluation | Mandarin Chinese | LDC2002S12 |
| 2001 HUB5 Mandarin Transcripts | Mandarin Chinese | LDC2003T01 |
| 2001 NIST Speaker Recognition Evaluation Corpus | English | LDC2002S34 |
| 2002 Rich Transcription Broadcast News and Conversational Telephone Speech | English | LDC2004S11 |
| 2009 CoNLL Shared Task Part 1 | Catalan, Czech, German, Spanish | LDC2012T03 |
| 2009 CoNLL Shared Task Part 2 | English, Mandarin Chinese, Chinese | LDC2012T04 |
| 8 years worth of summary/article sets collected via Newsblaster | LDC2012E80 | |
| ACE 2004 Evaluation Corpus | English, Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2004E51 |
| ACE 2004 Multilingual Training Corpus | English, Standard Arabic, Mandarin Chinese | LDC2005T09 |
| ACE 2004 Pilot Corpus V1.3 | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2004E03 |
| ACE 2005 Multilingual Training Data V6.0 | English, Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2005E18 |
| ACE-2 Version 1.0 | English | LDC2003T11 |
| ACL Multilingual Corpus 1 | 1006 | |
| AIDA 1.2 : Automatic Identification of Dialectal Arabic | Arabic | LDC2012E56 |
| AQUAINT CrossLingual QA Arabic Newswire Corpus | Baharna Arabic, English, Arabic | LDC2004E49 |
| ATIS3 Test Data | English | LDC95S26 |
| ATIS3 Training Data | English | LDC94S19 |
| Abstract Meaning Representation (AMR) Annotation Release 1.0 | English | LDC2014T12 |
| American English Nickname Collection | English | LDC2012T11 |
| American English Spoken Lexicon | English | LDC99L23 |
| American National Corpus (ANC) Second Release | English | LDC2005T35 |
| Annotated English Gigaword | English | LDC2012T21 |
| Arabic Gigaword Third Edition | Standard Arabic | LDC2007T40 |
| Arabic Newswire Part 1 | Standard Arabic | LDC2001T55 |
| Arabic Treebank - Broadcast News v1.0 | Standard Arabic, Arabic | LDC2012T07 |
| Arabic Treebank ARZ Part 1, V1.0 | Egyptian Arabic | LDC2012E28 |
| Arabic Treebank Part 20 V1.0 - BOLT Pilot ARZ Email | Arabic | LDC2012E25 |
| Arabic Treebank: Part 1 - 10K-word English Translation | Standard Arabic | LDC2003T07 |
| Arabic Treebank: Part 1 v 2.0 | Standard Arabic | LDC2003T06 |
| Arabic Treebank: Part 3 v 3.2 | Standard Arabic, Arabic | LDC2010T08 |
| BBN Pronoun Coreference and Entity Type Corpus | English | LDC2005T33 |
| BBN/LDC WebForum Selections Arabic/English Parallel Corpus | Arabic/English (Parallel) | LDC2012E75 |
| BBN/LDC WebForum Selections Chinese/English Parallel Corpus | Chinese/English (Parallel) | LDC2012E76 |
| BBN/LDC/Sakhr Arabic-Dialect/English Parallel Corpus | Sakhr Arabic-Dialect/English (Parallel) | LDC2012E17 |
| BLLIP 1987-89 WSJ Corpus Release 1 | English | LDC2000T43 |
| BOLT - Phase 1 Discussion Forums Source Data R1 V2 | English, Egyptian Arabic, Chinese | LDC2012E04 |
| BOLT - Phase 1 Discussion Forums Source Data R2 | Chinese, Egyptian Arabic, English | LDC2012E16 |
| BOLT - Phase 1 Discussion Forums Source Data R3 | Chinese, Egyptian Arabic, English | LDC2012E21 |
| BOLT - Phase 1 Rejected Training Data Thread IDs | LDC2012E62 | |
| BOLT - Phase 1 Translation Samples V2 | LDC2012E11 | |
| BOLT LRL Hausa Representative Language Pack V1.2 | Hausa | LDC2015E70 |
| BOLT LRL Turkish Representative Language Pack V2.2 | Turkish | LDC2014E115 |
| BOLT LRL Uzbek Representative Language Pack | Uzbek | LDC2016E29 |
| BOLT Phase 1 - Arabic Treebank ARZ Part 2, V1.0 | Egyptian Arabic | LDC2012E88 |
| BOLT Phase 1 - Chinese Parallel Word Alignment and Tagging Part 3 | Chinese | LDC2012E95 |
| BOLT Phase 1 - English Treebank BOLT WB Part 2, V 1.0 | English | LDC2012E97 |
| BOLT Phase 1 Chinese Parallel Word Alignment and Tagging DF Part 4 | Chinese | LDC2013E02 |
| BOLT Phase 1 Chinese Parallel Word Alignment and Tagging Part 1 | Chinese | LDC2012E24 |
| BOLT Phase 1 Chinese Parallel Word Alignment and Tagging Part 2 | Chinese | LDC2012E72 |
| BOLT Phase 1 Chinese Propbank DF Part 1 | Chinese | LDC2012E121 |
| BOLT Phase 1 Chinese Propbank DF Part 2 | Chinese | LDC2012E131 |
| BOLT Phase 1 Chinese Treebank DF Part 1 | Chinese | LDC2012E109 |
| BOLT Phase 1 Chinese Treebank DF Part 2 | Chinese | LDC2012E120 |
| BOLT Phase 1 Chinese Treebank DF Part 3 | Chinese | LDC2012E130 |
| BOLT Phase 1 DevTest Source and Translation V4 | Arabic/Chinese/English | LDC2012E30 |
| BOLT Phase 1 Egyptian Arabic Parallel Word Alignment DF | Egyptian Arabic | LDC2013E01 |
| BOLT Phase 1 Egyptian Arabic Parallel Word Alignment DF Part 2 v2 | Egyptian Arabic | LDC2012E94 |
| BOLT Phase 1 Egyptian Arabic Parallel Word Alignment Part 1 V2 | Egyptian Arabic | LDC2012E51 |
| BOLT Phase 1 Egyptian Arabic Propbank DF Part 1 | Egyptian Arabic | LDC2012E122 |
| BOLT Phase 1 Egyptian Arabic Propbank DF Part 2 | Egyptian Arabic | LDC2012E129 |
| BOLT Phase 1 Egyptian Arabic Treebank DF Part 1 V2.0 | Egyptian Arabic | LDC2012E93 |
| BOLT Phase 1 Egyptian Arabic Treebank DF Part 2 V2.0 | Egyptian Arabic | LDC2012E98 |
| BOLT Phase 1 Egyptian Arabic Treebank DF Part 3 V2.0 | Egyptian Arabic | LDC2012E89 |
| BOLT Phase 1 Egyptian Arabic Treebank DF Part 4 V2.0 | Egyptian Arabic | LDC2012E99 |
| BOLT Phase 1 Egyptian Arabic Treebank DF Part 5 V2.0 | Egyptian Arabic | LDC2012E107 |
| BOLT Phase 1 Egyptian Arabic Treebank DF Part 6 V2.0 | Egyptian Arabic | LDC2012E125 |
| BOLT Phase 1 Egyptian Arabic Treebank DF Part 7 V1.0 | Egyptian Arabic | LDC2013E12 |
| BOLT Phase 1 English Propbank DF Part 1 | English | LDC2012E123 |
| BOLT Phase 1 English Propbank DF Part 2 | English | LDC2012E128 |
| BOLT Phase 1 English Propbank DF Part 3 | English | LDC2013E05 |
| BOLT Phase 1 English Treebank DF Part 1 V1.0 | English | LDC2012E92 |
| BOLT Phase 1 English Treebank DF Part 3 V1.0 | English | LDC2012E114 |
| BOLT Phase 1 English Treebank DF Part 4 V1.0 | English | LDC2013E17 |
| BOLT Phase 1 HTER Experiment Source and Reference Translation | Chinese-English, Arabic-English | LDC2012E18 |
| BOLT Phase 1 IR Eval Assessment Results V1.1 | LDC2012E118 | |
| BOLT Phase 1 IR Eval Source Data Document List | LDC2012E82 | |
| BOLT Phase 1 Translation Training Data R1 | Chinese-English, Arabic-English | LDC2012E15 |
| BOLT Phase 1 Translation Training Data R2 | Chinese-English, Arabic-English | LDC2012E19 |
| BOLT Phase 1 Translation Training Data R3 | Chinese-English, Arabic-English | LDC2012E55 |
| BOLT Phase 1 Translation Training Data R4 | Chinese-English, Arabic-English | LDC2012E81 |
| BOLT Phase 1 Translation Training Data R5 | Chinese-English, Arabic-English | LDC2012E96 |
| BOLT Phase 1 Translation Training Data R6 | Chinese-English, Arabic-English | LDC2012E124 |
| BOLT Phase 2 English Treebank SMS/Chat Part 1 | English | LDC2013E127 |
| BOLT Phase 2 IR Source Data Document List and Sample Query | English | LDC2013E08 |
| BOLT Phase 2 SMS and Chat Sample Source Data | Chinese, English, Egyptian Arabic | LDC2013E10 |
| Boston University Radio Speech Corpus | English | LDC96S36 |
| Boulder Coercion Corpus | Other_8 | |
| British National Corpus Parses and BNC | British English | 1000 |
| Brown Corpus (treebanked) | Standard American English | Other_7 |
| Buckwalter Arabic Morphological Analyzer | Standard Arabic, English | LDC2004L02 |
| CALIMA 0.3: Columbia Arabic Language Morphological Analyzer -- Egyptian Arabic | Egyptian Arabic | LDC2012E57 |
| CALLFRIEND American English-Non-Southern Dialect | English | LDC96S46 |
| CALLFRIEND American English-Southern Dialect | Southern American English | LDC96S47 |
| CALLFRIEND Canadian French | Canadian French | LDC96S48 |
| CALLFRIEND Farsi | Farsi, Persian | LDC96S50 |
| CALLFRIEND German | German | LDC96S51 |
| CALLFRIEND Hindi | Hindi | LDC96S52 |
| CALLFRIEND Japanese | Japanese | LDC96S53 |
| CALLFRIEND Korean | Korean | LDC96S54 |
| CALLFRIEND Mandarin Chinese-Mainland Dialect | Mandarin Chinese-Mainland Dialect | LDC96S55 |
| CALLFRIEND Mandarin Chinese-Taiwan Dialect | Mandarin Chinese-Taiwan Dialect | LDC96S56 |
| CALLFRIEND Spanish-Caribbean Dialect | Spanish | LDC96S57 |
| CALLFRIEND Spanish-Caribbean Dialect | Spanish | LDC96S57 |
| CALLFRIEND Tamil | Tamil | LDC96S59 |
| CALLFRIEND Vietnamese | Vietnamese | LDC96S60 |
| CALLHOME American English Lexicon (PRONLEX) | American English | LDC97L20 |
| CALLHOME American English Speech | American English | LDC97S42 |
| CALLHOME American English Transcripts | American English | LDC97T14 |
| CALLHOME Egyptian Arabic Speech Supplement | Egyptian Arabic | LDC2002S37 |
| CALLHOME Egyptian Arabic Transcripts | Egyptian Arabic | LDC97T19 |
| CALLHOME Egyptian Arabic Transcripts Supplement | Egyptian Arabic | LDC2002T38 |
| CALLHOME German Lexicon | German | LDC97L18 |
| CALLHOME German Speech | German | LDC97S43 |
| CALLHOME German Transcripts | German | LDC97T15 |
| CALLHOME Japanese Lexicon | Japanese | LDC96L17 |
| CALLHOME Japanese Speech | Japanese | LDC96S37 |
| CALLHOME Japanese Transcripts | Japanese | LDC96T18 |
| CALLHOME Mandarin Chinese Lexicon | Mandarin Chinese | LDC96L15 |
| CALLHOME Mandarin Chinese Speech | Mandarin Chinese | LDC96S34 |
| CALLHOME Mandarin Chinese Transcripts | Mandarin Chinese | LDC96T16 |
| CALLHOME Spanish Dialogue Act Annotation | Spanish | LDC2001T61 |
| CALLHOME Spanish Lexicon | Spanish | LDC96L16 |
| CALLHOME Spanish Speech | Spanish | LDC96S35 |
| CALLHOME Spanish Transcripts | Spanish | LDC96T17 |
| CELEX2 | English, German, Dutch | LDC96L14 |
| CETEMpublico | Portuguese | LDC2001T62 |
| CODAFY 0.1: Automatic mapper into the Conventional Orthography of Dialectal Arabic | Dialectal Arabic | LDC2012E58 |
| COMLEX English Syntax Lexicon | English | LDC96L6 |
| COMLEX Pronouncing Dictionary | English | LDC96L7 |
| COMLEX Syntax Text Corpus Version 2.0 | English | LDC96T11 |
| CSLU: Kids` Speech Version 1.1 | English | LDC2007S18 |
| CSLU: Spelled and Spoken Words | English | LDC2006S15 |
| CSLU: Spoltech Brazilian Portuguese Version 1.0 | Brazilian Portuguese | LDC2006S16 |
| CSLU: Stories v 1.2 | English | LDC2006S14 |
| CSR-I (WSJ0) Complete | English | LDC93S6A |
| CSR-IV HUB4 | English | LDC96S31 |
| Childes Corpus 1996 | 1001 | |
| Childes Corpus 1998 | 1002 | |
| Chinese <-> English Name Entity Lists v 1.0 | Mandarin Chinese-English | LDC2005T34 |
| Chinese English News Magazine Parallel Text | Chinese-English (Parallel) | LDC2005T10 |
| Chinese Gigaword | Mandarin Chinese | LDC2003T09 |
| Chinese Gigaword Fifth Edition | Mandarin Chinese | LDC2011T13 |
| Chinese Gigaword Second Edition | Mandarin Chinese | LDC2005T14 |
| Chinese Proposition Bank 2.0 | Mandarin Chinese | LDC2008T07 |
| Chinese Treebank 2.0 | Mandarin Chinese | LDC2001T11 |
| Chinese Treebank 4.0 | Mandarin Chinese | LDC2004T05 |
| Chinese Treebank 5.0 | Mandarin Chinese | LDC2005T01 |
| Chinese Treebank 5.1 | Mandarin Chinese | LDC2005T01U01 |
| Chinese Treebank 6.0 | Mandarin Chinese | LDC2007T36 |
| Chinese Treebank 7.0 | Mandarin Chinese | LDC2010T07 |
| Chinese Treebank 8.0 | Mandarin Chinese, Chinese | LDC2013T21 |
| Chinese Treebank Final Release | Mandarin Chinese | LDC2000T48 |
| Chinese idiom translation dictionary + word segmenter dictionary - web resources | Chinese | LDC2012E78 |
| Chinese-English Translation Lexicon Version 3.0 | English-Mandarin Chinese | LDC2002L27 |
| CoNNL 2008 Shared Task Development Set | English | LDC2008E33 |
| CoNNL 2008 Shared Task Test Set | English | LDC2008E34 |
| CoNNL 2008 Shared Task Training Set | English | LDC2008E32 |
| CoNNL 2008 Shared Task Trial Data Set | English | LDC2008E31 |
| CoNNL 2009 Shared Task Chinese Test Set | Chinese | LDC2009E37 |
| CoNNL 2009 Shared Task Chinese Training Set | Chinese | LDC2009E38 |
| CoNNL 2009 Shared Task Chinese Trial Data Set | Chinese | LDC2009E36D |
| CoRD | The London-Lund Corpus of Spoken English | English | other_1234 |
| Corpus Search | 1003 | |
| DEFT ERE Cross-Doc Event Coreference Training Data Annotation | LDC2017E24 | |
| DEFT ERE English Discussion Forum Annotation V3 | English | LDC2014E31 |
| DEFT English Belief and Sentiment Annotation | English | LDC2016E27 |
| DEFT Event Sequencing After-Link And Parent-Child Annotation Training Data | English | LDC2016E130 |
| DEFT Event Sequencing Pilot Evaluation Source Data | English | LDC2017E08 |
| DEFT Phase 1 AMR Annotation R4 | English | LDC2014E41 |
| DEFT Phase 1 ERE Annotation R3 V2 | English | LDC2013E64 |
| DEFT Phase 1 Narrative Text Source Data R1 | English | LDC2013E19 |
| DEFT Phase 2 AMR Annotation R1 | English | LDC2015E86 |
| DEFT Phase 2 AMR Annotation R2 | English | LDC2016E25 |
| DEFT Phase 2 AMR Exploratory Source Data | English | LDC2014R46 |
| DEFT Phase 2 AMR Selected Segmented DF Source Data V2.0 | English | LDC2015R11 |
| DEFT Rich ERE English Training Annotation R2 V2 | English | LDC2015E68 |
| DSO Corpus of Sense-Tagged English | English | LDC97T12 |
| ECI Multilingual Text | Turkish, Swedish, Slovenian, Russian, Portuguese, Norwegian, Norwegian Bokmål, Norwegian Nynorsk, Lithuanian, Latin, Japanese, Scottish Gaelic, French, Estonian, English, Modern Greek (1453-), German, Danish, Bulgarian, Tosk Albanian, Standard Malay, Spanish, Serbian, Northern Uzbek, Mandarin Chinese, Italian, Dutch, Czech, Croatian, Albanian | LDC94T5 |
| Emotional Prosody Speech and Transcripts | English | LDC2002S28 |
| English Gigaword | English | LDC2003T05 |
| English Gigaword Fifth Edition | English | LDC2011T07 |
| English Gigaword Second Edition | English | LDC2005T12 |
| English News Text Treebank: Penn Treebank Revised | English | LDC2015T13 |
| English Translation Treebank: An-Nahar Newswire | English | LDC2012T02 |
| English Web Treebank | English | LDC2012T13 |
| Entropic Speech Technology | 1005 | |
| European Language Newspaper Text | Portuguese, French, German | LDC95T11 |
| FactBank 1.0 | English | LDC2009T23 |
| Fisher English Training Part 2, Speech | English | LDC2005S13 |
| Fisher English Training Part 2, Transcripts | English | LDC2005T19 |
| Fisher English Training Speech Part 1 Speech | English | LDC2004S13 |
| Fisher English Training Speech Part 1 Transcripts | English | LDC2004T19 |
| GALE Arabic-English Parallel Aligned Treebank -- Newswire | Arabic-English (Parallel) | LDC2013T10 |
| GALE Kickoff Release - Arabic Names Extracted from ACE V1.0 | Arabic | LDC2005E66 |
| GALE Kickoff Release - Arabic Names Extracted from ATB V1.0 | Arabic | LDC2005E68 |
| GALE Kickoff Release - Broadcast Conversation Audio V1.0 | Baharna Arabic, Chinese, Arabic | LDC2005E61 |
| GALE Kickoff Release - Broadcast Conversation Transcripts V1.0 | Baharna Arabic, Chinook jargon, Chinese, Arabic | LDC2005E63 |
| GALE Kickoff Release - Broadcast News Audio V1.0 | Arabic, Chinese | LDC2005E62 |
| GALE Kickoff Release - English-Arabic Parallel Treebank V1.0 | English-Arabic (Parallel) | LDC2005E69 |
| GALE Kickoff Release - VOA Arabic Broadcast News Audio | Arabic | LDC2005E60 |
| GALE Kickoff Release - VOA Arabic Broadcast News Transcripts | Arabic | LDC2005E71 |
| GALE Kickoff Release 2 - English CTS Treebank with Structural Metadata | English | LDC2005E79 |
| GALE Kickoff Release 2 -- Levantine Arabic CTS Audio | South Levantine Arabic, North Levantine Arabic | LDC2005E76 |
| GALE Kickoff Release 2 -- Levantine Arabic CTS Transcripts | South Levantine Arabic, North Levantine Arabic | LDC2005E77 |
| GALE Kickoff Release 2 -- Levantine Arabic CTS Treebank | South Levantine Arabic, North Levantine Arabic | LDC2005E78 |
| GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1 | English, Mandarin Chinese (Parallel) | LDC2009T02 |
| GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2 | English, Mandarin Chinese (Parallel) | LDC2009T06 |
| GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1 | English, Mandarin Chinese (Parallel) | LDC2007T23 |
| GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2 | English, Mandarin Chinese (Parallel) | LDC2008T08 |
| GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3 | English, Mandarin Chinese (Parallel) | LDC2008T18 |
| GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1 | English-Mandarin Chinese (Parallel) | LDC2009T15 |
| GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2 | English-Mandarin Chinese (Parallel) | LDC2010T03 |
| GALE Phase 2 Distillation - Training V5.0 | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2007E13 |
| GALE Phase 2 Release 1 - Transcripts | Chinook jargon, Baharna Arabic, English, Chinese, Arabic | LDC2007E05 |
| GALE Phase 2 Release 1 - Translations | English, Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2007E06 |
| GALE Phase 2 Release 1 - Web Text | Arabic, Chinese, English | LDC2007E04 |
| GALE Phase 2 Release 2 - Transcripts | Baharna Arabic, Chinook jargon, English, Chinese, Arabic | LDC2007E45 |
| GALE Phase 2 Release 2 - Translations | Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2007E46 |
| GALE Phase 2 Release 3 - Transcripts | Baharna Arabic, Chinook jargon, Chinese, Arabic | LDC2007E86 |
| GALE Phase 2 Release 3 - Translations | Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2007E87 |
| GALE Phase 3 - MTPlus Pilot | LDC2008E42 | |
| GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 1 | Mandarin Chinese, Chinese | LDC2014T28 |
| GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 2 | Mandarin Chinese, Chinese | LDC2015T09 |
| GALE Phase 3 DevTest - Broadcast Audio | LDC2007E60 | |
| GALE Phase 3 Release 1 - Distillation V1.1 | English, Chinese, Arabic | LDC2007E104 |
| GALE Phase 3 Release 1 - English Translation Treebank | English, Baharna Arabic, Arabic | LDC2007E105 |
| GALE Phase 3 Release 1 - Found Parallel Text | English, Chinese, Arabic (Parallel) | LDC2007E103 |
| GALE Phase 3 Release 1 - Transcripts | English, Chinese, Arabic | LDC2007E100 |
| GALE Phase 3 Release 1 - Translations | Arabic, Chinese, English | LDC2007E101 |
| GALE Phase 3 Release 1 - Web Text V 1.0 | English, Chinook jargon, Baharna Arabic, Arabic, Chinese | LDC2007E102 |
| GALE Phase 3 Release 2 - Broadcast Audio | English, Chinese, Arabic | LDC2008E38 |
| GALE Phase 3 Release 2 - Transcripts | LDC2008E39 | |
| GALE Phase 3 Release 2 - Translations | LDC2008E40 | |
| GALE Phase 3 Release 2 - Web Text | LDC2008E41 | |
| GALE Phase 3 and 4 Eval Superset | Arabic, Chinese | LDC2011E50 |
| GALE Phase 4 Arabic Parallel Aligned Treebank Part 1 V1.2 | Arabic-English (Parallel) | LDC2009E82 |
| GALE Phase 4 Chinese Parallel Word Alignment and Tagging Part 1 V1.1 | Chinese-English (Parallel) | LDC2009E83 |
| GALE Phase 4 Release 1 - Transcripts V1.0 | English | LDC2008E55 |
| GALE Phase 4 Release 1 - Translations V2.0 | Arabic and Chinese - English (Parallel) | LDC2008E56 |
| GALE Phase 4 Release 1 - Web Text V1.0 | LDC2008E53 | |
| GALE Phase 4 Release 2 - Transcripts | Arabic, Chinese, English | LDC2009E15 |
| GALE Phase 4 Release 2 - Translations | Arabic and Chinese - English (Parallel) | LDC2009E16 |
| GALE Phase 4 Release 2 - Web Text | Arabic, Chinese, English | LDC2009E14 |
| GALE Phase 4 Release 3 - Found Parallel Text | Arabic-English, Chinese-English (Parallel) | LDC2009E105 |
| GALE Phase 4 Release 3 - Transcripts | Arabic, Chinese, English | LDC2009E94 |
| GALE Phase 4 Release 3 - Translations V1.2 | Arabic and Chinese - English (Parallel) | LDC2009E95 |
| GALE Phase 4 Release 3 - Web Text | Arabic, Chinese, English | LDC2009E93 |
| GALE Phase 5 Eval Source Transcripts and Translation | Arabic, Chinese | LDC2011E21 |
| GALE Phase 5 Eval Superset Source Transcripts and Translation | Arabic, Chinese | LDC2011E25 |
| GALE Phase 5 Levantine Arabic Dialect Judgments and Translations | Levantine Arabic-English (Parallel) | LDC2010E79 |
| GALE Y1 - Arabic English Parallel News Text | English, Baharna Arabic, Arabic (Parallel) | LDC2006E25 |
| GALE Y1 - BBN Iraqi Broadcast Conversation Corpus | Iraqi Arabic | LDC2006G07 |
| GALE Y1 - Distillation Blind Evaluation Audio Part A | English | LDC2006E46_A |
| GALE Y1 - Distillation Blind Evaluation Audio Part B | English | LDC2006E46_B |
| GALE Y1 - Distillation Blind Evaluation Audio Part C | English | LDC2006E46_C |
| GALE Y1 - Distillation Blind Evaluation Audio Part D | English | LDC2006E46_D |
| GALE Y1 - Distillation Blind Evaluation Audio Part E | English | LDC2006E46_E |
| GALE Y1 - Distillation Blind Evaluation Newswire | English | LDC2006E45 |
| GALE Y1 - Distillation Evaluation Audio | English | LDC2006E21 |
| GALE Y1 - Distillation Evaluation Newswire | Baharna Arabic, Chinook jargon, English, Chinese, Arabic | LDC2006E22 |
| GALE Y1 - English Chinese Parallel Financial News | Chinook jargon, English, Chinese (Parallel) | LDC2006E26 |
| GALE Y1 - Interim Release: Transcripts | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2006E23 |
| GALE Y1 - Interim Release: Translations | Chinook jargon, Baharna Arabic, Chinese, Arabic - English (Parallel) | LDC2006E24 |
| GALE Y1 - Web 1T 5-gram Version 1 | English | LDC2006E88 |
| GALE Y1 Q1 Release - Arabic Treebank v 1.0 | Arabic | LDC2005E84 |
| GALE Y1 Q1 Release - English Translation Treebank v 1.0 | Arabic-English (Parallel) | LDC2005E85 |
| GALE Y1 Q1 Release - Transcripts V1.0 | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2005E82 |
| GALE Y1 Q1 Release - Translations V1.0 | Arabic and Chinese - English (Parallel) | LDC2005E83 |
| GALE Y1 Q1 Release - Web Text Collection V1.0 | Chinese, Arabic, English | LDC2005E81 |
| GALE Y1 Q2 Release - Arabic Treebank v 1.0 | Arabic | LDC2006E35 |
| GALE Y1 Q2 Release - English Translation Treebank v 1.0 | Arabic-English (Parallel) | LDC2006E36 |
| GALE Y1 Q2 Release - Transcripts V1.0 | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2006E33 |
| GALE Y1 Q2 Release - Translations V2.0 | Baharna Arabic, Chinook jargon, Arabic, Chinese; into English | LDC2006E34 |
| GALE Y1 Q2 Release - Web Text Collection V1.0 | Arabic, Chinese, English | LDC2006E32 |
| GALE Y1 Q3 Release - Arabic Treebank | Arabic | LDC2006E87 |
| GALE Y1 Q3 Release - English Translation Treebank | Arabic-English (Parallel) | LDC2006E82 |
| GALE Y1 Q3 Release - Transcripts | English, Chinook jargon, Baharna Arabic, Arabic, Chinese | LDC2006E84 |
| GALE Y1 Q3 Release - Translations | Baharna Arabic, Chinook jargon, Arabic, Chinese; into English | LDC2006E85 |
| GALE Y1 Q3 Release - Web Text Collection | LDC2006E77 | |
| GALE Y1 Q3 Release - Word Alignment | Baharna Arabic, Chinook jargon, Arabic, Chinese; into English | LDC2006E86 |
| GALE Y1 Q4 Release - Arabic Treebank | Arabic | LDC2006E94 |
| GALE Y1 Q4 Release - English Translation Treebank | Arabic-English (Parallel) | LDC2006E95 |
| GALE Y1 Q4 Release - Transcripts | English, Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2006E91 |
| GALE Y1 Q4 Release - Translations | Arabic and Chinese - English (Parallel) | LDC2006E92 |
| GALE Y1 Q4 Release - Web Text Collection | LDC2006E90 | |
| GALE Y1 Q4 Release - Word Alignment | Arabic, Chinese, English (Parallel) | LDC2006E93 |
| Gigaword English Automatic Parses | Other_9 | |
| Google Question Bank Update-v1.0 | English | LDC2012R121 |
| Google Treebank Weblog Subcorpus V2.0 | English | LDC2011E71 |
| Grassfields Bantu Fieldwork: Ngomba Tone Paradigms | Ngomba | LDC2001S16 |
| HUB4 Radio Broadcast News | 1014 | |
| HUB5 Spanish Telephone Speech Corpus | Spanish | LDC98S70 |
| Hansard French/English | English - Canadian French (Parallel) | LDC95T20 |
| Hong Kong Hansards Parallel Text | English, Chinese (Parallel) | LDC2000T50 |
| Hong Kong Laws Parallel Text | English, Chinese | LDC2000T47 |
| Hong Kong News Parallel Text | English, Chinese (Parallel) | LDC2000T46 |
| Hong Kong Parallel Text | English, Chinese (Parallel) | LDC2004T08 |
| ICSI Meeting Speech | English | LDC2004S02 |
| ICSI Meeting Transcripts | English | LDC2004T04 |
| ISCA 1 and 3 | 1007 | |
| ISCA Tutorial | 1008 | |
| ISL Meeting Speech Part 1 | English | LDC2004S05 |
| ISL Meeting Transcripts Part 1 | English | LDC2004T10 |
| JURIS | English | LDC98T32 |
| Japanese Business News Text | Japanese | LDC95T8 |
| Japanese Business News Text Supplement | Japanese | LDC99T34 |
| Korean English Treebank Annotations | Korean, English (Parallel) | LDC2002T26 |
| Korean Newswire | Korean | LDC2000T45 |
| Korean Propbank | Korean | LDC2006T03 |
| Korean Telephone Conversations Lexicon | Korean | LDC2003L02 |
| Korean Telephone Conversations Speech | Korean | LDC2003S03 |
| Korean Telephone Conversations Transcripts | Korean | LDC2003T08 |
| Korean Treebank Annotations Version 2.0 | Korean | LDC2006T09 |
| LCTL Urdu | Urdu | LDC2006E110 |
| LLHDB | English | LDC98S68 |
| LORELEI Akan Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Akan | LDC2018E07 |
| LORELEI Amharic Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Amharic | LDC2016E87 |
| LORELEI Arabic Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Arabic | LDC2016E89 |
| LORELEI Bengali Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Bengali | LDC2017E60 |
| LORELEI Farsi Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Farsi | LDC2016E93 |
| LORELEI Hindi Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Hindi | LDC2017E62 |
| LORELEI Hungarian Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Hungarian | LDC2016E98 |
| LORELEI Indonesian Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1 | Indonesian | LDC2017E66 |
| LORELEI Language Independent NLP Tools | LDC2016E53 | |
| LORELEI Mandarin Incident Language Pack V2 | Mandarin Chinese | LDC2016E30 |
| LORELEI Mandarin Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Mandarin Chinese | LDC2016E101 |
| LORELEI Russian Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Russian | LDC2016E95 |
| LORELEI Situation Frame Exercise Annotation | English | LDC2017E07 |
| LORELEI Somali Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Somali | LDC2016E91 |
| LORELEI Spanish Representative Language Pack Translation, Annotation, Grammar, Lexicon and Tools V1. | Spanish | LDC2016E97 |
| LORELEI Swahili Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Swahili | LDC2017E64 |
| LORELEI Tagalog Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Tagalog | LDC2017E68 |
| LORELEI Tamil Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Tamil | LDC2017E70 |
| LORELEI Thai Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Thai | LDC2018E03 |
| LORELEI Vietnamese Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1 | Vietnamese | LDC2016E103 |
| LORELEI Wolof Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Wolof | LDC2018E09 |
| LORELEI Year 1 Dry Run Evaluation IL2 V1.1 | English | LDC2016E56 |
| LORELEI Yoruba Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Yoruba | LDC2016E105 |
| LORELEI Zulu Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Zulu | LDC2018E05 |
| Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) | North Levantine Arabic, South Levantine Arabic | LDC2005S14 |
| MADA-ARZ 0.1: Morphological Analysis and Disambiguation for Arabic (Egyptian version) | Egyptian Arabic | LDC2012E60 |
| MRC Psycholinguistic Database Machine Usable Dictionary | other_4 | |
| Mandarin Chinese News Text | Mandarin Chinese | LDC95T13 |
| Matlab | 1009 | |
| Message Understanding Conference (MUC) 6 | English | LDC2003T13 |
| Message Understanding Conference (MUC) 7 | English | LDC2001T02 |
| Multiple-Translation Arabic (MTA) Part 1 | English, Standard Arabic (Parallel) | LDC2003T18 |
| Multiple-Translation Arabic (MTA) Part 2 | English, Standard Arabic (Parallel) | LDC2005T05 |
| Multiple-Translation Chinese (MTC) Part 2 | English, Mandarin Chinese (Parallel) | LDC2003T17 |
| Multiple-Translation Chinese (MTC) Part 3 | English, Mandarin Chinese (Parallel) | LDC2004T07 |
| Multiple-Translation Chinese (MTC) Part 4 | English, Mandarin Chinese (Parallel) | LDC2006T04 |
| Multiple-Translation Chinese Corpus | English, Mandarin Chinese (Parallel) | LDC2002T01 |
| NIST 2009 Open Machine Translation (OpenMT) Evaluation | Urdu and Arabic - English (Parallel) | LDC2010T23 |
| NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source | Dari, Korean, Persian, Farsi, English, Mandarin Chinese, Arabic, Iranian Persian, Chinese (Parallel) | LDC2014T02 |
| NIST Meeting Pilot Corpus Speech | English | LDC2004S09 |
| NIST Meeting Pilot Corpus Transcripts and Metadata | English | LDC2004T13 |
| NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations | Urdu, Mandarin Chinese, Standard Arabic, English, Chinese, Arabic (Parallel) | LDC2010T01 |
| NLTK | 1010 | |
| NTIMIT | 1011 | |
| NomBank v 1.0 | English | LDC2008T23 |
| North American News Text Corpus | English | LDC95T21 |
| OntoNotes Release 5.0 | English, Mandarin Chinese, Arabic, Chinese | LDC2013T19 |
| OntoNotes V3.0 - GALE Pre-Release | English | LDC2009E60 |
| Original Penn Treebank release 2 | 1012 | |
| Penn Discourse Treebank Version 2.0 | English | LDC2008T05 |
| Penn Treebank release 3 | 1013 | |
| Portuguese Newswire Text | Portuguese | LDC99T40 |
| Prague Dependency Treebank 1.0 | Czech, English (Parallel) | LDC2001T10 |
| PropBank frameset files (v1.7) | other_10 | |
| PropBank on the Brown corpus | other_11 | |
| Proposition Bank I | English | LDC2004T14 |
| REFLEX Bengali | LDC2015E13 | |
| REFLEX Hungarian | LDC2015E82 | |
| REFLEX Tagalog | LDC2015E90 | |
| REFLEX Tamil | LDC2015E83 | |
| REFLEX Thai | LDC2015E84 | |
| REFLEX Urdu | LDC2015E14 | |
| REFLEX Yoruba | LDC2015E91 | |
| RST Discourse Treebank | English | LDC2002T07 |
| Reuters vol 1 | English | 1015 |
| Reuters vol. 2 | English | 1016 |
| SAID | English | LDC2003T10 |
| SANCL 2012 Shared Task Release 1 | English | LDC2012E43 |
| SIGHAN Bakeoff | LDC2003E16 | |
| SUSAS | English | LDC99S78 |
| SUSAS Transcripts | English | LDC99T33 |
| Santa Barbara Corpus of Spoken American English Part I | American English | LDC2000S85 |
| Santa Barbara Corpus of Spoken American English Part II | American English | LDC2003S06 |
| Santa Barbara Corpus of Spoken American English Part III | Amrican English | LDC2004S10 |
| Santa Barbara Corpus of Spoken American English Part IV | American English | LDC2005S25 |
| SemEval-2016 Task 8 - Meaning Representation Parsing - Gold Standard AMRs | English | LDC2016E33 |
| Spanish Discussion Forum Source Data R1 | Spanish | LDC2014E14 |
| Spanish Language News Corpus | Spanish | 1017 |
| Spanish Newswire Text, Volume 2 | Spanish | LDC99T41 |
| Speech in Noisy Environments (SPINE) Evaluation Audio | English | LDC2000S96 |
| Speech in Noisy Environments (SPINE) Evaluation Transcripts | English | LDC2000T54 |
| Speech in Noisy Environments (SPINE) Training Audio | English | LDC2000S87 |
| Speech in Noisy Environments (SPINE) Training Transcripts | English | LDC2000T49 |
| Speech in Noisy Environments (SPINE2) Part 1 Audio | English | LDC2001S04 |
| Speech in Noisy Environments (SPINE2) Part 1 Transcripts | English | LDC2001T05 |
| Speech in Noisy Environments (SPINE2) Part 2 Audio | English | LDC2001S06 |
| Speech in Noisy Environments (SPINE2) Part 2 Transcripts | English | LDC2001T07 |
| Speech in Noisy Environments (SPINE2) Part 3 Audio | English | LDC2001S08 |
| Speech in Noisy Environments (SPINE2) Part 3 Transcripts | English | LDC2001T09 |
| Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio | English | LDC2001S99 |
| Switchboard Cellular Part 1 Transcription | English | LDC2001T14 |
| Switchboard-1 Release 2 | English | LDC97S62 |
| Switchboard-2 Phase I | English | LDC98S75 |
| Switchboard-2 Phase II | English | LDC99S79 |
| Switchboard-2 Phase III Audio | English | LDC2002S06 |
| Syllable-Final /s/ Lenition | Spanish | LDC2001T60 |
| TAC 2009 KBP Assessment Results | English | LDC2009E90 |
| TAC 2009 KBP Evaluation Generic Infoboxes V2.0 | English | LDC2009E56 |
| TAC 2009 KBP Evaluation NIL Link Assessment | English | LDC2009E110 |
| TAC 2009 KBP Evaluation Reference Knowledge Base | English | LDC2009E58A |
| TAC 2009 KBP Evaluation Reference Knowledge Base | English | LDC2009E58C |
| TAC 2009 KBP Evaluation Reference Knowledge Base | English | LDC2009E58B |
| TAC 2009 KBP Evaluation Slot Filling List | English | LDC2009E65 |
| TAC 2010 KBP Assessment Results | English | LDC2010E61 |
| TAC 2010 KBP Entity Linking IAA Study Results | English | LDC2012E31 |
| TAC 2010 KBP Evaluation Entity Linking Gold Standard V1.0 | English | LDC2010E82 |
| TAC 2010 KBP Evaluation Slot Filling Annotation | English | LDC2012E32 |
| TAC 2010 KBP Evaluation Surprise Slot Filling Annotation | English | LDC2012E33 |
| TAC 2010 KBP Generic Infoboxes | English | LDC2010E24 |
| TAC 2010 KBP Source Data | LDC2010E12 | |
| TAC 2010 KBP Training Entity Linking V2.0 | English | LDC2010E31 |
| TAC 2010 KBP Training Slot Filling Annotation V2.1 | English | LDC2010E18 |
| TAC 2010 RTE-6 KBP Validation Pilot Development Data | English | LDC2010E32 |
| TAC 2011 Guided Summarization Test Data | English | LDC2011E28 |
| TAC 2011 Guided Summarization Test Data V1.1 | English | LDC2011E62 |
| TAC 2011 KBP English Evaluation Diagnostic Temporal Slot Filling Queries | English | LDC2011E85 |
| TAC 2011 KBP English Evaluation Entity Linking Annotation | English | LDC2012E29 |
| TAC 2011 KBP English Evaluation Entity Linking Queries | English | LDC2012E36 |
| TAC 2011 KBP English Evaluation Regular Slot Filling Annotation V1.2 | English | LDC2011E89 |
| TAC 2011 KBP English Evaluation Regular Slot Filling Queries | English | LDC2012E37 |
| TAC 2011 KBP English Evaluation Temporal Slot Filling Annotation | English | LDC2012E38 |
| TAC 2011 KBP English Evaluation Temporal Slot Filling Queries | English | LDC2012E39 |
| TAC 2011 KBP English Regular Slot Filling Assessment Results | English | LDC2011E88 |
| TAC 2011 KBP English Sample Temporal Slot Filling Annotation V1.2 | English | LDC2011E47 |
| TAC 2011 KBP English Temporal Slot Filling Assessment Results | English | LDC2013E65 |
| TAC 2011 KBP English Training Regular Slot Filling Annotation | English | LDC2011E48 |
| TAC 2011 KBP English Training Temporal Slot Filling Annotation | English | LDC2011E49 |
| TAC 2011 RTE-7 KBP Validation Development Data | English | LDC2011E29 |
| TAC 2011 RTE-7 KBP Validation Test Data | English | LDC2011E30 |
| TAC 2012 KBP English Regular Slot Filling Evaluation Annotations | English | LDC2012E91 |
| TAC 2013 KBP English Entity Linking Evaluation Queries and Knowledge Base Links V1.1 | English | LDC2013E90 |
| TAC 2013 KBP English Regular Slot Filling Assessment Results | English | LDC2013E91 |
| TAC 2013 KBP English Regular Slot Filling Evaluation Queries and Annotations V1.1 | English | LDC2013E77 |
| TAC 2013 KBP English Regular Slot Filling per:title Training Data | English | LDC2013E60 |
| TAC 2013 KBP English Temporal Slot Filling Assessment Results | English | LDC2013E99 |
| TAC 2013 KBP English Temporal Slot Filling Evaluation Queries and Annotations V1.1 | English | LDC2013E86 |
| TAC 2013 KBP English Temporal Slot Filling Training Queries and Annotations | English | LDC2013E82 |
| TAC 2013 KBP Source Corpus | LDC2013E45 | |
| TAC 2014 KBP English Entity Linking Training AMR Queries and KB Links V1.1 | English | LDC2014E15 |
| TAC 2014 KBP English Event Argument Extraction Evaluation Assessment Results V2.0 | English | LDC2014E88 |
| TAC 2014 KBP English Event Argument Extraction Evaluation Source Corpus V1.1 | English | LDC2014R43 |
| TAC 2014 KBP English Source Corpus | English | LDC2014E13 |
| TAC 2014 KBP Event Argument Extraction Pilot Assessment Results V1.1 | English | LDC2014E40 |
| TAC 2014 KBP Event Argument Extraction Pilot Source Corpus V1.1 | English | LDC2014E20 |
| TAC KBP 2009 Evaluation Entity Linking List | English | LDC2009E64 |
| TAC KBP 2016 Belief and Sentiment Evaluation Gold Standard Annotation (Versions 1 and 2) | English | LDC2016E114 |
| TAC KBP Evaluation Surprise Slot Filling Queries | English | LDC2010E53 |
| TAC KBP Gold Standard Entity Linking Entity Type List | English | LDC2009E86 |
| TAC KBP Training Surprise Slot Filling Annotation | English | LDC2010E52 |
| TDT2 Careful Transcription Audio | English | LDC2000S92 |
| TDT2 Careful Transcription Text | English | LDC2000T44 |
| TDT2 English Text | English | LDC99T35 |
| TDT2 Mandarin Audio Corpus | Mandarin Chinese | LDC2001S93 |
| TDT2 Multilanguage Text Version 4.0 | English, Mandarin Chinese | LDC2001T57 |
| TDT2 Text Data and Tables | 1019 | |
| TDT3 Multilanguage Text Version 2.0 | English, Mandarin Chinese | LDC2001T58 |
| TERN 2004 Training Data V1.3 | LDC2004E23 | |
| TI 46-Word | English | LDC93S9 |
| TI 46-word | 1004 | |
| TIDES Extraction ACE 2004 Training Data V1.4 | LDC2004E17 | |
| TIMIT Acoustic-Phonetic Continuous Speech Corpus | English | LDC93S1 |
| TIPSTER Complete | English | LDC93T3A |
| TREC Mandarin | Mandarin Chinese | LDC2000T52 |
| TREC Spanish | Spanish | LDC2000T51 |
| Tactical Speaker Identification Speech Corpus (TSID) | English | LDC99S83 |
| Taiwanese Putonghua | Taiwanese Mandarin | LDC98S72 |
| Talkbank Switchboard corpus | 1018 | |
| The 2012 IBM Egyptian Arabic Corpus | Egyptian Arabic | LDC2012E77 |
| The AQUAINT Corpus of English News Text | English | LDC2002T31 |
| The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English | Old English | other_1 |
| The Enron Sent Corpus v1.0 | other_2 | |
| The George Bushotter Lakhota Text collection | Other_5 | |
| The IViE corpus | other_3 | |
| The New York Times Annotated Corpus | English | LDC2008T19 |
| TimeBank 1.2 | English | LDC2006T08 |
| Tipster | 1020 | |
| Translanguage English Database (TED) Speech | English | LDC2002S04 |
| Translanguage English Database (TED) Transcripts | English | LDC2002T03 |
| Treebank-2 | English | LDC95T7 |
| Treebank-3 | English | LDC99T42 |
| USC Marketplace Broadcast News Speech | English | LDC99S82 |
| USC Marketplace Broadcast News Transcripts | English | LDC99T36 |
| Uzbek Incident Language Pack | LDC2015E89 | |
| VAHA (POLYPHONE II) | Spanish | LDC96S41 |
| Voice of America (VOA) Czech Broadcast News Audio | Czech | LDC2000S89 |
| Voice of America (VOA) Czech Broadcast News Transcripts | Czech | LDC2000T53 |
| Voicemail Corpus Part II | English | LDC2002S35 |
| WordNet 1.5 | Other_6 | |
| Zurich BNC web | 1000.5 | |
| bilingual data extracted from three Creative Commons (CC BY-SA) sources | LDC2012E79 |
530 total corpora