All Corpora
To access the discs in the LDC library, contact Michael Ginn.
You need to have a verbs account to access the corpora that are on the verbs server.
Corpus Name | Language | Catalog ID |
---|---|---|
1996 English Broadcast News Speech (HUB4) | English | LDC97S44 |
1996 English Broadcast News Transcripts (HUB4) | English | LDC97T22 |
1996-2008 NIST Speaker Recognition Evaluation Data Collection | English | LDC2009E100 |
1997 English Broadcast News Transcripts (HUB4) | English | LDC98T28 |
1997 HUB4 Broadcast News Evaluation Non-English Test Material | Spanish, Mandarin Chinese | LDC2001S91 |
1997 HUB4 English Evaluation Speech and Transcripts | English | LDC2002S11 |
1997 HUB5 Arabic Evaluation | Egyptian Arabic | LDC2002S22 |
1997 HUB5 Arabic Transcripts | Egyptian Arabic | LDC2002T39 |
1997 HUB5 English Evaluation | English | LDC2002S23 |
1997 HUB5 German Evaluation | German | LDC2002S24 |
1997 HUB5 German Transcripts | German | LDC2003T03 |
1997 HUB5 Spanish Evaluation | Spanish | LDC2002S25 |
1997 HUB5 Spanish Transcripts | Spanish | LDC2003T04 |
1997 Mandarin Broadcast News Speech (HUB4-NE) | Mandarin Chinese | LDC98S73 |
1997 Spanish Broadcast News Transcripts (HUB4-NE) | Spanish | LDC98T29 |
1998 HUB4 Broadcast News Evaluation English Test Material | English | LDC2000S86 |
1998 HUB5 English Evaluation | English | LDC2002S10 |
1998 HUB5 English Transcripts | English | LDC2003T02 |
1999 HUB4 Broadcast News Evaluation English Test Material | English | LDC2000S88 |
2000 Communicator Evaluation | English | LDC2002S56 |
2000 HUB5 English Evaluation Speech | English | LDC2002S09 |
2000 HUB5 English Evaluation Transcripts | English | LDC2002T43 |
2000 NIST Speaker Recognition Evaluation | English | LDC2001S97 |
2001 Communicator Evaluation | English | LDC2003S01 |
2001 HUB5 English Evaluation | English | LDC2002S13 |
2001 HUB5 Mandarin Evaluation | Mandarin Chinese | LDC2002S12 |
2001 HUB5 Mandarin Transcripts | Mandarin Chinese | LDC2003T01 |
2001 NIST Speaker Recognition Evaluation Corpus | English | LDC2002S34 |
2002 Rich Transcription Broadcast News and Conversational Telephone Speech | English | LDC2004S11 |
2009 CoNLL Shared Task Part 1 | Catalan, Czech, German, Spanish | LDC2012T03 |
2009 CoNLL Shared Task Part 2 | English, Mandarin Chinese, Chinese | LDC2012T04 |
8 years worth of summary/article sets collected via Newsblaster | LDC2012E80 | |
ACE 2004 Evaluation Corpus | English, Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2004E51 |
ACE 2004 Multilingual Training Corpus | English, Standard Arabic, Mandarin Chinese | LDC2005T09 |
ACE 2004 Pilot Corpus V1.3 | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2004E03 |
ACE 2005 Multilingual Training Data V6.0 | English, Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2005E18 |
ACE-2 Version 1.0 | English | LDC2003T11 |
ACL Multilingual Corpus 1 | 1006 | |
AIDA 1.2 : Automatic Identification of Dialectal Arabic | Arabic | LDC2012E56 |
AQUAINT CrossLingual QA Arabic Newswire Corpus | Baharna Arabic, English, Arabic | LDC2004E49 |
ATIS3 Test Data | English | LDC95S26 |
ATIS3 Training Data | English | LDC94S19 |
Abstract Meaning Representation (AMR) Annotation Release 1.0 | English | LDC2014T12 |
American English Nickname Collection | English | LDC2012T11 |
American English Spoken Lexicon | English | LDC99L23 |
American National Corpus (ANC) Second Release | English | LDC2005T35 |
Annotated English Gigaword | English | LDC2012T21 |
Arabic Gigaword Third Edition | Standard Arabic | LDC2007T40 |
Arabic Newswire Part 1 | Standard Arabic | LDC2001T55 |
Arabic Treebank - Broadcast News v1.0 | Standard Arabic, Arabic | LDC2012T07 |
Arabic Treebank ARZ Part 1, V1.0 | Egyptian Arabic | LDC2012E28 |
Arabic Treebank Part 20 V1.0 - BOLT Pilot ARZ Email | Arabic | LDC2012E25 |
Arabic Treebank: Part 1 - 10K-word English Translation | Standard Arabic | LDC2003T07 |
Arabic Treebank: Part 1 v 2.0 | Standard Arabic | LDC2003T06 |
Arabic Treebank: Part 3 v 3.2 | Standard Arabic, Arabic | LDC2010T08 |
BBN Pronoun Coreference and Entity Type Corpus | English | LDC2005T33 |
BBN/LDC WebForum Selections Arabic/English Parallel Corpus | Arabic/English (Parallel) | LDC2012E75 |
BBN/LDC WebForum Selections Chinese/English Parallel Corpus | Chinese/English (Parallel) | LDC2012E76 |
BBN/LDC/Sakhr Arabic-Dialect/English Parallel Corpus | Sakhr Arabic-Dialect/English (Parallel) | LDC2012E17 |
BLLIP 1987-89 WSJ Corpus Release 1 | English | LDC2000T43 |
BOLT - Phase 1 Discussion Forums Source Data R1 V2 | English, Egyptian Arabic, Chinese | LDC2012E04 |
BOLT - Phase 1 Discussion Forums Source Data R2 | Chinese, Egyptian Arabic, English | LDC2012E16 |
BOLT - Phase 1 Discussion Forums Source Data R3 | Chinese, Egyptian Arabic, English | LDC2012E21 |
BOLT - Phase 1 Rejected Training Data Thread IDs | LDC2012E62 | |
BOLT - Phase 1 Translation Samples V2 | LDC2012E11 | |
BOLT LRL Hausa Representative Language Pack V1.2 | Hausa | LDC2015E70 |
BOLT LRL Turkish Representative Language Pack V2.2 | Turkish | LDC2014E115 |
BOLT LRL Uzbek Representative Language Pack | Uzbek | LDC2016E29 |
BOLT Phase 1 - Arabic Treebank ARZ Part 2, V1.0 | Egyptian Arabic | LDC2012E88 |
BOLT Phase 1 - Chinese Parallel Word Alignment and Tagging Part 3 | Chinese | LDC2012E95 |
BOLT Phase 1 - English Treebank BOLT WB Part 2, V 1.0 | English | LDC2012E97 |
BOLT Phase 1 Chinese Parallel Word Alignment and Tagging DF Part 4 | Chinese | LDC2013E02 |
BOLT Phase 1 Chinese Parallel Word Alignment and Tagging Part 1 | Chinese | LDC2012E24 |
BOLT Phase 1 Chinese Parallel Word Alignment and Tagging Part 2 | Chinese | LDC2012E72 |
BOLT Phase 1 Chinese Propbank DF Part 1 | Chinese | LDC2012E121 |
BOLT Phase 1 Chinese Propbank DF Part 2 | Chinese | LDC2012E131 |
BOLT Phase 1 Chinese Treebank DF Part 1 | Chinese | LDC2012E109 |
BOLT Phase 1 Chinese Treebank DF Part 2 | Chinese | LDC2012E120 |
BOLT Phase 1 Chinese Treebank DF Part 3 | Chinese | LDC2012E130 |
BOLT Phase 1 DevTest Source and Translation V4 | Arabic/Chinese/English | LDC2012E30 |
BOLT Phase 1 Egyptian Arabic Parallel Word Alignment DF | Egyptian Arabic | LDC2013E01 |
BOLT Phase 1 Egyptian Arabic Parallel Word Alignment DF Part 2 v2 | Egyptian Arabic | LDC2012E94 |
BOLT Phase 1 Egyptian Arabic Parallel Word Alignment Part 1 V2 | Egyptian Arabic | LDC2012E51 |
BOLT Phase 1 Egyptian Arabic Propbank DF Part 1 | Egyptian Arabic | LDC2012E122 |
BOLT Phase 1 Egyptian Arabic Propbank DF Part 2 | Egyptian Arabic | LDC2012E129 |
BOLT Phase 1 Egyptian Arabic Treebank DF Part 1 V2.0 | Egyptian Arabic | LDC2012E93 |
BOLT Phase 1 Egyptian Arabic Treebank DF Part 2 V2.0 | Egyptian Arabic | LDC2012E98 |
BOLT Phase 1 Egyptian Arabic Treebank DF Part 3 V2.0 | Egyptian Arabic | LDC2012E89 |
BOLT Phase 1 Egyptian Arabic Treebank DF Part 4 V2.0 | Egyptian Arabic | LDC2012E99 |
BOLT Phase 1 Egyptian Arabic Treebank DF Part 5 V2.0 | Egyptian Arabic | LDC2012E107 |
BOLT Phase 1 Egyptian Arabic Treebank DF Part 6 V2.0 | Egyptian Arabic | LDC2012E125 |
BOLT Phase 1 Egyptian Arabic Treebank DF Part 7 V1.0 | Egyptian Arabic | LDC2013E12 |
BOLT Phase 1 English Propbank DF Part 1 | English | LDC2012E123 |
BOLT Phase 1 English Propbank DF Part 2 | English | LDC2012E128 |
BOLT Phase 1 English Propbank DF Part 3 | English | LDC2013E05 |
BOLT Phase 1 English Treebank DF Part 1 V1.0 | English | LDC2012E92 |
BOLT Phase 1 English Treebank DF Part 3 V1.0 | English | LDC2012E114 |
BOLT Phase 1 English Treebank DF Part 4 V1.0 | English | LDC2013E17 |
BOLT Phase 1 HTER Experiment Source and Reference Translation | Chinese-English, Arabic-English | LDC2012E18 |
BOLT Phase 1 IR Eval Assessment Results V1.1 | LDC2012E118 | |
BOLT Phase 1 IR Eval Source Data Document List | LDC2012E82 | |
BOLT Phase 1 Translation Training Data R1 | Chinese-English, Arabic-English | LDC2012E15 |
BOLT Phase 1 Translation Training Data R2 | Chinese-English, Arabic-English | LDC2012E19 |
BOLT Phase 1 Translation Training Data R3 | Chinese-English, Arabic-English | LDC2012E55 |
BOLT Phase 1 Translation Training Data R4 | Chinese-English, Arabic-English | LDC2012E81 |
BOLT Phase 1 Translation Training Data R5 | Chinese-English, Arabic-English | LDC2012E96 |
BOLT Phase 1 Translation Training Data R6 | Chinese-English, Arabic-English | LDC2012E124 |
BOLT Phase 2 English Treebank SMS/Chat Part 1 | English | LDC2013E127 |
BOLT Phase 2 IR Source Data Document List and Sample Query | English | LDC2013E08 |
BOLT Phase 2 SMS and Chat Sample Source Data | Chinese, English, Egyptian Arabic | LDC2013E10 |
Boston University Radio Speech Corpus | English | LDC96S36 |
Boulder Coercion Corpus | Other_8 | |
British National Corpus Parses and BNC | British English | 1000 |
Brown Corpus (treebanked) | Standard American English | Other_7 |
Buckwalter Arabic Morphological Analyzer | Standard Arabic, English | LDC2004L02 |
CALIMA 0.3: Columbia Arabic Language Morphological Analyzer -- Egyptian Arabic | Egyptian Arabic | LDC2012E57 |
CALLFRIEND American English-Non-Southern Dialect | English | LDC96S46 |
CALLFRIEND American English-Southern Dialect | Southern American English | LDC96S47 |
CALLFRIEND Canadian French | Canadian French | LDC96S48 |
CALLFRIEND Farsi | Farsi, Persian | LDC96S50 |
CALLFRIEND German | German | LDC96S51 |
CALLFRIEND Hindi | Hindi | LDC96S52 |
CALLFRIEND Japanese | Japanese | LDC96S53 |
CALLFRIEND Korean | Korean | LDC96S54 |
CALLFRIEND Mandarin Chinese-Mainland Dialect | Mandarin Chinese-Mainland Dialect | LDC96S55 |
CALLFRIEND Mandarin Chinese-Taiwan Dialect | Mandarin Chinese-Taiwan Dialect | LDC96S56 |
CALLFRIEND Spanish-Caribbean Dialect | Spanish | LDC96S57 |
CALLFRIEND Spanish-Caribbean Dialect | Spanish | LDC96S57 |
CALLFRIEND Tamil | Tamil | LDC96S59 |
CALLFRIEND Vietnamese | Vietnamese | LDC96S60 |
CALLHOME American English Lexicon (PRONLEX) | American English | LDC97L20 |
CALLHOME American English Speech | American English | LDC97S42 |
CALLHOME American English Transcripts | American English | LDC97T14 |
CALLHOME Egyptian Arabic Speech Supplement | Egyptian Arabic | LDC2002S37 |
CALLHOME Egyptian Arabic Transcripts | Egyptian Arabic | LDC97T19 |
CALLHOME Egyptian Arabic Transcripts Supplement | Egyptian Arabic | LDC2002T38 |
CALLHOME German Lexicon | German | LDC97L18 |
CALLHOME German Speech | German | LDC97S43 |
CALLHOME German Transcripts | German | LDC97T15 |
CALLHOME Japanese Lexicon | Japanese | LDC96L17 |
CALLHOME Japanese Speech | Japanese | LDC96S37 |
CALLHOME Japanese Transcripts | Japanese | LDC96T18 |
CALLHOME Mandarin Chinese Lexicon | Mandarin Chinese | LDC96L15 |
CALLHOME Mandarin Chinese Speech | Mandarin Chinese | LDC96S34 |
CALLHOME Mandarin Chinese Transcripts | Mandarin Chinese | LDC96T16 |
CALLHOME Spanish Dialogue Act Annotation | Spanish | LDC2001T61 |
CALLHOME Spanish Lexicon | Spanish | LDC96L16 |
CALLHOME Spanish Speech | Spanish | LDC96S35 |
CALLHOME Spanish Transcripts | Spanish | LDC96T17 |
CELEX2 | English, German, Dutch | LDC96L14 |
CETEMpublico | Portuguese | LDC2001T62 |
CODAFY 0.1: Automatic mapper into the Conventional Orthography of Dialectal Arabic | Dialectal Arabic | LDC2012E58 |
COMLEX English Syntax Lexicon | English | LDC96L6 |
COMLEX Pronouncing Dictionary | English | LDC96L7 |
COMLEX Syntax Text Corpus Version 2.0 | English | LDC96T11 |
CSLU: Kids` Speech Version 1.1 | English | LDC2007S18 |
CSLU: Spelled and Spoken Words | English | LDC2006S15 |
CSLU: Spoltech Brazilian Portuguese Version 1.0 | Brazilian Portuguese | LDC2006S16 |
CSLU: Stories v 1.2 | English | LDC2006S14 |
CSR-I (WSJ0) Complete | English | LDC93S6A |
CSR-IV HUB4 | English | LDC96S31 |
Childes Corpus 1996 | 1001 | |
Childes Corpus 1998 | 1002 | |
Chinese <-> English Name Entity Lists v 1.0 | Mandarin Chinese-English | LDC2005T34 |
Chinese English News Magazine Parallel Text | Chinese-English (Parallel) | LDC2005T10 |
Chinese Gigaword | Mandarin Chinese | LDC2003T09 |
Chinese Gigaword Fifth Edition | Mandarin Chinese | LDC2011T13 |
Chinese Gigaword Second Edition | Mandarin Chinese | LDC2005T14 |
Chinese Proposition Bank 2.0 | Mandarin Chinese | LDC2008T07 |
Chinese Treebank 2.0 | Mandarin Chinese | LDC2001T11 |
Chinese Treebank 4.0 | Mandarin Chinese | LDC2004T05 |
Chinese Treebank 5.0 | Mandarin Chinese | LDC2005T01 |
Chinese Treebank 5.1 | Mandarin Chinese | LDC2005T01U01 |
Chinese Treebank 6.0 | Mandarin Chinese | LDC2007T36 |
Chinese Treebank 7.0 | Mandarin Chinese | LDC2010T07 |
Chinese Treebank 8.0 | Mandarin Chinese, Chinese | LDC2013T21 |
Chinese Treebank Final Release | Mandarin Chinese | LDC2000T48 |
Chinese idiom translation dictionary + word segmenter dictionary - web resources | Chinese | LDC2012E78 |
Chinese-English Translation Lexicon Version 3.0 | English-Mandarin Chinese | LDC2002L27 |
CoNNL 2008 Shared Task Development Set | English | LDC2008E33 |
CoNNL 2008 Shared Task Test Set | English | LDC2008E34 |
CoNNL 2008 Shared Task Training Set | English | LDC2008E32 |
CoNNL 2008 Shared Task Trial Data Set | English | LDC2008E31 |
CoNNL 2009 Shared Task Chinese Test Set | Chinese | LDC2009E37 |
CoNNL 2009 Shared Task Chinese Training Set | Chinese | LDC2009E38 |
CoNNL 2009 Shared Task Chinese Trial Data Set | Chinese | LDC2009E36D |
CoRD | The London-Lund Corpus of Spoken English | English | other_1234 |
Corpus Search | 1003 | |
DEFT ERE Cross-Doc Event Coreference Training Data Annotation | LDC2017E24 | |
DEFT ERE English Discussion Forum Annotation V3 | English | LDC2014E31 |
DEFT English Belief and Sentiment Annotation | English | LDC2016E27 |
DEFT Event Sequencing After-Link And Parent-Child Annotation Training Data | English | LDC2016E130 |
DEFT Event Sequencing Pilot Evaluation Source Data | English | LDC2017E08 |
DEFT Phase 1 AMR Annotation R4 | English | LDC2014E41 |
DEFT Phase 1 ERE Annotation R3 V2 | English | LDC2013E64 |
DEFT Phase 1 Narrative Text Source Data R1 | English | LDC2013E19 |
DEFT Phase 2 AMR Annotation R1 | English | LDC2015E86 |
DEFT Phase 2 AMR Annotation R2 | English | LDC2016E25 |
DEFT Phase 2 AMR Exploratory Source Data | English | LDC2014R46 |
DEFT Phase 2 AMR Selected Segmented DF Source Data V2.0 | English | LDC2015R11 |
DEFT Rich ERE English Training Annotation R2 V2 | English | LDC2015E68 |
DSO Corpus of Sense-Tagged English | English | LDC97T12 |
ECI Multilingual Text | Turkish, Swedish, Slovenian, Russian, Portuguese, Norwegian, Norwegian Bokmål, Norwegian Nynorsk, Lithuanian, Latin, Japanese, Scottish Gaelic, French, Estonian, English, Modern Greek (1453-), German, Danish, Bulgarian, Tosk Albanian, Standard Malay, Spanish, Serbian, Northern Uzbek, Mandarin Chinese, Italian, Dutch, Czech, Croatian, Albanian | LDC94T5 |
Emotional Prosody Speech and Transcripts | English | LDC2002S28 |
English Gigaword | English | LDC2003T05 |
English Gigaword Fifth Edition | English | LDC2011T07 |
English Gigaword Second Edition | English | LDC2005T12 |
English News Text Treebank: Penn Treebank Revised | English | LDC2015T13 |
English Translation Treebank: An-Nahar Newswire | English | LDC2012T02 |
English Web Treebank | English | LDC2012T13 |
Entropic Speech Technology | 1005 | |
European Language Newspaper Text | Portuguese, French, German | LDC95T11 |
FactBank 1.0 | English | LDC2009T23 |
Fisher English Training Part 2, Speech | English | LDC2005S13 |
Fisher English Training Part 2, Transcripts | English | LDC2005T19 |
Fisher English Training Speech Part 1 Speech | English | LDC2004S13 |
Fisher English Training Speech Part 1 Transcripts | English | LDC2004T19 |
GALE Arabic-English Parallel Aligned Treebank -- Newswire | Arabic-English (Parallel) | LDC2013T10 |
GALE Kickoff Release - Arabic Names Extracted from ACE V1.0 | Arabic | LDC2005E66 |
GALE Kickoff Release - Arabic Names Extracted from ATB V1.0 | Arabic | LDC2005E68 |
GALE Kickoff Release - Broadcast Conversation Audio V1.0 | Baharna Arabic, Chinese, Arabic | LDC2005E61 |
GALE Kickoff Release - Broadcast Conversation Transcripts V1.0 | Baharna Arabic, Chinook jargon, Chinese, Arabic | LDC2005E63 |
GALE Kickoff Release - Broadcast News Audio V1.0 | Arabic, Chinese | LDC2005E62 |
GALE Kickoff Release - English-Arabic Parallel Treebank V1.0 | English-Arabic (Parallel) | LDC2005E69 |
GALE Kickoff Release - VOA Arabic Broadcast News Audio | Arabic | LDC2005E60 |
GALE Kickoff Release - VOA Arabic Broadcast News Transcripts | Arabic | LDC2005E71 |
GALE Kickoff Release 2 - English CTS Treebank with Structural Metadata | English | LDC2005E79 |
GALE Kickoff Release 2 -- Levantine Arabic CTS Audio | South Levantine Arabic, North Levantine Arabic | LDC2005E76 |
GALE Kickoff Release 2 -- Levantine Arabic CTS Transcripts | South Levantine Arabic, North Levantine Arabic | LDC2005E77 |
GALE Kickoff Release 2 -- Levantine Arabic CTS Treebank | South Levantine Arabic, North Levantine Arabic | LDC2005E78 |
GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1 | English, Mandarin Chinese (Parallel) | LDC2009T02 |
GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2 | English, Mandarin Chinese (Parallel) | LDC2009T06 |
GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1 | English, Mandarin Chinese (Parallel) | LDC2007T23 |
GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2 | English, Mandarin Chinese (Parallel) | LDC2008T08 |
GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3 | English, Mandarin Chinese (Parallel) | LDC2008T18 |
GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1 | English-Mandarin Chinese (Parallel) | LDC2009T15 |
GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2 | English-Mandarin Chinese (Parallel) | LDC2010T03 |
GALE Phase 2 Distillation - Training V5.0 | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2007E13 |
GALE Phase 2 Release 1 - Transcripts | Chinook jargon, Baharna Arabic, English, Chinese, Arabic | LDC2007E05 |
GALE Phase 2 Release 1 - Translations | English, Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2007E06 |
GALE Phase 2 Release 1 - Web Text | Arabic, Chinese, English | LDC2007E04 |
GALE Phase 2 Release 2 - Transcripts | Baharna Arabic, Chinook jargon, English, Chinese, Arabic | LDC2007E45 |
GALE Phase 2 Release 2 - Translations | Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2007E46 |
GALE Phase 2 Release 3 - Transcripts | Baharna Arabic, Chinook jargon, Chinese, Arabic | LDC2007E86 |
GALE Phase 2 Release 3 - Translations | Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2007E87 |
GALE Phase 3 - MTPlus Pilot | LDC2008E42 | |
GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 1 | Mandarin Chinese, Chinese | LDC2014T28 |
GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 2 | Mandarin Chinese, Chinese | LDC2015T09 |
GALE Phase 3 DevTest - Broadcast Audio | LDC2007E60 | |
GALE Phase 3 Release 1 - Distillation V1.1 | English, Chinese, Arabic | LDC2007E104 |
GALE Phase 3 Release 1 - English Translation Treebank | English, Baharna Arabic, Arabic | LDC2007E105 |
GALE Phase 3 Release 1 - Found Parallel Text | English, Chinese, Arabic (Parallel) | LDC2007E103 |
GALE Phase 3 Release 1 - Transcripts | English, Chinese, Arabic | LDC2007E100 |
GALE Phase 3 Release 1 - Translations | Arabic, Chinese, English | LDC2007E101 |
GALE Phase 3 Release 1 - Web Text V 1.0 | English, Chinook jargon, Baharna Arabic, Arabic, Chinese | LDC2007E102 |
GALE Phase 3 Release 2 - Broadcast Audio | English, Chinese, Arabic | LDC2008E38 |
GALE Phase 3 Release 2 - Transcripts | LDC2008E39 | |
GALE Phase 3 Release 2 - Translations | LDC2008E40 | |
GALE Phase 3 Release 2 - Web Text | LDC2008E41 | |
GALE Phase 3 and 4 Eval Superset | Arabic, Chinese | LDC2011E50 |
GALE Phase 4 Arabic Parallel Aligned Treebank Part 1 V1.2 | Arabic-English (Parallel) | LDC2009E82 |
GALE Phase 4 Chinese Parallel Word Alignment and Tagging Part 1 V1.1 | Chinese-English (Parallel) | LDC2009E83 |
GALE Phase 4 Release 1 - Transcripts V1.0 | English | LDC2008E55 |
GALE Phase 4 Release 1 - Translations V2.0 | Arabic and Chinese - English (Parallel) | LDC2008E56 |
GALE Phase 4 Release 1 - Web Text V1.0 | LDC2008E53 | |
GALE Phase 4 Release 2 - Transcripts | Arabic, Chinese, English | LDC2009E15 |
GALE Phase 4 Release 2 - Translations | Arabic and Chinese - English (Parallel) | LDC2009E16 |
GALE Phase 4 Release 2 - Web Text | Arabic, Chinese, English | LDC2009E14 |
GALE Phase 4 Release 3 - Found Parallel Text | Arabic-English, Chinese-English (Parallel) | LDC2009E105 |
GALE Phase 4 Release 3 - Transcripts | Arabic, Chinese, English | LDC2009E94 |
GALE Phase 4 Release 3 - Translations V1.2 | Arabic and Chinese - English (Parallel) | LDC2009E95 |
GALE Phase 4 Release 3 - Web Text | Arabic, Chinese, English | LDC2009E93 |
GALE Phase 5 Eval Source Transcripts and Translation | Arabic, Chinese | LDC2011E21 |
GALE Phase 5 Eval Superset Source Transcripts and Translation | Arabic, Chinese | LDC2011E25 |
GALE Phase 5 Levantine Arabic Dialect Judgments and Translations | Levantine Arabic-English (Parallel) | LDC2010E79 |
GALE Y1 - Arabic English Parallel News Text | English, Baharna Arabic, Arabic (Parallel) | LDC2006E25 |
GALE Y1 - BBN Iraqi Broadcast Conversation Corpus | Iraqi Arabic | LDC2006G07 |
GALE Y1 - Distillation Blind Evaluation Audio Part A | English | LDC2006E46_A |
GALE Y1 - Distillation Blind Evaluation Audio Part B | English | LDC2006E46_B |
GALE Y1 - Distillation Blind Evaluation Audio Part C | English | LDC2006E46_C |
GALE Y1 - Distillation Blind Evaluation Audio Part D | English | LDC2006E46_D |
GALE Y1 - Distillation Blind Evaluation Audio Part E | English | LDC2006E46_E |
GALE Y1 - Distillation Blind Evaluation Newswire | English | LDC2006E45 |
GALE Y1 - Distillation Evaluation Audio | English | LDC2006E21 |
GALE Y1 - Distillation Evaluation Newswire | Baharna Arabic, Chinook jargon, English, Chinese, Arabic | LDC2006E22 |
GALE Y1 - English Chinese Parallel Financial News | Chinook jargon, English, Chinese (Parallel) | LDC2006E26 |
GALE Y1 - Interim Release: Transcripts | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2006E23 |
GALE Y1 - Interim Release: Translations | Chinook jargon, Baharna Arabic, Chinese, Arabic - English (Parallel) | LDC2006E24 |
GALE Y1 - Web 1T 5-gram Version 1 | English | LDC2006E88 |
GALE Y1 Q1 Release - Arabic Treebank v 1.0 | Arabic | LDC2005E84 |
GALE Y1 Q1 Release - English Translation Treebank v 1.0 | Arabic-English (Parallel) | LDC2005E85 |
GALE Y1 Q1 Release - Transcripts V1.0 | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2005E82 |
GALE Y1 Q1 Release - Translations V1.0 | Arabic and Chinese - English (Parallel) | LDC2005E83 |
GALE Y1 Q1 Release - Web Text Collection V1.0 | Chinese, Arabic, English | LDC2005E81 |
GALE Y1 Q2 Release - Arabic Treebank v 1.0 | Arabic | LDC2006E35 |
GALE Y1 Q2 Release - English Translation Treebank v 1.0 | Arabic-English (Parallel) | LDC2006E36 |
GALE Y1 Q2 Release - Transcripts V1.0 | Baharna Arabic, Chinook jargon, English, Arabic, Chinese | LDC2006E33 |
GALE Y1 Q2 Release - Translations V2.0 | Baharna Arabic, Chinook jargon, Arabic, Chinese; into English | LDC2006E34 |
GALE Y1 Q2 Release - Web Text Collection V1.0 | Arabic, Chinese, English | LDC2006E32 |
GALE Y1 Q3 Release - Arabic Treebank | Arabic | LDC2006E87 |
GALE Y1 Q3 Release - English Translation Treebank | Arabic-English (Parallel) | LDC2006E82 |
GALE Y1 Q3 Release - Transcripts | English, Chinook jargon, Baharna Arabic, Arabic, Chinese | LDC2006E84 |
GALE Y1 Q3 Release - Translations | Baharna Arabic, Chinook jargon, Arabic, Chinese; into English | LDC2006E85 |
GALE Y1 Q3 Release - Web Text Collection | LDC2006E77 | |
GALE Y1 Q3 Release - Word Alignment | Baharna Arabic, Chinook jargon, Arabic, Chinese; into English | LDC2006E86 |
GALE Y1 Q4 Release - Arabic Treebank | Arabic | LDC2006E94 |
GALE Y1 Q4 Release - English Translation Treebank | Arabic-English (Parallel) | LDC2006E95 |
GALE Y1 Q4 Release - Transcripts | English, Chinook jargon, Baharna Arabic, Chinese, Arabic | LDC2006E91 |
GALE Y1 Q4 Release - Translations | Arabic and Chinese - English (Parallel) | LDC2006E92 |
GALE Y1 Q4 Release - Web Text Collection | LDC2006E90 | |
GALE Y1 Q4 Release - Word Alignment | Arabic, Chinese, English (Parallel) | LDC2006E93 |
Gigaword English Automatic Parses | Other_9 | |
Google Question Bank Update-v1.0 | English | LDC2012R121 |
Google Treebank Weblog Subcorpus V2.0 | English | LDC2011E71 |
Grassfields Bantu Fieldwork: Ngomba Tone Paradigms | Ngomba | LDC2001S16 |
HUB4 Radio Broadcast News | 1014 | |
HUB5 Spanish Telephone Speech Corpus | Spanish | LDC98S70 |
Hansard French/English | English - Canadian French (Parallel) | LDC95T20 |
Hong Kong Hansards Parallel Text | English, Chinese (Parallel) | LDC2000T50 |
Hong Kong Laws Parallel Text | English, Chinese | LDC2000T47 |
Hong Kong News Parallel Text | English, Chinese (Parallel) | LDC2000T46 |
Hong Kong Parallel Text | English, Chinese (Parallel) | LDC2004T08 |
ICSI Meeting Speech | English | LDC2004S02 |
ICSI Meeting Transcripts | English | LDC2004T04 |
ISCA 1 and 3 | 1007 | |
ISCA Tutorial | 1008 | |
ISL Meeting Speech Part 1 | English | LDC2004S05 |
ISL Meeting Transcripts Part 1 | English | LDC2004T10 |
JURIS | English | LDC98T32 |
Japanese Business News Text | Japanese | LDC95T8 |
Japanese Business News Text Supplement | Japanese | LDC99T34 |
Korean English Treebank Annotations | Korean, English (Parallel) | LDC2002T26 |
Korean Newswire | Korean | LDC2000T45 |
Korean Propbank | Korean | LDC2006T03 |
Korean Telephone Conversations Lexicon | Korean | LDC2003L02 |
Korean Telephone Conversations Speech | Korean | LDC2003S03 |
Korean Telephone Conversations Transcripts | Korean | LDC2003T08 |
Korean Treebank Annotations Version 2.0 | Korean | LDC2006T09 |
LCTL Urdu | Urdu | LDC2006E110 |
LLHDB | English | LDC98S68 |
LORELEI Akan Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Akan | LDC2018E07 |
LORELEI Amharic Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Amharic | LDC2016E87 |
LORELEI Arabic Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Arabic | LDC2016E89 |
LORELEI Bengali Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Bengali | LDC2017E60 |
LORELEI Farsi Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Farsi | LDC2016E93 |
LORELEI Hindi Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Hindi | LDC2017E62 |
LORELEI Hungarian Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Hungarian | LDC2016E98 |
LORELEI Indonesian Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1 | Indonesian | LDC2017E66 |
LORELEI Language Independent NLP Tools | LDC2016E53 | |
LORELEI Mandarin Incident Language Pack V2 | Mandarin Chinese | LDC2016E30 |
LORELEI Mandarin Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Mandarin Chinese | LDC2016E101 |
LORELEI Russian Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Russian | LDC2016E95 |
LORELEI Situation Frame Exercise Annotation | English | LDC2017E07 |
LORELEI Somali Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Somali | LDC2016E91 |
LORELEI Spanish Representative Language Pack Translation, Annotation, Grammar, Lexicon and Tools V1. | Spanish | LDC2016E97 |
LORELEI Swahili Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Swahili | LDC2017E64 |
LORELEI Tagalog Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Tagalog | LDC2017E68 |
LORELEI Tamil Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Tamil | LDC2017E70 |
LORELEI Thai Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Thai | LDC2018E03 |
LORELEI Vietnamese Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1 | Vietnamese | LDC2016E103 |
LORELEI Wolof Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Wolof | LDC2018E09 |
LORELEI Year 1 Dry Run Evaluation IL2 V1.1 | English | LDC2016E56 |
LORELEI Yoruba Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Yoruba | LDC2016E105 |
LORELEI Zulu Representative Language Pack Translation Annotation Grammar Lexicon and Tools V1.0 | Zulu | LDC2018E05 |
Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) | North Levantine Arabic, South Levantine Arabic | LDC2005S14 |
MADA-ARZ 0.1: Morphological Analysis and Disambiguation for Arabic (Egyptian version) | Egyptian Arabic | LDC2012E60 |
MRC Psycholinguistic Database Machine Usable Dictionary | other_4 | |
Mandarin Chinese News Text | Mandarin Chinese | LDC95T13 |
Matlab | 1009 | |
Message Understanding Conference (MUC) 6 | English | LDC2003T13 |
Message Understanding Conference (MUC) 7 | English | LDC2001T02 |
Multiple-Translation Arabic (MTA) Part 1 | English, Standard Arabic (Parallel) | LDC2003T18 |
Multiple-Translation Arabic (MTA) Part 2 | English, Standard Arabic (Parallel) | LDC2005T05 |
Multiple-Translation Chinese (MTC) Part 2 | English, Mandarin Chinese (Parallel) | LDC2003T17 |
Multiple-Translation Chinese (MTC) Part 3 | English, Mandarin Chinese (Parallel) | LDC2004T07 |
Multiple-Translation Chinese (MTC) Part 4 | English, Mandarin Chinese (Parallel) | LDC2006T04 |
Multiple-Translation Chinese Corpus | English, Mandarin Chinese (Parallel) | LDC2002T01 |
NIST 2009 Open Machine Translation (OpenMT) Evaluation | Urdu and Arabic - English (Parallel) | LDC2010T23 |
NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source | Dari, Korean, Persian, Farsi, English, Mandarin Chinese, Arabic, Iranian Persian, Chinese (Parallel) | LDC2014T02 |
NIST Meeting Pilot Corpus Speech | English | LDC2004S09 |
NIST Meeting Pilot Corpus Transcripts and Metadata | English | LDC2004T13 |
NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations | Urdu, Mandarin Chinese, Standard Arabic, English, Chinese, Arabic (Parallel) | LDC2010T01 |
NLTK | 1010 | |
NTIMIT | 1011 | |
NomBank v 1.0 | English | LDC2008T23 |
North American News Text Corpus | English | LDC95T21 |
OntoNotes Release 5.0 | English, Mandarin Chinese, Arabic, Chinese | LDC2013T19 |
OntoNotes V3.0 - GALE Pre-Release | English | LDC2009E60 |
Original Penn Treebank release 2 | 1012 | |
Penn Discourse Treebank Version 2.0 | English | LDC2008T05 |
Penn Treebank release 3 | 1013 | |
Portuguese Newswire Text | Portuguese | LDC99T40 |
Prague Dependency Treebank 1.0 | Czech, English (Parallel) | LDC2001T10 |
PropBank frameset files (v1.7) | other_10 | |
PropBank on the Brown corpus | other_11 | |
Proposition Bank I | English | LDC2004T14 |
REFLEX Bengali | LDC2015E13 | |
REFLEX Hungarian | LDC2015E82 | |
REFLEX Tagalog | LDC2015E90 | |
REFLEX Tamil | LDC2015E83 | |
REFLEX Thai | LDC2015E84 | |
REFLEX Urdu | LDC2015E14 | |
REFLEX Yoruba | LDC2015E91 | |
RST Discourse Treebank | English | LDC2002T07 |
Reuters vol 1 | English | 1015 |
Reuters vol. 2 | English | 1016 |
SAID | English | LDC2003T10 |
SANCL 2012 Shared Task Release 1 | English | LDC2012E43 |
SIGHAN Bakeoff | LDC2003E16 | |
SUSAS | English | LDC99S78 |
SUSAS Transcripts | English | LDC99T33 |
Santa Barbara Corpus of Spoken American English Part I | American English | LDC2000S85 |
Santa Barbara Corpus of Spoken American English Part II | American English | LDC2003S06 |
Santa Barbara Corpus of Spoken American English Part III | Amrican English | LDC2004S10 |
Santa Barbara Corpus of Spoken American English Part IV | American English | LDC2005S25 |
SemEval-2016 Task 8 - Meaning Representation Parsing - Gold Standard AMRs | English | LDC2016E33 |
Spanish Discussion Forum Source Data R1 | Spanish | LDC2014E14 |
Spanish Language News Corpus | Spanish | 1017 |
Spanish Newswire Text, Volume 2 | Spanish | LDC99T41 |
Speech in Noisy Environments (SPINE) Evaluation Audio | English | LDC2000S96 |
Speech in Noisy Environments (SPINE) Evaluation Transcripts | English | LDC2000T54 |
Speech in Noisy Environments (SPINE) Training Audio | English | LDC2000S87 |
Speech in Noisy Environments (SPINE) Training Transcripts | English | LDC2000T49 |
Speech in Noisy Environments (SPINE2) Part 1 Audio | English | LDC2001S04 |
Speech in Noisy Environments (SPINE2) Part 1 Transcripts | English | LDC2001T05 |
Speech in Noisy Environments (SPINE2) Part 2 Audio | English | LDC2001S06 |
Speech in Noisy Environments (SPINE2) Part 2 Transcripts | English | LDC2001T07 |
Speech in Noisy Environments (SPINE2) Part 3 Audio | English | LDC2001S08 |
Speech in Noisy Environments (SPINE2) Part 3 Transcripts | English | LDC2001T09 |
Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio | English | LDC2001S99 |
Switchboard Cellular Part 1 Transcription | English | LDC2001T14 |
Switchboard-1 Release 2 | English | LDC97S62 |
Switchboard-2 Phase I | English | LDC98S75 |
Switchboard-2 Phase II | English | LDC99S79 |
Switchboard-2 Phase III Audio | English | LDC2002S06 |
Syllable-Final /s/ Lenition | Spanish | LDC2001T60 |
TAC 2009 KBP Assessment Results | English | LDC2009E90 |
TAC 2009 KBP Evaluation Generic Infoboxes V2.0 | English | LDC2009E56 |
TAC 2009 KBP Evaluation NIL Link Assessment | English | LDC2009E110 |
TAC 2009 KBP Evaluation Reference Knowledge Base | English | LDC2009E58A |
TAC 2009 KBP Evaluation Reference Knowledge Base | English | LDC2009E58C |
TAC 2009 KBP Evaluation Reference Knowledge Base | English | LDC2009E58B |
TAC 2009 KBP Evaluation Slot Filling List | English | LDC2009E65 |
TAC 2010 KBP Assessment Results | English | LDC2010E61 |
TAC 2010 KBP Entity Linking IAA Study Results | English | LDC2012E31 |
TAC 2010 KBP Evaluation Entity Linking Gold Standard V1.0 | English | LDC2010E82 |
TAC 2010 KBP Evaluation Slot Filling Annotation | English | LDC2012E32 |
TAC 2010 KBP Evaluation Surprise Slot Filling Annotation | English | LDC2012E33 |
TAC 2010 KBP Generic Infoboxes | English | LDC2010E24 |
TAC 2010 KBP Source Data | LDC2010E12 | |
TAC 2010 KBP Training Entity Linking V2.0 | English | LDC2010E31 |
TAC 2010 KBP Training Slot Filling Annotation V2.1 | English | LDC2010E18 |
TAC 2010 RTE-6 KBP Validation Pilot Development Data | English | LDC2010E32 |
TAC 2011 Guided Summarization Test Data | English | LDC2011E28 |
TAC 2011 Guided Summarization Test Data V1.1 | English | LDC2011E62 |
TAC 2011 KBP English Evaluation Diagnostic Temporal Slot Filling Queries | English | LDC2011E85 |
TAC 2011 KBP English Evaluation Entity Linking Annotation | English | LDC2012E29 |
TAC 2011 KBP English Evaluation Entity Linking Queries | English | LDC2012E36 |
TAC 2011 KBP English Evaluation Regular Slot Filling Annotation V1.2 | English | LDC2011E89 |
TAC 2011 KBP English Evaluation Regular Slot Filling Queries | English | LDC2012E37 |
TAC 2011 KBP English Evaluation Temporal Slot Filling Annotation | English | LDC2012E38 |
TAC 2011 KBP English Evaluation Temporal Slot Filling Queries | English | LDC2012E39 |
TAC 2011 KBP English Regular Slot Filling Assessment Results | English | LDC2011E88 |
TAC 2011 KBP English Sample Temporal Slot Filling Annotation V1.2 | English | LDC2011E47 |
TAC 2011 KBP English Temporal Slot Filling Assessment Results | English | LDC2013E65 |
TAC 2011 KBP English Training Regular Slot Filling Annotation | English | LDC2011E48 |
TAC 2011 KBP English Training Temporal Slot Filling Annotation | English | LDC2011E49 |
TAC 2011 RTE-7 KBP Validation Development Data | English | LDC2011E29 |
TAC 2011 RTE-7 KBP Validation Test Data | English | LDC2011E30 |
TAC 2012 KBP English Regular Slot Filling Evaluation Annotations | English | LDC2012E91 |
TAC 2013 KBP English Entity Linking Evaluation Queries and Knowledge Base Links V1.1 | English | LDC2013E90 |
TAC 2013 KBP English Regular Slot Filling Assessment Results | English | LDC2013E91 |
TAC 2013 KBP English Regular Slot Filling Evaluation Queries and Annotations V1.1 | English | LDC2013E77 |
TAC 2013 KBP English Regular Slot Filling per:title Training Data | English | LDC2013E60 |
TAC 2013 KBP English Temporal Slot Filling Assessment Results | English | LDC2013E99 |
TAC 2013 KBP English Temporal Slot Filling Evaluation Queries and Annotations V1.1 | English | LDC2013E86 |
TAC 2013 KBP English Temporal Slot Filling Training Queries and Annotations | English | LDC2013E82 |
TAC 2013 KBP Source Corpus | LDC2013E45 | |
TAC 2014 KBP English Entity Linking Training AMR Queries and KB Links V1.1 | English | LDC2014E15 |
TAC 2014 KBP English Event Argument Extraction Evaluation Assessment Results V2.0 | English | LDC2014E88 |
TAC 2014 KBP English Event Argument Extraction Evaluation Source Corpus V1.1 | English | LDC2014R43 |
TAC 2014 KBP English Source Corpus | English | LDC2014E13 |
TAC 2014 KBP Event Argument Extraction Pilot Assessment Results V1.1 | English | LDC2014E40 |
TAC 2014 KBP Event Argument Extraction Pilot Source Corpus V1.1 | English | LDC2014E20 |
TAC KBP 2009 Evaluation Entity Linking List | English | LDC2009E64 |
TAC KBP 2016 Belief and Sentiment Evaluation Gold Standard Annotation (Versions 1 and 2) | English | LDC2016E114 |
TAC KBP Evaluation Surprise Slot Filling Queries | English | LDC2010E53 |
TAC KBP Gold Standard Entity Linking Entity Type List | English | LDC2009E86 |
TAC KBP Training Surprise Slot Filling Annotation | English | LDC2010E52 |
TDT2 Careful Transcription Audio | English | LDC2000S92 |
TDT2 Careful Transcription Text | English | LDC2000T44 |
TDT2 English Text | English | LDC99T35 |
TDT2 Mandarin Audio Corpus | Mandarin Chinese | LDC2001S93 |
TDT2 Multilanguage Text Version 4.0 | English, Mandarin Chinese | LDC2001T57 |
TDT2 Text Data and Tables | 1019 | |
TDT3 Multilanguage Text Version 2.0 | English, Mandarin Chinese | LDC2001T58 |
TERN 2004 Training Data V1.3 | LDC2004E23 | |
TI 46-Word | English | LDC93S9 |
TI 46-word | 1004 | |
TIDES Extraction ACE 2004 Training Data V1.4 | LDC2004E17 | |
TIMIT Acoustic-Phonetic Continuous Speech Corpus | English | LDC93S1 |
TIPSTER Complete | English | LDC93T3A |
TREC Mandarin | Mandarin Chinese | LDC2000T52 |
TREC Spanish | Spanish | LDC2000T51 |
Tactical Speaker Identification Speech Corpus (TSID) | English | LDC99S83 |
Taiwanese Putonghua | Taiwanese Mandarin | LDC98S72 |
Talkbank Switchboard corpus | 1018 | |
The 2012 IBM Egyptian Arabic Corpus | Egyptian Arabic | LDC2012E77 |
The AQUAINT Corpus of English News Text | English | LDC2002T31 |
The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English | Old English | other_1 |
The Enron Sent Corpus v1.0 | other_2 | |
The George Bushotter Lakhota Text collection | Other_5 | |
The IViE corpus | other_3 | |
The New York Times Annotated Corpus | English | LDC2008T19 |
TimeBank 1.2 | English | LDC2006T08 |
Tipster | 1020 | |
Translanguage English Database (TED) Speech | English | LDC2002S04 |
Translanguage English Database (TED) Transcripts | English | LDC2002T03 |
Treebank-2 | English | LDC95T7 |
Treebank-3 | English | LDC99T42 |
USC Marketplace Broadcast News Speech | English | LDC99S82 |
USC Marketplace Broadcast News Transcripts | English | LDC99T36 |
Uzbek Incident Language Pack | LDC2015E89 | |
VAHA (POLYPHONE II) | Spanish | LDC96S41 |
Voice of America (VOA) Czech Broadcast News Audio | Czech | LDC2000S89 |
Voice of America (VOA) Czech Broadcast News Transcripts | Czech | LDC2000T53 |
Voicemail Corpus Part II | English | LDC2002S35 |
WordNet 1.5 | Other_6 | |
Zurich BNC web | 1000.5 | |
bilingual data extracted from three Creative Commons (CC BY-SA) sources | LDC2012E79 |
530 total corpora