CHINESE CLASSICS CONCORDANCES

CHINESE CORPORA
(Only corpora, potentially available online, included into the list).
Modern Chinese Corpora
#CorpusSizeDescription
1Academia Sinica Balanced Corpus of Modern Chinese10,000,000 WSinica 5.0
(projection for 2006)
ONLINE
2Academia Sinica Treebank361,834 W
61,087 trees
Sinica TreeBank 3.0
ONLINE
3Beijing Language and Culture University
Chinese language heritage corpus (LCIC)
-Status UNCLEAR
4Lancaster Corpus of Mandarin Chinese1,000,000 W
ONLINE
5LDC Chinese Gigaword831,748,000 unitsversion 2.0 (2009)
ONLINE
6Leeds Chinese Internet Corpus280,000,000 WONLINE
7Peking University Modern Chinese Corpus307,000,000 CUNCLEAR
8PH Corpus (see /pub/chinese/)2,447,000 WXinhua newswire data
1990-1991
ONLINE
9Sheffield Corpus of Chinese for Diachronic Linguistic Study430,000 CONLINE
Diachronic corpus
consisting of a wide range of fully marked-up
Chinese historical texts.
10UCLA Chinese corpus687,000 unitsONLINE
11Xiamen University corporaUNCLEAR
12The Thesaurus Linguae Sericae (TLS): collaborative forum for discussion on the close reading of Chinese textsVery advanced corpora site, similar to and much larger than CTexts project, but some functionality is different.
ONLINE
Early Chinese Corpora
1CHANT (CHinese ANcient Texts) 40,000,000 CONLINE
2Peking University Ancient Chinese Corpus170,000,000 CONLINE
3Scripta Sinica Ancient Chinese402,000,000 CONLINE
Academia Sinica
Tagged Corpus of Old Chinese,
Academia Sinica Ancient Chinese Corpus,
more than 460 titles