| CHINESE CORPORA |
| (Only corpora, potentially available online, included into the list). |
| Modern Chinese Corpora |
| # | Corpus | Size | Description |
| 1 | Academia Sinica Balanced Corpus of Modern Chinese | 10,000,000 W | Sinica 5.0 (projection for 2006) ONLINE |
| 2 | Academia Sinica Treebank | 361,834 W 61,087 trees | Sinica TreeBank 3.0 ONLINE |
| 3 | Beijing Language and Culture University Chinese language heritage corpus (LCIC) | - | Status UNCLEAR |
| 4 | Lancaster Corpus of Mandarin Chinese | 1,000,000 W
| ONLINE |
| 5 | LDC Chinese Gigaword | 831,748,000 units | version 2.0 (2009) ONLINE |
| 6 | Leeds Chinese Internet Corpus | 280,000,000 W | ONLINE |
| 7 | Peking University Modern Chinese Corpus | 307,000,000 C | UNCLEAR |
| 8 | PH Corpus (see /pub/chinese/) | 2,447,000 W | Xinhua newswire data 1990-1991 ONLINE |
| 9 | Sheffield Corpus of Chinese for Diachronic Linguistic Study | 430,000 C | ONLINE Diachronic corpus consisting of a wide range of fully marked-up Chinese historical texts. |
| 10 | UCLA Chinese corpus | 687,000 units | ONLINE |
| 11 | Xiamen University corpora | | UNCLEAR |
| 12 | The Thesaurus Linguae Sericae (TLS): collaborative forum for discussion on the close reading of Chinese texts | | Very advanced corpora site, similar to and much larger than CTexts project, but some functionality is different. ONLINE |
| Early Chinese Corpora |
| 1 | CHANT (CHinese ANcient Texts) | 40,000,000 C | ONLINE |
| 2 | Peking University Ancient Chinese Corpus | 170,000,000 C | ONLINE |
| 3 | Scripta Sinica Ancient Chinese | 402,000,000 C | ONLINE Academia Sinica Tagged Corpus of Old Chinese, Academia Sinica Ancient Chinese Corpus, more than 460 titles |