CHINESE CLASSICS

Click to Expand/Collapse OptionHome
Click to Expand/Collapse OptionHelp
Click to Expand/Collapse OptionTexts
Click to Expand/Collapse OptionSearch
Click to Expand/Collapse OptionBag
Click to Expand/Collapse OptionStats & Freqs
Click to Expand/Collapse OptionAbout project
Project Origins
This project has been in many ways inspired by communication and discussions with E. Bruce Brooks (of Warring States Project) and Chris Beckwith. I also considered as a model the Zuozhuan Digital Concordance project (under guidance of John Page and María Isabel García Hidalgo).
(The CHANT database became available only after the prototype had been released.)
The project functionality is now similar to the Zuozhuan Digital Concordance (with addition of the Chunqiu text).
However, the original goal of the project has not been a concordance system per se. It has been designed to provide comparative statistics analysis of both Chunqiu and Zuozhuan texts, and study relationships and narrative characteristics of these two.
It will be developing in this direction.
 
Text Sources
The texts of Chunqiu and Zuozhuan were obtained from the Chinese Wikisource (which is under Creative Commons license).  
The Chinese text of Shi-jing was obtained from the Chinese Wikisource (which is under Creative Commons license).  
Legge's translation of the text of Shi-jing was obtained from a few sources, notably Sacred Texts and Wengu
The genre structure was taken from Wikimedia's source. A few missing phrases of Legge's test will be added soon.
The Chinese text' verse lines have been aligned to Legge's translation. Due to necessity to combine these sources, the original (Wikimedia) verse paraggraph structure has suffered. It may be restored in future.
The table of rulers names and dates was loaned from A universal guide for China studies site (by Ulrich Theobald)
 
Project Technology
The project has been developed completely in Python/MySQL/PHP/Eclipse/Wing, and is easily transferrable.
There have been a few stages since June, 2009:
  1. Merging Chunqiu and Zuozhuan (including the rulers' data) into a single UTF-8 file
  2. Marking up 'temporal' paragraphs (manually) with simple XML-like tokens;
  3. Creation of a Python text parser;
  4. Creation of a MySQL database;
  5. Text parsing and population of the MySQL database (it was helped by Evan Jones' manual on Python/UTF-8 development.);
  6. Creation of text index;
  7. Matching unique characters with pinyin (romanized) equivalents; Chinese Characters Dictionary Web site, maintained by Rick Harbaugh, was very useful for extracting pinyin in bulk. CC-CEDICT Chinese-English Dictionary had been used for the rest of unrecognized characters, and Unihan Database was consulted in most hard cases.
  8. Development of online interface (in PHP 5) to enable data retrieval. The Dr. Herong Yang's manual on working with Chinese texts in UTF-8 was very helpful in developing webpages with Chinese characters in PHP and MySQL.
    The menu tree is a modified ionix Limited PHP snippet.
  9. Adding reconstructions of preclassic, classic, and Middle Chinese pronunciations of characters, as well as some English meanings, kindly provided by George Starostin, who continues developing the database of reconstructions, started by Sergei Starostin (see Tower of Babel website and, specifically, the Chinese database).
    (Missing English senses are taken from CEDIT.)
  10. Numerical icons are copied from Portland State University Mash-ups' tools
 
Other credits
I am also indebted to the CS Department of the University of Toronto (Profs. G.Hirst, N.Koudas, G.Penn, G.Wilson, et al.) for introduction to the latest computational linguistics approaches that were indispensable for this project.