About the ECCE Corpus 1.0 (ECCE英汉社论平行语料库 1.0)
Download the corpus here.
The ECCE (pronounced as /'eki/, which is the shorthand of the English Chinese Corpus of Editorials) corpus 1.0 was created by Linwei Yang and his MA students at Yantai University before Linwei joined the PhD progromme at the National Research Centre for Foreign Language Education of Beijing Foreign Studies University.
The bilingual texts of ECCE were originally extracted from The Financial Times website, and sentence-aligned by Linwei's team. The earlier online version of the ECCE corpus 1.0 (known as 'Bilingual FT Editorial Corpus') has been mounted athttp://www.icorpus.net/application/ft/. The corpus was post-edited before it was uploaded to http://corpus.bfsu.edu.cn/channels/corpus by Jiajin Xu.
The publication dates of the texts span from 16 September 2009 to 21 March 2014.
The ECCE 1.0 corpus is composed of 238,363 English words and 424,921 Chinese characters. (The token definition for English words is '[a-zA-Z0-9-]+', and '[\u4e00-\u9fa5]|[a-zA-Zａ-ｚＡ-Ｚ0-9０-９\.%％]+' for Chinese characters.)
Both plain text, encoded in UTF-8 and ANSI (GB2312, 936), and SQL database formats of the texts are provided.
The ECCE_1.0_EN_ZH_ANSI version of the ECCE corpus 1.0 can be searched withhttp://gexiaoshuai.top/software/SDAU-ParaConc.zip.
Yang, Linwei. 2016. ECCE 1.0: The Bilingual FT Editorial Cropus.（杨林伟，2016，ECCE英汉社论平行语料库。）