当前位置: HOME >> FLERIC News >> Content

CONE现代汉语平衡语料库正式发布

发布者: [发表时间]:2026-06-09 [来源]: [浏览次数]:

2026年6月9日14时至15时,CONE现代汉语平衡语料库(The CONE Corpus: A Corpus of Oral, Network, and Edited Chinese)发布会顺利举行。许家金教授主持发布“CONE现代汉语平衡语料库”。北外语料库团队成员及来自北京部分高校的师生代表共20余人参加了此次活动。

CONE现代汉语平衡语料库是一项面向当代汉语研究的重要基础设施建设成果,总规模达5000万词,采用“口头汉语(Oral)—网络汉语(Network)—书面汉语(Edited)”三位一体的建库理念,旨在系统反映现代汉语在不同媒介、不同语域中的实际使用状况。

CONE语料库由三个规模基本相当的子库构成:OralCONE(口头汉语子库)、NetworkCONE(网络汉语子库)和EditedCONE(书面汉语子库),每个子库规模约1700万词。项目分别借鉴国际英语语料库(ICE)、网络语域英语语料库(CORE)以及Brown Corpus等国际经典语料库建设经验,在平衡性、代表性和可比性方面达到较高标准。

下一阶段北外语料库团队还将推出约300万词的MiniCONE精简版语料库,包括100万词的MiniOralCONE、100万词的MiniNetworkCONE和100万词的MiniEditedCONE,以满足教学科研等快速化需求。

CONE语料库的建设得到了多位硕博士研究生的大力支持。团队特别感谢孙铭辰同学在语料库建设中的核心贡献,以及任卓璇、宋瑛明、杨宇航、殷俪恺等同学在语料收集与整理工作中提供的重要支持。

使用CONE语料库开展研究时,请引用以下文献:

Xu, Jiajin & Mingchen Sun (forthcoming). A Frequency Dictionary of Mandarin Chinese: Core Vocabulary for Learners (2nd Edition). Routledge.

语料库访问地址:http://114.251.154.212/cqp/

账号:用户名:test;密码:test

CONE语料库的介绍网页:https://corpus.bfsu.edu.cn/CONE.html

Release of the CONE Corpus of Contemporary Chinese

From 14:00 to 15:00 on June 9, 2026, the launch event for the CONE Corpus of Contemporary Chinese (The CONE Corpus: A Corpus of Oral, Network, and Edited Chinese) was successfully held. Professor Jiajin Xu hosted the release of the corpus. More than 20 participants attended the event, including members of the BFSU (Beijing Foreign Studies University) Corpus Research Group and some faculty and students from other universities in Beijing.

With a total size of 50 million words, the CONE Corpus of Contemporary Chinese is built upon the tripartite concept of "Oral Chinese—Network Chinese—Edited Chinese," aiming to reflect the actual usage of modern Chinese across media and registers.

The CONE Corpus consists of three sub-corpora of roughly equal size, each containing approximately 17 million tokenized words:

OralCONE (Oral Chinese Sub-corpus), NetworkCONE (Network Chinese Sub-corpus), and EditedCONE (Written/Edited Chinese Sub-corpus)

Drawing on the sampling frameworks of international corpora—such as the International Corpus of English (ICE), the Corpus of Online Registers of English (CORE), and the Brown Corpus—the project strives to meet the standards of balance, representativeness, and comparability.

In the next phase, the BFSU Corpus Research Group will launch MiniCONE, a streamlined version of the corpus totaling about 3 million words. It will include:

MiniOralCONE (1 million words), MiniNetworkCONE (1 million words), and MiniEditedCONE (1 million words)

This version is designed to meet the need for rapid application in teaching and academic research.

The construction of the CONE Corpus received strong support from multiple Master's and Ph.D. students. We extend special thanks to Mingchen Sun for his core contributions to the corpus construction, as well as to Zhuoxuan Ren, Yingming Song, Yuhang Yang, and Likai Yin for their support in data collection.

Citation

When using the CONE Corpus for research, please cite the following reference:

Xu, Jiajin & Mingchen Sun (forthcoming). A Frequency Dictionary of Mandarin Chinese: Core vocabulary for learners (2nd Edition). Routledge.

Corpus Access URL: http://114.251.154.212/cqp/

Credentials: Username: test; Password: test

CONE Corpus Introduction Page: https://corpus.bfsu.edu.cn/CONE.html