DEAP Baby (V1.0)
The DEAP Baby (V1.0) Corpus is a balanced multi-discipline English for Academic Purposes corpus based on the resampling of the 125-million-word DEAP (Database of English for Academic Purposes) Corpus. Fifty thousand words were sampled from each of the twenty-five disciplines. The sentences of the randomly extracted texts were shuffled to deliberately disrupt the original text organisation for the sake of copyright protection.
All texts were gathered from high-impact scholarly international journals published between 2011-2021.
The DEAP Baby Corpus can be used to compile academic word list(s), formulaic list(s) and to facilitate other types of intra-sentential lexico-grammatical research as well as EAP pedagogy.
Key information of DEAP Baby (V1.0)
Corpus size: 1.25 million words (N.B. Word token definition: [a-zA-Z0-9-]+)
Word count of each discipline: 50,000 words
# of disciplines: 25
Text encoding: UTF-8 (with BOM)
Date of release: 28 July, 2022
Project leader: Jiajin Xu
Text preparation: Mingchen Sun
Proofreaders: Liangping Wu & Jiajin Xu
How to cite: Sun, Mingchen & Jiajin Xu. (2022). DEAP Baby Corpus V1.0. National Research Centre for Foreign Language Education, Beijing Foreign Studies University
Online concordancing: http://114.251.154.212/cqp/ (User ID: test; Passcode: test)
Full-text download URL: https://corpus.bfsu.edu.cn/DEAPBabyV1.zip
Information of the prarent project 'The DEAP Corpus': https://corpus.bfsu.edu.cn/info/1082/1561.htm
DEAP Baby (V1.0) 语料库基本信息
中文名称:DEAP学术英语萃取库1.0版
英文名称:DEAP Baby Corpus V1.0
全库库容:125万词
单个学科库容:5万词
学科数量:25个
文本编码:UTF-8
建成时间:2022年7月28日
设计:许家金
建库:孙铭辰
校对:吴良平、许家金
引用:孙铭辰、许家金,2022,DEAP Baby Corpus V1.0(DEAP学术英语萃取库1.0版)。
在线检索:http://114.251.154.212/cqp/,DEAP Baby Corpus V1.0,账号:test,密码:test
全文下载:https://corpus.bfsu.edu.cn/DEAPBabyV1.zip
DEAP语料库介绍:https://corpus.bfsu.edu.cn/info/1082/1561.htm
National Research Centre for Foreign Language Education, Beijing Foreign Studies University
北京外国语大学中国外语与教育研究中心语料库团队
https://corpus.bfsu.edu.cn