当前位置: HOME >> FLERIC News >> Content

A New Tale of Two C's: CROWN2021 and Upcoming CLOB2021

发布者: [发表时间]:2022-11-09 [来源]: [浏览次数]:

CROWN2021 is a balanced Brown family American English corpus of one million words containing texts published in 2021. It was developed under the leadership of Prof. Jiajin Xu and the texts were collected by Mingchen Sun and 12 other graduate students at Beijing Foreign Studies University (BFSU). CROWN2021 serves as updated language resource of present-day American written English, and a reference corpus for contrastive studies involving diachronic variation (with Brown, Frown, Crown), regional variation (with LOB, FLOB, CLOB) and cross-linguistic comparison (with LCMC, ToRCH family corpora, GLOBE family corpora). Users can have access to the online version of CROWN2021 and other BFSU-made Brown family corpora at BFSU CQPweb Corpus Portal (http://114.251.154.212/cqp/). Both user ID and passcode are "test".

KEY INFORMATION

Project leader: Jiajin Xu of the National Research Centre for Foreign Language Education (NRCFLE), BFSU

Text collectors: Mingchen Sun (359 texts), Yagang Chen (47 texts), Shujuan Deng (21 texts), Tingyan Zhangchen (19 texts), Meijia Hao (15 texts), Xingke Lv (13 texts), Jiaxi Shen (5 texts), Yuanyuan Lin (4 texts), Junyu Mao (4 texts), Xinzhi Yang (4 texts), Zinuo Zuo (4 texts), Xinkai Deng (3 texts), Ruotong Zha (2 texts)

Time of compilation: April 2022 - October 2022

Size: Approximately one million words

Language: Contemporary American English

Number of texts/samples: 500 samples of 2000+ words each (Short texts are pieced together to form one 2000-word text, but saved separately and marked with A, B, C etc. in the filenames.)

Sampling strategy: The Brown Corpus model (see: http://korpus.uib.no/icame/manuals/BROWN/INDEX.HTM)

Period: The texts were published in 2021.

Released in: November 2022

POS TagSet: The BNC Basic (C5) Tagset

POS Tagger: TreeTagger

Lemmatiser: TreeTagger

Sentence Segmenter: spaCy

How to cite:

Mingchen Sun, Jiajin Xu et al. 2022. The CROWN2021 Corpus. National Research Centre for Foreign Language Education, Beijing Foreign Studies University

Related work:

Xu, Jiajin & Maocheng Liang. 2013. A tale of two C's: Comparing English varieties with Crown and CLOB (The 2009 Brown family corpora). ICAME Journal 37: 175-183.