You are here

A chronology of corpus linguistics

A chronology of corpus linguistics since the pre-electronic age (collated by Jiajin Xu of Beijing Foreign Studies University)

1583/1828

The Latin word ‘corpus’ was used to refer to a collection of documents in addition to its etymological sense of human body. (Source: Gothofredi, D. 1583/1828. Corpus Juris Civilis Romani (Tomus Primus). Neapoli: Apud Januarium Mirelli Bibliopolam.)

1820 or earlier

John Freeman compiled a frequent list to teach adults to read. (Source: ‘A method of teaching adult persons to read ; which is designed to obviate their objections and accelerate their progress.’ Reprinted as ‘On grammalogues: To the Editor of the Phonotypic Journal. The Phonotypic Journal 2(24): 170-171.’)

1838/1843

Sir Issac Pitman, alphabetic and numerical arrangements of frequent words based on 10,000 words, taken from 20 books, 500 from each. (Source: Pitman, I. 1843. List of words from which grammalogues may be selected. The Phonotypic Journal 2(23): 161-163.)

1897/1898

Kaeding, F. Häufigkeitswörterbuch der Deutschen Sprache. Berlin: Self-published.

1922

According to Malinowski (1922: 18-19), [he] was thus acquiring…an abundant linguistic material, and a series of ethnographic documents…. This corpus inscriptionum Kiriwiniensium…[a] collection of…characteristic narratives, typical utterances,…, as documents of native mentality. (Source: Malinowski, B. (1922). Argonauts of the Western Pacific. London: Routledge & Kegan Paul Ltd.)

1935

Zipf, George Kingsley. (1935). The Psycho-biology of Language: An introduction to dynamic philology. Boston: Houghton Mifflin Company. There is a 1936 version by George Routledge and Sons, Ltd and a 1968 version published by the MIT Press.

1956

The analysis here presented is based on the speech of a single informant…and in particular upon a corpus of material, of which a large proportion was narrative, derived from approximately 100 hours of listening. (Source: At page 128 of Allen, W. (1956). Structure and system in the Abaza verbal complex. Transactions of the Philological Society 55(1): 127-176.)

Whatmough, Joshua. (1956). Poetic, Scientific and Other Forms of Discourse: A new approach to Greek and Latin literature. Berkeley: University of California Press.

1964

The completion of the Brown Corpus (A Standard Corpus of Present-Day Edited American English) project. (Source: Francis, N. & H. Kučera. (1967). Computational Analysis of Present-day American English. Providence: Brown University Press.)

1966

Herdan, Gustav. (1966). The Advanced Theory of Language as Choice and Chance. Berlin: Springer.

1982

Aarts, J. & T. van den Heuvel. (1982). Grammars and intuitions in Corpus Linguistics. In S. Johansson (ed.). Computer Corpora in English Language Research. Bergen: Norwegian Computing Centre for the Humanities. 66-84.

(to be updated)

Corpora compiled by CLSC members

JDEST (Jiao Da English for Science and Technology):

New JDEST:

CLEC:

SWECCL:

Crown: A Brown family American English corpus of one million words published largely in 2009, developed under the leadership of Jiajin Xu and Maocheng Liang. An article describing the corpus was published in the 2013 issue of ICAME JournalDownload Crown (18.2MB). Crown and CLOB corpora based publications can be found here. Please find a detailed description of Crown corpus at CoRD corpus resource database of Helsinki University.

CLOB: A Brown family British English corpus of one million words published largely in 2009) developed under the leadership of Jiajin Xu and Maocheng Liang. An article describing the corpus was published in the 2013 issue of ICAME JournalDownload CLOB (18.2MB). Crown and CLOB corpora based publications can be found here. Please find a detailed description of CLOB corpus at CoRD corpus resource database of Helsinki University.

The TECCL corpus: Ten-thousand English Compositions of Chinese Learners

Belongs to: 
Research foci