A chronology of corpus linguistics since the pre-electronic age (collated by Jiajin Xu of Beijing Foreign Studies University)
The Latin word ‘corpus’ was used to refer to a collection of documents in addition to its etymological sense of human body. (Source: Gothofredi, D. 1583/1828. Corpus Juris Civilis Romani (Tomus Primus). Neapoli: Apud Januarium Mirelli Bibliopolam.)
1820 or earlier
John Freeman compiled a frequent list to teach adults to read. (Source: ‘A method of teaching adult persons to read ; which is designed to obviate their objections and accelerate their progress.’ Reprinted as ‘On grammalogues: To the Editor of the Phonotypic Journal. The Phonotypic Journal 2(24): 170-171.’)
Sir Issac Pitman, alphabetic and numerical arrangements of frequent words based on 10,000 words, taken from 20 books, 500 from each. (Source: Pitman, I. 1843. List of words from which grammalogues may be selected. The Phonotypic Journal 2(23): 161-163.)
Kaeding, F. Häufigkeitswörterbuch der Deutschen Sprache. Berlin: Self-published.
According to Malinowski (1922: 18-19), [he] was thus acquiring…an abundant linguistic material, and a series of ethnographic documents…. This corpus inscriptionum Kiriwiniensium…[a] collection of…characteristic narratives, typical utterances,…, as documents of native mentality. (Source: Malinowski, B. (1922). Argonauts of the Western Pacific. London: Routledge & Kegan Paul Ltd.)
Zipf, George Kingsley. (1935). The Psycho-biology of Language: An introduction to dynamic philology. Boston: Houghton Mifflin Company. There is a 1936 version by George Routledge and Sons, Ltd and a 1968 version published by the MIT Press.
The analysis here presented is based on the speech of a single informant…and in particular upon a corpus of material, of which a large proportion was narrative, derived from approximately 100 hours of listening. (Source: At page 128 of Allen, W. (1956). Structure and system in the Abaza verbal complex. Transactions of the Philological Society 55(1): 127-176.)
Whatmough, Joshua. (1956). Poetic, Scientific and Other Forms of Discourse: A new approach to Greek and Latin literature. Berkeley: University of California Press.
The completion of the Brown Corpus (A Standard Corpus of Present-Day Edited American English) project. (Source: Francis, N. & H. Kučera. (1967). Computational Analysis of Present-day American English. Providence: Brown University Press.)
Herdan, Gustav. (1966). The Advanced Theory of Language as Choice and Chance. Berlin: Springer.
Aarts, J. & T. van den Heuvel. (1982). Grammars and intuitions in Corpus Linguistics. In S. Johansson (ed.). Computer Corpora in English Language Research. Bergen: Norwegian Computing Centre for the Humanities. 66-84.
(to be updated)
Corpora compiled by CLSC members
JDEST (Jiao Da English for Science and Technology):
Crown: A Brown family American English corpus of one million words published largely in 2009, developed under the leadership of Jiajin Xu and Maocheng Liang. An article describing the corpus was published in the 2013 issue of ICAME Journal. Download Crown (18.2MB). Crown and CLOB corpora based publications can be found here. Please find a detailed description of Crown corpus at CoRD corpus resource database of Helsinki University.
CLOB: A Brown family British English corpus of one million words published largely in 2009) developed under the leadership of Jiajin Xu and Maocheng Liang. An article describing the corpus was published in the 2013 issue of ICAME Journal. Download CLOB (18.2MB). Crown and CLOB corpora based publications can be found here. Please find a detailed description of CLOB corpus at CoRD corpus resource database of Helsinki University.
The TECCL corpus: Ten-thousand English Compositions of Chinese Learners