Dear all,
The Sketch Engine now supports term extraction for many languages - and we want to evaluate it.
For that, we need domain corpora in which somebody has gone through identifying all the 'true' terms. Then we can compute our system's precision and recall.
We are aware of GENIA, for English, and are using that already (key citation here: A comparative evaluation of term recognition algorithms 2008: Z Zhang, J Iria, CA Brewster, F Ciravegna)
Any corpus with "the terms it contains", conscientiously produced, will help us.
Pointers please!
Adam Kilgarriff
http://www.sketchengine.co.uk/