QA@CLEF-2004
Resources
| Judged Submissions of the CLEF-2004 QA Track |
Eighteen groups participated in the CLEF-2004 QA evaluation exercise, submitting 48 runs in 19 different tasks.
Submissions have been judged by human assessors and grouped according to the target language of the tasks.
Here you can download them (zip file).
| Test Sets at CLEF-2003: |
Three monolingual tasks (with Dutch, Italian and Spanish questions) and five
bilingual tasks (where Dutch, French, German, Italian and Spanish
queries searched for an answer in an English target corpus) were proposed at CLEF-2003.
Here are the original test sets that were distributed to participants. Each test collection is a plain text file.
Please, visit
last year's web site
for further information about the format.
Correct answers were manually retrieved and are included in the "DISEQuA" and "Multisix"
corpora (see below).
| DISEQuA corpus: |
The Dutch, Italian, Spanish and English collection of Questions and
Answers was developed by three research groups: ITC-irst (Centro per la Ricerca Scientifica e
Tecnologica, Trento - Italy), UNED (Spanish Distance Learning University, Madrid - Spain) and ILLC
(Language and Inference Technology Group, University of Amsterdam - The Netherlands).
It is composed of 450 questions formulated into four languages. The answers have been manually searched
in three document collections, which enables to test/train cross-language QA systems in twelve different
combinations. The corpora in which the answers were retrieved are those licensed by the CLEF consortium
in 2002: La Stampa and SDA newspaper/wire articles (year 1994) for Italian, EFE
(year 1994) for Spanish and Algemeen Dagblad and NRC Handelsblad (years 1994 and 1995) for
Dutch. Questions appear also in English, but they were not verified in an English document collection.
Reference publication (to be acknowledged whenever you use DISEQuA) is
B. Magnini, S. Romagnoli, A. Vallin, J. Herrera, A. Peñas, V. Peinado, F. Verdejo, M. de Rijke,
Creating the DISEQuA Corpus: a Test Set for Multilingual Question Answering, in
Carol Peters, editor, Working Notes for the CLEF 2003 Workshop, 21-22 August, Trondheim, Norway, 2003.
For further information, read a short
description
of the corpus.
Here
you can download the version 1.0 of DISEQuA (zip file).
| Multisix corpus: |
The test sets we used for the cross-language tasks at CLEF QA-2003 are collected in the Multisix corpus,
is a collection of 200 English questions whose answers have been manually searched in the Los Angeles Times
corpus (year 1994) licensed last year by CLEF. Each question has been translated into five languages: Dutch,
French, German, Italian and Spanish, but no manual processing was conducted in other document collections.
Some typos were recently found and corrected in German questions, so some entries in the "Multisix corpus" are slightly
different from those in the original test sets (that can be downloaded above).
Reference publication (to be acknowledged whenever you use Multisix) is
B. Magnini, S. Romagnoli, A. Vallin, J. Herrera, A. Peñas, V. Peinado, F. Verdejo, M. de Rijke,
The Multiple Language Question Answering Track at CLEF 2003.
(see chapter "Gold Standard for the Cross-Language Tasks"), in Carol Peters, editor, Working Notes for the CLEF
2003 Workshop, 21-22 August, Trondheim, Norway, 2003.
For further information, read a short
description.
Here
you can download the revised version (v2) of the Multisix corpus (zip file).
| Check input utilities: |
Before submitting their results, participants should run this checking routine in order to detect format inconsistencies
(invalid document numbers, missing data, etc..) in their runs. The submissions that are not compliant with the required
format will not be assessed. For a detailed description of the answer format, please refer to the
track guidelines.
Download the checking routine for
CLEF-2003 QA track.
Download the checking routine for
CLEF-2004 QA track.
| Italian Translation of the TREC Questions: |
ITC-irst has translated into Italian 1000 questions released for the QA track at TREC-2002 and 2003.
They represent a good example of how CLEF questions for this year's tasks may look like, and they can be
used for training.
Similarly to the DISEQuA corpus (see above), the translation of the two TREC question
sets is given in two XML files, where queries are numbered and described according to the category
they belong to (either FACTOID, LIST or DEFINITION) and their answer type, i.e. the instance they refer to.
Several kinds of answer types have been taken into account: LOCATION (a place), PERSON (someone's name or role),
TIME (the date of an event), MEASURE (the amount of something), MATERIAL (a particular substance), HOW (
questions like "How did something happen?"), TITLE (the title of a song, movie, book, etc.), ACRONYM (
the meaning of an abbreviation) and OTHER (plants, animals, inanimate objects, etc.). In most of the cases,
the right answer is provided.
This translation represents a growing resource, and you are all encouraged to add other languages and
other useful descriptive tags.
Download the translation of the
TREC-2002
questions. (zip file)
Download the translation of the
TREC-2003
questions. (zip file)
| Test Set for Italian Named-Entities Recognition: |
Annotated text represent another useful resource you may use to test and improve your system. ITC-irst
provides the transcribed text of Italian broadcasts, in which the entities LOCATION, PERSON and ORGANIZATION
have been marked with tags, according to the NIST guidelines.
Download the
test set.
(tar.gz file)
| French Translation of the TREC Questions: |
The RALI group (Laboratoire de Recherche Appliquée en Linguistique Informatique) at the University of
Montreal, Canada, has translated into French 1893 questions drawn from the TREC QA evaluation exercises.
The file is available at the
RALI website.
| Spanish Resources: |
QA resources for Spanish (including the translation of the TREC questions) are available on the website
of the NLP and IR Group at UNED (Madrid, Spain).
URL:
http://terral.lsi.uned.es/QA/resources/
| Finnish Resources: |
The DOREMI research group at the University of Helsinki has posted some QA resources for Finnish, including translations of
the CLEF 2003 and 2004 test sets.
URL:
http://www.cs.helsinki.fi/research/doremi/interests/QAResources.shtml