ISCA Services

ISCA - International Speech
Communication Association

ISCApad Archive » 2023 » ISCApad #296 » Events » Other Events » (2023-07-XX) Track 4: Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems - Eleventh Dialog System Technology Challenge (DSTC11.T4)

ISCApad #296

Tuesday, February 07, 2023 by Chris Wellekens

3-3-28 (2023-07-XX) Track 4: Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems - Eleventh Dialog System Technology Challenge (DSTC11.T4)

Track 4: Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems - Eleventh Dialog System Technology Challenge (DSTC11.T4)

Call for Participation

TRACK GOALS AND DETAILS: Two main goals and tasks:
• Task 1: Propose and develop effective Automatic Metrics for evaluation of open-domain multilingual dialogs.
• Task 2: Propose and develop Robust Metrics for dialogue systems trained with back translated and paraphrased dialogs in English.

EXPECTED PROPERTIES OF THE PROPOSED METRICS:
• High correlation with human annotated assessments.
• Explainable metrics in terms of the quality of the model-generated responses.
• Participants can propose their own metric or optionally improve the baseline evaluation metric deep AM-FM (Zhang et al, 2020).

DATASETS:
For training: Up to 18 Human-Human curated multilingual datasets (+3M turns), with turn/dialogue level automatic annotations as toxicity or sentiment analysis, among others.
Dev/Test: Up to 10 Human-Chatbot curated multilingual datasets (+150k turns), with turn/dialogue level human annotations including QE metrics or cosine similarity.
Data translated and back-translated into several languages (English, Spanish and Chinese). Also, there are several paraphrases with annotations for each dataset.

BASELINE MODEL:

The default choice is Deep AM-FM (Zhang et al, 2020). This model has been adapted to be able to evaluate multilingual datasets, as well as to work with paraphrased and back translated sentences.

GitHub: https://github.com/karthik19967829/DSTC11-Benchmark

REGISTRATION AND FURTHER INFORMATION:
ChatEval: https://chateval.org/dstc11
GitHub: https://github.com/Mario-RC/dstc11_track4_robust_multilingual_metrics

PROPOSED SCHEDULE:
Training/Validation data release: From November to December in 2022
Test data release: Middle of March in 2023
Entry submission deadline: Middle of March in 2023
Submission of final results: End of March in 2023
Final result announcement: Early of April in 2023
Paper submission: From March to May in 2023
Workshop: July-September/2023 in a venue to be announced with DSTC11

ORGANIZATIONS:
Universidad Politécnica de Madrid (Spain)
National University of Singapore (Singapore)
Tencent AI Lab (China)
New York University (USA)
Carnegie Mellon University (USA)

Back

Top

Organisation	Events	Membership	Help
> Board	> Interspeech	> Join - renew	> Sitemap
> Legal documents	> Workshops	> Membership directory	> Contact
> Logos			> FAQ
			> Privacy policy