ISCA - International Speech
Communication Association


ISCApad Archive  »  2023  »  ISCApad #296  »  Events  »  Other Events  »  (2023-07-XX) Track 4: Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems - Eleventh Dialog System Technology Challenge (DSTC11.T4)

ISCApad #296

Tuesday, February 07, 2023 by Chris Wellekens

3-3-28 (2023-07-XX) Track 4: Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems - Eleventh Dialog System Technology Challenge (DSTC11.T4)
  
Track 4: Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems - Eleventh Dialog System Technology Challenge (DSTC11.T4)

Call for Participation

TRACK GOALS AND DETAILS: Two main goals and tasks:
•    Task 1: Propose and develop effective Automatic Metrics for evaluation of open-domain multilingual dialogs.
•    Task 2: Propose and develop Robust Metrics for dialogue systems trained with back translated and paraphrased dialogs in English.

EXPECTED PROPERTIES OF THE PROPOSED METRICS:
•    High correlation with human annotated assessments.
•    Explainable metrics in terms of the quality of the model-generated responses.
•    Participants can propose their own metric or optionally improve the baseline evaluation metric deep AM-FM (Zhang et al, 2020).

DATASETS:
For training: Up to 18 Human-Human curated multilingual datasets (+3M turns), with turn/dialogue level automatic annotations as toxicity or sentiment analysis, among others.
Dev/Test: Up to 10 Human-Chatbot curated multilingual datasets (+150k turns), with turn/dialogue level human annotations including QE metrics or cosine similarity.
Data translated and back-translated into several languages (English, Spanish and Chinese). Also, there are several paraphrases with annotations for each dataset.

BASELINE MODEL:
The default choice is Deep AM-FM (Zhang et al, 2020). This model has been adapted to be able to evaluate multilingual datasets, as well as to work with paraphrased and back translated sentences.

REGISTRATION AND FURTHER INFORMATION:
ChatEval: https://chateval.org/dstc11
GitHub: https://github.com/Mario-RC/dstc11_track4_robust_multilingual_metrics

PROPOSED SCHEDULE:
Training/Validation data release: From November to December in 2022
Test data release: Middle of March in 2023
Entry submission deadline: Middle of March in 2023
Submission of final results: End of March in 2023
Final result announcement: Early of April in 2023
Paper submission: From March to May in 2023
Workshop: July-September/2023 in a venue to be announced with DSTC11

ORGANIZATIONS:
Universidad Politécnica de Madrid (Spain)
National University of Singapore (Singapore)
Tencent AI Lab (China)
New York University (USA)
Carnegie Mellon University (USA)

Back  Top


 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by ISCA