Logo Unbabel 23 (2)

Towards a Detailed and Unbiased Automatic Evaluation of Multilingual Large Language Models

Unbabel, Lisbon

Closed

Description

Description

Despite the broad adoption of Large Language Models (LLMs), their evaluation primarily relies on superficial numerical scores. In this thesis, the candidate will address this disparity by developing detailed and unbiased automatic evaluation metrics for multilingual language models. The proposed methods will closely align with human judgments, tackle bias, offer multi-dimensional feedback, allow for customization and prioritize efficiency.