Towards a Detailed and Unbiased Automatic Evaluation of Multilingual Large Language Models

Description

Despite the broad adoption of Large Language Models (LLMs), their evaluation primarily relies on superficial numerical scores. In this thesis, the candidate will address this disparity by developing detailed and unbiased automatic evaluation metrics for multilingual language models. The proposed methods will closely align with human judgments, tackle bias, offer multi-dimensional feedback, allow for customization and prioritize efficiency.

No items found.