Description Despite the broad adoption of Large Language Models (LLMs), their evaluation primarily relies on superficial numerical scores. In this thesis, the candidate will address this disparity by developing detailed and unbiased automatic evaluation metrics for multilingual language models. The proposed methods will closely align with human judgments, tackle bias, offer multi-dimensional feedback, allow for […]
Ana Oliveira