Debasmita Das
Manager, Data Science at Mastercard
Debasmita is a data scientist with over 9 years of work experience in Finance and Artificial Intelligence, with a focus on Generative AI, Fraud detection, Anti-Money Laundering and HR Analytics. At present, she leads a Data Science team at Mastercard that builds AI-driven solutions. She has authored multiple research papers and has 2 granted patents within this domain. She was recognized as one of the ‘Women to Watch Out for in AI & Analytics’ at 3AI PINNACLE AWARDS 2023. She has an MBA degree from Indian Institute of Management, Lucknow and prior to her tenure at Mastercard, Debasmita was in the Innovation Team at J.P. Morgan & Chase.
Watch live: May 8, 2024 @ 4:25 – 4:55 pm ET
How to Evaluate a Large Language Model
Evaluating Large Language Models presents unique challenges due to their generative nature and lack of ground truth data. Traditional evaluation metrics used for discriminative models are often insufficient for assessing the quality, coherence, diversity, and usefulness of LLM-generated text. In this session, we will discuss several key considerations for evaluating LLMs, including qualitative analysis by human assessors, quantitative metrics such as perplexity and diversity scores, and domain-specific evaluation through downstream tasks. Additionally, we will discuss the importance of benchmark datasets, reproducibility, and the need for standardized evaluation protocols to facilitate fair comparison and advancement in LLM research.