[CCoE Notice] Thesis Announcement: Pham, Quoc Huy, "Agentic Framework for Domain-specific RAG Evaluation"

Greenwell, Stephen J sjgreen2 at Central.UH.EDU
Wed Apr 9 10:16:17 CDT 2025


[Thesis Defense Announcement at the Cullen College of Engineering]
Agentic Framework for Domain-specific RAG Evaluation
Pham, Quoc Huy
April 14, 2025, 12:30 p.m. to 2:00 p.m. (CST)
Location: AERB #222 and Teams link<https://urldefense.com/v3/__https://teams.microsoft.com/l/meetup-join/19*3ameeting_YjA1MDRhOWUtYjM5ZC00MjA2LWFkMGEtOTgyMjc1OGI1ZjI0*40thread.v2/0?context=*7b*22Tid*22*3a*22170bbabd-a2f0-4c90-ad4b-0e8f0f0c4259*22*2c*22Oid*22*3a*225dd5bb4d-564d-4df0-bdc3-e572e5ebc98a*22*7d__;JSUlJSUlJSUlJSUlJSUl!!LkSTlj0I!FUshXraW4rXH8r7P92a3pU17kwkj2ODrqGFrUHmTPD-wkbUJo3jEk33Qb2GuMXM7MzchAbwAKb1yX9IhDBrn8KdHHvU$ >
Meeting ID: 217 618 049 368
Passcode: B7BH3j5s

Committee Chair:
Vedhus Hoskere, Ph.D.
Committee Members:
Craig Glennie, Ph.D. | Nima Ekhtari, Ph.D. | Todd Bradford, LTC
Abstract
Large Language Models (LLMs) have revolutionized generation and understanding of textual information and have wide-ranging applications. While the capabilities of LLMs are immediately impressive to any user, extended use quickly reveals the problems of inaccurate text generation associated with these models. Retrieval-Augmented Generation (RAG) approaches are used to enhance LLM reliability by grounding responses in knowledge bases, significantly reducing hallucinations.
However, RAG performance is highly domain sensitive, necessitating careful tuning of components before deployment in specialized applications. This challenge underscores the critical need for robust evaluation frameworks tailored to domain-specific RAG systems. Existing methods often rely on heuristic-based metrics such as exact match or BLEU scores, which fail to capture deeper semantic reasoning and nuanced understanding. Additionally, these frameworks typically depend on manually curated Question-Answer (QA) datasets, which are often unavailable or insufficient in specialized domains.
To address these limitations, we propose an Agentic Framework for Domain-Specific RAG Evaluation. Our approach introduces a synthetic data generation pipeline that simplifies the adaptation of RAG systems to new domains. We incorporate LLM-as-a-judge metrics to enable a more holistic and versatile evaluation of both synthetic datasets and RAG performance. We present MiliQA, a synthetic data set derived from military documents and compare its quality against public data sets of QA such as Aurelio Mixtral, HuggingFace QA, and WikiEval. To validate our metrics, we benchmark them against human-annotated datasets, including STS-B and SQuAD 2.0. Finally, we demonstrate the applicability of our framework by evaluating key components of RAG systems including embedding models, LLMs, and multiple RAG methodologies, using MiliQA. The results demonstrate the practical value of our proposed framework in guiding the design and optimization of RAG systems for domain-specific applications.
[Engineered For What's Next]



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://Bug.EGR.UH.EDU/pipermail/engi-dist/attachments/20250409/475213ea/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 28468 bytes
Desc: image001.png
Url : http://Bug.EGR.UH.EDU/pipermail/engi-dist/attachments/20250409/475213ea/attachment-0002.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 5699 bytes
Desc: image002.png
Url : http://Bug.EGR.UH.EDU/pipermail/engi-dist/attachments/20250409/475213ea/attachment-0003.png 


More information about the Engi-Dist mailing list