This research presents a comprehensive evaluation of Retrieval-Augmented Generation (RAG) based chatbots for educational support in higher education. Our system leverages multiple Large Language Models (LLMs) including GPT-3.5, Gemini, LLaMA, Mistral, and DeepSeek to provide personalized academic assistance to students across various disciplines. The project addresses the limitations of existing research by conducting a comparative study to evaluate the effectiveness of different LLMs in educational contexts, specifically focusing on developing a chatbot to assist students with course materials across various subjects offered at NTNU.
Through systematic experiments across three distinct university courses, we demonstrate that GPT-RAG consistently outperforms other models in answer correctness and relevancy, while RAG-Gemini shows superior faithfulness scores. Our pilot study with real students validates the practical effectiveness of the system in educational settings.
RQ1: To what extent does the use of a RAG-based chatbot improve the accuracy and relevance of responses compared to traditional intent-based chatbots in a higher education setting?
RQ2: In the context of student queries, how effectively can a RAG-based chatbot retrieve relevant information from course materials and generate contextually appropriate responses?
RQ3: Does the integration of RAG with generative AI models enhance the chatbot's ability to handle complex or ambiguous queries from students, offering more personalized and insightful answers?
RAG-based Chatbot Framework: Our system implements a sophisticated RAG architecture supporting multiple file formats, vector storage, and multi-LLM integration
Data Modeling and Database Systems
NTNU Gjøvik, Norway
Computer Science Department
Introduction to Data Analytics
Linnaeus University (LNU), Sweden
Programming Fundamentals
Sukkur IBA University (SIBAU), Pakistan
Pilot Study Setup: Deployment to Bachelor students in Computer Science, NTNU Gjøvik
Key Finding: GPT-RAG consistently outperformed other models across most evaluation metrics, achieving superior answer correctness and relevancy. RAG-Gemini demonstrated high faithfulness scores, while RAG-Mistral showed excellent context precision.
| Dataset | Model | Answer Correctness | Answer Relevancy | Faithfulness | Context Precision | Context Recall |
|---|---|---|---|---|---|---|
| Data Modeling & Database Systems | GPT-RAG | 0.71 | 0.98 | 0.68 | 0.73 | 0.81 |
| RAG-Gemini | 0.63 | 0.87 | 0.82 | 0.73 | 0.80 | |
| RAG-Llama | 0.50 | 0.90 | 0.61 | 0.73 | 0.80 | |
| RAG-Mistral | 0.53 | 0.94 | 0.66 | 0.88 | 0.79 | |
| RAG-Deepseek | 0.57 | 0.73 | 0.56 | 0.77 | 0.78 | |
| Data Analytics | GPT-RAG | 0.66 | 0.95 | 0.73 | 0.60 | 0.72 |
| RAG-Gemini | 0.53 | 0.78 | 0.84 | 0.60 | 0.70 | |
| RAG-Llama | 0.47 | 0.94 | 0.66 | 0.71 | 0.75 | |
| RAG-Mistral | 0.45 | 0.93 | 0.74 | 0.65 | 0.67 | |
| RAG-Deepseek | 0.53 | 0.75 | 0.69 | 0.66 | 0.74 | |
| Programming Fundamentals | GPT-RAG | 0.70 | 0.88 | 0.61 | 0.67 | 0.69 |
| RAG-Gemini | 0.65 | 0.78 | 0.72 | 0.64 | 0.68 | |
| RAG-Llama | 0.58 | 0.76 | 0.54 | 0.68 | 0.68 | |
| RAG-Mistral | 0.65 | 0.84 | 0.56 | 0.73 | 0.68 | |
| RAG-Deepseek | 0.63 | 0.64 | 0.56 | 0.67 | 0.71 |
Real-world Validation: Our pilot study with Bachelor students at NTNU Gjøvik demonstrated practical effectiveness across different query types. The system successfully handled both content inquiry and exam preparation scenarios.
| Query Type | Records | Model | Answer Relevancy | Faithfulness |
|---|---|---|---|---|
| Content Inquiry | 93 | GPT-RAG | 0.90 | 0.62 |
| Gemini-RAG | 0.78 | 0.79 | ||
| Mistral-RAG | 0.87 | 0.63 | ||
| LLama-RAG | 0.80 | 0.55 | ||
| Exam Preparation | 38 | GPT-RAG | 0.79 | 0.62 |
| Gemini-RAG | 0.62 | 0.66 | ||
| Mistral-RAG | 0.78 | 0.57 | ||
| LLama-RAG | 0.61 | 0.60 |
Development of a RAG-based chatbot specifically designed to provide personalized educational support to students in higher education.
Assessment of the effectiveness of RAG-based chatbots in comparison to traditional intent-based systems, focusing on accuracy, relevance, and context-awareness.
Introduction of a novel methodology for integrating course materials into a RAG-based chatbot, demonstrating potential for personalized student support.
Evaluation across three different universities and academic systems, ensuring generalizability of findings.
Try the System: Our RAG-based educational chatbot is available for demonstration. The system features a modern web interface with real-time chat functionality and supports multiple LLM backends.
PDF, TXT, PPTX, CSV, XLSX, DOCX
GPT-3.5, Gemini, LLaMA, Mistral, DeepSeek
FAISS and Chroma integration
Comprehensive metrics framework
This research establishes a foundation for advanced educational AI systems that bridge the gap between course content and personalized student support. Future work will focus on:
Lead Researcher: Ali Shariq Imran (NTNU)
Researchers: Abdul Manaf (NTNU), Nimra Mughal (Sukkur IBA), Zenun Kastrati (Linnaeus), Sher Muhammad Daudpota (Sukkur IBA)
Institutions: NTNU Gjøvik (Norway), Sukkur IBA University (Pakistan), Linnaeus University (Sweden)
Project Website: NORPART Connect
Live Demo: Student AI Navigator
Code Repository: GitHub
Status: Under Review
This research is part of ongoing work in educational technology and AI-assisted learning at NTNU Gjøvik, Norway.