
Tejas Muniswamy
Bangalore, India
Tejas Muniswamy
Expert Generative AI Engineer: Agentic RAG, LLM Fi
Category : Artificial intelligence (AI)
Generative AI is only as good as the architecture behind it. I specialize in building "Agentic" workflows AI systems that can think, use tools, and retrieve information autonomously to provide accurate, hallucination-free answers.
My technical toolkit includes advanced RAG techniques, PEFT/LoRA for model fine-tuning, and creating custom memory-augmentation networks. My goal is to move your project beyond a simple chatbot and into a robust, intelligent assistant that integrates seamlessly with your existing data. I bring a researcher’s eye for detail and an engineer’s focus on performance to every project I take on.
My technical toolkit includes advanced RAG techniques, PEFT/LoRA for model fine-tuning, and creating custom memory-augmentation networks. My goal is to move your project beyond a simple chatbot and into a robust, intelligent assistant that integrates seamlessly with your existing data. I bring a researcher’s eye for detail and an engineer’s focus on performance to every project I take on.
Working hours
- Monday:08h00 To 18h00
- Tuesday:08h00 To 18h00
- Wednesday:08h00 To 18h00
- Thursday:08h00 To 18h00
- Friday:08h00 To 18h00
- Saturday:Not available
- Sunday:Not available
◦ Designed and implemented an end-to-end NLP classification system to detect rare critical events (animal attacks on humans) from millions
of global news articles in the CC-News dataset (2017–2024), addressing extreme class imbalance, high noise, and the need for continuous
improvement without large-scale manual labeling.
◦ Built a chunked data ingestion and distributed processing pipeline using PySpark to process large Parquet files at scale, applying lightweight
regex-based pre-filtering to remove obvious noise early, significantly reducing compute overhead while preserving recall.
◦ Developed and deployed a DistilBERT-based transformer classifier trained across 13 merged news categories, generating per-class
probabilities and confidence scores to reason about both predictions and model uncertainty.
◦ Engineered a feedback-driven active learning loop: low-confidence predictions were flagged for human review and fed back into retraining,
allowing the model to improve specifically on edge cases and achieve 95% macro-F1 on a heavily imbalanced dataset using focal loss.
◦ Integrated a RAG and LLM reasoning layer for high-impact cases: retrieved semantically similar historical articles via FAISS-based vector
search and passed retrieved context to a locally hosted LLM (via Ollama) to generate natural language explanations and enable researcher
queries on trends such as location or animal type.
◦ Built the entire system locally using Ollama for LLM inference, with Azure Blob Storage for dataset and artifact management and Hugging
Face Hub for downloading pretrained model weights – maintaining full data privacy without reliance on external cloud inference.
◦ Packaged and deployed the complete pipeline as a containerised FastAPI service (Docker), exposing classification, confidence scoring, and
RAG-augmented explanation endpoints for downstream research consumption.
◦ Implemented MLOps practices including experiment tracking and real-time monitoring (MLflow) logging metrics, loss curves, confidence
score distributions, and model drift indicators across retraining runs alongside automated retraining on feedback batches, model versioning,
and containerised deployment, reducing manual operational effort by 80%.
◦ Designed multi-stage prompt engineering strategies for the LLM reasoning layer to reduce hallucinations, improve factual grounding, and
enhance output reliability for domain-specific queries.
◦ Built and maintained robust data pipelines for large-scale text datasets, covering ingestion, preprocessing, tokenisation, training, validation,
and inference workflows end-to-end.
◦ Followed responsible AI development principles, ensuring fairness, transparency, and reliability across all deployed AI systems; delivered the
full system independently from research design through to production deployment.
◦ Tech: Python, PyTorch, Transformers (DistilBERT), Hugging Face, Ollama, LangChain, FAISS, PySpark, FastAPI, Docker, MLflow, Azure
Blob Storage, Linux
of global news articles in the CC-News dataset (2017–2024), addressing extreme class imbalance, high noise, and the need for continuous
improvement without large-scale manual labeling.
◦ Built a chunked data ingestion and distributed processing pipeline using PySpark to process large Parquet files at scale, applying lightweight
regex-based pre-filtering to remove obvious noise early, significantly reducing compute overhead while preserving recall.
◦ Developed and deployed a DistilBERT-based transformer classifier trained across 13 merged news categories, generating per-class
probabilities and confidence scores to reason about both predictions and model uncertainty.
◦ Engineered a feedback-driven active learning loop: low-confidence predictions were flagged for human review and fed back into retraining,
allowing the model to improve specifically on edge cases and achieve 95% macro-F1 on a heavily imbalanced dataset using focal loss.
◦ Integrated a RAG and LLM reasoning layer for high-impact cases: retrieved semantically similar historical articles via FAISS-based vector
search and passed retrieved context to a locally hosted LLM (via Ollama) to generate natural language explanations and enable researcher
queries on trends such as location or animal type.
◦ Built the entire system locally using Ollama for LLM inference, with Azure Blob Storage for dataset and artifact management and Hugging
Face Hub for downloading pretrained model weights – maintaining full data privacy without reliance on external cloud inference.
◦ Packaged and deployed the complete pipeline as a containerised FastAPI service (Docker), exposing classification, confidence scoring, and
RAG-augmented explanation endpoints for downstream research consumption.
◦ Implemented MLOps practices including experiment tracking and real-time monitoring (MLflow) logging metrics, loss curves, confidence
score distributions, and model drift indicators across retraining runs alongside automated retraining on feedback batches, model versioning,
and containerised deployment, reducing manual operational effort by 80%.
◦ Designed multi-stage prompt engineering strategies for the LLM reasoning layer to reduce hallucinations, improve factual grounding, and
enhance output reliability for domain-specific queries.
◦ Built and maintained robust data pipelines for large-scale text datasets, covering ingestion, preprocessing, tokenisation, training, validation,
and inference workflows end-to-end.
◦ Followed responsible AI development principles, ensuring fairness, transparency, and reliability across all deployed AI systems; delivered the
full system independently from research design through to production deployment.
◦ Tech: Python, PyTorch, Transformers (DistilBERT), Hugging Face, Ollama, LangChain, FAISS, PySpark, FastAPI, Docker, MLflow, Azure
Blob Storage, Linux
◦ Scaled automated testing coverage from 30% to 70% using CI/CD-driven frameworks in Java and Python.
◦ Reduced regression cycles by 30% through parallel execution and test optimization.
◦ Built SQL-based data validation pipelines ensuring AI-ready and analytics-ready datasets.
◦ Supported enterprise systems acting as data backbones for ML and GenAI pipelines.
◦ Awarded INSTA IRISE Award for technical excellence.
◦ Tech: Python, Java, Selenium, SQL, Jenkins, Git, Jira, Agile
◦ Reduced regression cycles by 30% through parallel execution and test optimization.
◦ Built SQL-based data validation pipelines ensuring AI-ready and analytics-ready datasets.
◦ Supported enterprise systems acting as data backbones for ML and GenAI pipelines.
◦ Awarded INSTA IRISE Award for technical excellence.
◦ Tech: Python, Java, Selenium, SQL, Jenkins, Git, Jira, Agile
Computer Science with AI
Please sign in as a customer to give your feedback



