MS CSE @ UCSC NeuroSymbolic AI LLM Systems Knowledge Graphs Evaluation

Interpretable AI, backed by real systems.

I build systems that ▍

I build AI systems that pair research rigor with deployable engineering. My recent work spans transformer compiler/runtime systems, failure-to-eval compilers, human-AI reliance and observability, neuro-symbolic reasoning, multimodal safety, knowledge graph generation, anomaly detection, and time-series modeling, along with the backend, data, and cloud tooling needed to make those systems useful outside a notebook.

See projects Read papers Resume ↗

GitHub

Scholar bhattahan@gmail.com

About

I am most energized by work that connects model behavior, system design, and clear evaluation.

Research themes

What I focus on

Interpretable AI, transformer inference systems, failure-to-eval compilers, human-AI reliance, structured knowledge extraction, and evaluation that reveals where models actually break.

InterpretabilitySafetyKnowledge GraphsEvaluation

Engineering stack

How I build

Python and Node.js across ML pipelines, backend services, experiment tracking, and cloud infrastructure for production-minded AI systems.

PythonNode.jsFastAPIPostgreSQLAWS

Skills

Research engineering, backend systems, and production ML.

Languages

Python, Java, C/C++, SQL, JavaScript, HTML/CSS, R

Machine Learning

PyTorch, TensorFlow, Keras, Transformers, scikit-learn, NumPy, Pandas, Matplotlib

LLM & Data Tools

LangChain, ChromaDB, RAG, vector databases, PyPDF

Backend & Deployment

Node.js, FastAPI, Flask, REST APIs, Docker, Kubernetes, Linux, Git, CI/CD, Nginx

Cloud & Data Engineering

AWS EC2, AWS S3, AWS IAM, AWS Lambda, SageMaker, CloudWatch, PostgreSQL, Redis, ETL, SQL optimization, schema design, data cleaning, time-series preprocessing

Software Engineering

TDD, API integration, version control, DSA, System Design, Performance Optimization, Code Reviews, Debugging

Experience & Education

Researcher - AI Explainability and Accountability (AIEA) Lab @ UCSC Sep 2025 – Jun 2026

Built ReliaGuard Studio and BreakPoint for human-AI reliance detection and adversarial LLM evaluation.

Used Next.js, TypeScript, FastAPI, PostgreSQL, Docker, and MLflow across observability and evaluation workflows.
Combined neuro-symbolic risk modeling, conformal gating, multi-judge validation, and explainable dashboards.
Compiled real LLM failure modes into adversarial regression suites with versioned traps and rubrics.

Data Scientist - AI Institute of South Carolina (remote) Jun 2024 – May 2025

Researched multimodal toxicity mitigation and anomaly detection with reproducible experiment workflows.

Used PostgreSQL-backed experiment tracking to manage dataset metadata, model outputs, evaluation metrics, and benchmarking results.
Contributed to projects such as DE-HATE and Time Series Foundation Model evaluation.

Data Scientist - Space Applications Centre (ISRO) Jan 2024 – May 2024

Built an end-to-end LSTM satellite clock-bias prediction platform for more reliable navigation timing forecasts.

Added Apache Kafka live streaming and a Next.js, React, and TypeScript UI around the prediction workflow.
Automated preprocessing and anomaly filtering to improve signal quality and downstream forecasts.

Software Engineer (AI) - Indus Institute of Technology and Engineering May 2022 – Jan 2024

Developed Node.js-backed LLM pipelines for chat, knowledge graphs, and retrieval-augmented medical QA.

Built a university website chatbot, automated knowledge graph generation, and medical QA workflows.
Evaluated GPT-4, LLaMA-2, and BERT for structured semantic extraction and grounded reasoning.

University of California, Santa Cruz Aug 2025 – Jun 2026

Master of Science in Computer Science and Engineering - Santa Cruz, California

GPA: 3.82

Indus Institute of Technology and Engineering Aug 2020 – Jul 2024

Bachelor of Technology in Computer Engineering - Ahmedabad, India

GPA: 3.94

Projects

Projects highlighted in the current resume.

CacheIR

Compiler runtime

Inspectable compiler and runtime for decoder-only transformer inference: imports Llama, Mistral, and Qwen-style models into a custom IR, specializes prefill/decode graphs, plans KV-cache-aware execution, selects backend kernels, and runs through a reference runtime.

Custom IRKV cacheRuntime Code ↗

BreakPoint

LLM evaluation

Failure-to-eval compiler for LLM systems that mines real failure modes into versioned adversarial regression suites with DSL-defined traps and rubrics, multi-judge calibration, RAG/tool simulators, CI gates, and FastAPI/Next.js dashboards.

Eval suitesCI gatesFastAPI Code ↗

ReliaGuard Studio

AI observability

Next.js, TypeScript, and FastAPI observability platform with SDKs, guardrail APIs, review queues, and conformal neuro-symbolic models to detect and audit overreliance and underreliance in human-AI workflows.

Next.jsFastAPIMLflow Code ↗

Research Forge - A Self-Improving Research Agent

Research agent

Universal self-improving AI research agent that discovers papers, generates hypotheses, runs experiments in a sandbox Python environment, and updates its reasoning strategy using long-term graph memory.

LangGraphSandboxExperiments Code ↗

Meeting AI Assistant

Desktop copilot

Real-time AI meeting copilot desktop app that captures system audio, transcribes conversations, and streams context-aware responses using the OpenAI API with screenshot and document context support.

ElectronOpenAIRealtime Code ↗

Time Series Foundation Model Evaluation for Anomaly Detection

Benchmarking

Benchmarked TimeGPT, Chronos, and Time-MOE across multiple datasets, showing where traditional statistical and deep learning models can still outperform TSFMs.

Time seriesAnomalyEvaluation Code ↗

Knowledge Graph Generation from Large Language Models

Structured extraction

Automated pipeline for generating structured knowledge graphs from unstructured text using large language models, with evaluation across GPT-4, LLaMA-2, and BERT.

Knowledge graphsLLMsGraphRAG Code ↗

Med-Bot: Retrieval Augmented Medical Assistant

RAG

Retrieval-augmented medical assistant built to provide accurate and reliable answers grounded in medical PDFs.

LangChainChromaDBLLMs Code ↗

Papers

Conference papers, workshop papers, and preprints.

Leveraging LSTM for Predictive Modeling of Satellite Clock Bias

IEEE Xplore

LSTM-based clock-bias forecasting with a practical preprocessing pipeline for navigation reliability.

CDMA 2025Time seriesNavigation Link ↗

Generating Knowledge Graphs from Large Language Models: A Comparative Study of GPT-4, LLaMA 2, and BERT

arXiv

Structured knowledge graph generation from unstructured text, evaluated for semantic quality and GraphRAG readiness.

Knowledge graphsLLMsGraphRAG Link ↗

Med-Bot: An AI-Powered Assistant to Provide Accurate and Reliable Medical Information

arXiv

Retrieval-augmented medical assistant for grounded answers from medical literature PDFs.

RAGHealthcareLLMs Link ↗

Time Series Foundational Models: Their Role in Anomaly Detection and Prediction

AAAI 2025 Workshop

Evaluation of time series foundation models for anomaly detection and prediction under realistic benchmarking settings.

Anomaly detectionTSFMEvaluation Link ↗

Contact

For research collaborations, internships, or ML engineering conversations, send a short note with context and one link.

Email: bhattahan@gmail.com

Email me LinkedIn ↗ Back to top ↑