BDI-Kit

BDI-Kit
BDI-Kit, a Python toolkit specifically designed to streamline biomedical data integration. BDI-Kit provides a suite of methods, intuitive APIs, and AI-powered chat agent support to facilitate the efficient harmonization of diverse datasets. By simplifying the integration process through both programmatic and conversational interfaces, BDI-Kit enables practitioners and researchers to create unified datasets, clearing the path for more effective exploration and discovery in fields such as genomics and clinical research.

GitHub    Docs    Video    Demo


BDI-Viz

BDI-Viz
BDI-Viz is a powerful, interactive extension of BDI-Kit designed to assist biomedical researchers and domain experts with schema matching and value mapping tasks. Using a visualization-driven, expert-in-the-loop approach, BDI-Viz simplifies the integration of complex biomedical datasets through interactive heatmaps, value comparison views, and AI-powered explanations generated by LLM agents. The platform also provides timeline tracking with undo/redo support to facilitate transparent and iterative decision-making. Designed specifically for biomedical applications, BDI-Viz supports major data commons such as GDC and PDC.

GitHub    Docs    Video    Demo


Harmonia

Harmonia
Harmonia is an LLM-based interactive data harmonization agent built on top of the BDI-Kit library that supports user-in-the-loop data integration. It combines large language models with specialized data integration primitives to construct and refine harmonization pipelines, generate code when needed, and incorporate user feedback. The agent can also evaluate intermediate outputs, correct errors, or request user input, enabling flexible and adaptive data harmonization workflows.

GitHub    Video   


Data-Gatherer

Data-Gatherer
Data Gatherer, a Python library for automatically discovering and extracting dataset references from scientific publications. By combining rule-based methods and large language models, Data Gatherer identifies dataset mentions from full-text articles and produces structured, machine-readable outputs. The system helps researchers accelerate dataset discovery, data reuse, and biomedical knowledge integration across the scientific literature.

GitHub    Docs   


Discovera

Discovera
Despite rapid advances in high-throughput molecular profiling technologies, translating molecular signatures into mechanistic insights remains a major bottleneck in biomedical discovery. Discovera, a workflow-aligned AI agent for signature-to-mechanisms analysis that transforms molecular signatures into evidence-grounded mechanistic reports. Discovera combines LLM-based reasoning with a structured orchestration framework that mirrors expert analytical workflows, guiding the system through signature characterization, enrichment analysis, core gene prioritization, pathway synthesis, and literature-supported interpretation. Unlike general-purpose biomedical agents, Discovera is explicitly designed around the signature interpretation process and operates under evidence-use constraints that require mechanistic claims to be supported by intermediate analytical outputs and retrieved literature.

GitHub