BDI-Kit

BDI-Kit
BDI-Kit, a Python toolkit specifically designed to streamline biomedical data integration. BDI-Kit provides a suite of methods, intuitive APIs, and AI-powered chat agent support to facilitate the efficient harmonization of diverse datasets. By simplifying the integration process through both programmatic and conversational interfaces, BDI-Kit enables practitioners and researchers to create unified datasets, clearing the path for more effective exploration and discovery in fields such as genomics and clinical research.

GitHub    Docs    Video    Demo


BDI-Viz

BDI-Viz
BDI-Viz is a powerful, interactive extension of BDI-Kit designed to assist biomedical researchers and domain experts with schema matching and value mapping tasks. Using a visualization-driven, expert-in-the-loop approach, BDI-Viz simplifies the integration of complex biomedical datasets through interactive heatmaps, value comparison views, and AI-powered explanations generated by LLM agents. The platform also provides timeline tracking with undo/redo support to facilitate transparent and iterative decision-making. Designed specifically for biomedical applications, BDI-Viz supports major data commons such as GDC and PDC.

GitHub    Docs    Video    Demo


Harmonia

Harmonia
Harmonia is an LLM-based interactive data harmonization agent built on top of the BDI-Kit library that supports user-in-the-loop data integration. It combines large language models with specialized data integration primitives to construct and refine harmonization pipelines, generate code when needed, and incorporate user feedback. The agent can also evaluate intermediate outputs, correct errors, or request user input, enabling flexible and adaptive data harmonization workflows.

GitHub    Video   


Data-Gatherer

Data-Gatherer
Data Gatherer, a Python library for automatically discovering and extracting dataset references from scientific publications. By combining rule-based methods and large language models, Data Gatherer identifies dataset mentions from full-text articles and produces structured, machine-readable outputs. The system helps researchers accelerate dataset discovery, data reuse, and biomedical knowledge integration across the scientific literature.

GitHub    Docs   


Discovera

Discovera
Discovera, an interactive agent-based system for hypothesis generation and mechanistic discovery in functional genomics. Discovera combines bioinformatics tools, large language models, and retrieval-augmented generation to help researchers explore gene sets, perform functional enrichment analyses, and generate literature-grounded biological insights through an intuitive conversational interface.

GitHub