Technology

Linux Foundation's LF AI & Data Foundation welcomes three new IBM open source projects

Wednesday, 30 April 2025 6:40AM UTC

The Linux Foundation’s LF AI & Data Foundation expands its AI and data technology portfolio with three new IBM-contributed open source projects: Docling, Data Prep Kit, and BeeAI, enhancing capabilities in semantic document understanding, data preparation, and federated learning.

The Linux Foundation’s LF AI & Data Foundation has announced the addition of three new open source projects contributed by IBM, broadening its portfolio in key AI and data technology areas. The projects—Docling, Data Prep Kit, and BeeAI—address critical needs in semantic document understanding, enterprise-grade data preparation, and privacy-preserving federated learning, respectively.

LF AI & Data serves as an umbrella organisation under the Linux Foundation that supports open source innovation in artificial intelligence and data technologies. The three IBM-contributed projects were officially inducted by the LF AI & Data Technical Advisory Committee, a step that integrates them into the foundation’s broader governance and technical ecosystem.

BeeAI is presented as the first open-source platform supporting agent-to-agent communication, enabling developers to build, discover, run, and compose autonomous AI agents into multi-agent workflows. It leverages the open Agent Communication Protocol (ACP) to facilitate interoperability across different frameworks and technology stacks, supporting languages including JavaScript and Python.

Docling offers a comprehensive open-source ecosystem of Python tools designed for document conversion, generation, and manipulation. It focuses on extracting structured information from complex documents, which is critical for enhanced semantic understanding. The project has gained significant traction, with over 27,000 stars on GitHub, pointing to its growing status as a de facto standard in the space.

Data Prep Kit is designed as a modular suite to clean, transform, and trace unstructured data, specifically tailored for large language models (LLMs). It emphasises data quality, transparency, and scalability while supporting both batch and streaming data scenarios. This toolkit integrates seamlessly with modern AI workflows, aiming to enhance the rigour of enterprise data processes.

Todd Moore, Senior Vice President of Community Operations at the Linux Foundation and interim Executive Director of LF AI & Data, expressed enthusiasm for the new projects. Speaking to AiThority, he said, “We are excited to welcome Docling, Data Prep Kit, and BeeAI into the LF AI & Data family. These contributions from IBM reflect a strong commitment to open collaboration and responsible AI. I love BeeAI’s commitment to both Javascript and Python for aggregated learning.”

Brad Topol, Distinguished Engineer and Director of Open Source at IBM, highlighted the projects’ significance for accelerating innovation in generative AI. In remarks to AiThority, he noted, “Docling, Data Prep Kit, and BeeAI were born from a need to fill critical gaps in AI development tooling and accelerate innovation in the Generative AI space. We’re proud to see them as a catalyst enabling the broader open-source community to build AI applications and agentic workflows. We’re excited to collaborate with the open-source community to evolve these technologies and solve real-world challenges together.”

The governance of these projects will follow LF AI & Data’s model, benefiting from structured community-driven technical steering committees and ecosystem engagement. The projects are now publicly accessible, inviting participation from developers, data scientists, and researchers to contribute towards the future development of these impactful tools.

This expansion marks a notable enhancement in the tools available for AI workflows, with emphasis on open collaboration and scalable solutions for enterprise and developer communities.

Source: Noah Wire Services

More on this

https://www.ibm.com/new/announcements/ibm-adds-open-source-projects-docling-beeaI-and-data-prep-kit-added-to-the-linux-foundation - IBM's official announcement detailing the contribution of Docling, Data Prep Kit, and BeeAI to the Linux Foundation, highlighting their roles in semantic document understanding, data preparation, and federated learning.
https://fossforce.com/2025/03/at-ato-ai-ibm-announces-its-handing-three-ai-projects-to-linux-foundation/ - Coverage of IBM's announcement at the All Things Open AI conference, discussing the donation of three AI projects to the Linux Foundation and their significance in the AI development stack.
https://www.automationinside.com/article/ibm-contributes-key-open-source-projects-to-linux-foundation-to-advance-ai-community-participation - An article highlighting IBM's contribution of Docling, Data Prep Kit, and BeeAI to the Linux Foundation, emphasizing their impact on AI community participation and open-source innovation.
https://ossaidevjapan24.sched.com/event/1jKBm/data-prep-kit-a-comprehensive-cloud-native-toolkit-for-scalable-data-preparation-in-genai-app-daiki-tsuzuku-takuya-goto-ibm - A session at the Open Source Summit + AI_dev Japan 2024 discussing Data Prep Kit as a cloud-native toolkit for scalable data preparation in generative AI applications.
https://docs.beeai.dev/introduction/welcome - The official BeeAI documentation introducing the platform as an open-source solution for discovering, running, and composing AI agents from any framework.
https://github.com/i-am-bee/beeai-framework - The GitHub repository for BeeAI Framework, providing resources for building production-ready AI agents in Python and TypeScript.
https://news.google.com/rss/articles/CBMiugJBVV95cUxPVTlneWJBbjEtUjFvSXNJcWx3WG5sU0F4SFY3ZzRrZVZzSDNrUWlCYTNoVF9MelpPM2lrM2RuMGprR05NNlRkeVg3QzlPZi15SUlkaGZ4ZkFrV3ZJQWJxc0E1Ynk1VEMtNGhVdUowVHpicS1NNTBSQkJYYzA2RG9faW1BZVpacGxaclJoc1Y3VmR3MUVWX1F1SmNvZm9mV3hLajBhV1RxbDNtVUEtNHdON1ZaeUxKTmZieWFGU2hLRUhvZ0Yyek5jSnFDQ3BkTU9QdlhvQnNYVGFOWEMtVkkxV0xRVmNqVndvUjFJQWZKanlNczBiSk5VZnNlZ3BHQ0pTc1JwenVuNVV0REtzZktSVTdwajZUVDljWml5dWk3VlE2dWNpbV9nM0lJbEhyMHZOMFhaeUhMeDdxUQ?oc=5&hl=en-US&gl=US&ceid=US:en - Please view link - unable to able to access data

Noah Fact Check Pro

The draft above was created using the information available at the time the story first emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed below. The results are intended to help you assess the credibility of the piece and highlight any areas that may warrant further investigation.

Freshness check

Score: 8

Notes: The narrative appears to be current, as it discusses recent additions to the LF AI & Data Foundation. However, it does not specify an exact date, which might affect its timeliness score slightly.

Quotes check

Score: 6

Notes: While there are quotes from Todd Moore and Brad Topol, initial online sources only reference the narrative provided. The original source or date for these quotes could not be verified beyond the current article.

Source reliability

Score: 8

Notes: The narrative originates from a well-known news aggregator and references reputable institutions like the Linux Foundation and IBM, though the direct source of the article is unclear.

Plausibility check

Score: 9

Notes: The projects mentioned align with ongoing developments in AI and open-source technology. The involvement of IBM and the Linux Foundation lends credibility to the claims.

Overall assessment

Verdict (FAIL, OPEN, PASS): PASS

Confidence (LOW, MEDIUM, HIGH): HIGH

Summary: The narrative is likely recent, involves reliable institutions, and aligns with plausible developments in the AI sector. However, the freshness score is slightly reduced due to the lack of a specific date, and the quotes' original source could not be independently verified.

Linux Foundation
AI
Open Source
IBM
LF AI & Data