A Pilot for AI-Ready & Biosurveillance-Ready Health Data Infrastructure

Technical AI safety Biosecurity Global health & development

🐭

Berfi Amba

ProposalGrant

Closes July 23rd, 2026

$0raised

$5,000minimum funding

$10,000funding goal

Offer to donate

22 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Project summary

One plausible failure mode for increasingly capable AI systems is that they will be trained, evaluated and deployed using datasets that systematically exclude large portions of humanity. If future AI systems are used to inform health, governance, resource allocation, forecasting or other high-impact decisions, populations that are absent from underlying data ecosystems may become effectively invisible to those systems.

Despite growing concern about AI bias and alignment, relatively little attention has been paid to the upstream processes that determine which populations become represented in AI-relevant datasets in the first place. This project investigates why populations in low and middle-income countries are often missing from the health datasets that increasingly support current and future AI systems, biosurveillance platforms and epidemiological forecasting models.

Through an 8-12 week field pilot in healthcare facilities in Kinshasa (Democratic Republic of the Congo), we will map clinical data workflows and identify where routine clinical information becomes lost, unstructured or invisible before it can enter AI-relevant data ecosystems.

Beyond documenting the problem, the project's primary goal is an implementation-specific structured data capture configuration for participating facilities. By testing a practical intervention in real healthcare settings, we aim to determine the minimum infrastructure required for routine clinical activity to become AI-ready and biosurveillance-ready data.

The project will produce a written report on implementation barriers and observed workflow constraints. Expected outputs include documentation of workflow maps, documentation of data invisibility points, an operational structured data capture configuration for participating facilities, and operational recommendations for strengthening participation of underrepresented populations in AI-enabled health intelligence systems.

What are this project's goals? How will you achieve them?

This project investigates why populations in low and middle-income countries (LMICs) are often absent from the health datasets used by AI systems, biosurveillance platforms and epidemiological forecasting models. The goal is to identify where routine clinical information becomes lost, inaccessible or unusable before it can contribute to AI-relevant data ecosystems.

To achieve this, we will conduct an 8–12 week field pilot in healthcare facilities in Kinshasa. The project will map clinical data workflows, identify points of data loss and invisibility and implement a structured data capture workflow in an operational setting. The pilot will generate evidence on how data exclusion occurs in practice and evaluate practical interventions that may improve representation in future health datasets.

How will this funding be used?

The requested $10,000 will support field implementation of the pilot project. Funding will be allocated to:

Field implementation and clinical workflow assessment ($3,000)
Adaptation, configuration, and deployment of existing structured data capture tools ($2,500)
Healthcare worker training and implementation support ($1,500)
Pilot implementation support and operational support activities ($1,000)
Stakeholder engagement and partnership development ($1,000)
Reporting of implementation report and contingency costs ($1,000)

The proposed budget supports field implementation, training and operational deployment activities only. The Grant is not intended to support the creation of new intellectual property, datasets, or data infrastructures beyond the scope of implementation activities.

All methodologies, software, data models, workflow mapping frameworks, structured data capture architectures and related intellectual property used in the project are considered pre-existing background intellectual property of the Recipient, unless otherwise agreed in writing.

All clinical data and underlying datasets remain subject to applicable ethical approvals, institutional agreements and data governance requirements, and are not intended for public release or transfer under this Grant. Any dissemination will be limited to a written report on barriers and causes of clinical data invisibility, in accordance with ethical and legal obligations.

Who is on your team? What's your track record on similar projects?

I serve as the principal investigator and lead all aspects of the project, including research design, stakeholder engagement, implementation planning and evaluation.

I have a training in public health and health economics. My professional experience includes data management, data quality assurance, health economic evaluation, health data analysis and healthcare information systems.

I have worked as a data analyst on international development projects, including contributions to the United Nations Industrial Development Organization (UNIDO) Annual Report for Madagascar. My technical background includes data governance, data integrity frameworks, large-scale dataset management, predictive modeling, cost-effectiveness analysis, dashboard development and design of monitoring systems for healthcare and laboratory environments.

Over the past four months, I have independently developed the full implementation framework for this project, including the pilot protocol, workflow mapping tools, interview guides, ethics and governance framework, data quality assessment methodology and reporting templates. These materials constitute pre-existing background work supporting the implementation phase of the project. An overview of the work is available on GitHub: https://github.com/Beeotics/Health-Data-Pilot.

Preliminary stakeholder engagement has been completed during project development. Discussions were conducted with healthcare professionals and facility leadership in Kinshasa and multiple facilities have expressed interest in participating in the pilot subject to implementation planning and required approvals.

The project builds directly on my experience in health data quality, healthcare analytics and the practical challenges of transforming routine clinical information into structured, decision-ready data systems.

What are the most likely causes and outcomes if this project fails?

The most likely risks are operational rather than technical. Potential challenges include limited participation from healthcare facilities, competing demands on healthcare workers' time, difficulties maintaining adoption of structured data capture workflows or delays in obtaining institutional approvals.

If the project fails, the primary consequence would be insufficient evidence to validate the proposed intervention or generate operational observations and lessons learned.

However, even a partially successful implementation would likely generate useful implementation observations into workflow constraints, data quality challenges and barriers to participation in AI-relevant health data systems. The project does not depend on achieving large-scale deployment to produce valuable findings.

How much money have you raised in the last 12 months, and from where?

To date, the project has not received external funding. Development of the implementation framework, pilot protocol, stakeholder engagement activities and implementation planning has been conducted through the principal investigator’s independent efforts and unpaid work.

The requested grant would represent the first external funding used to transition the project from pre-implementation design into field implementation activities.

Aashka Patel

7 days ago

A really interesting and timely project. Most of the genetic data out there for clinical research is that of the White Western Europeans; it doesn't contain the DNA from other parts of the world. Hence, I agree with your point that populations in low and middle-income countries are often missing from the health datasets that increasingly support current and future AI systems. Hope this project gets funded soon and brings about a real, positive change in the world :)

🐭

Berfi Amba

7 days ago

Thank you @aashkapatel ! I really appreciate your intake on this 😊

Angel B

16 days ago

Really interesting project. I like that you're focusing on the point where data is actually created rather than only discussing bias at the AI model level. A lot of people talk about underrepresentation in health datasets, but you're trying to identify the operational reasons why the data never makes it into those datasets in the first place.

I also think this touches on something that is often overlooked in the current AI ecosystem.

A lot of investors and even parts of the tech community are obsessed with increasingly sophisticated AI models, but good AI ultimately depends on good data. Without representative, structured, and high-quality data, even the most advanced models will produce biased or unreliable predictions. Projects that improve the foundations of data collection may not sound as glamorous as the latest AI breakthrough, but they're arguably just as important.

What also caught my attention is that this isn't just a research concept, you already have a working implementation and a GitHub repository showing concrete development work. That makes it feel much more actionable than many proposals that stop at problem identification.

One question I'd be curious about: if the pilot proves successful, how transferable do you think the structured data capture workflow will be across different healthcare systems and countries? Is the goal to create a framework that can be adapted broadly across LMICs?

Also, do you think the biggest barrier today is really the lack of technology, or is it more about workflow adoption and incentives within healthcare facilities? It seems like the answer to that question could have major implications for where future investments should be directed.

Looking forward to seeing how the pilot evolves. Good luck with your project !

🐭

Berfi Amba

16 days ago

Thank you @Angel_b !

About scalability across countries and healthcare systems:

That's one of the key questions we're trying to answer. Our goal is not to create a Kinshasa-specific tool, but to identify the minimum set of structured data-capture practices that can be integrated into existing clinical workflows in resource-constrained settings. While healthcare systems differ, many facilities face similar challenges: paper-based records, limited connectivity, fragmented reporting requirements, and high clinical workloads. If we can demonstrate that a lightweight approach works in Kinshasa without requiring major infrastructure investments, it could provide a framework that can be adapted to other LMIC contexts rather than a one-size-fits-all solution.

On whether the main barrier is technology or workflow adoption:

Our hypothesis is that the primary challenge is not technology itself, but workflow integration. Many digital health initiatives introduce new tools that require additional effort from already overburdened healthcare staff, which often limits adoption. Clinical data is frequently generated but remains trapped in paper records, free-text notes, or disconnected systems.

We believe that improving representation in AI-relevant datasets depends on making structured data capture a natural by-product of routine care rather than an additional task. If that hypothesis is correct, relatively modest investments in workflow design and data infrastructure could have a greater long-term impact than investing solely in downstream AI models trained on incomplete datasets.