Project summary
One plausible failure mode for increasingly capable AI systems is that they will be trained, evaluated and deployed using datasets that systematically exclude large portions of humanity. If future AI systems are used to inform health, governance, resource allocation, forecasting or other high-impact decisions, populations that are absent from underlying data ecosystems may become effectively invisible to those systems.
Despite growing concern about AI bias and alignment, relatively little attention has been paid to the upstream processes that determine which populations become represented in AI-relevant datasets in the first place. This project investigates why populations in low and middle-income countries are often missing from the health datasets that increasingly support current and future AI systems, biosurveillance platforms and epidemiological forecasting models.
Through an 8-12 week field pilot in healthcare facilities in Kinshasa (Democratic Republic of the Congo), we will map clinical data workflows and identify where routine clinical information becomes lost, unstructured or invisible before it can enter AI-relevant data ecosystems.
Beyond documenting the problem, the project's primary goal is an implementation-specific structured data capture configuration for participating facilities. By testing a practical intervention in real healthcare settings, we aim to determine the minimum infrastructure required for routine clinical activity to become AI-ready and biosurveillance-ready data.
The project will produce a written report on implementation barriers and observed workflow constraints. Expected outputs include documentation of workflow maps, documentation of data invisibility points, an operational structured data capture configuration for participating facilities, and operational recommendations for strengthening participation of underrepresented populations in AI-enabled health intelligence systems.
What are this project's goals? How will you achieve them?
This project investigates why populations in low and middle-income countries (LMICs) are often absent from the health datasets used by AI systems, biosurveillance platforms and epidemiological forecasting models. The goal is to identify where routine clinical information becomes lost, inaccessible or unusable before it can contribute to AI-relevant data ecosystems.
To achieve this, we will conduct an 8–12 week field pilot in healthcare facilities in Kinshasa. The project will map clinical data workflows, identify points of data loss and invisibility and implement a structured data capture workflow in an operational setting. The pilot will generate evidence on how data exclusion occurs in practice and evaluate practical interventions that may improve representation in future health datasets.
How will this funding be used?
The requested $10,000 will support field implementation of the pilot project. Funding will be allocated to:
Field implementation and clinical workflow assessment ($3,000)
Adaptation, configuration, and deployment of existing structured data capture tools ($2,500)
Healthcare worker training and implementation support ($1,500)
Pilot implementation support and operational support activities ($1,000)
Stakeholder engagement and partnership development ($1,000)
Reporting of implementation report and contingency costs ($1,000)
The proposed budget supports field implementation, training and operational deployment activities only. The Grant is not intended to support the creation of new intellectual property, datasets, or data infrastructures beyond the scope of implementation activities.
All methodologies, software, data models, workflow mapping frameworks, structured data capture architectures and related intellectual property used in the project are considered pre-existing background intellectual property of the Recipient, unless otherwise agreed in writing.
All clinical data and underlying datasets remain subject to applicable ethical approvals, institutional agreements and data governance requirements, and are not intended for public release or transfer under this Grant. Any dissemination will be limited to a written report on barriers and causes of clinical data invisibility, in accordance with ethical and legal obligations.
Who is on your team? What's your track record on similar projects?
I serve as the principal investigator and lead all aspects of the project, including research design, stakeholder engagement, implementation planning and evaluation.
I have a training in public health and health economics. My professional experience includes data management, data quality assurance, health economic evaluation, health data analysis and healthcare information systems.
I have worked as a data analyst on international development projects, including contributions to the United Nations Industrial Development Organization (UNIDO) Annual Report for Madagascar. My technical background includes data governance, data integrity frameworks, large-scale dataset management, predictive modeling, cost-effectiveness analysis, dashboard development and design of monitoring systems for healthcare and laboratory environments.
Over the past four months, I have independently developed the full implementation framework for this project, including the pilot protocol, workflow mapping tools, interview guides, ethics and governance framework, data quality assessment methodology and reporting templates. These materials constitute pre-existing background work supporting the implementation phase of the project. An overview of the work is available on GitHub: https://github.com/Beeotics/Health-Data-Pilot.
Preliminary stakeholder engagement has been completed during project development. Discussions were conducted with healthcare professionals and facility leadership in Kinshasa and multiple facilities have expressed interest in participating in the pilot subject to implementation planning and required approvals.
The project builds directly on my experience in health data quality, healthcare analytics and the practical challenges of transforming routine clinical information into structured, decision-ready data systems.
What are the most likely causes and outcomes if this project fails?
The most likely risks are operational rather than technical. Potential challenges include limited participation from healthcare facilities, competing demands on healthcare workers' time, difficulties maintaining adoption of structured data capture workflows or delays in obtaining institutional approvals.
If the project fails, the primary consequence would be insufficient evidence to validate the proposed intervention or generate operational observations and lessons learned.
However, even a partially successful implementation would likely generate useful implementation observations into workflow constraints, data quality challenges and barriers to participation in AI-relevant health data systems. The project does not depend on achieving large-scale deployment to produce valuable findings.
How much money have you raised in the last 12 months, and from where?
To date, the project has not received external funding. Development of the implementation framework, pilot protocol, stakeholder engagement activities and implementation planning has been conducted through the principal investigator’s independent efforts and unpaid work.
The requested grant would represent the first external funding used to transition the project from pre-implementation design into field implementation activities.