Really interesting project. I like that you're focusing on the point where data is actually created rather than only discussing bias at the AI model level. A lot of people talk about underrepresentation in health datasets, but you're trying to identify the operational reasons why the data never makes it into those datasets in the first place.
I also think this touches on something that is often overlooked in the current AI ecosystem.
A lot of investors and even parts of the tech community are obsessed with increasingly sophisticated AI models, but good AI ultimately depends on good data. Without representative, structured, and high-quality data, even the most advanced models will produce biased or unreliable predictions. Projects that improve the foundations of data collection may not sound as glamorous as the latest AI breakthrough, but they're arguably just as important.
What also caught my attention is that this isn't just a research concept, you already have a working implementation and a GitHub repository showing concrete development work. That makes it feel much more actionable than many proposals that stop at problem identification.
One question I'd be curious about: if the pilot proves successful, how transferable do you think the structured data capture workflow will be across different healthcare systems and countries? Is the goal to create a framework that can be adapted broadly across LMICs?
Also, do you think the biggest barrier today is really the lack of technology, or is it more about workflow adoption and incentives within healthcare facilities? It seems like the answer to that question could have major implications for where future investments should be directed.
Looking forward to seeing how the pilot evolves. Good luck with your project !