Search

AI for Health. Landscape review to identify promising areas for impact

Share with
Or share with link

Editorial note

This report was commissioned by Coefficient Giving (formerly Open Philanthropy) and produced by Rethink Priorities from October to November 2025. We revised the report for publication. Coefficient Giving, our expert informants, and their affiliated organizations do not necessarily endorse our conclusions.

In this report, we conducted a rapid landscape scan of AI for health applications to identify areas that appear most promising for further investigation and potential philanthropic support. This work was informed by desk research, selective literature review, and interviews with three experts, two of whom agreed to be named.

We tried to flag major sources of uncertainty in the report and are open to revising our views based on new information or further research.

Executive summary

What we did

The goal of this project was to identify promising AI for health interventions and assess their potential to generate meaningful health impact, with a focus on organizations deploying frontier AI models in real-world settings. Organizations focused mainly on R&D were out of scope. Our work combined a broad landscape scan, a rough cost-effectiveness assessment, and qualitative investigation of selected organizations. The process involved three main steps.

1. Landscape sourcing and prioritization

Over roughly one week, we built a longlist of 258 organizations using AI to improve clinical support, patient support, health operations, or population health. Each organization was screened to verify that AI was a core component of the intervention, since many groups market themselves as AI-enabled while in practice using limited AI. We then classified organizations by likely technical sophistication. Based on public information, just under half appeared to be using frontier AI models as of late 2025.

2. Assessment of intervention pathways and cost-effectiveness potential

We identified eight impact pathways through which AI could generate health value. From these, we prioritized the first six pathways for deeper analysis, focusing on those with the most direct and measurable links to near-term health outcomes: diagnostic assistance, disease surveillance, clinical skills and decision support, product safety and quality, service delivery efficiency, and patient behavior. Data-driven planning and administrative efficiency were also considered but not prioritized for cost-effectiveness analysis given their more indirect links to health outcomes and greater measurement challenges.

For selected organizations operating within the six prioritized pathways, we developed simplified, intentionally rough cost-effectiveness models designed to provide directional insight rather than precise estimates. These models aimed to surface key drivers of value, explore where more advanced AI capabilities might plausibly shift outcomes, and assess the strength and limitations of the existing evidence. Across the landscape, robust outcome measurement paired with frontier AI applications was uncommon, limiting the precision and certainty of these assessments.

3. Illustrative profiles of selected organizations

We include qualitative profiles that illustrate how selected organizations apply frontier AI within impact pathways that emerged as potentially promising.

Key takeaways

General findings:

  • Frontier AI organizations are most concentrated in high-income countries. Only about one third of the frontier-AI organizations that we identified are active in low- and middle-income countries (LMICs), while simpler AI applications remain more common in resource-constrained settings.
  • Frontier AI is most prominent in clinical support. Diagnostic assistance and clinical decision support tools show the highest adoption of frontier models. Patient support and population health tools more often rely on conventional machine learning.
  • Our exploratory cost-effectiveness estimates varied by more than three orders of magnitude, which points to substantial dispersion in expected impact across the landscape.
  • Organizations operating in LMICs tend to show the strongest promise. Higher disease burden, larger access gaps, and greater marginal returns to improved service quality contribute to stronger apparent cost-effectiveness.
  • Organizations focusing on diagnostic assistance, disease surveillance, clinical skills and decision support, and product safety and quality show the strongest potential. Based on the initial outputs from our analysis, tools that can improve both service quality and efficiency, stop disease outbreaks earlier, or reduce exposure to substandard care can have outsized impact, particularly in low-income contexts.
  • Organizations focusing on service delivery efficiency and patient behavior tools appear weaker. Efficiency-focused tools often struggle to translate time savings into meaningful health gains unless deployed in strongly capacity-constrained settings. Behavior change tools show measurable effects but tend to be relatively costly per user and have limited persistence of impact.
  • Improvements in throughput, not accuracy, often drive impact. For many diagnostic and decision support interventions, modeled value arises primarily from increased patient volume, not higher diagnostic sensitivity.
  • Gains in diagnostic accuracy come from both reducing false positives and false negatives. In some contexts, reducing false positives can significantly lower costs and resource strain– an underrecognized pathway to system efficiency for diagnostic and decision support interventions.

Limitations, and uncertainties:

  • Frontier AI tools rarely align with strong evidence. Tools near the AI frontier are often early-stage and supported by limited data, while better-studied interventions tend to rely on older AI methods.
  • Unclear how much AI contributes to observed effects. In most interventions, AI is part of a broader package involving hardware, workflow redesign, and training. We did not attempt to attribute impacts specifically to AI.
  • Findings stem from exploratory modeling and limited evidence. Because the underlying data and assumptions vary in quality, even modest changes can alter the expected impact of an intervention. These assessments are best understood as preliminary guidance for future investigation.

Table 1: Summary of key findings on AI for health impact pathways

Impact pathwayPrimary mechanismLikely need for philanthropic supportCost-effectiveness potentialKey risks/uncertainties
Disease surveillanceEarlier outbreak detection enabling quicker response and reduced transmissionHigh—weak commercial incentives and public-good natureHigh—potential is large, but depends on real-world response chainsData quality varies by country; alerts may not translate into timely action
Diagnostic assistanceExpanded diagnostic access and improved accuracyLow to medium—strong commercial markets, philanthropy mainly helps LMIC deploymentHigh—highest in LMICs via access expansionAccuracy and performance vary outside trial settings
Service delivery efficiencyWorkflow and documentation improvements that free clinician timeMedium—commercial interest in HICs but limited demand for LMIC-focused toolsLow to medium—highest when clinician time is binding constraintDepends on training quality and successful integration with local health systems
Patient behaviorImproved adherence, self-management, and preventive behaviorsLow—private commercial models dominate this spaceLow—modest effect sizes, weak persistenceOften requires sustained use; reach depends on device access and engagement
Clinical skills and decision supportImproved clinical decision quality, triage, and referral accuracyMedium to high—commercial viability is uncertain for CHW-focused toolsMedium to high—especially in LMICS, where referral accuracy is lowLimited evidence on real-world adoption and clinical impact
Product safety and qualityDetection of substandard or falsified medicines before reaching patientsMedium—commercially viable for some buyers, though affordability constraints may limit scaleModerate—driven by throughput and prevalence of poor-quality medicinesIndependent validation remains limited; unclear population-level impact
Data-driven planningBetter planning, resource allocation, and program management based on improved analyticsHigh—government analytics are public goods with weak market demandNot assessed—poor evidence for downstream health improvementsUnclear whether improved analytics change decisions or outcomes
Administrative efficiencyReducing administrative burdenLow—appears commercially served with limited public-health externalitiesNot assessed, but likely low—expect limited effect on health outcomesWeak or no causal link to health outcomes

Mapping the AI for health landscape

Search process and categorization

Over a period of ~3.5 days, we conducted a rapid landscape scan to identify as many “AI for health” organizations as possible. The goal of this initial stage was to capture a broad picture of the current ecosystem before applying any filters or prioritization criteria. We cast a wide net, including organizations of varying geographies, stages of development, and degrees of AI use.

To compile the longlist, we drew from multiple sources:

We entered all identified organizations in this spreadsheet, along with basic descriptive information, such as primary function, geographic focus, target user, and medical field.

To organize the longlist, we developed a two-level categorization framework that groups organizations by their core function, and then by specific intervention type. The categorization framework can be seen here and is explained below.

1. Categories

Four broad functional domains reflecting how AI interventions can improve health outcomes:

  • Clinical support
  • Patient support
  • Health operations
  • Population health

2. Sub-categories

Specific intervention types within each category that describe what the AI system does in practice. Examples:

  • Diagnostic image reading
  • Patient adherence tools
  • Disease surveillance and risk prediction
  • Supply chain and logistics optimization

To ensure the categorization framework was comprehensive and aligned with existing literature, we cross-checked our categories against those used in Broadband Commission (2020) and Stanford Center for Digital Health (2025).

Landscaping findings

We found a total of 258 health organizations across four categories and 21 subcategories. The number of organizations varies widely across sub-categories, from only three in “workforce training and supervision” to 28 in “Remote monitoring/diagnostics.” Some categories, such as “patient adherence tools,” generated large lists of potential organizations that turned out not to claim much use of AI at all, whereas other categories, such as “diagnostic image reading,” mostly surfaced organizations that heavily lean on AI claims for their public image, even if some later turned out to be older machine learning (ML) technologies.

Most organizations in our sample focus on diagnostics, clinical decision support, and patient support, while system-level applications such as “administrative efficiency” and “disease surveillance” are less common. Although our sample is not necessarily representative of all AI activity in health, it is plausible that these patterns reflect broader trends, with frontier AI currently concentrated in patient- and clinician-facing tools rather than system-level uses.

Activity across low- and middle-income countries (LMICs) is mixed. For example, our impression is that all of the “disease surveillance” organizations we found are active in LMICs, whereas more than two-thirds of organizations working on “clinical workflow improvement” or “medical record analysis” are only active in high-income countries.

More than 200 of the organizations we identified in this space are for-profit enterprises. This may imply that interested donors would need to pursue alternative routes to impact than grantmaking, such as direct investment, joint operations, or partnership support with nonprofits. We have not looked deeply into the specific forms of engagement that would be most promising.

Scope and limitations

The landscaping scan was neither comprehensive nor representative; many relevant organizations are likely not captured, and we have not identified a viable process to close these gaps without gaining more direct expertise and connections within the space. The search was conducted only in English, likely leading to an overrepresentation of English-speaking and internationally active organizations. Furthermore, R&D-focused organizations were deemed out of scope.

While we aimed to capture the full range of AI health interventions currently in implementation, we expect that some organization types may be missing or underrepresented, e.g.:

  • Within the “patient support category, most identified organizations focus on behavior change and adherence. We did not find organizations centered on access or affordability (e.g., those helping patients navigate benefits, locate care, or manage payments), although such applications could theoretically be valuable.
  • We can also imagine AI tools designed to improve cultural or contextual accessibility, for example, voice-based apps for low-literacy users or systems that adapt care guidance to different cultural or educational backgrounds, but we did not specifically search for such organizations.

Assessing frontier AI likelihood

To identify which organizations are likely to be working with frontier-level AI, we used a two-step process combining LLM-assisted classification and manual validation. This process took ~1.5 days.

Before the classification, we defined what we mean by “frontier AI” to guide the LLM and manual review. In this analysis, we distinguish between different levels of AI technical advancement to assess how “frontier” an organization’s AI capabilities are. This classification reflects the broad evolution of AI capabilities over the past two decades (see Bommasani et al., 2022):

  • Non-frontier: Traditional machine learning methods such as logistic regression, decision trees, or random forests. These techniques have been standard since the 2000s and are now well established and widely used for narrow, structured prediction tasks.
  • Borderline frontier: Modern deep learning methods, such as convolutional or recurrent neural networks. These approaches became mainstream after 2012[3] and are now common in applied AI systems, particularly in areas like image and speech recognition. They represent sophisticated but mature forms of AI.
  • Frontier: Building or integrating frontier-scale AI systems, including LLMs, generative AI, and multimodal foundation models that can process text, images, and other data types together. This class of systems has emerged over roughly the past seven years, beginning with the introduction of the transformer architecture in 2017 (Vaswani et al., 2017), and represents the current AI frontier, marked by broad generalization and rapid innovation.

Step 1: LLM-based initial classification

We used LLMs (Claude and ChatGPT) to assign each organization a preliminary rating across five categories of “frontier AI likelihood”: low, low-medium, medium, medium-high, and high.

  • If an organization appeared to verifiably and intensively use technologies in the above “frontier” categorization, it would qualify as “high.” Cases were rated as “medium-high” if an organization was unclear about the specific usage of frontier AI tech, or if it appeared to use advanced versions of the “borderline frontier” list. “Medium” typically indicated an assessment that the organization’s claims seemed to vaguely indicate frontier tech, or if they relied heavily on tech in the “borderline” list. “Medium-low” “low” indicates a judgment that an organization is likely not using advanced AI technology in a meaningful way.
  • We began with an initial sample of about 20 organizations to calibrate the LLM’s understanding of what we consider “frontier AI.”
  • The LLMs were asked to assign a category and briefly justify their reasoning.
  • After briefly reviewing these first outputs, we adjusted the prompting to tighten or relax the classification threshold as needed until the assignments aligned reasonably with our sense of what qualifies as frontier AI.
  • We then applied this process to all organizations in the dataset to generate an initial rating.

Step 2: Manual validation

  • We conducted several spot-checks on lower-rated organizations to verify that the initial LLM ratings were broadly accurate. After this quick scan, organizations rated low or low-medium by the LLM scoring were excluded from further validation, as we considered the risk of false negatives low.
  • We then focused on organizations rated at least medium or above. For each, we conducted manual validation by reviewing websites, press releases, technical documents, and other public information about their AI systems.
  • In many cases, the information was sparse or high-level. Thus, we think that some remaining degree of misclassification is likely due to the limited public detail on the underlying AI technologies.
  • It appears that many organizations claim their work is “AI-powered” or “AI-enabled” without further detail, so we looked for more specific claims about the use of particular frontier technologies.

Findings: Patterns of frontier AI use in health organizations

After this validation process, we identified 23 organizations that, although seeming initially promising for some form of cutting-edge technology, did not actively advertise the use of AI. We found a further 94 organizations that touted their AI-related capabilities but seemed quite unlikely to be using frontier technologies. We scored 30 organizations as “medium” likelihood of using frontier AI, 38 as “medium-high” likelihood, and 73 as “high,” indicating strong evidence that they employ such technologies. Our findings thus show that our initial list was relatively evenly distributed across the five levels of frontier AI likelihood, with somewhat more representation in the “high” likelihood category, and just under half of the initial longlist ultimately at least medium-high chance of frontier AI usage.

The distribution of frontier AI capabilities varies notably across categories. Clinical support organizations show the strongest concentration of high-frontier AI, with 49 organizations (~45% of the clinical support category) scoring in the highest tier. This mostly reflects the recent surge in LLM-based decision support and clinical workflow tools. In contrast, health operations organizations show a more distributed pattern, with meaningful numbers across all AI sophistication levels, suggesting this category encompasses both cutting-edge optimization systems and more traditional tools. Patient support and population health categories show relatively fewer high-frontier AI organizations, which may indicate that these application areas are either still developing frontier AI use cases or that effective solutions don’t necessarily require the most advanced AI techniques. For patient support in particular, the high concentration at “medium-low” or lower levels of AI sophistication (42 organizations) reflects our assessment that many apps and digital health tools use standard machine learning rather than foundation models.

Among 73 high-scoring frontier AI organizations, only 25 are confirmed active in LMICs, 34 not operating in LMICs at all, and the rest uncertain. The pattern is similar across medium-high and medium frontier AI organizations, where roughly 30–50% are confirmed active in LMICs. Organizations with lower AI sophistication show slightly higher LMIC engagement, with 39 out of 72 “non-frontier” confirmed active in LMICs. This confirms our prior that the most technically advanced AI organizations tend to focus on high-income country markets, likely due to factors such as regulatory barriers, payment systems, and market size. In addition, such a dynamic might indicate that simpler AI approaches are more readily deployable in resource-constrained settings.

Grouping organizations by impact pathways

After identifying the longlist of organizations and mapping them to categories and sub-categories, we found that a more informative way to compare them was to group organizations by the mechanism through which they generate health impact. Rather than focusing on product type or use case, we looked at the underlying causal pathway. For example, organizations working on diagnostic image reading and those supporting disease screening both influence health outcomes through improved identification of health conditions, so they fall under the same impact pathway, which we call “diagnostic assistance.”

These pathways cut across categories and technologies and reflect the underlying mechanisms through which AI can generate health outcomes. Grouping organizations in this way allowed us to reason about shared mechanisms and compare opportunities across different types of tools, rather than focusing on individual use cases.

We identified eight impact pathways:

  1. Disease surveillance: earlier outbreak detection enabling faster response and reduced transmission
  2. Diagnostic assistance: improving the accuracy and timeliness of diagnosis
  3. Service delivery efficiency: increasing the efficiency of clinical workflows and coordination, leading to greater effective coverage
  4. Patient behavior: supporting adherence, lifestyle, and self-management resulting in better health outcomes
  5. Clinical skills and decision support: improving clinical experience in terms of efficiency, standard of care, and referral quality
  6. Product safety and quality: detecting counterfeit or substandard medicines and equipment
  7. Data-driven planning: enabling better planning, resource allocation, and program management through improved analytics
  8. Administrative efficiency: strengthening financial and administrative systems, such as insurance claims and reimbursements, reducing delays, errors, and leakage.

These impact pathways represent the main ways AI can create health value across the landscape. They provide a common structure for comparing opportunities, assessing the potential scale of impact, and identifying where rough cost-effectiveness modeling is most decision-relevant.

Exploring impact pathways through cost-effectiveness models

Overview of our modeling approach

To understand where AI for health might have the greatest impact, we developed a set of simple rough models. These were not intended to serve as full cost-effectiveness analyses, but rather as order-of-magnitude comparisons to support early prioritization. Their purpose was to compare different impact pathways and identify those that seem most likely to offer highly cost-effective opportunities. We built these models at the organization level for multiple examples identified within each pathway. Given the exploratory nature of this work and the sensitivity of some assumptions, we do not present organization-level model outputs here. Instead, we focus on higher-level takeaways that emerged from the modeling exercise in the next section.

Our goal was to create models that were both grounded in available evidence and broadly representative of the main ways AI can generate health value.

We focused on six impact pathways that together cover much of the landscape of frontier AI applications in health:

  • Disease surveillance
  • Diagnostic assistance
  • Service delivery efficiency
  • Patient behavior
  • Clinical skills and decision support
  • Product safety and quality

These six pathways were chosen because they have at least some supporting evidence, and represent distinct causal mechanisms. The remaining two pathways, “data-driven planning” and “administrative efficiency,” were deprioritized for the remainder of the project due to limited data or less direct links to measurable health outcomes.

High-level findings by impact pathway

Disease surveillance

AI-enabled disease surveillance aims to detect outbreaks earlier by analyzing diverse data sources such as case reports, online news, social media, and environmental data. Earlier detection can enable faster response and reduce transmission. We modeled impact by linking earlier outbreak detection to potential reductions in disease burden.

Findings:

  • Earlier detection could be highly cost-effective, particularly for fast spreading, high burden diseases, or in regions with weak existing surveillance systems.
  • However, health impact depends heavily on whether early alerts translate into faster public health action, which is rarely documented.
  • While the potential impact is large, the evidence base is comparatively thin and results are highly sensitive to assumptions about how much earlier detection reduces transmission.

Key limitations and uncertainties: Evidence on response speed, disease-specific transmission dynamics, and real-world implementation is limited.

Diagnostic assistance

AI diagnostic tools aim to improve health outcomes by enhancing the accuracy and accessibility of medical screening and diagnosis through image analysis (e.g., X-rays), audio interpretation (e.g., stethoscopes), or lab result interpretation. We modeled two distinct mechanisms: increased diagnostic volume from workflow efficiency and improved diagnostic quality from better detection performance. These mechanisms were evaluated separately to understand their relative contribution to impact.

Findings:

  • Cost-effectiveness varies widely across settings, diseases, and the relative contribution of volume versus quality improvements.
  • It appears to be highest in low-resource settings with high disease burdens and substantial access gaps. In these contexts, most of the potential impact comes from expanding diagnostic access.
  • In higher-resource settings, where access is less constrained, value depends mainly on whether AI meaningfully improves diagnostic accuracy, which remains uncertain outside controlled studies.

Key limitations and uncertainties: Key uncertainties include how costs should be attributed to increased access, how diagnostic tools perform in real-world settings, and the degree to which baseline capacity constraints limit volume gains.

Service delivery efficiency

AI-enabled service delivery efficiency improvements aim to increase healthcare productivity by automating administrative tasks, optimizing clinical workflows, and improving resource allocation. We modeled value from two distinct mechanisms: increased service volume from efficiency gains and improved service quality from better workflows or documentation. Impact depends on which constraints are binding in a given health system.

Findings:

  • Cost-effectiveness varies widely and depends on the workflow targeted and the specific constraints of the health system.
  • Efficiency gains create the most value when clinician time is the main bottleneck and demand exceeds capacity; in other settings, improvements may not increase care delivered.
  • Quality improvements from better coordination or documentation are plausible but difficult to quantify and appear modest given current evidence.

Key limitations and uncertainties: It is often unclear which constraints are actually binding in real-world settings; evidence from pilot studies may not generalize to routine practice;, and estimates of quality improvements have weak empirical grounding.

Patient behavior

AI-enabled patient support tools aim to improve adherence, lifestyle behaviors, and self-management, thereby reducing the risk of chronic disease or adverse health events. We developed simple models that link behavioral changes to reductions in disease risk. For practicality, each case was modeled through its main disease pathway.

Findings:

  • Many tools have modest effect sizes, limited persistence of behavior change, and relatively high per-user costs, which together constrain cost-effectiveness.
  • Even when behavior improves, the resulting reductions in disease risk are often small relative to program costs.
  • Digital-only interventions may become more cost-effective if marginal costs fall substantially with scale.

Key limitations and uncertainties: Large uncertainties around effect persistence, adherence patterns, and true program costs make results particularly uncertain.

Clinical skills and decision support

Clinical decision support systems aim to improve treatment decisions, documentation, triage, and referrals through evidence-based recommendations and structured workflows. These tools work by analyzing patient data to suggest evidence-based treatments, identify overlooked clinical factors, reduce documentation burden, and optimize care. We modeled three distinct benefit streams that AI clinical decision support can provide: increased volume of services, improved standard treatment quality, and improved referrals.

Findings:

  • Cost-effectiveness varies widely and depends on setting, disease burden, and the mechanism through which AI creates value.
  • In low-resource environments, improved referral accuracy can generate large health gains by reducing preventable mortality.
  • In high-resource settings, impact hinges on whether AI meaningfully improves clinical decision quality beyond current standards.

Key limitations and uncertainties: Limited evidence on real-world changes in clinical decisions, referral follow-through, and downstream health outcomes makes us uncertain about how well modeled improvements translate into actual practice.

Product safety and quality

Pathway overview

AI-enabled tools can detect substandard or falsified medicines before they reach patients, reducing treatment failure and preventable mortality, especially in settings with weak regulatory systems. We model potential health gains from detecting poor quality medicines. Impact is driven by the number of doses tested and the underlying prevalence of substandard products.

Findings:

  • Cost-effectiveness depends heavily on testing throughput, with greater value achieved when tools are deployed at higher volume points in the supply chain.
  • Tools that analyze the chemical composition of medicines appear to be more impactful than packaging-based approaches, since they can detect both falsified and substandard products.
  • The strongest cost-effectiveness potential arises in low-resource settings with significant problems of substandard and falsified medicines.

Key limitations and uncertainties: Uncertainty around the prevalence of substandard and falsified medicines and achievable testing throughput makes estimates highly sensitive.

Modeling challenges and limitations

Frontier AI systems often lack robust monitoring and evaluation data. Programs using the most advanced AI approaches tend to be newer and usually have limited published evidence on health outcomes, particularly in low-resource settings. By contrast, organizations with strong evaluation records typically use more established AI techniques that have had time to build an evidence base. This inverse relationship required us to balance frontier relevance with evidentiary strength when modeling cost-effectiveness.

Some impact pathways required using non-frontier examples to anchor assumptions. Because rigorous evidence is sparse for many frontier AI interventions, several pathways relied on organizations that were not at the AI frontier but had sufficient data to support plausible parameter ranges. These selections were made to allow comparisons across pathways despite uneven data availability.

AI is typically only one component of broader interventions. In many real-world deployments, AI tools are bundled with new hardware, workflow changes, training, or expanded service delivery models. Available evidence rarely isolates the effect of AI alone, so our modeling reflects the combined impact of interventions as implemented.

Our models are sensitive to uncertain parameters. Small changes in factors such as disease burden, diagnostic performance, or treatment effectiveness could meaningfully shift the cost-effectiveness estimates. Given these uncertainties, our assessments should be interpreted as directional rather than precise.

Cross-cutting insights from our analysis

Our analysis surfaced several themes that cut across organizations and impact pathways. These patterns help explain where AI for health appears most promising, where structural barriers limit cost-effectiveness, and where philanthropic capital is likely to be most valuable.

  • LMIC deployment offers greater potential for cost-effective impact. High-income settings consistently appear significantly less cost-effective, while LMIC-focused interventions benefit from higher disease burdens, lower treatment costs, and larger access gaps. The main caveat is that improved detection or diagnosis only matters if downstream systems can act on it.
  • Throughput gains often create more value than accuracy gains. Across impact pathways, increases in service volume frequently drove more impact than improvements in diagnostic precision. Efficiency gains matter most where clinician time or diagnostic capacity is the binding constraint.
  • Specificity improvements can deliver large benefits in high-volume contexts. Reducing false positives can prevent unnecessary treatment and testing, sometimes generating more value than sensitivity improvements. This pattern depends heavily on realistic estimates of baseline human performance.
  • Service delivery and patient-facing tools face structural barriers. Efficiency tools only translate into health impact when clinician time is the true bottleneck. If budgets, infrastructure, or demand are the limiting factors, efficiency gains do not increase the number of patients treated. Patient-facing digital tools tend to have modest effects, limited persistence, and high per-user costs, which makes high cost-effectiveness unlikely.
  • Public good functions remain underserved by markets. Disease surveillance and frontline decision support face weak commercial incentives and are more likely to require philanthropic support, while diagnostic AI markets appear comparatively well-funded.
  • High-impact opportunities depend on real-world system readiness. AI tools only deliver health gains if clinicians, patients, and public systems can act on improved detection or decision support, which we expect to vary widely across settings.

Illustrative organization profiles

This section presents five organizations that exemplify how AI-enabled approaches are being applied across impact pathways that we consider promising. The goal is not to provide a ranked list or definitive funding recommendations, but to showcase how these pathways manifest in practice and illustrate their potential.

To provide breadth, we selected example organizations spanning five key impact pathways that we have identified as promising. We exclude the patient behavior pathway, as our rough cost-effectiveness model suggests it is likely to be less cost-effective than other AI-enabled approaches.

  • Disease surveillance
    • EPIWATCH: AI-powered epidemic intelligence system that scans global open-source data to detect infectious disease outbreaks earlier.
  • Product safety and quality
    • RxAll: AI-enabled RxScanner for detecting substandard and falsified medicines by analyzing chemical composition.
  • Diagnostic assistance
    • Delft Imaging: AI-enabled diagnostic imaging solutions for detecting tuberculosis, silicosis, and other diseases using X-rays.
  • Service delivery efficiency
    • Medic: Community Health Toolkit (CHT) supports community health systems with mobile tools for service provision, data collection, messaging, and more.
  • Clinical skills and decision support
    • HEP Assist: AI-powered clinical decision support for community health workers, assisting in diagnosis and referral for child illnesses in rural Ethiopia.

EPIWATCH

Overview

EPIWATCH[4] is an AI-enabled epidemic intelligence system developed by the Kirby Institute at the University of New South Wales (UNSW) Sydney.[5] It was founded by Prof. Raina MacIntyre and her research group in 2016 after the delayed and chaotic response to the West Africa Ebola epidemic, with the aim of detecting infectious disease outbreaks earlier by scanning global open-source data (e.g., news reports, social media, clinician forums, and official alerts) and flagging unusual disease events in real time. The platform collects, filters, and classifies signals of emerging health threats and displays them in a global dashboard for public health agencies, governments, and frontline responders. In addition to infectious diseases, EPIWATCH is designed as an all-hazards system with emerging capabilities for detecting chemical, radiological, nuclear, explosive (CBRNE) events, and health disinformation.

The system aims to address a key bottleneck in outbreak surveillance: delays in detection and reporting, particularly in low-resource settings where formal case notification systems can be slow, fragmented, or absent. EPIWATCH is intended to support early warning and situational awareness so governments and global health actors can respond before outbreaks become widespread. It is framed as a global public good, designed to provide early warning capabilities that benefit all countries equally and to overcome information gaps caused by weak surveillance, censorship, or conflict.

Key product: AI-enabled outbreak detection system

EPIWATCH’s core product is its early-warning platform that aims to detect outbreaks earlier than traditional surveillance and prompt faster public-health response. MacIntyre suggested that operating the platform at its current scale costs roughly $2 million per year, including cloud computing and access to social-media data, in addition to staff salaries for both technical and epidemiology staff.

Health problem it addresses

Emerging infectious diseases continue to impose large mortality and economic losses, and the frequency of outbreaks appears to be increasing (Liu et al., 2025; Marani et al., 2021). Traditional notification systems depend on clinicians diagnosing and reporting cases, labs confirming results, and national surveillance teams aggregating and sharing data—processes that are often slow and fragmented across LMICs (WHO, 2008). As seen with Ebola, SARS-CoV-2, mpox, and chikungunya, delays of weeks to months can allow undetected spread (WHO, 2015; Bedford et al., 2020; Paredes et al., 2024; ECDC, 2013). These delays are especially common in low-resource settings or crisis-affected settings, where formal surveillance systems may be disrupted or unable to detect early signals.

How the technology works

In our interview, MacIntyre explained that EPIWATCH uses open-source intelligence (OSINT) to collect digital signals about unusual disease events from news outlets, official alerts, clinician forums, blogs, and social media. The system ingests data in 53 languages, including 12 Indian languages and several African languages, and automatically back-translates everything into English for analysis and display.

EPIWATCH uses a two-stage AI pipeline. First, a language model-based prioritization algorithm removes approximately 85-90% of irrelevant content. The remaining signals are classified by pathogen, syndrome, location, and severity, covering more than 250 diseases and undiagnosed “mystery illness” (Honeyman et al., 2025) categories using Natural Language Processing. Second, LLMs generate summaries and link related reports, which are used for automated digests that are human-reviewed before release. The dashboard also provides global analytics such as time trends, avian flyways, animal and human data, socioeconomic data, hospital, lab and transport-network overlays, and a risk-scoring tool.

MacIntyre said the project moved from earlier models to GPT-4 and now GPT-5 after receiving a Microsoft Accelerate grant, and that keeping up with new LLMs had significantly improved system performance. Based on our classification, this places EPIWATCH in the “high” and “frontier” category, as the platform verifiably integrates modern LLMs for classification, filtering, and summarization, even though the team does not train foundation models themselves. In addition, EPIWATCH draws on a network of epidemiologists and clinicians who provide context and ground-truth information when unusual events are detected.

Evidence and success stories

In our interview, MacIntyre highlighted several peer-reviewed studies that have used EPIWATCH data, for example:

  • Honeyman et al. (2025) found that EPIWATCH detected early signals in several documented outbreaks, e.g., identifying a Legionella outbreak in Argentina six days before WHO, and that only ~14% of the 310 unknown-cause outbreaks it detected were ever laboratory-confirmed, meaning ~260 events (~86%) never appeared in formal surveillance systems.
  • Kannan et al. (2024) found that during the 2022 war in Ukraine, formal disease surveillance ceased for most pathogens, yet EPIWATCH still detected 805 outbreak signals, a 447% increase over 2021, including major cholera and tuberculosis clusters that were invisible to official reporting channels.
  • Stone et al. (2023) showed that EPIWATCH is able to detect non-infectious public health threats as well: During the Russian occupation of Chernobyl, it identified early signals of potential radiological events, including 16 reports within the Chernobyl region and symptom clusters around Kyiv, at a time when formal radiation monitoring systems were disrupted.
  • Hutchison et al. (2023) showed that EPIWATCH detected an 8.7-fold rise in rash-and-fever illness signals during the early 2022 monkeypox outbreak (656 vs. 75 the previous year), identifying syndromic clusters in multiple countries ahead of formal case confirmations.

MacIntyre said the system regularly detects clusters of “mystery illness” before laboratory confirmation, and reported that EPIWATCH surfaced signals consistent with COVID-19 by mid-November 2019 in China.

Scale and implementation

Based on our interview with MacIntyre, the EPIWATCH team currently includes 20 staff, with 7 software/AI engineers and 10 epidemiologists and analytics specialists, supported by administrative personnel. MacIntyre said the system has been used by government, military, and industry partners, including a pharmaceutical company, and the US military’s Indo-Pacific Command, which incorporated the platform into pandemic simulation exercises. She also mentioned discussions with the German Ministry of Health. EPIWATCH has capability to gain health intelligence in settings where formal surveillance has broken down, such as during the war in Ukraine (Kannan et al., 2024).

Broader product portfolio

Beyond early detection, EPIWATCH offers several supporting tools:

  • Weekly Digest: LLM-assisted outbreak summaries for rapid situational awareness.
  • EPIRISK: Risk scoring and analytics with flight-network and environmental overlays.
  • Episcope: Simple outbreak modeling for scenario analysis (e.g., impact of vaccination or delayed response).
  • Mobile/translated dashboards: Localized interfaces for frontline health workers, initially in Hindi and several other languages.
  • Global Biosecurity (open-access journal): rapid publication of outbreak analyses, especially from LMICs.

Room for more funding

EPIWATCH appears to have substantial additional room for funding. Its main philanthropic grant from Vitalik Buterin’s Balvi Fund (~$6M, 2022–2025) is ending this year, and an Australian government grant will conclude mid-2026. According to MacIntyre, the team has not secured long-term operational support, and as a university-based initiative, it relies entirely on grants and philanthropic contributions. At about 1 to 1.5 million dollars per year, EPIWATCH could continue operating in its current form, covering the ongoing costs of the platform and its existing AI pipeline. Funding above this level would support major growth and upgrades, including enhanced AI capability, new multilingual tools for field use and expanded training initiatives.

Organization leadership

EPIWATCH was founded by Prof. Raina MacIntyre, Head of the Biosecurity Program at the Kirby Institute, University of New South Wales (UNSW) Sydney. MacIntyre is an infectious disease epidemiologist with a long publishing record in outbreak analytics, respiratory viruses, and epidemic preparedness. The EPIWATCH core team includes software/AI engineers and epidemiology/analytics staff, alongside administrative support. We did not find publicly available information on individual team (EPIWATCH, 2023) members listed on the organization’s website.

Similar organizations

The closest related organization appears to be BlueDot, which also uses AI to detect infectious disease threats but functions as a commercial predictive analytics platform rather than an open-access system. Other somewhat comparable tools include, e.g., WHO’s EIOS (WHO, 2025), HealthMap, and AIME (Nesta, 2022). See also MacIntyre et al. (2023) for a comparison of several related organizations.

RxAll

Overview

RxAll is a Nigeria-founded for-profit social enterprise that develops AI-enabled tools to detect substandard and falsified (S&F) medicines at the point of dispensing. The company was founded in 2016 and reports operations in at least 15 countries.

Key product: RxScanner

RxAll’s primary product, the RxScanner (RxAll, 2022; see Figure 1), is a handheld AI-enabled spectrometer that identifies poor-quality medicines by analyzing their chemical composition and comparing the result against a reference database of more than 300 drug types, which is continually expanding. The device is mainly intended for pharmacies and other retail drug sellers, as well as hospitals, and produces results within seconds. RxAll claims that the RxScanner is 20x cheaper and easier to operate than existing spectrometers, which are typically too expensive for routine use in low-resource settings (Alonge, 2019). The device is provided on a subscription model, where users purchase the device for $6,000 and pay $300/month for access to the cloud-based reference library. The aim is to offer real-time drug authentication in settings where laboratory testing is slow, costly, or inaccessible.

Health problem it addresses

S&F drugs are a major contributor to avoidable mortality in LMICs. It is estimated that 10–15% of medicines worldwide are S&F, meaning they contain too little (or none) of the correct active ingredient, are improperly manufactured or degraded, or are entirely fake and may contain harmful substances (El-Dahiyat et al., 2021). In a previous report, we estimated that S&F antimalarials may be responsible for ~100,000 deaths per year in Sub-Saharan Africa (Leow et al., 2024). In previous work, we found that many of the common strategies to address S&F medicines rely on strong regulatory capacity (Leow et al., 2024). These include improving national laboratories, strengthening supply chains, or reducing manufacturing of falsified drugs at the source. These approaches can be effective but often move slowly and remain difficult to implement in many sub-Saharan African settings.

 

Figure 1: RxScanner

The RxScanner also addresses limitations of earlier point-of-dispensing tools. Mobile authentication systems, such as scratch-and-SMS codes, have seen low uptake, depend on mobile networks, and can be faked by counterfeiters. To our knowledge, RxAll currently has no close competitor offering a similar portable scanner. Other tools, such as TrueMed, analyse packaging rather than chemical content, which makes them more suitable for detecting outright counterfeits, even though substandard products are estimated to be ~6–10x more common (Leow et al., 2024, p. 10). The RxScanner avoids these issues by testing the chemical composition directly, which makes it harder to fake and allows detection of both S&F medicines, without relying on action from patients.

How the technology works

The RxScanner (RxAll, 2022) combines near-infrared spectroscopy paired with deep-learning classification models to authenticate medicines in real time. Each scan produces a chemical fingerprint that the model compares against a database of verified samples, returning a result within seconds (see instruction video; Youtube, 2019). Models update as new reference data are added, allowing the system to adapt to emerging counterfeit products.

Using our classification framework, this places RxAll in the “medium-high” category. The device relies on modern deep-learning methods (e.g., convolutional neural networks) and on-device inference, but does not involve frontier-scale AI systems such as multimodal foundation models or LLMs. The innovation lies primarily in applying mature deep-learning methods to a last-mile global health problem rather than advancing the frontier of AI model development.

Evidence and success stories

The RxScanner has reportedly helped remove >1.3 million counterfeit drugs from circulation (RxPay, 2025) and achieves ~99.9% classification accuracy (Once Daily, 2015), based on the organization’s data. Case studies on the RxAll website (RxAll, 2022) describe several deployments: for example, the Ibadan Drug Fulfillment Center reports that patients accessing safe medicines increased after deployment, and E-Health Africa clinics used the RxScanner to test antimalarials and “protect” patients. Participating pharmacies also report increased sales and improved reputation. However, these figures are self-reported, and we did not identify independent validation studies of RxScanner accuracy, field performance, or impact.

Scale and implementation

RxAll reports that the RxScanner is used in 15 countries around the world. Moreover, the company states that >100,000 drugs are authenticated monthly in partnership with the National Agency for Food and Drug Administration and Control (NAFDAC), >150,000 drugs have been tested at the Ibadan Drug Fulfillment Center, and >30,000 antimalarials were tested in E-Health Africa clinics in one year (RxAll, 2022).

Broader product portfolio

In addition to the RxScanner, RxAll also operates a verified-drug marketplace (“RxDelivered”) and pharmacy management tools (e.g., inventory, invoicing, and point-of-sale systems) intended to help pharmacies source and dispense authenticated medicines, though our analysis here focuses on the RxScanner as their core AI-based tool.

Room for more funding

RxAll reports that it is currently seeking to raise $10M. Stated priorities for additional funding include expanding deployment within Nigeria and scaling operations in Kenya and Uganda, as well as preparing entry into Ghana and Tanzania. The company also aims to deepen partnerships with national regulatory authorities for post-marketing surveillance, including co-developing dashboards, traceability pilots, and periodic quality reports based on RxScanner data. Additional funds would support spectral-library growth and product upgrades to the RxScanner and its underlying AI models, as well as working-capital financing for authenticated medicines through RxPay. Around half of the funding appears aimed at improving or deploying the RxScanner, with the remainder supporting the company’s broader pharmaceutical marketplace and financing tools.[6]

Organization leadership

RxAll is led by co-founders Adebayo Alonge (CEO) and Amy Kao (CMO), both graduates of Yale. Alonge is a trained pharmacist with experience in pharmaceutical supply chains and emerging-market commercialization. Kao has a background in go-to-market strategy and previously held roles in technology and consumer-health companies. The broader leadership team includes specialists in spectroscopy, AI, and hardware engineering. The organization has demonstrated the technical capacity to build and deploy the RxScanner across multiple African markets.

Similar organizations

We have not identified any direct competitors to RxAll offering AI-based chemical verification of medicines. The closest organization appears to be TrueMed, which uses AI-driven image recognition to authenticate pharmaceutical packaging. However, its approach detects only falsified products rather than substandard ones and appears to be more geared towards brand protection in high-income markets rather than medicine quality assurance in LMICs.

Delft Imaging

Overview

Delft Imaging is a Dutch social enterprise founded in 2002 that develops AI-enabled diagnostic imaging solutions for resource-constrained settings. The company operates as a certified B Corp (Delft Imaging, 2024a) and reports operations across 85+ countries, primarily in Africa, Latin America, and Asia, with over 2,500 installations and 32 million screenings conducted to date.

Delft Imaging’s flagship product, CAD4TB (Delft Imaging, 2017), is AI-powered software that analyzes chest X-rays to detect tuberculosis-related abnormalities. Beyond tuberculosis (TB) screening, Delft Imaging has expanded into maternal health with BabyChecker (Delft Imaging, 2021), which uses AI to analyze obstetric ultrasound scans; CAD4Silicosis (Delft Imaging, 2024b), which detects signs of silicosis, a lung disease caused by long-term inhalation of silica dust; and RetCAD (Delft Imaging, 2024c), which scans for retinal diseases. The company also produces portable X-ray systems, including the Delft Light ultra-portable device, as well as mobile OneStopTB clinic units.

Key product: CAD4TB

Developed in partnership with Thirona, a spin-off of Radboud University Medical Center, CAD4TB was the first commercially available computer-aided detection solution for TB (Rahman et al., 2017). The most recent version of the software, Version 7 (Nzimande et al., 2025) , is available in both online and offline configurations, and quickly assesses TB risk. CAD4TB is available through the Global Drug Facility (Stop TB Partnership, 2021), enabling national TB programs to access pre-negotiated pricing with quality assurance and technical support. One license costs between $10,000 and $13,000, depending on volume purchased.

Health problem it addresses

Traditional TB diagnosis relies on sputum testing (microscopy or molecular tests like GeneXpert) that requires symptomatic patients to self-identify and seek care (Ayalew et al., 2024). Chest X-ray screening, particularly in portable versions like Delft Light, can identify cases earlier and among asymptomatic populations, but the implementation of large-scale programs has been constrained by the need for trained radiologists. Computer-aided detection can resolve difficulties, including lack of trained personnel and substantial variation across human readers in detecting TB-related abnormalities. This gap is substantial: according to WHO estimates, ~30% of new TB cases go undiagnosed each year (Lemaignen, 2024), underscoring the need for scalable screening approaches that do not rely on scarce radiologists. In high-burden settings, AI-assisted screening allows rapid triaging to identify who needs confirmatory testing, potentially reducing molecular test consumption while maintaining or improving case detection.

How the technology works

Since at least 2018, CAD4TB has used deep learning techniques trained on chest X-rays from multiple countries and equipment types (Philipsen & Meijers, 2018). The software takes a single chest X-ray as input and produces several outputs: image quality assessment, a TB probability score (0-100) based on bacteriological reference standards, an abnormality score based on radiological reference, and a heatmap highlighting concerning areas (Health AI Register, 2024).

Using our classification framework, CAD4TB falls into the “medium-high” frontier AI category. While it relies on deep convolutional neural networks rather than large language models or multimodal foundation models, the system represents sophisticated application of modern deep learning based on expanding training datasets from diverse geographic settings.

Evidence and success stories

CAD4TB is among the most extensively validated AI diagnostic tools we have seen, as Delft claims that there are over 100 publications documenting its technical performance and operational implementation (Delft Imaging, 2025a). Key validation studies include:

  • An evaluation testing Version 6 on 5,565 independent chest X-rays in Pakistan found 76% specificity against GeneXpert reference when set to 90% sensitivity, performing comparably to expert radiologists (Murphy et al., 2020)
  • A 2021 independent evaluation of 12 AI solutions found CAD4TB version 7 achieved specificity point estimates that were marginally higher than the Expert Reader’s[7] sensitivity of 95.5% and specificity of 42.2% (Codlin et al., 2021)

Scale and implementation

Delft Imaging reports significant operational scale with 300+ active projects, and maintains partnerships with WHO, Stop TB Partnership, United Nations Office for Project Services (UNOPS), United Nations Development Programme (UNDP), Global Fund, and World Bank (Global Health Hub, 2025). CAD4TB has been deployed in diverse contexts, including community outreach, health facility triage, prison screening, and national prevalence surveys. At the ISR-WHO World TB Day Symposium 2021, the WHO recommended computer-aided detection, such as CAD4TB, as an alternative to human interpretation for TB screening (WHO, 2021).

Broader product portfolio

Beyond TB screening (and the associated hardware), Delft Imaging has developed the following AI-assisted tools. We have not assessed these in depth:

  • BabyChecker (Delft Imaging, 2024d): Launched in 2021, this smartphone-based ultrasound solution uses AI to analyze obstetric scans and identify pregnancy risks, enabling frontline health workers with minimal training (as little as five minutes) to conduct scans and receive AI analysis of gestational age, fetal presentation, multiple gestations, and placenta location (Delft Imaging, 2025b). As of 2024, BabyChecker had been deployed in six countries (Sierra Leone, Zambia, Honduras, Kenya, Malawi, Tanzania; Singh, 2024), with over 8,000 scans completed and more than 200 health workers trained (Delft Imaging, 2025b).
  • CAD4COVID: Developed in April 2020 using the same technical core as CAD4TB to identify COVID-19 characteristics on chest X-rays (Smolaks, 2020). A 2020 study found comparable performance to six independent radiologists when tested on 454 patients (Murphy et al., 2020).
  • RetCAD (Delft Imaging, 2024c): Retinal image analysis to detect glaucoma, diabetic retinopathy, and age-related macular degeneration.
  • CAD4Silicosis (Delft Imaging, 2024b): A module for detecting silicosis among miners.

Room for more funding

Delft Imaging operates as a self-sustaining social enterprise without relying on external donor funding (Global Health Hub, 2025). Although Delft Imaging generally appears financially stable, some products—such as BabyChecker—are still in an early commercialization phase and do not appear to have fully sustainable revenue models yet, based on publicly available information (Sinclair, 2024). The business model likely combines: software licensing; hardware sales (portable X-ray systems, OneStopTB clinics, stationary equipment); service contracts and training programs; and likely cross-subsidization between products.

Organization leadership

Delft reports 70+ employees across 4+ offices, though the core team may be smaller (Delft Imaging, 2024e). The company is led by CEO and founder Guido Geerts, who has driven the company since the early 2000s and also serves as Co-CEO and partner at Thirona, the Radboud University Medical Center spin-off that co-developed CAD4TB.

As a certified B Corporation, Delft Imaging achieved a B Impact score of 120.7 (median is 50.9) and was named a 2022 Best for the World B Corp (Delft Imaging, 2022) in the “Customers Impact Area.” As a B Corp, Delft publishes Annual Impact Reports (Delft Imaging, 2024f), indicating continued successful operational activity and commitment to transparency. In January 2025, Delft Imaging won the Oryx Prize (Delft Imaging, 2025c), the grand award of the FD Gazellen Awards, recognizing it as “the greatest company” among fast-growing Dutch enterprises. The jury commended the company for its international perspective, social commitment, and contributions to global health equity. The company also won Health Technology Company of the Year at the HTW Awards in May 2024 (Sinclair, 2024).

Similar organizations

Chest X-Rays and diagnostic assistance for respiratory conditions were one of the more common applications for AI in global health that we came across. Potentially similar organizations include Qure.ai, JF Healthcare, Xvision, InferVision, DeepTek, Lunit, Oxipit, and Thirona.

Medic

Overview

Medic is a San Francisco-based nonprofit organization founded around 2010 by Josh Nesbit and Isaac Holeman to improve health in the hardest-to-reach communities through open-source digital health software (Ashoka, 2011). The organization serves as the technical steward and primary contributor to the Community Health Toolkit (CHT), a recognized Digital Public Good that supports community health workers (CHWs) delivering care at the doorstep level (Digital Public Goods, 2024). As of late 2024, CHT-based apps supported over 165,000 health workers across 18 countries in Africa and Asia, with health workers having conducted over 172 million caring activities since the toolkit’s inception (OpenHIE, 2022).

Medic operates with offices in San Francisco, Nairobi, and Kathmandu, employing approximately several dozen staff members, including developers, designers, researchers, implementation specialists, and program managers (Medic, 2021a; Medic, 2021b). The organization’s 2023 revenue was $12.2 million, with 89.6% derived from contributions and the remainder from program services (ProPublica, 2025).

Key product: Community Health Toolkit

The CHT is a collection of open-source software frameworks and applications released under the AGPL-3.0 license, designed specifically for community health systems (Community Health Toolkit; GitHub, 2014). The toolkit supports applications on basic feature phones (via SMS), Android apps on smartphones, and web-based interfaces for tablets and computers (Engineering For Change, 2024; Community Health Toolkit, 2025).

The CHT is free and open-source, without licensing or per-user fees (UNDP, 2022). Apps built with the CHT include features such as “smart messaging, decision support, easy data gathering and management, and health system analytics” (Engineering For Change, 2024). Implementation costs would typically include: basic feature phones at ~$25 each or smartphones at higher price points, data connectivity (when available), capacity building for CHWs and supervisors, and technical support for deployment and customization (Engineering For Change, 2024).

Health problem it addresses

While Medic supports community health programs across multiple countries, we use maternal and newborn health challenges in Kenya as a representative scenario. In Kenya, the maternal mortality ratio in 2023 was 530 deaths per 100,000 live births, exceeding the global average (MedicMobile, 2023). Rural and underserved communities face significant barriers to accessing antenatal and postnatal care, including geographic distance to health facilities, shortage of healthcare providers, and limited health system capacity for systematic follow-up (Vargas et al., 2025). Traditional paper-based tracking systems for CHWs are inefficient, making it difficult to ensure pregnant women receive the WHO-recommended eight antenatal care (ANC) visits and deliver in facilities with skilled birth attendants (WHO, 2016).

Only 58% of women in rural Kenya complete all recommended ANC visits, and many deliveries occur at home without skilled attendance (Igumbor et al., 2024). CHWs operating without digital tools struggle to track pregnancies systematically, coordinate referrals, and provide timely reminders for care visits—challenges that contribute to preventable maternal and newborn deaths (Haroun et al., 2024).

How the technology works

The CHT-based apps for antenatal care automatically register pregnancies, schedule care visit reminders, enable danger sign reporting, and coordinate with health facilities for delivery planning (Guidestar, 2025). In our scenario, when a CHW registers a pregnant woman using their mobile device, the app automatically generates a schedule of recommended ANC visits based on gestational age and sends automated SMS reminders to both the CHW and the pregnant woman about upcoming appointments (Engineering For Change, 2024). The app also provides decision support workflows that guide CHWs through standardized assessment protocols, flagging danger signs that require immediate referral to health facilities (Engineering For Change, 2024). The CHT’s architecture is designed for offline-first functionality, enabling health workers to operate effectively without consistent internet connectivity, and devices can function for weeks between synchronization opportunities (Community Health Toolkit, 2025). The platform supports multiple languages, including English, Spanish, French, Nepali, and Swahili, with capabilities to add additional local languages as needed (Community Health Toolkit, 2025).

Using our classification framework, the CHT represents a relatively low level of AI usage, but we have optimistically assigned it to “medium” for our purposes. Medic’s core platform appears focused on mobile applications and workflow management for CHWs rather than frontier AI. The organization mentions plans for “AI integration” in their future roadmap (Settle, 2024), suggesting they are not currently heavily using advanced AI models but may be moving in that direction.

Evidence and success stories

The CHT platform has been validated through randomized controlled trials and repeated cross-sectional surveys. As early as 2015, a study in Kenya’s Millennium Villages found that an SMS-based system increased ANC attendance and improved coordination between CHWs and health facilities for HIV-positive pregnant women (Mushamiri et al., 2015). Similarly, research in Uganda showed that Living Goods’ CHW program using the CHT-based Smart Health app significantly increased childhood vaccine coverage in participating communities (Brydon, 2023). A study in rural Kenya evaluating the Smart Health app found it improved healthcare-seeking behavior and referral completion rates compared to paper-based CHW systems (Karlyn et al., 2020).

Scale and implementation

Medic’s CHT has achieved substantial operational scale across multiple countries, reaching over 160,000 health workers across 18 countries. As of 2022, the largest deployments were in Kenya, Nepal, and Uganda, each with approximately 10,000 active users (UNDP, 2022).

Six governments (Kenya, Mali, Nepal, Niger, Uganda, and Zanzibar) have selected the CHT as their digital community health platform, with roadmaps to scale to their collective 350,000 CHWs (Medic, 2021). In September 2023, Kenya officially launched the CHT-based electronic Community Health Information System (eCHIS) for national scale, representing a significant milestone in government adoption and sustainability, and capping off more than a decade of engagement in the country (Jacobs, 2023).

Room for more funding

The organization’s revenue model (Global Goods Guidebook, 2025) consists of philanthropic support for core CHT development (which allows them to avoid per-user licensing fees) plus contract revenue for partner services—but these partner services are exclusively CHT implementation and customization work (Medic, 2024). Based on IRS Form 990 filings, Medic has demonstrated financial growth and stability (ProPublica, 2025). Revenue increased from $4.8 million in 2021 to $12.2 million in 2023, with total assets of $6.6 million in December 2023 and $5.3 million in December 2024 (ProPublica, 2025; Medic, 2024). The organization maintains more than 90% of revenue from grants and contributions, indicating continued dependence on philanthropic funding (Medic, 2024).

Given ongoing national-scale deployments, continued funding is likely needed to support activities such as expanding technical team capacity, developing additional workflows, strengthening interoperability, and conducting large-scale evaluations. These points are based on public documents describing their current activities and organizational plans, not on our own assessment of funding gaps.

Organization leadership

The organization was co-founded by Josh Nesbit, who served as CEO from 2010 until stepping down in January 2021 while remaining on the Board of Directors (Nesbit, 2021). Under Nesbit’s leadership, the organization earned numerous recognitions, including the 2014 Skoll Award for Social Entrepreneurship (Skoll, 2014). Medic is currently led by Dykki Settle as Interim CEO following Dr. Krishna Jafa’s departure in June 2024 (Graham, 2024). Dr. Jafa served as CEO from March 2022 to June 2024, during which time the CHT user base grew from 41,000 to over 130,000 health workers (Ennis, 2022; Graham, 2024). Dr. Jafa brought 25 years of expertise in health system strengthening, with previous leadership roles at the Bill & Melinda Gates Foundation, Population Services International, and the CDC (Ennis, 2022).

Similar organizations

Medic’s Toolkit supports a broad range of tools, and many organizations provide services similar to at least one of them. For our considered scenario of antenatal care support, our impression is that Dimagi and D-Tree do similar work in some respects.

HEP Assist

Overview

HEP Assist is an AI-powered clinical decision-support (CDSS) tool for community health workers (CHWs), developed by Last Mile Health and IDinsight in collaboration with the Ethiopian Ministry of Health (Megentta, 2025). The tool entered pilot deployment in 2024 across 10 ministry-run call centers. In the current pilot, CHWs who encounter difficult cases call into these centers, where trained agents use an LLM built on national clinical guidelines to provide real-time advice on assessment, treatment, and referral. The system is designed to strengthen frontline care in rural settings where CHWs often work without direct clinical supervision. The longer-term vision is to give CHWs direct access to the tool on mobile devices and to expand the platform to additional countries in sub-Saharan Africa.

Because HEP Assist was only piloted in 2024, there is no published empirical evaluation yet, and no public data on clinical impact, accuracy, or changes in referral rates. The system still appears to be in early pilot phase, and results have not been reported. Public information also does not yet describe the technical features, performance, or cost of the future mobile version.

Key product: AI-assisted clinical decision support for rural child illness care

Although the tool is currently accessed through call-center intermediaries, the intended future version would function as a smartphone- or text-based assistant used directly by frontline workers. In our envisioned deployment, a CHW at a rural health post would enter a child’s symptoms and history, and the LLM would return standardized triage, treatment, and referral guidance aligned with national protocols.

Health problem it addresses

In Ethiopia, where HEP Assist is being implemented, CHWs are the primary source of healthcare for roughly three-quarters of the population. More than 40,000 workers each serve 2,500 to 3,000 people, often in remote rural areas without on-site clinical supervision. CHWs are responsible for diagnosing and treating a wide range of conditions, from child diarrhea and pneumonia to maternal complications, but it is unrealistic to memorize hundreds of pages of national guidelines and diagnostic protocols, especially for unusual or severe cases. As a result, clinical decision-making is often inconsistent, and errors in triage and referral are common, contributing to preventable morbidity and mortality (Sieger, 2025).[8] This challenge is not unique to Ethiopia: countries such as Malawi, Uganda, and Kenya might also benefit from CHW support (Sieger, 2025).

How the technology works

Details on the technical stack are very sparse, but HEP Assist is described as an LLM-based clinical decision-support tool trained on Ethiopian Ministry of Health guidelines. In the current rollout, CHWs call ministry-run support centers, and agents use the AI to interpret symptoms, retrieve relevant protocol guidance, and provide standardized treatment or referral advice. The LLM allows natural-language queries and returns structured recommendations aligned with national protocols. The system was co-designed with 19 ministry clinical experts to ensure safety and relevance, and the longer-term plan is to give CHWs direct mobile access so they can consult the AI without an intermediary (Sieger, 2025).

Scale and implementation

HEP Assist is currently being piloted across 10 Ministry of Health-run call centers in Ethiopia, where agents use the AI tool to support CHWs when they encounter difficult cases. The long-term plan is to give ~40,000 CHWs direct mobile access to the LLM-based assistant and eventually expand the platform to other countries that rely heavily on CHWs, such as Malawi, Uganda, and Kenya (Sieger, 2025).

Room for more funding

There is no public information on the long-term funding model for HEP Assist or whether the program is actively seeking philanthropic support. The only disclosed funding to date we are aware of is a $200,000 award from the Mastercard “AI to Accelerate Inclusion” Challenge (Sieger, 2025), which supports pilot development and scale-up within Ethiopia’s call-center system. We have not found evidence of other external funders, projected operating costs, or stated financing gaps for full deployment to frontline workers.

Organization leadership

HEP Assist is led jointly by Last Mile Health, IDinsight, and the Ethiopia Ministry of Health. On the implementation side, the only publicly named lead is Abraham Zerihun Megentta, Last Mile Health’s Ethiopia Country Director, who seems to oversee the rollout across the ministry-run call centers and integration with the CHW system. Technical development appears to be led by IDinsight, with Chief Data Scientist Sid Ravinutala publicly identified as directing the LLM-based guidance engine. Beyond these named individuals, we found no public information on involved personnel, or the technical backgrounds of contributing staff.

Similar organizations

We investigated HEP Assist as a potential example of LLM-assisted CDSS, but acknowledge that information on this particular program is scarce. We found two initiatives (PATH 2025 AI CHW pilot in Rwanda and ASHABot in India; PATH, 2025; Ramjee et al., 2024) that are similarly LLM-based chatbots to support CHWs in LMICs. Our impression is that both of these are also in the research or early stage pilot phase.

Existing CDSS programs utilizing rule-based machine learning may soon switch to similar LLM-based approaches, so it’s possible that Reach52 and D-Tree, for example, may soon incorporate a similar approach. Some organizations offer AI-assisted clinical decision support as part of a suite of tools, so this impact pathway could also include organizations such as Medic and Dimagi.

 

Contributions and acknowledgements

Rethink Priorities logo

Ruby Emerson and Jenny Kudymowa jointly researched and wrote this report. Emerson also served as project lead. Aisling Leow and John Firth supervised the project.

Special thanks to Deena Mousa, Oliver Kim, and Abbie Clare for helpful comments on drafts. Thanks also to Shane Coburn and Thais Jacomassi for copyediting, and to Elisa Autric for assistance with publishing the report online. Further thanks to Pius Alabi and Raina MacIntyre for taking the time to speak with us.

Coefficient Giving (formerly Open Philanthropy) provided funding for this report, but it does not necessarily endorse our conclusions.

  1. Searches covered broad terms like “AI for health” and “artificial intelligence for health,” along with more targeted searches such as “AI for diagnostic imaging” and “AI for disease surveillance.” We also searched for different types of AI systems such as “generative AI for health,” “large language models for health,” and “multimodal AI for medical imaging,” though these additional searches yielded few distinct results, so we primarily relied on broader AI terms. In some cases where LMIC representation was otherwise limited, we also searched the aforementioned terms alongside phrases like “LMICs” and “low-income context” in order to specifically identify organizations that work in such contexts.
  2. We validated all organizations suggested by LLMs or found in third-party references, such as news reports and academic articles, in order to confirm that each organization actually exists and operates in the health sector. This involved checking each organization’s website or public presence (e.g., press coverage) to filter out non-existent, duplicate, or defunct entries.
  3. Deep learning started gaining traction after 2012, when convolutional neural networks achieved major breakthroughs in image recognition (Krizhevsky et al., 2012), sparking wider use across AI applications.
  4. EPIWATCH should not be confused with EpiWatch, a seizure detection app for the Apple Watch.
  5. See this slide deck by EPIWATCH that provides a useful overview of the organization.
  6. RxAll provided a high-level breakdown of intended use of funds: ~40% for go-to-market activities and pharmacy onboarding, ~25% for credit operations and risk automation (RxPay), ~20% for product and data (including spectral-library expansion and RxAI v2), ~10% for quality and regulatory work (e.g., pharmacovigilance tooling and laboratory quality assurance), and ~5% for administration and contingency.
  7. A qualified radiologist with over 30 years of experience.
  8. For example, Getachew et al. (2019) found large gaps in diagnostic accuracy among CHWs in Ethiopia: while diarrhea was usually identified correctly (89% sensitivity), only 52% of fevers, 59% of respiratory infections, and 39% of malnutrition cases were detected, suggesting that many sick children went undiagnosed or received incorrect management.