Last week, I came across news announcing IDx’s acquisition of 3Derm, two automated diagnostics startups that I have been following closely for the past few years. While they operate within different specialties - ophthalmology for the former and dermatology for the latter - they both belong to a pre-2012 group of “first movers” into healthcare AI. These startups had already made some progress, with products on the market, before deep learning came to prominence and anything “AI/ML” became a hot topic. They were able to capitalize on their existing data infrastructure to implement deep learning solutions at scale and develop clinical AI tools. In this article, I explore how this head start has helped them, how ideas for AI-centric products can be validated without AI, and how AI offerings across multiple medical specialities fit into the larger context.
A tale of two startups
First, we have the acquirer IDx with diagnostic products for diabetic retinopathy - a diabetes complication that damages light-sensitive tissue in the retina (the back of the eye). If left untreated, it may cause mild vision problems and blindness in extreme cases↗. Retinal imaging using a fundus camera (a low-powered microscope with a camera) and manual interpretation by an ophthalmologist is a widely accepted screening method for this disease↗. IDx is developing AI systems for the grading and detection of diabetic retinopathy in fundus photographs. In 2017, IDx ran the first clinical trial for an autonomous medical AI system↗↗, and a year later its product received FDA clearance making it the first device authorized for fundus image screening without the need for an ophthalmologist - essentially making it usable by healthcare providers who may not be involved in eye care↗.
The acquiree, 3Derm, is developing diagnostic products for skin diseases ranging from common rashes to severe infections and skin cancer. Diagnosis of these types of diseases is often carried out initially through visual inspection with potential followup biopsy and pathological examination. Given that the average wait time to see a dermatologist in the US is 28 days↗, teledermatology has grown in popularity as a cost-effective and reliable means for dermatology care delivery↗. This involves photographic imaging of suspicious skin findings for almost instantaneous interpretation by remote experts. 3Derm provides teledermatology services through skin imaging hardware and automated diagnostic products. They’ve participated in multiple pilot programs↗ and clinical efficacy studies↗↗, and earlier this year, a 3Derm product for autonomously detecting different types of skin cancer became the first AI device in dermatology to receive the FDA “breakthrough Device designation”. This designation is a fast-track regulatory pathway for devices that demonstrate more effective diagnosis for life-threatening and irreversibly debilitating diseases↗.
These two startups share multiple aspects. Both are in the diagnostic space where interpretation by a specialist is required. From a machine learning perspective, both problems addressed may be formulated as a classification problem where a fundus photograph or an image of a suspicious mole can be classified into either negative or positive for a given disease. At their core, both AI applications are likely to utilize similar convolutional neural networks (CNNs) - a class of deep learning algorithms↗ - to perform this classification. Additionally, both products are primarily targeted for use in primary care where they may help triage patients and identify those who would benefit the most from a referral to specialists. Finally, the data used by both are also similar: two dimensional images captured by devices that come in handheld mobile variants and require minimal operator skill - making them ideal for non-specialist primary care settings. Perhaps the most interesting commonality between them: both were founded prior to the 2012 resurrection of research in neural networks and the popularization of deep learning. To understand how this head start has helped them capitalize on the technology, let’s explore what changed back then.
Pre-2012, much of computer vision research was based on feature engineering or explicit hand-crafted features designed by experts↗- and had struggled to reach performance levels that would make it clinically useful. In 2012, deep learning - where learning happens directly from labelled data - made substantial performance gains in the ImageNet image classification competition↗. It very quickly became the de facto method for analyzing various data types↗ and doing machine learning in general. It wasn’t until 2014 when the first studies to apply deep learning in medical imaging started to appear↗. Only a handful of studies helped bring these applications to light, and for the ophthalmology and dermatology specialities, these studies happen to come out of Google Research and its massive PR machine. The 2016 ophthalmology study in JAMA showed that deep learning algorithms had a high sensitivity and specificity for detecting diabetic retinopathy in retinal fundus photographs↗. The 2017 dermatology study in Nature demonstrated the ability of a CNN to classify skin lesions into over 700 diseases↗.
While deep learning came with performance improvements, it also shifted the ML bottleneck from the methods to the data. The tedious methods that previously required PhD’s can now be applied using over a dozen open-source tools↗. On the flip side, deep learning requires more data than prior methods, with more higher quality data often meaning better performance. As a result, data engineering became a core component of any machine learning product. While some startups were working on gaining access to data through licensing agreements (either with providers or pharma) and establishing data pipelines, others already had a data infrastructure in place and were able to capitalize on it to train and develop deep learning solutions. IDx and 3Derm are two examples of the latter.
A head start
For IDx, deep learning was simply a new method. The first IDx patents go back to 2006 where hand-crafted features were used↗. By 2013, IDx already had autonomous diagnostic products on the European market based on these methods↗. Introducing deep learning to their existing machine learning stack↗ brought along improved performance and likely played a major role in the FDA clearance they received 5 years later. The increased demand in quality and quantity of training data as a result of this transition was perhaps easily satisfied given IDx’s existing machine learning prediction engine, data infrastructure, as well as existing customers for continuous feedback and development.
3Derm had started by developing the hardware: a stereoscopic digital 3D dermatoscope for capturing standard photos of skin diseases↗. This was bundled with a web interface for cataloging and monitoring these abnormalities. As the product matured, a teledermatology component was added allowing for remote experts to diagnose images captured and processed through their system. Introducing some level of AI into this system seems quite plausible as these diagnostic models can really capitalize on the existing teledermatology service. Given appropriate permissions, diagnostic models can be developed using data processed by the system (and its corresponding labels). This data is likely relatively clean: It has been collected and catalogued in a standard way making it more “ML ready” than your average clinical data. The web component 3Derm developed for viewing images can now serve as the data labelling tool. 3Derm’s existing network of dermatologists can now double as data-labellers, helping fuel the models with more data while providing valuable feedback.
It’s all about data engineering
Where is the best spot to build an AI product? On top of a data stream.
For both IDx and 3Derm, this headstart allowed them to really understand the nuances of the data early on - everything from edge cases and imaging artifacts to more general data issues such as class-imbalance. For those working on developing clinical AI products, there is perhaps a lesson here: a solid data infrastructure is a prerequisite for successful implementation. This is the core of what being “AI-first” is: collecting data from day one. In some instances, being AI-first might mean that you start with no AI at all. Infact, it may make sense to start by implementing an idea in its analog form, validating it, then digitizing parts of it over time. In other words, you can build an infrastructure for images to be captured and analyzed by remote experts, test if the solution really addresses an unmet need and providers are willing to pay for it, then ultimately introduce AI. By not forcing expensive digital solutions before validating them, you allow yourself to “fail fast”.
If the analog form is indeed a good place to start, one might consider any telemedicine product that relies purely on image interpretation (teleradiology, telepathology…etc)↗ to be the best proving ground for AI interventions: common cases can be automated while experts can be consulted for the more complex. In fact, any business model that deals with medical data logistics is well positioned to implement AI solutions, at least theoretically. For instance, you will find that big tech offerings that provide medical data storage, retrieval, and archival - such as Google Cloud↗ and Microsoft Azure↗ - also offer peripheral add-on AI products as well as data analysis and labelling services.
AI across medical specialities
A few years ago (~2017), we witnessed the consolidation of products from different vendors through the emergence of AI marketplaces - essentially “app stores”. These marketplaces largely operate within a given speciality (e.g. Nuance↗ and Blackford↗ in radiology, Visiopharm↗ in pathology.. etc). While it is still early to gauge their success, they have multiple selling points. For vendors, they provide additional monetization channels as well as reach. Providers, on the other hand, get single point access to a wide range of AI models, in addition to tracking algorithm usage and performance among other metrics↗. The AI marketplace concept clearly needs a dedicated article. I chose to highlight it here as I see the evolution of stand-alone products into AI marketplaces and now into cross-speciality AI offerings - demonstrated by this acquisition - as an encouraging sign of both technology and market maturity.
IDx and 3Derm will now operate under Digital Diagnostics, a new “AI vendor” brand↗. It will be interesting to see how AI products for ophthalmology and dermatology may be bundled, marketed, and branded together. Offerings across specialities is not a new concept. Many incumbents in medical image analysis have long been active across a wide range of specialities (e.g. Phillips for radiology, cardiology, and pathology↗). However, departments within these larger organizations tend to be heavily siloed, with solutions developed in isolation from one another. Instead of disparate efforts to provide add-on AI solutions within each of these departments, Digital Diagnostics now has the opportunity to start with AI as a common denominator and demonstrate how the technology can extend horizontally to bridge different medical specialties. For instance, R&D efforts can be shared across the board. Pre-2012, explicitly defined algorithms for analyzing fundus photographs differed greatly from those used on skin images. Today, the data-agnostic nature of deep learning allows almost identical models to be trained separately on different data for different tasks. Additionally, data preparation and labelling tools may also benefit multiple data types. From a clinical user perspective, it may even be the case that a single modular product serves multiple specialities allowing for a “single interface experience”. The versatility of this horizontal AI product to expand to other specialties beyond ophthalmology and dermatology will ultimately signal its scalability and success.
Headwinds & tailwinds
As with all AI-powered diagnostic tools, the issue of efficacy vs effectiveness often comes into play. Despite having some methodological and clinical limitations, IDx’s prospective observational trial is essential in understanding the efficacy of these tools↗. For effectiveness, however, whether patients directly benefit from them remains unanswered. The evidence from the real world has not been entirely positive so far. The performance of Google’s diabetic retinopathy diagnostic product deployed in Thailand was recently reported. More than one-fifth of images were rejected by the system as it was designed with a relatively high rejection threshold for quality. Poor internet connection often stood in the way of uploading images to the prediction servers↗↗. This highlights the influence of socio-environmental factors and clinical workflows in general on the real-world performance of such systems. Also related to real-world performance are the limitations of autonomous AI devices by design. IDx’s device, for instance, requires images to be captured only by one fundus camera of a specific make and model, is only approved to detect “more than mild” diabetic retinopathy, and is not cleared for use on patients with pre-existing diabetic retinopathy↗.
For providers, AI offerings across specialties may be an attractive option where otherwise a different vendor per speciality is needed. We see a similar pattern with electronic health records (EHR) where providers must often choose between single- and multi-speciality systems, each with its own pros and cons↗. There might be a time in the future where providers have to decide the same for AI systems.
From a regulatory perspective, both IDx and 3Derm have crossed some major milestones as mentioned in the introduction - ultimately lowering the regulatory barriers for incoming players. Specifically with IDx, their FDA clearance was obtained through the De Novo pathway, allowing competitors to use IDx’s marketed device as a predicate for their own submissions and go through another regulatory pathway - the 510(k). For context, De Novo pathways are for new technologies that present low to moderate risk to patients and require an in depth risk-benefit analysis. In 510(k) submissions, devices are only required to show “substantial equivalence” to a previously cleared device↗↗.
Who is paying for all this? In August 2020, the Center for Medicare and Medicaid Services (CMS) introduced billing codes for fully autonomous AI systems that detect certain eye diseases - the first reimbursement of its kind without specialist intervention↗. While this marks a critical first step, it is likely an uphill battle over the coming years to get coverage through other commercial and private payers. Reimbursement for teledermatology services, however, is still relatively new and vary from state to state↗.
A very small step towards AGI
Writing this article made me think about the concept of artificial general intelligence. I admittedly had to consult wikipedia for a plain definition:
Artificial general intelligence (AGI) is the hypothetical intelligence of a machine that has the capacity to understand or learn any intellectual task that a human being can↗.
While we may be still quite far from achieving this today, the aggregation of knowledge - one that would normally exist across multiple human professionals - is definitely a step in that direction. There aren’t many physicians around the world that have specialized in both ophthalmology and dermatology, or in any two or more medical specialities that would otherwise require multiple lifetimes of training and practice. While the underlying mechanics of this aggregated knowledge today consists of separate models performing single limited tasks, serving them from the same source is perhaps a very early embodiment of AGI.