Validation is an essential part of the Infermedica medical content creation process, which allows us to measure the accuracy and correctness of our medical database. See what testing methods we use and how we can determine whether our new medical content is safe for patients to use.
Every day our team of over twenty doctors is working on the development of the Medical Knowledge Base. They expand it by creating new medical concepts corresponding to various conditions and symptoms. Later, these medical concepts will become part of Metabase, the database that underlies all pre-diagnosis and triage solutions offered by Infermedica. But before that happens, each medical concept undergoes a series of rigorous tests that verify its quality, accuracy, and alignment with other concepts.
The medical content development process at Infermedica includes five major steps, in which testing plays an important role:
- defining the scope of the desired changes
- eliciting expert knowledge based on the literature and other available sources
- specifying test criteria
- performing manual and automated acceptance tests
- deploying the verified model to the API
Why do we care about validation of our medical content?
We consider validation to be a critical task, because the reliability and accuracy of our medical content are closely linked to patients’ trust, correct pre-diagnosis, routing to the appropriate help level, and effective support of healthcare professionals.
Millions of people search their symptoms online, with some studies estimating 6.75 million health-related searches daily. Thousands of these people reach out to chatbots or call centers associated with their healthcare provider. We are also observing a growing number of medical facilities using pre-diagnosis apps for triage and to improve patient management. We take care to provide them with the most accurate pre-diagnosis suggestions possible.
Our Medical Knowledge Base (Metabase) consists of 680 conditions, and we are continually expanding it to meet the very specific needs of patients and physicians. Usually, our medical team is working on different medical concepts in parallel. They often focus on various medical specializations, and they work in several languages. We always take the utmost care to use reliable sources of medical knowledge, and we develop concepts together with recognized experts. During our work with this large and complex system, we are always looking for potential improvements.
The testing methods we use to validate medical content
Each time we created a new medical concept, it was analyzed in a peer-to-peer review and improved. When it was deemed flawless, it was added to Metabase. Since that database strongly relies on conditions and symptoms (medical concepts), we soon began to use medical test cases (which represent real-patient cases) to test new concepts, as well as the overall performance of the database. Test cases also helped us to introduce automation processes.
Our testing processes matured over time, and today we use rigorous and yet efficient testing methods that help us to spot and correct any inconsistencies.
We follow these methods with every single content release on Metabase:
This step takes place once a new medical concept is ready. During the peer-review, new concepts are challenged by fellow medical content editors. Together they propose, discuss and introduce possible improvements. At the same time, they validate all data and sources related to the new medical concept.
The whole process of peer review is simplified by Metabase, which, with a system of comments, flags, and statuses, helps doctors to communicate and focus on the specific elements of the new medical concepts.
Once the new concept is approved, our doctors start working on clinical test cases. They look for literature-based examples of a given symptom or condition to find clinical cases they can use to check the performance of the newly added medical concept.
Construction of the test case is similar each time. It is based on basic information about the real-life patient, detailed clinical features, and sources of the case. An example of a title of a test case would be “33-Year-Old Woman With Epigastric Pain and Hematemesis”, published in Mayo Clinic Proceedings, Mayo Clin Proc. 2012 Feb; 87(2): 194–197. In its description, we present carefully analyzed clinical features, such as age, sex, and any confirmed or excluded findings, e.g. symptoms, risk factors, and lab test results. Each test case is complemented with the resulting condition (diagnosis) and the acceptance criterion (differential diagnosis ranking), which indicates the probability of its occurrence as one of the top factors.
The clinical cases we use are derived primarily from well-established medical sources, such as BMJ, NEJM, the American Journal of Medicine, and Mayo Clinic Proceedings.
Once the new test case has been added, Metabase validates it against Infermedica’s diagnostic engine and checks its correctness with all other symptoms and conditions in the database. As of today, test cases have a 93% success rate for common conditions.
After positive verification, the new test case becomes part of Metabase and is used in regression tests (described later in this article).
At regular intervals, all our medical content undergoes extra verification. Additional review is conducted by medics from the Infermedica expert panel. These are specialists with experience in various fields. During the expert review, they carefully inspect both new and existing concepts to identify any potential incompatibilities.
Next, the new concepts undergo technical review, in which our most experienced physicians check whether the new concept is in accordance with the internal guidelines. Each new concept must be perfectly aligned with the structure of the medical content, so they look for potential issues involving the structuring of the medical content. These could be anything from duplicated symptoms to disturbed hierarchy of symptoms or problems with newly introduced numerical parameters.
As soon as the new medical concepts are confirmed, we start building an updated diagnostic model in Metabase. At this point, we use regression testing to judge how the newly introduced content affects the performance of the previous model. Comparing versions of diagnostic models with regression testing helps us to guarantee the stability of this complex system and allows us to continuously measure its behavior.
It is worth noting that our clinical test cases represent varying degrees of complexity and rarity of disease representation, so its actual performance in tests varies from 93% for cases of common conditions to 85% for all cases.
The final step of our content creation cycle is manual testing. Here, our doctors test the newly introduced conditions and symptoms in a way that reflects how doctors or patients would be using them in real-life experience. Although this method is subjective, it helps us spot elements that would interfere with user experience and correct them before they see daylight.
Before our newly developed and tested medical content reaches patients and healthcare professionals, there is one more step - adding translations. This allows users in multiple countries to make full use of applications based on the Infermedica medical knowledge base.
When the translations are completed, our new medical content is deployed to the cloud-based API and is then accessible to the end-users, who can use it through various applications, developed by Infermedica or by external companies leveraging the Infermedica API.
Whatever the interface of these healthcare apps or the location of our end users, we know that thanks to our multi-step testing processes, the medical content we are delivering to them is of the highest quality.