7 Highlights From the New FHIR Server Performance Benchmark

A small but useful piece of FHIR news landed this week. Health Samurai, with Marat Surmashev, VP of Engineering, leading the work, published an open-source performance benchmark that pits four FHIR servers against each other on the same hardware. Outpatient teams shopping for back-end tooling now have a daily-updated data point to look at instead of a stack of vendor PDFs.

Seven highlights worth a quick read are below. For the broader background on FHIR back-end work for outpatient teams, the FHIR primer hub on this site is the starting point.

1. Four Servers Are in the Run

The benchmark covers Aidbox, HAPI FHIR, Medplum, and the Microsoft FHIR Server. Each one runs in the same container shape, with identical resource limits. That is the part outpatient buyers usually do not get to see: every vendor measured under the same conditions.

2. The Hardware Is Pinned

Everything runs on a single bare-metal box with 64 CPU cores and 500 GB of RAM. Each server is allocated 8 vCPU and 24 GB of memory. Medplum runs as eight smaller replicas to match its architecture. The point of pinning hardware is to remove the most common confounder in vendor-published numbers.

3. CRUD Throughput Has a Wide Spread

CRUD requests per second on the 2026-06-29 snapshot range from about 5,200 at the top down to 440 at the bottom, with HAPI and Medplum in between. For an outpatient practice the absolute numbers are less interesting than the spread itself, which is wider than most procurement decks would lead you to expect.

4. Bundle Import Is the Migration Story

Bundle import resources-per-second tells you what a one-time data migration would feel like. Aidbox lands near 2,678, HAPI near 2,214, with Medplum and Microsoft sitting lower. Outpatient practices moving from a legacy EHR to a FHIR-native back end care about this row more than the CRUD row.

5. Search Throughput Has Its Own Shape

Search RPS does not track CRUD perfectly. The benchmark reports search throughput separately, with composite searches called out as a known gap for one of the four servers and very slow for another on quantity and composite. Worth checking against the real query mix your outpatient EHR generates.

6. Storage Footprint Varies a Lot

After loading the same Synthea data, storage sizes run from about 4.24 GB on the smallest to 22.6 GB on the largest. The note in the report attributes much of the gap to index-build strategy: some servers pre-build search indexes on write, one ships without default indexes and treats indexing as a deliberate operator decision. The choosing a FHIR terminology server for outpatient behavioral health 2026 guide walks through how storage and indexing show up in the adjacent terminology layer.

7. The Repo Reruns Daily and Anyone Can Fork It

The benchmark code is open source, the dashboard refreshes daily, and the Medplum CTO has already forked the repository. That last detail is the healthy version of how open benchmarks are supposed to work. Outpatient buyers can read the numbers, then watch how they shift as vendors push optimizations into the open.

One honest caveat is worth naming. The publisher is Health Samurai, which makes Aidbox, so this is a vendor-run benchmark even with the methodology open. The dataset is also small at 1,000 patients, and the next post in the series is expected to test at scale. For practices comparing search and code lookup behavior on real outpatient terminology workloads, the top 5 FHIR terminology servers for behavioral-health ICD-10 coding write-up covers parallel ground.

christiancounsel

christiancounsel