Continuous Glucose Monitors: Does Better Accuracy Mean Better Glycemic Control?
Accuracy is good, but precision is essential.
You can listen to this AI-generated audio discussion that summarizes this article.
Introduction
When Dexcom released its G7 continuous glucose monitor (CGM), it was received with great fanfare, given its best MARD value yet, which correlates to higher accuracy. Naturally, one would assume that greater accuracy translates to better performance, and hence, tighter glucose control, right?
Well, let’s not get ahead of ourselves just yet. It’s not that simple. This article helps clarify how to think about glucose levels as it pertains to diabetes in four parts. First, I cover how glucose behaves within the body. Then I cover to how CGMs try to measure glucose. Part three is my experiment that illustrates how I use CGM data to manage T1D, comparing the G6 and G7. I end with analyzing Dexcom’s clinical trial that it used to get FDA approval for the G7.
Let’s start with the basics.
Glucose’s erratic and seemingly random behavior
When you take your pulse, whatever your reading is, well, that’s your pulse. It is what it is. Same with your blood pressure, temperature, and cholesterol levels, and almost everything else … except blood glucose levels. That’s right, glucose is not evenly distributed throughout the body. It’s concentrated differently in different body parts, and as such, it flows in different concentrations in fluids as it travels from one part to another.
For example, in my article, “Why Controlling Glucose is so Tricky,” I explain how the brain holds 20% of the body’s total glucose volume at any given time, and it burns ~78.4 mg of glucose per minute. That’s about 1g of glucose every 12 minutes. And yet, insulin is not involved in the brain’s metabolization of glucose. As the article goes on to explain, glucose is rapidly running around the body, in and out of organs and muscles, at different rates of concentration.
Managing diabetes requires knowing your systemic glucose level, which is the total amount of glucose throughout your entire body, regardless of where it is. But, it’s not possible to determine that number outside of a specially designed imaging system in a laboratory test that injects you with something called “labeled glucose” that can be tracked visually. Since this isn’t possible, the next best thing is to infer what those levels might be based on glucose levels extracted from blood, and here’s where it becomes apparent how glucose resides in different concentrations in different parts of the body.
You can test this yourself by using a finger-stick glucometer and testing fingertips, arms, toes, legs, etc. When your “systemic glucose level” is relatively stable, you may get more similar results from different parts, but when you eat, exercise, or engage in anything that causes glucose levels to change, the distribution of glucose throughout the body gets more volatile. So, different areas of the body have different concentrations of glucose.
Furthermore, different mechanisms besides insulin (hormones, glucose receptors, exercise) clear glucose from the bloodstream at different times, rates, and concentrations. For these and many other reasons, people think of the differences in these glucose levels as a delay in glucose movement—where it just takes more time to travel from one part to another. This is the “lag” effect you may have heard about between a finger-prick test and a CGM that reads from interstitial tissues.
Yes, there can be a lagging effect, but it’s actually less common in real world settings than people think. The premise for this effect comes from highly controlled lab studies, where clinicians infuse glucose into people’s veins, and then test how long it takes for those glucose levels to reach interstitial tissue. During this time, they don’t change the glucose levels, or impose any other disturbances in the body. They just want to see how long it takes for glucose to get from blood to interstitial tissue. (While the average is 15 minutes, there’s great variability among individuals, and times of day, etc.) Indeed, it’s this lagging effect that CGM algorithms use to attempt to infer systemic glucose levels with that of interstitial tissue.
But lab conditions are not the real world. Exercise, hypoglycemia, eating certain kinds of foods, and cortisol levels are each conditions where glucose may be absorbed from the blood and travel to other areas of the body before seeping into interstitial tissue. And different conditions can alter the rate and concentration of glucose that arrives there. Sometimes, these glucose spikes could rise precipitously, and never get into interstitial tissue at all.
This is why Dexcom doesn’t recommend attempting to “calibrate” a CGM while your glucose levels are not stable.
In the real world, this presents a very difficult challenge for T1Ds to make treatment decisions. A finger-prick blood glucose test might yield a value like 180 mg/dL, but the CGM might only show 120. Trying to determine which value is “correct” is not straightforward, nor will it always be consistent. If you were to treat the 180 value as “correct,” and administer 2u of insulin, that may work… or it might be an overkill if the 180 value was more of an anomaly due to the erratic nature of glucose concentration levels. Here, the CGM value of 120 might have been wiser.
These are only a few of many factors that can vary the speed and manner in which glucose travels, making it very difficult to truly establish systemic glucose levels, and that’s the challenge facing CGM technology.
To read and learn more about this, see the article, “Differences in venous, capillary and interstitial glucose concentrations,” where the authors explain how and why there’s great variability in glucose readings using different types of measuring methods, and how that variability isn’t just a matter of diffusion time (delay).
Despite this variability, it’s not as though glucose levels are entirely random. They’re not. Systemic glucose levels—which is the total amount of glucose in the body—can be reasonably inferred from the chaos going on, largely because scientists have studied this volatility and can infer systemic levels to some degree of accuracy (with margin of error built into it). Yes, there’s error ranges, but these are within acceptable boundaries, allowing CGMs to have made a remarkable difference for T1Ds to manage their glucose levels.
On the other hand, the very same volatility imposes an upper limit to how “accurate” these readings can ever be. As will be clarified later, CGM technology is about as good as it can get, so long as interstitial tissue is the sole source of measurement, especially using only one sensor. Trying to make them “more accurate” may be a fool’s errand, as this article will illustrate.
CGMs and T1D management
Before CGMs, T1D was never managed at a level of granularity beyond a finger-stick test, which is good enough to make macro-level decisions. A reading <70 mg/dL means you eat; >200 mg/dL means you take insulin. Between those extremes, sure, you might take some action, but if you’re only performing finger-prick tests 5-6 times a day (which is more than what most T1Ds did before CGMs), the value of these test kits were limited.
When CGMs came on the market, it completely changed how T1D was managed. And yet, here’s where glucose’s natural volatility sneakily presented itself to be a more difficult matter than anyone expected, not just for CGMs, but for the entire practice of T1D management.
To optimally manage T1D, you need to know your systemic glucose levels, which is the only way to know if and how your body needs intervention. If you were to rely on individual readings, regardless of how “accurate” each may be relative to a reference measuring device (which is how MARD is calculated), you’re going to have far too much noise that has little value in making dosing decisions.
Below is an illustration of this. The following graph shows data form Dexcom’s “more accurate” G7 and its previous version, the G6, which I wore at the same time during a 30-day experiment I ran on myself. The plots from the G7 reveal the natural and erratic glucose patterns just described.
Now we come to the essential question: Is the G7’s more “accurate” readings truly representative of the total amount of glucose in one’s body? Does it make glucose management decisions better? Worse? Or no different?
Before getting into that, notice the BGM readings: In the three instances where I used a blood glucose monitor, that reading was far from both the G6 and G7, nor did it “predict” where those values would end up.
Obviously, both the G6 and G7 are able to track macro-level trends, from which you can reasonably infer overall glucose control. If a doctor wished to use this data to gather long-term patterns, or even to determine glycemic variability in non-diabetics to see if a patient may be an emerging type 2 diabetic, either sensor will do. For these purposes, any sensor on the market will do the job, even those with poor levels of accuracy, because macro-trends don’t require read-by-read numbers.
Similarly, if a T1D only looks at their CGM data 5-6 times a day, then again, the G7 is good enough, as is any other CGM on the market.
But managing T1D to achieve healthy outcomes (time-in-range (TIR) targets of >70%), well, that requires glucose patterns that more closely resemble systemic glucose levels, not intermittent readings that may come from a single fluidic source. Under these conditions, one needs to look at—and react to—shifts in glucose movements far more frequently than a few times a day. In fact, the more often, the better.
With that in mind, look at the G7 readings again. Sugars that rise or fall rapidly would require immediate action, and a tightly-controlled T1D would react quickly on such rapid movements. Look at any section of that chart and imagine yourself in any given moment: If levels appear to be rising or falling, do you trust the data? Or do you wait for the next reading? Or ten readings? Or more? The more readings you need before you can make a decision, the more time has gone by that you didn’t make a decision. And that’s a big deal. Time is critical when it comes to glycemic management, because things spin out of control very fast.
Now, let’s be honest. Most T1Ds don’t really do that. But more and more often, people are beginning to rely on automated insulin pumps to do that analysis, and those systems are looking at each and every reading (or they should). No algorithm can figure out G7 data any better than you can, given how wild these readings are. In fact, a well-controlled T1D has been able to achieve that because they can look at glucose levels and trends in combination with their latest actions (food, insulin, exercise) and make better guesses as to where glucose levels are going than what an algorithm can do. But none of this works if the data being observed is not reliable enough. And that’s where the risk lies.
The G7’s individual readings, if taken literally in each moment, could easily be interpreted as a very rapid change in glucose levels, even though that is highly unlikely to actually be happening in the body, and that can cause a person or an algorithm to make a very bad decision.
This is not just conjecture—researchers have studied this phenomenon and published their results in the article, “Limits to the Evaluation of the Accuracy of Continuous Glucose Monitoring Systems by Clinical Trials,” where the authors describe the erratic and random patterns of glucose fluctuations, and call into question the appropriateness of how clinical trials for CGMs are conducted in the first place.
Read that bold text again, but this time, say it verbally and really loudly. In fact, scream it.
If you want to manage your T1D—or rely on a pump to do it—you must have a CGM that properly estimates systemic glucose levels and movements, NOT individual readings that happen to pair with a blood glucose analyzer, which is how MARD accuracy is determined. (“MARD” refers to the “Mean Absolute Relative Difference” from a reference glucose measuring device.)
And that explains the difference between the G6 and the G7. The G7 may be more accurate in how well it reads how much glucose may reside in any given sample, but that kind of information is nearly useless because glucose moves around erratically and quickly in fluids. By contrast, the G6’s data is more precise because the algorithms that Dexcom used in that sensor took into account glucose’s volatility using statistical probabilities and physical properties of glucose molecules in fluids. Hence, each of its readings were more representative of the body’s true systemic glucose levels, thereby making decision-making far more reliable for the person (or algorithm).
To put this to the test, I wore a G6 and G7 at the same time for a month to see which sensor gives better data to make better clinical decisions. Read on.
Does the G7 yield greater glucose control?
Before I explain how I tested the G6 vs the G7, I need to make it clear that Dexcom’s clinical trial that demonstrated the MARD level for the G7 was not intended to claim that the G7 resulted in healthier outcomes. That’s a very different goal. The company only intended to conduct an “efficacy trial,” which is only intended to show that the sensor was good enough to be approved by the FDA.
What Dexcom did not do is perform an “effectiveness trial,” which is when test subjects would wear each of the two sensors and make real-time management decisions under real-world conditions. I explain this in much greater detail in my article on how to evaluate clinical trials.
Since no trials have been done like this for the G7, I tried to do it on myself. As it happens, my T1D is under very tight control, where my time in range is 95%, with <2% below range (70 mg/dL) and <4% above range (180 mg/dL). Since I attribute this to watching my G6 data very closely, I was curious to see how I’d perform with the G7, whose data differs greatly.
As the data from my experiments will show below, I was only able to achieve a TIR of 75-80% using the G7. What’s more, I also experienced considerably more hypo events and greater variability, both of which can be harmful.
There’s a lot of detail here, so let’s start with my experiment.
During March, 2023, I wore both the G6 and G7 at the same time, but would only observe data from one sensor’s app at a time to make real-time management decisions. The goal was to determine which data made it easier or better to make in-the-moment decisions. After a period of a few days, I switched to the other sensor’s app, and repeated this pattern several times.
Upon completion of the experiment, I downloaded all my data to Excel and analyzed it to see how my TIR varied between the two. (I also collected data for insulin (InPen bluetooth enabled insulin pen), carbohydrates, exercise, sleep, and glucose levels from my Contour Next One blood glucose meter (BGM), which I included in my analysis report.)
The graphic below is the topline dashboard from my month wearing both the G6 and G7:
The first thing that pops out is that the G7 reported glucose values ~5% lower than the G6 (consistent with what others have reported online). Aside from that, the two sensors appear roughly equivalent: The G6 averaged 121 mg/dL, versus the G7’s 116, and the standard deviations (SD) were 33 vs. 34, respectively.
But the real difference between the two sensors is shown by the time-in-range (TIR) stats on a day-by-day basis, as shown in the following graph:
When I used the G6 to make decisions, I achieved a TIR of >90%. When I used the G7, my TIR dropped to the ~70% range. To understand why the G7 made it harder for more to maintain glycemic control, let’s look more closely at the earlier chart, which is a day where my decisions were governed by the G7’s data.
Now, let’s zoom into the two-hour window between 4-6pm, which is highly representative of the kind of volatility there is in G7 data versus the G6, and why it’s hard to make real-time decisions.
Remember, I couldn’t see the G6 data (the smoother blue graph), so at 5:30pm, and with only the G7 data in view, I saw the very rapid rise from 88 to 155 in a matter of 30 minutes. Granted, the data leading up to that was highly erratic, but these successive readings were not–they were decisively rising, and fast. Without any idea where these levels might top out, I knew I needed to start bolusing.
As I always do, I began with small, incremental boluses, keeping a close eye on those glucose levels as they rise, waiting to see when they level off or begin to fall. The goal is to avoid taking too much, or too little. I’m aiming for the Goldilocks effect.
Turns out, the G7’s data shot up to 270. If this really was my real glucose level, the stacked boluses I’d taken would have perfectly corrected these readings, and I would have had a soft landing. But, as the insulin started to kick in, my glucose levels plummeted to 49, making it clear to me that the G7 readings were not giving me reliable information. Individual readings may have been “accurate,” but they were not representative of my actual systemic glucose levels.
To achieve tight glucose control, one must be able to look at short time windows, and act as quickly as possible to glucose movements, even if they are finely tuned adjustments. (Most people aren’t in tight control, and typically work on bigger time windows, so they won’t be as affected by these erratic readings.) The conundrum for the G7 is that sugars may look like they’re starting to move up/down, but then the data suddenly reverses 30 minutes later because those earlier readings were anomalous.
Over time, these anomalous readings will create more errors in decisions (or pump algorithms) than successes, which will impose an upper limit on how well one can actually do.
Below are more daily charts to consider (without additional commentary). You can zoom in on your own and guess how/why I was able–or unable–to see trends in time to make decisions proactively.
The G7 generally reports lower BG values
While both the G6 and G7 were tighter (SD=29 and 28), the G7’s volatility is apparent.
The G7 appears to behave better this day, but real-time decisions were based on G6 data
The day was 100% in range, but the G7’s data was all over the map. (Thanks, G6!)
The Dexcom G7 trial: Exploring the futility of “accuracy.”
In Dexcom’s published report, “Accuracy and Safety of Dexcom G7 Continuous Glucose Monitoring in Adults with Diabetes,” 318 diabetic subjects wore three G7 sensors simultaneously over the course of ten days. For three of these days, subjects underwent clinically induced hyperglycemia and hypoglycemia under controlled conditions, where blood samples were taken and measured using a reference blood glucose sensor, the YSI 2300 Stat Plus glucose analyzer. The analysis showed that the “mean absolute relative difference” (MARD) between the two was ~8.8% for the G7, versus ~10% for the G6. The lower the percentage, the smaller the difference to the reference analyzer. Hence, greater accuracy.
Let me remind the reader that the G7 trial had subjects where THREE G7s simultaneously during the testing period. When blood samples were taken and measured on the iStat device, which of the three G7s was it measured against? Were all three averaged together? Was it only one? Which one? Did they choose which of the three that happened to be closest to the iStat? The company doesn’t reveal this in the trial data, but that alone raises eyebrows to me.
For this reason alone (though not necessarily so), making any claims about MARD should not be taken at face value. Moreover, MARD isn’t just one reading—MARD values vary considerably under different conditions, such as glucose levels and rates of change, as shown in this figure from their report.
The mean and median per-sensor MARDs were 8.8% and 7.8%, respectively, 442 (71.4%) had MARD values <10%, and 12 (1.9%) had MARD values >20%.
In short, the accuracy was best when glucose values were in the sweet spot of glycemic ranges, but accuracy diminished at more extreme glucose levels. This bar graph suggests the best MARD happened most often at ideal glucose ranges, but most T1Ds only spend about 30% of their day in those ranges. The rest of their day is spent far outside, usually well above 180 mg/dL, where the G7’s MARD rating is well above 14%.
What is also not revealed in Dexcom’s report is the rate of change (ROC), which can also greatly affect MARD. Once again, if you visualize glucose being highly volatile in fluid, imagine how much greater that volatility is when glucose is rushing in or out of that fluid rapidly. It’s like injecting dye into a vat of water: You’re going to see a lot of dense color in some places more than others, before the dye diffuses in the water settles out.
In the case of glucose levels rising or falling, Dexcom limited its testing to only 1 mg/dL change per minute, which showed some of the worst performing MARD values. In the real world, once a T1D eats a meal, glycemic levels can change at 2-4 mg/dL rather often. Relying on CGMs to capture that data is prone to significant error bars. (The G6’s algorithm is far superior in this regard for smoothing out these errors and giving the user or algorithm more reliable data to work with.)
To what degree this variability in MARD plays into real-world conditions, we can look at this metaanalysis of multiple studies on overall glucose levels for T1Ds who wear CGMs. It shows that only 30% of T1Ds have glucose ranges between 70-180 mg/dL 70% of the time, which is where the G7 is most accurate. By contrast, 80% of T1Ds spend more than 70% of their time above 180 md/dL, where the G7’s accuracy exceeds 30% error. (For context, 44.5% have an A1c between 7–9%, 32.5% exceed 9%, and only 23% of T1Ds had an A1c <7%.)
Despite the fact that the G7 is the most accurate at glucose levels between 70-180, T1Ds spend far more time far above 180. Hence, T1Ds are experiencing accuracy error rates of >30% most of the time. This means that decisions that either humans or algorithms are going to make in whether to dose insulin or carbs are dealing with highly imperfect information (especially compared to the G6, which was more reliable.
Summary
I personally suspect that few people will find the G7 helps T1Ds improve their glycemic control. This will also be a problem for automated insulin pumps for the same reasons.
Nevertheless, I suspect Dexcom is primarily focused on the value of the improved MARD rating in their marketing plans. It’s invaluable to claim that your MARD is superior to all other CGMs, regardless of the dubious value of MARD for CGMs.
It also helps that Dexcom’s target market is moving well beyond T1Ds. There are nearly 40 million type 2 diabetics (with another 98 million presumed to be undiagnosed T2Ds), compared to roughly 1.5 million T1Ds. That, plus a very rapidly emerging market of non-diabetic “life-hackers”, such as athletes, health enthusiasts, and everyday consumers. In fact, Dexcom is releasing a non-prescription version of the G7 called Stelo to these enthusiasts, for whom volatility in glucose levels just isn’t that important.
Of course, the downside for T1Ds is that some could actually see worse outcomes, and not even know it. The G7’s propensity to report lower average glucose averages (than what is actually in the bloodstream) may give people the false impression that their glycemic control has actually improved with the G7.
I hope the G6 never goes away, or better yet, provide the G6’s algorithms to a G7 sensor. Here’s a marketing plan: Sell the G7 with the G6 algorithm as the less expensive over-the-counter product for the comparatively fewer number of T1Ds that actually need higher quality data to manage glucose levels. We’re already paying too much for all the other stuff we have to buy, and we’re a tiny market compared to the rest of the world. This way, everyone’s a winner!
I suspect Freestyle Libre 3 is even “worse” than G7 with it’s MARD of 7.9% and one minute sampling rate. G7 on steroids :D I switched recently from G6 and what a ride I had yesterday when trying to fix one hypo got 3 consecutive nasty ones instead (tried to bolus after each climb out of low so that blood sugar wouldn’t skyrocket later). The readings just aren’t as predictable, they jump around and I ended up reacting too soon… But I will keep using it, who knows I might see some patterns in time and get a handle of it hopefully, I just like every minute readings too much.
Thanks for the article. I recently tried the G7 and found similar results. My Tandem pump using Control IQ seemed to struggle with the anomalies and it was frequently shutting off basal and issuing corrections.