Cool devices like the Apple Watch, Fitbit, and the nearly ubiquitous Garmin Forerunners (at my local running group anyway) aim to track your biometrics, like heart rate, and calories burned, along with various other fun variables like pace, vertical oscillation, and elevation. There are even a few that try to predict your recovery time, race paces and even VO2 Max which would be very interesting (especially given the vomit inducing-ness of a real VO2 Max test), if they could be trusted. And that is a big “if.”
When I got on board with these devices back in 2009, they weren’t all that accurate. Even the cardiologists who study wearables at Stanford University went as far as to say that they thought of them a little bit like random number generators. They really didn’t seem to be providing anything that was even close to your actual heart rate.
The folks at Stanford have recently tested seven newer fitness bands and now say those heart rate stats have gotten much much better. For most of the devices, the error rate was less than five percent. Which is considered medical grade—and good enough for your doctor!
Where all the devices fell apart was in estimating the calories burned when compared to gold-standard lab measurements of energy expenditure.
The reason for the discrepancy could be that we all burn energy at different rates and that’s hard to guess from simple parameters like weight and height which is generally the only thing we tell these devices when we first buy them. I mean, if I just go out and watch the runners in my neighbourhood, some are incredibly efficient and look simply elegant when they run (watch a video of triathlete Mirinda Carfrae for an example of this) while others are laboured, awkward and even look like they’re burning a lot more calories to cover the same amount of ground (watch a video of me for an example of this).
At Stanford University they did an evaluation of seven devices using a group of 60 volunteers. They evaluated the Apple Watch, Basis Peak, Fitbit Surge, Microsoft Band, Mio Alpha 2, PulseOn, and the Samsung Gear S2.
The sixty volunteers, made up of 31 women and 29 men, wore the seven devices while walking or running on treadmills or using stationary bikes. Each volunteer’s heart was also measured with a medical-grade electrocardiograph (ECG). Then their metabolic rate was estimated with an instrument for measuring the oxygen and the carbon dioxide in their breath, which is a pretty good proxy for metabolism and energy expenditure.
Once they had that data, the results from the wearable devices were compared to the measurements from the two medical instruments.
In the end, they concluded, not all that shockingly, that some devices were more accurate than others. They also concluded that factors such as skin colour and body mass index (BMI) definitely affected the measurements. I don’t know about you but I have never had any device ask me about my skin tone… have you?
The biggest takeaway was that none of the seven devices measured energy expenditure accurately at all. Even the most accurate device was off by an average of 27 percent and the least accurate was off by 93 percent. That is basically me throwing darts at some graph paper that I stapled to the wall—and I am not a good dart player!
The biggest takeaway was that none of the seven devices measured energy expenditure accurately at all.
The researchers said that manufacturers may or may not test the accuracy of these devices thoroughly and it’s hard for consumers to know how accurate such information is or the process that the manufacturers used in testing the devices.
The heart rate measurements performed far better than expected, based on the previous versions of the devices but again, what was truly surprising was just how crappy the energy expenditure measures were.
The take-home message appears to be that a user can pretty much rely on a fitness tracker’s heart rate measurements but basing the number of doughnuts you can eat on how many calories your watch says that you burned is really not a great idea… any way you dip it.
10 Trackers at Once
Back in May 2016, a reporter for the Big Crunch did a fun experiment (fun for us nerds that is). He wore 10 different trackers at once and took them through their paces. He did this in response to people actually suing Fitbit, saying that its devices didn’t track the data accurately enough.
The reporter went to an electronics store and purchased a bunch of devices like Fitbit, Garmin, Polar, Misfit, Jawbone, and Withings. He also borrowed another Fitbit and two Apple Watches from his very trusting friends.
He then ran the devices through three specific tests: step counting, heart rate measurement and total distance travelled.
Now he was the first to admit that this wouldn’t qualify as a scientific study but he still tried to be more thorough than most of us would – I have to give him that. If it had been me, I would have likely given up the minute the clerk at Best Buy showed me the bill.
The results were, as he put it: depending on your perspective, either quite variable — or pretty close… whatever that means.
First Test – Step Counting
He did two different tests of step counting. In the first, he wore all the devices for a couple hours, doing a variety of tasks. The step totals varied widely by more than 20 percent. That’s pretty significant when you consider that people often have a goal of getting to 10,000 steps in per day. With the chance of a 20 percent inaccuracy rate, one device might show 10,000 steps while another would show around 8,000. Not cool, man. Not cool.
In the second test he counted 500 steps out loud (I can’t help but picture someone following a treasure map). In this test, he still saw inaccuracies ranging between 446 and 513.
The most accurate device was the Fitbit chargeHR, which showed 505 steps. Interestingly another Fitbit (a cheaper model) showed only 486 steps. So, you get what you pay for, even within the same company.
Second Test – Heart Rate Measurement
To test heart rate, he peddled on a stationary bike to get his heart rate up to a consistent 140 beats per minute (BPM). He measured this by actually putting his hand up to his carotid artery and taking his pulse – old school like. He did this two days in a row, again being more scientific than I would have been.
The different heart rate monitors all appeared to have their own issues. Some needed time to catch up, even though his heart was already rocking at 140 bpm. Other devices would flip around between a low heart rate like 90 and a reading closer to what he got from his manual count, like 130.
As was expected at the time, the heart rate tracking was the least reliable of the three tests he did. The closest devices were the two Apple Watches. Both of them read 137 and 134 bpm, which is pretty darn close to the manually counted number.
Third Test – Total Distance Travelled
To test distance, he got on a treadmill and walked half a mile. Which doesn’t seem like much but even such a minimal distance was enough to see big deviations in the data.
What’s kind of cool is that since he was wearing multiple watches at the same time, he actually could see the distance totals diverge right in front of his eyes. The more he walked, the more the watches showed their inaccuracies.
In the end, the Withings Pulse O2 apparently nearly matched what the treadmill said. Though I would say that the treadmill itself was likely also wildly inaccurate (ones in public gyms usually are) and if that’s the case, then who knows which device was actually “right.”
In the end, what I take away from all of this is that maybe the best thing to do is to simply use these devices for relative comparisons. You need to commit to using one device and stick to it; the direction of the trend that is more important than the number itself. Just like a daily weigh-in on your bathroom scale, the number you see each day isn’t as important as the overall trend is.
In the end, what I take away from all of this is that maybe the best thing to do is to simply use these devices for relative comparisons.
I don’t know about you but when I weigh myself, I use the same scale that I have had for 7 years. I get out of bed, go to the bathroom, empty my bladder, and before I have my coffee or breakfast, I strip down to my birthday suit and get on the scale. I aim to be as consistent as possible and don’t concern myself with the actual pounds, kilos or percentages but rather I wait until the data is uploaded to my fancy charting software to see which way I am heading.
To further illustrate this, I recently had two separate DEXA scans done and both times, both technicians at the labs said that they had never in all their years of doing these scans, seen someone with a single digit body fat percentage. While a friend of mine had a magnetic resonance imaging (MRI) done that showed he was something close to 3 percent body fat. And to drive the point home even more, even my own measurements between my Tanita electrical impedance scale and the DEXA scan showed a difference of 3 percent within a 3 hours time frame.
What the Makers Say About This
Even a spokesperson from Garmin said that’s it’s about an individual’s relative gain and that Garmin activity trackers are designed to help users develop healthy habits and motivate them to beat yesterday.
A representative of Fitbit stood by the company’s research and product testing but cautioned against putting too much stock in the exact figures.
Polar pointed out that physical activity comes in many forms, each of which can provide benefits and what’s important to remember is that the reason we’re tracking steps isn’t just for the sake of tracking them, but ultimately because it’s about achieving a better fitness result.
Another company (which declined to be named) suggested that there’s always going to be a variation for each statistic. For steps, some devices treat hand movements differently, so if people wave their hands a lot during the day, they might get credited differently depending on the device.
So, I guess it kind of goes without saying, but these tests are pretty big strikes against the wearables, at least for energy burn tracking. So if you’re doing your best to count every last calorie (which I honestly don’t advise), definitely don’t take your wearable’s word for it.
Whether or not my Garmin, Apple Watch, or my bathroom scale is truly giving me accurate data, I am still more interested in seeing my numbers improve than I am in putting a lot of stock in whether I burned precisely 400 calories, ran exactly 3.10686 miles (5k) or that fat truly makes up 12.23 percent of my body. When I see my scale display “11 percent,” I will celebrate – not because of the actual number itself but because I am moving in my desired direction.