We recently noticed a difference in terms of how the steps data are being stored/downloaded from the server. When downloading the passive data, we get multiple “step counts” per day for each participant. I was wondering if anyone would be able to provide insights on how the steps data are being recorded (i.e. how often per day?) and whether these values should be summed to get a total per day?
Absolutely. Currently, mindLAMP gets step count data in two different ways: one method is by using the phone’s pedometer, and the other method is pulling step data directly from Apple Health (if it’s an iPhone) or Google Fit (if it’s an Android). These values are both of type ‘step_count’ and unit ‘count’, but they have different source values. For data from the pedometer, source will be null. Data from Apple Health will have source as a string beginning with ‘com.apple.health’.
Apple Health and Google Fit step count values are both not cumulative, but the values from the pedometer are cumulative. So, if you would like to get total steps in a day, you can either sum all of the Apple Health or Google fit values, or take the maximum pedometer value.
We recognize this is a bit confusing, and have actually been working with our app developer to make the raw data more intuitive by adding ‘pedometer’ as source for pedometer values instead of ‘null’. This change will be made when the updated version of the app is released, but I hope this helps for now.
We expect them to be similar, but because the values themselves are calculated by the devices and Apple’s/Androids’s code, it’s hard to say if one is more accurate than another.
We recently looked at the steps data for one participant across both the pedometer and apple health and noticed the values themselves are quite different from one another. I’ve attached a sample of what this data looks like below.
Would anyone have a recommendation as to which data source we should move forward with?
It may be useful to test on yourself and use a third device as the ground truth as it is hard to know which is correct or if both may be. Apple may also update its code /results without announcement.
Thanks,
John
PLEASE NOTE: This message is intended for the use of the person to whom it is addressed. It may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, your use of this message for any purpose is strictly prohibited. If you have received this communication in error, please delete the message and notify the sender so that we may correct our records. See our web page at http://www.bilh.org for a full directory of Beth Israel Lahey Health sites, staff, services and career opportunities.
Following up on the above, we have been analyzing this data and found that there have been many changes to how the steps data are stored. As a result, there are cases where participants don’t have a column for “unit” (e.g. step count, floors descended/ascended).
Additionally, since the update indicating whether the source is pedometer or apple, we’ve noticed discrepancies in the data collected across these two sources.
This leads us to the following questions:
If there is no “unit” for the steps sensor, can we assume it is step count data versus floors descended/ascended?
Given the discrepancy in data collected on a given day across pedometer and apple health sources, is there a recommendation for which source is more reliable or how we might be able to use both in an analysis?
Please find attached a file illustrating cases where we see these discrepancies in the data. Appreciate any insights. passive_data_issues.pdf (367.5 KB)
Thank you for your response! With regards to the discrepancy between the two sources of data collection, if we choose to focus on apple health as the source this presents issues with missing data as some participants only have pedometer data for days where apple data is missing, and apple data for days where pedometer data is missing.
Would you have a suggestion for how we might handle that in our statistical analyses?
I’ve been working with the steps data and have run into an issue where subjects have a data structure where the column “type” does not specify “step_count”. I have relied on the use of the column “unit” with values for “count” as well as identifying rows where there are no values under the floors ascended/distance columns to identify step counts.
These step values come from Apple Health which I was taking the sum of in order to determine the total steps in a day. I’ve pulled an example of what the data looks like for one subject on a given day and you’ll notice that in row 22 timestamped 15:38:52 there are 569 steps. Row 23 timestamped 15:38:55 has 604 steps.
I’m hesitant about taking the sum of these values given the small difference between these two time points and unsure if it’s an issue with the sensor.
Any suggestion as to how to best process these data is kindly appreciated.