Trial for MindLAMP

Hi Luke,

Thanks so much for your help and patience with all these emails, I am not the best with Python but trying my best!

I had one quick question – I was able to save the data and convert the UNIX timestamps but it looks like it is only showing up as 1000 rows. I wanted to clarify if there were indeed only 1000 rows total in the dataset or if for some reason it is not displaying/saving the total amount of rows. I increased the time window from Nov 1 to late December but that doesn’t seem to add any more rows to the dataset. This is what it looks like on my end:

Thanks again!


Hi Tess,

That’s an excellent question and one I probably should have clarified.

By default, LAMP API calls the most recent 1000 data points within a given window - this is to prevent unnecessary load on the database and server handling these requests. By adding a “_limit” parameter you can call more data.

If you need to query an entire dataset for a given time period that exceeds 1,000 points, the best way to do this is doing so recursively. For LAMP, the way to do this is, after you make your call, using the timestamp of the last point returned (since data is returned “last-first”), make a new call, replacing the “to” parameter with the final timestamp. This will get a new 1,000 points and you can repeat the process in a while loop until you get to your timestamp (or get 0 points of data).

I hope this helps - I may have some code somewhere I can send to you to demonstrate this, so let me know if you would need something like that.

Hope this helps!

Best,

Luke


Hi Luke,

Perfect, thanks so much for clarifying! I tested out _limit=0 and for one participant it came back as 258k rows which is great. I’m sure I can figure out a (very ugly) code to get the rest if needed, but thanks again for your help!


A collaborator reporting getting unlimited data from a LAMP.SensorEvent python API call when using _limit=0 parameter.

This is expected behavior! limit = 0 should return unlimited data, and +0 is in the default sort order, -0 is in the reverse sort order. null resets sorting and limiting to the default values. Note that using limit=0 can be dangerous for the server at this time.

Hello, may I ask if this behavior was changed? I couldn’t find anything about the _limit parameter in the LAMP documentation and if I call my server with _limit=0, I get 0 rows back. Changing this limit to a higher number works as intended however.

Hello,

Thank you for bringing this to our attention. Our development team is currently investigating this issue. We will update you when we have completed the investigation.

I do not appear to be getting zero data when I am running _limit=0. Would you mind sharing a snippet of your code for confirmation?

Hello,

thank you for your quick response. This is a code snippet that I use to download device_state sensor data for each study participant and then save them to a .csv file:

data = []
events = LAMP.SensorEvent.all_by_participant(sys.argv[1], origin='lamp.device_state', _limit=0)['data']
print("device_state " + sys.argv[1] + " " + str(len(events)))
for event in events:
    timestamp = int(event['timestamp'])
    if 'state' in event['data'] :
        data.append({
            'timestamp': timestamp,
            'UTC time': datetime.utcfromtimestamp(timestamp/1000),
            'screen_state': event['data']['state']
        })
    else :
        data.append({
            'timestamp': timestamp,
            'UTC time': datetime.utcfromtimestamp(timestamp/1000),
            'screen_state': event['data']['value'],
            #'screen_state_info': event['data']['valueString']  
        })
if len(data) > 0:
    pd.DataFrame.from_dict(data, orient='columns').to_csv(f"screen_state_data/{sys.argv[1]}.csv", index=False)

Are you certain this participant has data? Is the result the same for other sensors (origins)?

Yes, I’ve tried it for every participant and for every sensor and if I set _limit=0, I get this output (the last number is the number of rows received):

gps U6782144162 0
gps U0712271508 0
gps U8057709568 0
gps U6971786273 0
gps U6769391717 0
gps U3748598636 0
accelerometer U6782144162 0
accelerometer U0712271508 0
accelerometer U8057709568 0
accelerometer U6971786273 0
accelerometer U6769391717 0
accelerometer U3748598636 0
device_state U6782144162 0
device_state U0712271508 0
device_state U8057709568 0
device_state U6971786273 0
device_state U6769391717 0
device_state U3748598636 0
nearby_device U6782144162 0
nearby_device U0712271508 0
nearby_device U8057709568 0
nearby_device U6971786273 0
nearby_device U6769391717 0
nearby_device U3748598636 0
telephony U6782144162 0
telephony U0712271508 0
telephony U8057709568 0
telephony U6971786273 0
telephony U6769391717 0
telephony U3748598636 0
sleep U6782144162 0
sleep U0712271508 0
sleep U8057709568 0
sleep U6971786273 0
sleep U6769391717 0
sleep U3748598636 0
device_motion U6782144162 0
device_motion U0712271508 0
device_motion U8057709568 0
device_motion U6971786273 0
device_motion U6769391717 0
device_motion U3748598636 0

If I change the limit variable to _limit=1000, I get the expected output:

gps U6782144162 1000
gps U0712271508 1000
gps U8057709568 1000
gps U6971786273 1000
gps U6769391717 1000
gps U3748598636 1000
accelerometer U6782144162 1000
accelerometer U0712271508 1000
accelerometer U8057709568 1000
accelerometer U6971786273 1000
accelerometer U6769391717 1000
accelerometer U3748598636 1000
device_state U6782144162 1000
device_state U0712271508 897
device_state U8057709568 1000
device_state U6971786273 1000
device_state U6769391717 1000
device_state U3748598636 1000
nearby_device U6782144162 0
nearby_device U0712271508 0
nearby_device U8057709568 0
nearby_device U6971786273 0
nearby_device U6769391717 0
nearby_device U3748598636 0
telephony U6782144162 97
telephony U0712271508 9
telephony U8057709568 211
telephony U6971786273 512
telephony U6769391717 820
telephony U3748598636 775
sleep U6782144162 0
sleep U0712271508 0
sleep U8057709568 0
sleep U6971786273 0
sleep U6769391717 0
sleep U3748598636 0
device_motion U6782144162 1000
device_motion U0712271508 496
device_motion U8057709568 1000
device_motion U6971786273 1000
device_motion U6769391717 1000
device_motion U3748598636 1000

Have you tried checking the output after limiting the available data using the “_from” and “to” parameters? Additionally, is the result the same if you check the output for individual participants outside the for loop? For example, what is the output for LAMP.SensorEvent.all_by_participant(‘U6782144162’, origin=‘lamp.telephony’, _limit=0)? What you’re describing is not expected behavior and I’m wondering if it maybe memory related.

I’ve modified the script using the “_from” and “to” parameters:

events = LAMP.SensorEvent.all_by_participant("U6782144162", origin='lamp.device_state', _limit=0, _from=0, to=1665244976)['data']
print(events)

and I’ve called this outside the for loop using the participant U6782144162 for the “device_state” sensor and the output was an empty array.

Next I’ve tried removing the “_limit” parameter altogether and running it just with the “_from” and “to” parameters:

events = LAMP.SensorEvent.all_by_participant("U6782144162", origin='lamp.device_state',_from=0, to=1665244976)['data']
print(events)

and the result was again an empty array. This was unexpected. If I understand correctly, this should’ve returned the latest 1000 rows of data. Did I misunderstand the “_from” and “to” parameters?

Just to be sure, I removed the “_from” and “to” paramters too and ran it again with:

events = LAMP.SensorEvent.all_by_participant("U6782144162", origin='lamp.device_state')['data']
print(events)

and the output was very long as expected. I am not going to copy it all here, just a few lines to ilustrate:

[{'timestamp': 1662509226582, 'sensor': 'lamp.device_state', 'data': {'battery_level': 0.2, 'representation': 'locked', 'value': 2}}, {'timestamp': 1662509226577, 'sensor': 'lamp.device_state', 'data': {'battery_level': 0.2, 'representation': 'screen_off', 'value': 1}}, ...

The fact that _from=0 returns an empty array isn’t surprising, it’s asking the API to query all data from 50 years. What I meant was, could you modify from and to such that the time frame is very short (maybe from the timestamp of the last datapoint n to n - one fourth of a day)? If you run that with _limit=0 we can check whether or not it’s a server error by ensuring the amount of data available is not too high.

Shouldn’t it return all of the data then? I know that it is a bit excessive to ask for data from the last 50 years if I want data from the last 3 months, but shouldn’t it just return all of the data? Or is there a lower_bound for the “_from” parameter? What I thought that I could use this to was that I could download all of the data from out study. Such as setting “_from” to the first of June and the “to” to today. Is my assumption correct?

As for the testing part, I set the “_from” parameter to 1662494408799 and the “to” parameter to 1662505397553 and I downloaded the gps data for participant U6782144162:

events = LAMP.SensorEvent.all_by_participant("U6782144162", origin='lamp.gps', _from=1662494408799, to=1662505397553, _limit=0)['data']
print(events)

with the result being an empty array. I then removed the _limit=0 constraint and ran it again:

events = LAMP.SensorEvent.all_by_participant("U6782144162", origin='lamp.gps', _from=1662494408799, to=1662505397553)['data']
print(events)

this time I got plenty of data:

[{'timestamp': 1662505392895, 'sensor': 'lamp.gps', 'data': {'accuracy': 16.449, 'altitude': 236.89999389648438, 'latitude': 50.0821581, 'longitude': 14.4183016}}, {'timestamp': 1662505388575, 'sensor': 'lamp.gps', 'data': {'accuracy': 16.813, 'altitude': 236.89999389648438, 'latitude': 50.0821471, 'longitude': 14.4182956}},...

So it seems that the issue really is the _limit=0 constraint.

I certainly agree _limit=0 is causing the issues. However, given that on our end we are only encountering this issue for very large requests, I am in doubt that _limit=0 is coded in a semantically incorrect way. Perhaps our machines differ in memory or computational capacity and this is why the behavior is inconsistent. This is why we generally do not recommend using this feature and instead propose the alternative written by @LukeS above.