Cortext performance best practice/guideline

Hi
We have a small project collected 200million of sensor records in the couchDB (80Gb). The cortex on the local machine (with very fast network) takes a long time (hours even days) to run for SensorEvent related features (such significant_locations or sleep_periods). I really appreciate if there are better ways/suggestions to improve the performance. I suspect the bottleneck might be the network, is there a way to run cortex on AWS ML or replicate the db to local machine or create python notebook docker on the AWS host machine?

Thanks,
Chunlei

I suspect the bottleneck might be the network, is there a way to run cortex on AWS ML or replicate the db to local machine or create python notebook docker on the AWS host machine?

Yes! This is what our team does - we use the LAMP-ide (which is a customized JupyterLab instance) and deploy it on the same node or a node in the same availability zone as the database node. In addition to bandwidth/latency, this reduces cost significantly as AWS charges for data transfer in/out.

1 Like