Solution for Multimodular Naturalistic Data Analysis

Our team contributed to building a first-of-its-kind storage of multimodal naturalistic data tailored for transformative scientific research in the social-behavioral-cognitive area
Customer:
Hasson Lab at Princeton University
A first-of-its-kind storage of multimodal naturalistic data
  • Business Need

    The Hasson Lab has significantly progressed with their studies on naturalistic data but ran into a technical issue. The volumes of data were massive, and their existing system couldn’t process it at scale, which wasn’t acceptable for the kind of science the Client wanted to pursue.

  • Result

    Sigma Software’s data engineers helped the Client cope with their technology challenge, as a result creating a system that not only addresses Hasson Lab’s needs but also serves as a blueprint for building a larger number of datasets in the social behavioral and cognitive sciences.

Quote background
Sigma Software has provided us with the machinery to access the 1kD dataset and unlock its value.
Liat Hasenfratz
Co-director
Hasson Lab

Collaboration overview

Key Facts

Hasson Lab, in collaboration with Psychology Professor Casey Lew-William and Princeton’s Baby Lab, has started the First 1000 Days project, uniting specialists in life, engineering & computer science. It focuses on studying what affects children’s brain development in their first three years. The Hasson Lab team created a method that optimizes the research and streamlines the analysis of 100 years’ worth of video data from the first thousand days of newborns in 15 different families.

One of the Client’s challenges was creating a cloud infrastructure capable of processing the naturalistic data at scale and supporting further studies. It was decided to re-architect the existing algorithms to:

  • Facilitate integration of diverse datasets for faster data ingestion
  • Streamline data collection and safeguard efficient management & secure storage of third-party service data
  • Optimize data processing and analysis with the new framework for parsing, labeling, and extracting valuable insights from collected data
  • Elevate the analytical capabilities of the system to extract valuable insights from the data

The first version of the Data Platform was originally built using the AWS technology stack. We made sure to keep the advantages of the existing framework and added a few important updates to ensure the system meets the Client’s current and future needs while remaining optimal in costs.

The improvements yielded great results – historical processing speed for one algorithm surged by 300%. Moreover, all the data is now stored in a single source of truth, which on the one hand enhances analytical accuracy, and on the other hand, results in a nearly 50% reduction in storage costs.

  • Implemented AWS Glue to improve the efficiency of the core processing framework responsible for large video data volume handling
  • Used CloudWatch, CloudTrail, and EventBridge to amplify the system’s monitoring capabilities with advanced alert mechanisms and reduce error response time within the data flow
  • Leveraged AWS services to configure data security, integrity, and consistency according to the best data practices and data security standards

Since the Hasson Lab team operates in the sensitive healthcare sector and works with vulnerable naturalistic data, the main priority was to protect the user data they work with. This meant safeguarding their data storage complies with the data security standards, including the CCPA (California Consumer Privacy Act) and U.S. Privacy Act of 1974.

Apart from safeguarding compliance and mitigating data security risks, we were also working on building three main pillars of sound and efficient data operations: data observability, reliability, and durability. To achieve those, we performed the following steps:

  • Extended data monitoring framework with additional metrics, notifications, and alerts to prevent data loss and minimize latency
  • Implemented retry policies for components facing failures/errors and set up the rollback process for critical components to eliminate data inconsistencies, duplication, etc.
  • Configured cross-region data replication to safeguard data integrity in cases of regional data deletion or unavailability
  • Integrated the Object Lock to every file to prevent accidental or deliberate file deletions

Testimonials

Sigma has provided us with the machinery to access the 1kD dataset and unlock its value. We expect it will serve us and generations of researchers - from various fields such as computer science, psychology, linguistics, sociology, economics, and neuroscience. We also believe that the batch system solutions provided by Sigma Software will serve as roadmaps for the creation of more naturalistic datasets in the social behavioral and cognitive sciences.
Liat Hasenfratz, Co-director at Hasson Lab
Liat Hasenfratz

Co-director

Hasson Lab

This work is of exceptional scientific importance, addressing challenges of a magnitude yet to be resolved. It's a privilege to collaborate with such amazing people.
Stanislav Samko, Account and Program Manager at Sigma Software
Stanislav Samko

Account and Program Manager

Sigma Software

Data Analytics Team
Let us discuss how our team can contribute to your success