Princeton's new research data environment offers security, collaboration

Wednesday, Jul 14, 2021
by Eoin O'Carroll, Princeton Research Computing

Led by a research team at Princeton University, the New Jersey Families Study examines the lives of young children using an innovative methodology: The researchers put video cameras inside families’ homes. 

The researchers recruited about 20 households who agreed to the unobtrusive cameras for two weeks. The project, which studies how families prepare their pre-school-aged children for school, promises to yield unprecedented insight into the lives of families with young children.

The hundreds of hours of footage also pose a formidable privacy challenge. "We have promised the families that these videos will not get leaked anywhere under any circumstances," says Boriana Pratt, a statistical programmer at the Office of Population Research and the study's Data Manager. "We take that very seriously."

The New Jersey Families Study is one of many research projects at Princeton handling sensitive data. Other projects have information about people's finances, political attitudes, health, and genetic information. With these datasets come tough questions – and tightening regulations – about how to store and manage it safely.  

Enter Citadel, Princeton's new secure and compliant research data infrastructure environment introduced this year by Princeton Research Computing, a consortium spearheaded by the Princeton Institute for Computational Science and Engineering and OIT Research Computing. Citadel enables researchers from anywhere in the world to handle sensitive data while taking strict measures to prevent unauthorized access. Currently, six projects at Princeton are using the Citadel environment.

"This resource is available for researchers," says Irene Kopaliani, a Research Computing Cloud Architect at Princeton and Citadel's project lead. "Researchers can now propose research topics that were previously impossible to handle at the university."

In the past, researchers tasked with hosting restricted data would often implement a one-off solution. 

"We ended up with a lot of video files stored on hard drives in my office," says Pratt, adding that the files came in a proprietary format so that it couldn't be viewed without certain software.

Members of the New Jersey Families Study team wishing to view the footage would have to physically visit Pratt's office to view it on a standalone computer.

This arrangement went from unwieldy to unworkable when the pandemic hit, says Pratt. A recent graduate had joined the project to help with the data, but they needed to maintain social distancing. "My office isn't very big and we couldn't both be in it at the same time," she says.

Migrating to Citadel solved that problem by allowing secure remote access. Connections to Citadel, whose servers are locked in Faraday cage at Princeton's High-Performance Computing Research Center, are tightly controlled. Authorized researchers can access their datasets via their desktop or laptop computers but the files are not downloaded to these devices. Instead, they are accessed and manipulated inside virtual machines that are kept isolated using an application called tiCrypt.

Kopaliani, who works closely with a team of systems administrators and data storage experts, likens the system to a laboratory glovebox that allows scientists to manipulate radioactive materials or infectious diseases. Researchers can handle the data, but they can't take it out of the box. "It allows research from anywhere," she says. "It doesn't restrict people to being on campus."

"Providing a computing environment that is both scalable and secure is one important way that Princeton continues to support cutting edge research for its faculty," says Jay Dominick, Princeton's Vice President for Information Technology and Chief Information Officer.

The system has been designed to meet national standards for data security, including NIST SP 800-171NIST SP 800-53CMMC Level 3HIPAA regulations, and standard Data Use Agreements requirements. 

"We can meet the needs of any faculty wishing to use a computing environment designed for exacting federal IT Security standards with minimal impact to the researcher," says Dominick. "Data security is already an issue of national importance and providing resources like Citadel is one way we rise to meet the challenge."

Elizabeth Adams, the Director of the Office of Research and Project Administration, notes that more than 80 percent of the research at Princeton is funded by the U.S. federal government, including a range of agencies such as the National Science Foundation, the National Institutes of Health, the Department of Defense and NASA. "With that investment from the federal government comes expectations, particularly regarding the generation, transfer and storage of certain data sets necessary to the research," she says.

Adams emphasizes that Princeton does not undertake any research involving classified information, but that the federal regulations concerning how researchers handle sensitive and controlled (though unclassified) data have become steadily more rigorous over recent years. She also expects that the cybersecurity focus will continue among federal sponsors. "Princeton’s investment in Citadel will support the increasing diversity of our federal research programs as more rigorous data security requirements emerge," she says. "We want to maintain capacity and competitiveness."

That said, Adams hopes that Citadel will also help Princeton diversify its nonfederal research portfolio, including collaborations with industry, hospitals and foreign sponsors. "[Citadel] opens up avenues for important research at the University with different sectors of society that wouldn't be open otherwise."

MD/PhD student Chloe Cavanaugh works in the Notterman Lab

MD/PhD student Chloe Cavanaugh works in the Notterman Lab. The lab's main focus is the Fragile Family and Child WellBeing Study, a longitudinal study of urban families now in its 23rd year. Credit: Alexandre Mason-Sharma.
 

Citadel allows scientists to secure their data without hoarding it. "Not only do we have a responsibility to protect the data, we also have a responsibility to share the data."says Dan Notterman, a Professor of the Practice in Princeton's department of Molecular Biology. "You don't have to make it so difficult so that nobody will use it."

A pediatrician, Notterman is the chair of Princeton's Institutional Review Board for Human Subjects Research and serves as one of the principal investigators with the Fragile Families and Child Wellbeing Study. That project, a joint effort by Princeton and Columbia university, has amassed a longitudinal dataset of nearly 5000 urban children and is currently migrating to Citadel. 

Notterman sees the need for secure environments like Citadel as another sign of the extent to which data analysis has transformed science. 

"The systematic computational analysis of high-dimensional or large datasets," he says, "permeates everything that we scientists do. Everything."

"As a physician-scientist, I'm acutely aware of the need to protect sensitive data," says Notterman. "I certainly feel a sense of great responsibility." 

Of course, most research data does not require an environment as secure as Citadel's. But with the system in place, researchers at Princeton now have the option to take on projects that they might have otherwise had to pass up.