Skip to main content
Faculty and Staff homeNews home
Story
6 of 20

CI Compass leads archiving, long-term data preservation conversation at 2024 NSF Research Infrastructure Workshop

As scientific data at U.S. National Science Foundation (NSF)-funded mid-scale and major facilities continues to grow exponentially with the help of advanced instrumentation and increased computing power, the challenge of preserving, archiving, and keeping that data accessible grows with it. In March,…

As scientific data at U.S. National Science Foundation (NSF)-funded mid-scale and major facilities continues to grow exponentially with the help of advanced instrumentation and increased computing power, the challenge of preserving, archiving, and keeping that data accessible grows with it. In March, the U.S. National Science Foundation’s (NSF) CI Compass continued the NSF Major Facilities (MF) cyberinfrastructure conversation on efforts and best practices concerning archiving and long-term preservation of scientific data at the 2024 NSF Research Infrastructure Workshop (RIW). The focus topic was developed in direct response to feedback that CI Compass received during its 2024 Cyberinfrastructure for NSF Major Facilities (CI4MF 2024) workshop.

During CI4MF, the FAIR Data (Findable, Accessible, Interoperable, and Reusable) Topical Working Group organized a panel titled “Major Facilities’ Approach to Open Science

Members of the CI Compass team in attendance to the 2024 NSF RIW stand in a courtyard, with plants around them and a yellow stucco building behind them, smiling.
The CI Compass team in attendance at the 2024 NSF RIW included: Angela Murillo, Ewa Deelman, Don Brower, Christina Clark, Anirban Mandal, and Nicole Virdone. (Photo/Nicholas Deelman)

where questions about the greater need to focus on archiving and preservation were posed.

CI Compass participated in the NSF Research Infrastructure Workshop in 2022 and 2023, as well. CI Compass leadership welcomes the continued opportunity to work with the NSF Research Infrastructure Office, the NSF Office of Advanced Cyberinfrastructure, and the research facilities on-site, as well as the NSF center dedicated to cybersecurity, Trusted CI, to continue facilitating progress.

“Continuing to connect with cyberinfrastructure practitioners across the wide spectrum of research infrastructures in the NSF’s science and engineering eosystem is core to CI Compass’s mission,” said Ewa Deelman, Director of CI Compass, research professor of computer science and principal scientist at the University of Southern California Information Sciences Institute. “The data lifecycle in each research facility continues to change and challenge technologies and cyberinfrastructure professionals alike. At RIW, we focused particularly on data archiving. We want to continue seeking best community practices and collaborate on solutions to these challenges.”

Archiving and Long-term Preservation of Scientific Data Panel

Since the release of “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research” by Alondra Nelson, Deputy Assistant to the President and Deputy Director for Science and Society Performing the Duties of Director, from the White House Office of Science and Technology Policy (OSTP), commonly known as the “Nelson Memo,” in August 2022, cyberinfrastructure practitioners and data managers have been working on plans to comply with the memo’s December 31, 2025 deadline.

Don Brower, CI Compass FAIR data expert, research assistant professor, and computational scientist at the University of Notre Dame, moderated a panel as a part of the NSF RIW. The panel, “Archiving and Long Term Preservation of Scientific Data: Considerations, Approaches, Challenges, and Best Practices,” was hosted on the first day of the NSF RIW. Angela Murillo, CI Compass co-principal investigator and director of the student fellowship program, worked on putting the panel and questions together.

“From CI Compass’s past work and the discussions we have at the FAIR Data Topical Working Group each month, we know these complex topics are on the mind of researchers at major facilities,” said Murillo. “By fostering a discussion specifically on long-term data preservation, we hope to inspire facilities to begin addressing this big challenge by sharing experiences, resources, and potential solutions with each other."

Joining the panel were a group of cyberinfrastructure practitioners concerned with data preservation, including Bruce Berriman, senior astronomer and data scientist, Infrared Processing and Analysis Center (IPAC), California Institute of Technology; Adam Bolton, astrophysicist and data scientist, director, Community Science and Data Center (CSDC), NSF NOIRLab; and Jeannette Jackson, managing director, Research Data Ecosystem, University of Michigan.

Jackson spoke about the importance of data preservation which allows for ongoing discoveries that scientists make in existing datasets, and the importance of keeping abreast of what is needed to continue to make data available for the long term, especially as instruments and systems are continually evolving.

Adam Bolton, Jeannette Jackson, and Bruce Berriman sit at a panelist table speaking into microphones at 2024 NSF RIW.
Adam Bolton, Jeannette Jackson, and Bruce Berriman gave presentations and participated in a discussion together that sparked conversations past the sessions window during the NSF CI Compass-led “Archiving and Long-term Preservation of Scientific Data” panel at the 2024 NSF RIW. (Photo/Christina Clark)

“The data being produced in the infrastructure world is going to be important for the ongoing ability to connect it to different data types. Scientists can and will come up with new questions making possible new scientific discoveries in ways that were not possible before,” Jackson said.

“Make no small plans,” was one of Berriman’s leading thoughts concerning data and archives management, especially about responding to continual change in science requirements by agencies like the NASA and NSF and continuing to support and innovate technology forward to keep data safe and accessible.

“The NSF Rubin Observatory will report around 10 million transient alerts per night, and those alerts require rapid follow-ups,” Berriman said.

Bolton brought another astronomy-focused perspective to the panel. As human knowledge, instrumentation, and software continue to advance, new revelations appear in old datasets.

“Scientists can make big discoveries within our data archives,” Bolton said.

Bolton referenced the “killer asteroids hiding in large catalogs” discoveries announced in May 2022, and publicized in the New York Times. The B612 Foundation, a nonprofit research group, developed and applied a computational program to the multi-Petabyte NOIRLab archive, to discover previously unknown celestial bodies and asteroids hiding in existing catalogs and images.

After the formal panel discussion, questions from the audience were about the data usage models and evolving trends, data consent and usage in the health and social science spaces, and the current limitations of existing data management software.

The discussion for the panel continued past the official end of the session, prompting more connections to be made between centers, facilities, and educators.

“Managing metadata is crucial to a successful archive for self-documentation and data discovery, though it is often difficult and time consuming,” said Berriman.

“Facilities are facing challenges with legacy systems and their unique instrumentation made to collect and store very specific types of data,” said Brower. “We want to ensure that best practices are considered as the next versions of those systems are created. The next versions need to support continued scientific research, comply with federal data mandates, and help to continue making discoveries and pushing innovations.”

Cyberinfrastructure Workshop

As the NSF Research Infrastructure Workshop’s Cyberinfrastructure Track kicked off, three more sessions were hosted by organizers outside of the Archiving and Long-term Preservation of Scientific Data panel.

Three people (Nicole Virdone, Christina Clark, and Anirban Mandal) stand in front of posters at NSF RIW.
Nicole Virdone, Christina Clark, and Anirban Mandal stand with a CI Compass poster at the 2024 NSF RIW in Tucson, Arizona. (Photo/Angela Murillo)

Two sessions focused on cyberinfrastructure and ecosystems that exist outside of the NSF research infrastructure. These sessions presented different perspectives of approaches from agencies other than NSF in an effort to challenge processes and present similar situations throughout cyberinfrastructure facilities.

Debbie Bard, data department head at the National Energy Research Scientific Computing Facility (NERSC), and the Lawrence Berkeley National Laboratory, presented a talk titled “Department of Energy (DOE) Integrated Research Infrastructure (IRI) and the Advanced Scientific Computing Research.”

The second talk was titled “Overview of Canada Foundation for Innovation Research Infrastructure and Some of the Shared Data Challenges,” with both Mark Legace, director of programs, and Claudia Fall, associate director for research facilities, representing the Canada Foundation for Innovation.

The final session of the Cyberinfrastructure Track was titled “Digital Backbone: Navigating the Research Infrastructure Guide (RIG) Revisions to Cyberinfrastructure and Cybersecurity.” The session was led by Bill Miller, senior advisor for cyberinfrastructure; Office of Advanced Cyberinfrastructure, NSF; Michael Corn, cybersecurity advisor for research, Office of the Director, NSF; and Alison Rockwell, research infrastructure advisor, Research Infrastructure Office, NSF.

Poster Sessions

Two men are pictured at the 2024 NSF RIW poster session. One is looking away from the camera, while the other, Don Brower, is smiling mid-conversation.
CI Compass’s Don Brower discusses the FAIR TWG with an attendee at the 2024 NSF RIW poster session. (Photo/Christina Clark)

After the Cyberinfrastructure Track sessions, CI Compass took part in an in-person poster session where CI Compass leadership was on-site to discuss CI Compass’s objectives, the FAIR Data Topical Working Group, and the CI Compass Fellowship Program (CICF).

CI Compass had further discussions and meetings during the poster session to bring new NSF mid-scale and MF partners into collaborations with the center, and to understand their needs and concerns about future workforce development and data lifecycle management.

The 2025 NSF Research Infrastructure Workshop has not yet been announced. More information and announcements about events like this workshop can be found at researchinfrastructureoutreach.org.

Update on June 24, 2024: The videos from the 2024 NSF Research Infrastructure Workshop are now available to review on the NSF Research Infrastructure Knowledge Sharing Gateway.


About CI Compass

CI Compass is funded by the NSF Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering under grant number 2127548. Its participating research institutions include the University of Southern California, Indiana University, Texas Tech University, the University of North Carolina at Chapel Hill, the University of Notre Dame, and the University of Utah.

To learn more about CI Compass, please visit ci-compass.org.

Contact: Christina Clark, Research Communications Specialist
CI Compass / Notre Dame Research / University of Notre Dame cclark26@nd.edu / 574.631.2665
ci-compass.org / @cicompass

Originally published by Christina Clark at ci-compass.org on June 21, 2024.

Latest Research