Timing is Right for SDSC Cloud
New Storage System Supports NSF Data Policy
October 5, 2011
Successfully managing, preserving, and sharing large amounts of digitally-based data has become more of an economic challenge than a technical one, as researchers must meet a new National Science Foundation (NSF) policy requiring them to submit a data management plan as part of their funding requests, said Michael Norman, director the San Diego Supercomputer Center (SDSC) at the University of California, San Diego.
“Data management has become an even more challenging discipline than high-performance computing,” Norman said during remarks delivered at the 50th anniversary meeting this week of the Association of Independent Research Institutes (AIRI) in La Jolla, California. “The question used to be ‘what’s the essential technology?’ but is now ‘what’s the sustainable cost model?’”
The revised NSF policy, which went into effect early this year, asks researchers to submit a two-page data management plan on how they will archive and share their data. According to the policy, “investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants.”
Norman said this revised policy was one of the key drivers that shaped SDSC’s planning for a new Web-based, 100 percent disk data storage system called the SDSC Cloud, which was announced late last month. Believed to be the largest academic-based cloud storage system in the U.S. to-date, the SDSC Cloud is primarily designed for researchers, students, and other academics requiring stable, secure, and cost-effective storage and sharing of digital information, including extremely large data sets. While SDSC’s primary motivation to create its own data cloud was to provide an affordable resource for UC San Diego researchers to preserve and share their data, the resource is being made available to all academic researchers.
Over the last few years, however, the research infrastructure for data-enabled science has been widely discussed at the NSF, leading to the new data management and sharing policy. The document that is charting the course is called Cyberinfrastructure Framework for the 21st Century Science and Engineering (CIF21).
Still, Norman warned that researchers will likely never be able to afford to save all their data, and should focus on saving and sharing only what is intellectually valuable, while creating a sustainable business model. He referenced a KRDS report (Keeping Research Data Safe) that said the cost of long-term data stewardship is as much as six times the cost of bit preservation. “So it’s not the costs of storing the bits – it’s the cost of hosting the hardware, all of the administration costs, and the costs of migrating the data.
In late 2007, the Blue Ribbon Task Force on Sustainable Digital Preservation and Access was commissioned by the NSF and The Andrew W. Mellon Foundation to study the economic sustainability challenge of digital preservation and access. The Task Force, which worked in partnership with the Library of Congress, the Joint Information Systems Committee of the United Kingdom, the Council on Library and Information Resources, and the National Archives and Records Administration, published both an Interim and Final report, which can be found at http://brtf.sdsc.edu/.
“SDSC, like other data resource centers, has a long-term obligation to steward that data, and maintenance costs are needed to keep that data persistent,” said Norman. “It’s like real estate. You can either rent out your rooms or sell your condos, but if you’re not recovering costs as a landlord, you go out of business.”