This project aims at addressing a pressing need in the area of dependable computing system: the lack of publicly available failure data from real (production) modern computing systems. The project will collect, curate, and present public failure data of large-scale computing systems in a repository called FRESCO. The data sets will include static information, dynamic information about the workloads, and failure information for both planned and unplanned outages. The data collection from production machines will have to obey several practical constraints -- no changes to the workload, little performance perturbation, and minimal changes to the operating system. Further, the data have to be sanitized for removing sensitive information and processed to make it interpretable by a broad group of researchers. This project will also provide analysis tools to answer certain commonly occurring questions, such as the correlation between workload and failure and the performance implications of using one library over another, as well as an intuitive graphical front-end which will allow people to explore the data sets and download the relevant ones.
This is a joint project between researchers at Purdue University and Univeristy of Illinois at Urbana-Champaign, funded by the National Science Foundation's CRI program.
The NSF award page is here.
The FRESCO Data Repository
The project has recently released a new dataset, containing event and performance data for scientific code execution jobs submitted to one of Purdue University's production computing cluster for the period of March 2015 - June 2017.
The data repository can be accessed via HTTP transfer or Globus. More detailed information about this dataset and how to access it can be found in the documentation.
Collaborators and team members
Saurabh Bagchi (PI)
Rajesh Kalyanam (Systems developer)
Rakesh Kumar (Graduate Research Assistant)
Carol Song (Co-PI)
Stephen Harrell (Systems Consultant)
University of Illinois at Urbana-Champaign:
Ravishankar Iyer (Co-PI)
Zbigniw Kalbarczyk (Co-PI)
Saurabh Jha (Graduate Research Assistant)