FRESCO: Open Source Data Repository for Computational Usage and Failures

By Saurabh Bagchi1, Rakesh Kumar, Rajesh Kalyanam1, Stephen Harrell, Carolyn A Ellis1, Carol Song1

1. Purdue University

Category

Downloads

Published on

Abstract

FRESCO is a repository of performance data for scientific code execution jobs submitted to Purdue University's central computing cluster called Conte during March 2015 through June 2017.  Data in the repository can be used to identify failed jobs and analyze reasons for failure by studying the performance parameters during the job's execution on individual cluster nodes. The data comprises job submission and exit status data, resource usage data from each cluster node, outage of nodes, and libraries that are used by jobs executing on each node.

The Conte cluster comprises 580 nodes totaling 9280 cores with 40 Gbps Infiniband interconnects. Each node in the cluster has 64 GB of RAM and includes two additional 60-core Xeon Phi accelerators. The repository contains data for 10.8M jobs run on Conte over the 28-month period between March 2015 and June 2017. 

FRESCO has also been recently expanded to include job accounting and resource usage data from the University of Texas at Austin's Stampede 1 cluster. This data is for the period between 2013 and 2016, comprising 8.7M jobs. The Stampede 1 cluster at the time of decommissioning consisted of 6400 nodes with a total of 522,080 processing cores.

The actual dataset can be accessed through the following URL:

https://www.rcac.purdue.edu/fresco/index.html

The documentation to explain the data set is available through the following URL:

https://diagrid.org/resources/1099/download/FRESCO_Repository_Description.pdf

Sponsored by

NSF Grant No. CNS-1548114, CNS-1405906.

Cite this work

Researchers should cite this work as follows:

  • Saurabh Bagchi; Rakesh Kumar; Rajesh Kalyanam; Stephen Harrell; Carolyn A Ellis; Carol Song (2019), "FRESCO: Open Source Data Repository for Computational Usage and Failures," https://diagrid.org/resources/1093.

    BibTex | EndNote

Tags