Loading…
PEARC21 has ended
Back To Schedule
Wednesday, July 21 • 9:20am - 9:30am
Practice Guideline for Heavy I/O Workloads with Lustre File Systems on TACC Supercomputers

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
While the computational power of supercomputers has risen tremendously in recent years, users' increasingly intensive I/O workload can easily overwhelm file systems on the supercomputers. Generating a huge amount of data and IOPS in a brief period of time may significantly slow down file systems and in some cases may result in a crash incurring the loss of users' compute time, a great burden on administrators and user services, and poor reliability perception. Nearly a decade of close observation and study of file systems have led us to formulate new guidelines and invent several tools to alleviate the I/O issues faced in the current supercomputing environment. In this manuscript, we focus on I/O work done on the Lustre parallel file systems of Frontera and Stampede2, but also investigate other types of file systems employed on other TACC supercomputers. We also discuss common I/O issues collected from supercomputer users, including high frequency of MDS requests, overloaded OSS, large unstriped files, etc. To solve these problems, we offer important guidelines on how to choose optimal file systems for the work being run. Furthermore, we introduce novel tools and workflows, such as CDTool, Python_Cacher, OOOPS, and stripe_scratch to facilitate users' I/O work. We believe these tools will greatly benefit users who need to manage heavy I/O workloads on parallel file systems.


Wednesday July 21, 2021 9:20am - 9:30am PDT
Pathable Platform