The IEEE Computer Society Publications Board recently announced the winners of the 2022 Best Paper Award for the IEEE Transaction on Parallel and Distributed Systems (TPDS). A paper titled "Detecting Performance Variance for Parallel Applications Without Source Code," authored by teachers and students from the Department of Computer Science and Technology at Tsinghua University, secured the runner-up position. The first student author of the paper is Zheng Liyan, a doctoral student at the Institute of High Performance Computing in the Department of Computer Science and Technology. The co-authors include Professor Zhai Jidong, PhD candidate Tang Xiongchao, Assistant Researcher Wang Haojie, Postdoctoral Fellows Yu Teng and Jin Yuyang, and Professor Chen Wenguang, all from the same department. The co-authors also include Professor Zhang Feng from Renmin University of China, Associate Professor Shuaiwen Leon Song from the University of Sydney. The paper was supervised by Professor Zhai Jidong.
The paper addresses a critical issue in parallel applications – performance variance. This variance can degrade performance and make application behavior difficult to explain. The authors propose a framework, Vapro, for detecting and diagnosing performance variance in production-run parallel systems. Vapro uses a State Transition Graph (STG) to track program execution and performs lightweight workload analysis on the STG to locate performance variance. The framework can identify these snippets at runtime even without program source code. To diagnose the discovered variance, Vapro uses a progressive diagnosis method based on a hybrid model combining variance breakdown and statistical analysis. Vapro's performance overhead is only 1.38% on average, and it can identify performance variance in real applications caused by hardware issues, such as memory and IO. When the identified variance is fixed, the standard deviation of the execution time decreases by up to 73.5%. Vapro achieves 30.0% larger detection coverage than the state-of-the-art variance detection approach based on source code analysis.
The IEEE TPDS is a top journal in the field of high-performance computing focusing on latest research progress in the fields of parallel and distributed computing algorithms, computing applications, and performance analysis. In 2022, the IEEE TPDS published a total of 349 papers and awarded two best paper awards, a proportion of only 0.57%.
Editor: Li Han