Placement of I/O Servers to Improve Parallel I/O Performance on Switch-Based Clusters

Jan-Jan Wu, Da-Wei Wang, Yih-Fang Lin

(Paper #23)


Abstract

Switch-based clusters--Network of Workstations/PCs connected by commodity switches, has been an appealing vehicle for high-performance computing. Despite their attractive features, cluster systems still have some limits when compared with traditional massively parallel machines. First, cluster systems usually have limited number of processing nodes, making fully utilization of the computing power provided by each processing node a critical issue. Secondly, cluster systems are usually constructed with slower interconnects, making the network speed, not the disk speed, the limiting factor for parallel I/O performance. The notion of part-time I/O is commonly used for I/O in clusters, where a subset of processing nodes become I/O nodes at I/O time and return to computation after finishing the I/O operation. Careful assignment of part-time I/O nodes is the key to overcoming the above two limiting factors.In this paper, we show that load balance on the I/O nodes, is the key optimization criteria for assigning part-time I/O nodes for switch-based clusters. We formulate the assignment problem as a weighed bipartite matching with the goal to balance workload on the I/O nodes. We then propose an efficient algorithm to find optimal solution for this problem. Experimental results on a 16-node PC cluster and simulation results for larger clusters are reported.

Keywords:

Scientific Computing
Scheduling
Non numerical Algorithms