Conference Program


Conference at a glance


Monday Tuesday Wednesday Thursday Friday
Workshops and tutorials Keynote I Keynote II Keynote III Workshops and tutorials
Applications of the Cell Processor Cache Enhancement Techniques Compilers High-performance Communications I High-performance Communications II Storage Solutions for Supercomputing
Lunch Lunch Lunch Lunch Lunch
Workshops and tutorials Optimizing Parallel Applications Transactional Memory I Accelerating Applications with GPUs I Architectures for High-performance Computing Accelerating Applications with GPUs II Transactional Memory II Workshops and tutorials
Posters Excursion and Banquet Novel Supercomputing Applications Power Management

Detailed conference program



Tuesday, 9 June 2009
08:00-08:30 Registration and breakfast
08:30-09:00 Welcome and opening remarks
09:00-10:00 Keynote I: A European Perspective on Supercomputing
Mateo Valero, Director, Barcelona Supercomputing Center and Professor, Universitat Politecnica de Catalunya
Session chair: Valentina Salapura (IBM Research)
10:00-10:30 Break
10:30-12:30 Session 1a: Applications of the Cell processor
Session chair: Franz Franchetti (CMU)
Session 1b: Cache enhancement techniques
Session chair: Adolfy Hoisie (LANL)
Implementation of a Wide-angle Lens Distortion Correction Algorithm on the Cell Broadband Engine
K. Daloukas, C. Antonopoulos, and N. Bellas
Zero-Content Augmented Caches
J. Dusser, T. Piquet, and A. Seznec
High-performance regular expression scanning on the Cell/B.E. processor
D. Scarpazza and G. Russell
Dynamic Cache Clustering for Chip Multiprocessors
M. Hammoud, S. Cho, and R. Melhem
Computer Generation of Fast Fourier Transforms for the Cell Broadband Engine
S. Chellappa, F. Franchetti, and M. Pueschel
Less Reused Filter: Improving L2 Cache Performance via Filtering Less Reused Lines
L. Xiang and T. Chen
DBDB: Optimizing DMA transfer for the Cell BE Architecture
T. Liu, H. Lin, T. Chen, J. K. O'Brien and L. Shao
Divide-and-Conquer: A Bubble Replacement for Low Level Caches
C. Zhang and B. Xue
12:30-14:00 Lunch
14:00-15:30 Session 2a: Optimizing parallel applications
Session chair: Alex Nicolau (UC Irvine)
Session 2b: Transactional memory I
Session chair: Bronis de Supinski (LLNL)
OhHelp: A Scalable Domain-Decomposing Dynamic Load Balancing for Particle-in-Cell Simulations
H. Nakashima, Y. Miyake, H. Usui, and Y. Omura
Fast Memory Snapshot for Concurrent Programming with Synchronization
J. Chung and C. Kozyrakis
A Pattern-based Sparse Matrix Representation for Memory-efficient SMVM Kernels
M. Belgin, G. Back, and C. Ribbens
QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory
V. Gajinov, F. Zyulkyarov, A. Cristal, O. Unsal, E. Ayguade, T. Harris, and M. Valero
Dynamic Topology Aware Load Balancing Algorithms for MD Applications
A. Bhatele, L. Kale, and S. Kumar
Refereeing Conflicts in Hardware Transactional Memory
A. Shriraman and S. Dwarkadas
15:30-17:30 Posters
MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations
A. Faraj, S. Kumar, B. Smith, A. Mamidala, P. Heidelberger, J. Gunnels
A Model-driven Optimization for FFTW
L. Gu, X. Li
TransMetric: Architecture Independent Workload Characterization for Transactional Memory Benchmarks
J. Poe, C. Hughes, T. Li
PARSEC: Hardware Profiling for CMP Design of Emerging Workloads
M. Bhadauria, V. Weaver, S. McKee
Cancellation of Loads that Return Zero Using Zero-Value Caches
M. Islam, S. McKee, P. Stenstrom
Approximate Kernel Matrix Computation on GPUs for Large Scale Learning Applications
M. Hussein, W. Abd-Almageed
Auto-Vectorization through Code Generation for Stream Processing Applications
H. Wang, H. Andrade, B. Gedik, K. Wu
Dynamic Task Set Partitioning Based on Balancing Resource Requirements and Utilization to Reduce Power Consumption
D. Bautista, J. Sahuquillo, H. Hassan, S. Petit, J. Duato
Subdomain Communication to Increase Scalability in Large-Scale Scientific Applications
A. Ovcharenko, M. Shephard, K. Jansen, C. Carothers, O. Sahni
FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs
A. Papakonstantinou, K. Gururaj, J. Stratton, D. Chen, J. Cong, W. Hwu
Access Map Pattern Matching for Data Cache Prefetch
Y. Ishii, M. Inaba, K. Hiraki
Load balancing using Work-stealing for Pipeline Parallelism in Emerging Applications
A. Navarro, R. Asenjo, S. Tabik, C. Cascaval
Prediction-based Power Estimation and Scheduling for CMPs
K. Singh, M. Bhadauria, S. McKee
Prefetch Optimization on Large-scale Applications via Parameter Value Prediction
S. Liao, T. Hung, H. Zhou, D. Nguyen, C. Chou, C. Tu
Design of a Novel SIMD Architecture by Fusing Operations and Registers
J. Chiu, Y. Chou, K. Yang, H. Tzeng, C. Shen
Advantages of Silicon Photonics for Multi-socket Systems
S. Beamer, C. Batten, A. Joshi, V. Stojanovic, K. Asanovic
Thrifty Interconnection Network for HPC Systems
J. Li, L. Zhang, C. Lefurgy, R. Treumann, W. E. Denzei
An Infrastructure for Scalable Parallel Programs for Computational Chemistry
B. Sanders, V. Lotrich, N. Flocke, M. Ponton, R. Bartlett, E. Deumens, A. Perera


Wednesday, 10 June 2009
09:00-10:00 Keynote II: The Roadrunner Project and the Importance of Energy Efficiency on the Road to Exascale Computing
Don Grice, Distinguished Engineer, IBM
Session chair: Michael Gschwind (IBM Research)
10:00-10:30 Break
10:30-12:30 Session 3a: Compilers
Session chair: Lakshmi Renganarayana (IBM Research)
Session 3b: High performance communications I
Session chair: Gagan Agrawal (Ohio State University)
Parametric Multi-Level Tiling of Imperfectly Nested Loops
A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, and P. Sadayappan
Efficient High Performance Collective Communication for the Cell Blade
Q. Ali, S. Midkiff, and V. Pai
Dynamic parallelization of single-threaded binary programs using speculative slicing
C. Wang, Y. Wu, E. Borin, S. Hu, W. Liu, D. Sager, T. Ngai, and J. Fang
Practice of Parallelizing Network Applications on Multi-core Architectures
J. Wang, H. Cheng, B. Hua, and X. Tang
Synchronization Optimizations for Efficient Execution on Multi-Cores
A. Nicolau, G. Li, A. Veidenbaum and A. Kejariwal
Towards 100 Gbits/s Ethernet: Multicore-based Parallel Communication Protocol Design
S. Passas, K. Magoutis, and A. Bilas
Chunking Parallel Loops in the Presence of Synchronization
J. Shirako, J. Zhao, V. Nandivada, and V. Sarkar
Virtualization Polling Engine (VPE): Using Dedicated CPU Cores to Accelerate I/O Virtualization
J. Liu and B. Abali
12:30-13:45 Lunch
13:45-15:15 Session 4a: Accelerating applications with GPUs I
Session chair: Maria Eleftheriou (IBM Research)
Session 4b: Architectures for high-performance computing
Session chair: Sally McKee (Chalmers University)
Fast and Scalable List Ranking on the GPU
M. Rehman, K. Kothapalli, and P. Narayanan
Creating Artificial Global History to Improve Branch Prediction Accuracy
L. Porter and D. Tullsen
Tuned and Wildly Asynchronous Stencil Kernels for Heterogeneous CPU/GPU Platforms
S. Venkatasubramanian and R. Vuduc
Exploring Pattern-Aware Routing in Generalized Fat Tree Networks
G. Herrera, R. Beivide, C. Minkenberg, J. Labarta and M. Valero
Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on the Tesla Architecture
J. Meng and K. Skadron
Understanding the Interconnection Network of a Massively Parallel Real-Time Neural Net Simulator
J. Navaridas, M. Lujan, J. Miguel-Alonso, L. Plana, and S. Furber
15:30-21:00 Excursion and banquet


Thursday, 11 June 2009
09:00-10:00 Keynote III: Computing Outside the Box
Ian Foster, Director, Argonne National Laboratory
Session chair: Jose Moreira (IBM Research)
10:00-10:30 Break
10:30-12:30 Session 5a: High-performance communications II
Session chair: Sam Midkiff (Purdue University)
Session 5b: Storage solutions for supercomputing
Session chair: I-Hsin Chung (IBM Research)
A Graph Based Approach for MPI Deadlock Detection
T. Hilbrich, B. Supinski, M. Mueller, and M. Schulz
FTL Design Exploration in High-Performance Reconfigurable SSD for Server Applications
J. Shin, Z. Xia, N. Xu, R. Gao, X. Cai, S. Maeng, and F. Hsu
Maximizing MPI Point-to-Point Communication Performance on RDMA-enabled Clusters with Customized Protocols
M. Small and X. Yuan
/Scratch as a Cache: Rethinking HPC Center Scratch Storage
H. Monti, A. Butt, and S. Vazhkudai
MPI-aware compiler optimizations for improving communication-computation overlap
A. Danalis, L. Pollock, M. Swany and J. Cavazos
P-Code: A New RAID-6 Code with Optimal Properties
C. Jin, H. Jiang, D. Feng, and L. Tian
Evaluating High Performance Communication: a Power Perspective
J. Liu, D. Poff, and B. Abali
R-ADMAD: High Reliability Provision for Large-Scale De-duplication Archival Storage Systems
C. Liu, Y. Gu, L. Sun, B. Yan, and D. Wang
12:30-14:00 Lunch
14:00-15:30 Session 6a: Accelerating applications with GPUs II
Session chair: Moinuddin Qureshi (IBM Research)
Session 6b: Transactional memory II
Session chair: Marcelo Cintra (U. of Edimburgh)
Single-particle 3D Reconstruction from Cryo-Electron Microscopy Images on GPU
G. Tan and Z. Guo
Combining Thread Level Speculation, Helper Threads, and Runahead Execution
P. Xekalakis, N. Ioannou, and M. Cintra
How GPUs can outperform ASICs for fast LDPC decoding
G. Falcao, V. Silva, and L. Sousa
Limited Early Value Communication to Improve Performance of Transactional Memory
S. Pant and D. Byrd
A Compiler and Runtime System for Enabling Data Mining Applications on GPUs
W. Ma, and G. Agrawal
 
15:30-16:00 Break
16:00-17:30 Session 7a: Novel supercomputing applications
Session chair: Beverly Sanders (University of Florida)
Session 7b: Power management
Session chair: Vijay Naik (IBM Research)
EpiFast: A fast algorithm for large scale realistic epidemic simulations on distributed memory systems
K. Bisset, J. Chen, X. Feng, A. Kumar, and M. Marathe
Adagio: Making DVS Practical for Complex HPC Applications
B. Rountree, D. Lowenthal, B. Supinski, M. Schulz, V. Freeh, and T. Bletsch
Using Many-Core Hardware to Correlate Radio Astronomy Signals
R. van Nieuwpoort and J. Romein
A Comprehensive Power-Performance Model for NoCs with Multi-Flit Channel Buffers
M. Arjomand and H. Sarbazi-Azad
A Parallel Levenberg-Marquardt Algorithm
J. Cao, K. Novstrup, A. Goyal, S. Midkiff, and J. Caruthers
Rate-Based QoS Techniques for Cache/Memory in CMP Platforms
A. Herdrich, R. Illikkal, R. Iyer, and D. Newell
17:30-18:00 Closing remarks