Conference Program
Conference at a glance
| Monday | Tuesday | Wednesday | Thursday | Friday | |||
| Workshops and tutorials | Keynote I | Keynote II | Keynote III | Workshops and tutorials | |||
| Applications of the Cell Processor | Cache Enhancement Techniques | Compilers | High-performance Communications I | High-performance Communications II | Storage Solutions for Supercomputing | ||
| Lunch | Lunch | Lunch | Lunch | Lunch | |||
| Workshops and tutorials | Optimizing Parallel Applications | Transactional Memory I | Accelerating Applications with GPUs I | Architectures for High-performance Computing | Accelerating Applications with GPUs II | Transactional Memory II | Workshops and tutorials |
| Posters | Excursion and Banquet | Novel Supercomputing Applications | Power Management | ||||
Detailed conference program
| Tuesday, 9 June 2009 | ||
| 08:00-08:30 | Registration and breakfast | |
| 08:30-09:00 | Welcome and opening remarks | |
| 09:00-10:00 |
Keynote I: A European Perspective on Supercomputing Mateo Valero, Director, Barcelona Supercomputing Center and Professor, Universitat Politecnica de Catalunya Session chair: Valentina Salapura (IBM Research) |
|
| 10:00-10:30 | Break | |
| 10:30-12:30 |
Session 1a: Applications of the Cell processor Session chair: Franz Franchetti (CMU) |
Session 1b: Cache enhancement techniques Session chair: Adolfy Hoisie (LANL) |
|
Implementation of a Wide-angle Lens Distortion Correction Algorithm on the Cell Broadband Engine K. Daloukas, C. Antonopoulos, and N. Bellas |
Zero-Content Augmented Caches J. Dusser, T. Piquet, and A. Seznec |
|
|
High-performance regular expression scanning on the Cell/B.E. processor D. Scarpazza and G. Russell |
Dynamic Cache Clustering for Chip Multiprocessors M. Hammoud, S. Cho, and R. Melhem |
|
|
Computer Generation of Fast Fourier Transforms for the Cell Broadband Engine S. Chellappa, F. Franchetti, and M. Pueschel |
Less Reused Filter: Improving L2 Cache Performance via Filtering Less Reused Lines L. Xiang and T. Chen |
|
|
DBDB: Optimizing DMA transfer for the Cell BE Architecture T. Liu, H. Lin, T. Chen, J. K. O'Brien and L. Shao |
Divide-and-Conquer: A Bubble Replacement for Low Level Caches C. Zhang and B. Xue |
|
| 12:30-14:00 | Lunch | |
| 14:00-15:30 |
Session 2a: Optimizing parallel applications Session chair: Alex Nicolau (UC Irvine) |
Session 2b: Transactional memory I Session chair: Bronis de Supinski (LLNL) |
|
OhHelp: A Scalable Domain-Decomposing Dynamic Load Balancing for Particle-in-Cell Simulations H. Nakashima, Y. Miyake, H. Usui, and Y. Omura |
Fast Memory Snapshot for Concurrent Programming with Synchronization J. Chung and C. Kozyrakis |
|
|
A Pattern-based Sparse Matrix Representation for Memory-efficient SMVM Kernels M. Belgin, G. Back, and C. Ribbens |
QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory V. Gajinov, F. Zyulkyarov, A. Cristal, O. Unsal, E. Ayguade, T. Harris, and M. Valero |
|
|
Dynamic Topology Aware Load Balancing Algorithms for MD Applications A. Bhatele, L. Kale, and S. Kumar |
Refereeing Conflicts in Hardware Transactional Memory A. Shriraman and S. Dwarkadas |
|
| 15:30-17:30 | Posters | |
|
MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations A. Faraj, S. Kumar, B. Smith, A. Mamidala, P. Heidelberger, J. Gunnels |
A Model-driven Optimization for FFTW L. Gu, X. Li |
|
|
TransMetric: Architecture Independent Workload Characterization for Transactional Memory Benchmarks J. Poe, C. Hughes, T. Li |
PARSEC: Hardware Profiling for CMP Design of Emerging Workloads M. Bhadauria, V. Weaver, S. McKee |
|
|
Cancellation of Loads that Return Zero Using Zero-Value Caches M. Islam, S. McKee, P. Stenstrom |
Approximate Kernel Matrix Computation on GPUs for Large Scale Learning Applications M. Hussein, W. Abd-Almageed |
|
|
Auto-Vectorization through Code Generation for Stream Processing Applications H. Wang, H. Andrade, B. Gedik, K. Wu |
Dynamic Task Set Partitioning Based on Balancing Resource Requirements and Utilization to Reduce Power Consumption D. Bautista, J. Sahuquillo, H. Hassan, S. Petit, J. Duato |
|
|
Subdomain Communication to Increase Scalability in Large-Scale Scientific Applications A. Ovcharenko, M. Shephard, K. Jansen, C. Carothers, O. Sahni |
FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs A. Papakonstantinou, K. Gururaj, J. Stratton, D. Chen, J. Cong, W. Hwu |
|
|
Access Map Pattern Matching for Data Cache Prefetch Y. Ishii, M. Inaba, K. Hiraki |
Load balancing using Work-stealing for Pipeline Parallelism in Emerging Applications A. Navarro, R. Asenjo, S. Tabik, C. Cascaval |
|
|
Prediction-based Power Estimation and Scheduling for CMPs K. Singh, M. Bhadauria, S. McKee |
Prefetch Optimization on Large-scale Applications via Parameter Value Prediction S. Liao, T. Hung, H. Zhou, D. Nguyen, C. Chou, C. Tu |
|
|
Design of a Novel SIMD Architecture by Fusing Operations and Registers J. Chiu, Y. Chou, K. Yang, H. Tzeng, C. Shen |
Advantages of Silicon Photonics for Multi-socket Systems S. Beamer, C. Batten, A. Joshi, V. Stojanovic, K. Asanovic |
|
|
Thrifty Interconnection Network for HPC Systems J. Li, L. Zhang, C. Lefurgy, R. Treumann, W. E. Denzei |
An Infrastructure for Scalable Parallel Programs for Computational Chemistry B. Sanders, V. Lotrich, N. Flocke, M. Ponton, R. Bartlett, E. Deumens, A. Perera |
|
| Wednesday, 10 June 2009 | ||
| 09:00-10:00 |
Keynote II: The Roadrunner Project and the Importance of Energy Efficiency on the Road to Exascale Computing Don Grice, Distinguished Engineer, IBM Session chair: Michael Gschwind (IBM Research) |
|
| 10:00-10:30 | Break | |
| 10:30-12:30 |
Session 3a: Compilers Session chair: Lakshmi Renganarayana (IBM Research) |
Session 3b: High performance communications I Session chair: Gagan Agrawal (Ohio State University) |
|
Parametric Multi-Level Tiling of Imperfectly Nested Loops A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, and P. Sadayappan |
Efficient High Performance Collective Communication for the Cell Blade Q. Ali, S. Midkiff, and V. Pai |
|
|
Dynamic parallelization of single-threaded binary programs using speculative slicing C. Wang, Y. Wu, E. Borin, S. Hu, W. Liu, D. Sager, T. Ngai, and J. Fang |
Practice of Parallelizing Network Applications on Multi-core Architectures J. Wang, H. Cheng, B. Hua, and X. Tang |
|
|
Synchronization Optimizations for Efficient Execution on Multi-Cores A. Nicolau, G. Li, A. Veidenbaum and A. Kejariwal |
Towards 100 Gbits/s Ethernet: Multicore-based Parallel Communication Protocol Design S. Passas, K. Magoutis, and A. Bilas |
|
|
Chunking Parallel Loops in the Presence of Synchronization J. Shirako, J. Zhao, V. Nandivada, and V. Sarkar |
Virtualization Polling Engine (VPE): Using Dedicated CPU Cores to Accelerate I/O Virtualization J. Liu and B. Abali |
|
| 12:30-13:45 | Lunch | |
| 13:45-15:15 |
Session 4a: Accelerating applications with GPUs I Session chair: Maria Eleftheriou (IBM Research) |
Session 4b: Architectures for high-performance computing Session chair: Sally McKee (Chalmers University) |
|
Fast and Scalable List Ranking on the GPU M. Rehman, K. Kothapalli, and P. Narayanan |
Creating Artificial Global History to Improve Branch Prediction Accuracy L. Porter and D. Tullsen |
|
|
Tuned and Wildly Asynchronous Stencil Kernels for Heterogeneous CPU/GPU Platforms S. Venkatasubramanian and R. Vuduc |
Exploring Pattern-Aware Routing in Generalized Fat Tree Networks G. Herrera, R. Beivide, C. Minkenberg, J. Labarta and M. Valero |
|
|
Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on the Tesla Architecture J. Meng and K. Skadron |
Understanding the Interconnection Network of a Massively Parallel Real-Time Neural Net Simulator J. Navaridas, M. Lujan, J. Miguel-Alonso, L. Plana, and S. Furber |
|
| 15:30-21:00 | Excursion and banquet | |
| Thursday, 11 June 2009 | ||
| 09:00-10:00 |
Keynote III: Computing Outside the Box Ian Foster, Director, Argonne National Laboratory Session chair: Jose Moreira (IBM Research) |
|
| 10:00-10:30 | Break | |
| 10:30-12:30 |
Session 5a: High-performance communications II Session chair: Sam Midkiff (Purdue University) |
Session 5b: Storage solutions for supercomputing Session chair: I-Hsin Chung (IBM Research) |
|
A Graph Based Approach for MPI Deadlock Detection T. Hilbrich, B. Supinski, M. Mueller, and M. Schulz |
FTL Design Exploration in High-Performance Reconfigurable SSD for Server Applications J. Shin, Z. Xia, N. Xu, R. Gao, X. Cai, S. Maeng, and F. Hsu |
|
|
Maximizing MPI Point-to-Point Communication Performance on RDMA-enabled Clusters with Customized Protocols M. Small and X. Yuan |
/Scratch as a Cache: Rethinking HPC Center Scratch Storage H. Monti, A. Butt, and S. Vazhkudai |
|
|
MPI-aware compiler optimizations for improving communication-computation overlap A. Danalis, L. Pollock, M. Swany and J. Cavazos |
P-Code: A New RAID-6 Code with Optimal Properties C. Jin, H. Jiang, D. Feng, and L. Tian |
|
|
Evaluating High Performance Communication: a Power Perspective J. Liu, D. Poff, and B. Abali |
R-ADMAD: High Reliability Provision for Large-Scale De-duplication Archival Storage Systems C. Liu, Y. Gu, L. Sun, B. Yan, and D. Wang |
|
| 12:30-14:00 | Lunch | |
| 14:00-15:30 |
Session 6a: Accelerating applications with GPUs II Session chair: Moinuddin Qureshi (IBM Research) |
Session 6b: Transactional memory II Session chair: Marcelo Cintra (U. of Edimburgh) |
|
Single-particle 3D Reconstruction from Cryo-Electron Microscopy Images on GPU G. Tan and Z. Guo |
Combining Thread Level Speculation, Helper Threads, and Runahead Execution P. Xekalakis, N. Ioannou, and M. Cintra |
|
|
How GPUs can outperform ASICs for fast LDPC decoding G. Falcao, V. Silva, and L. Sousa |
Limited Early Value Communication to Improve Performance of Transactional Memory S. Pant and D. Byrd |
|
|
A Compiler and Runtime System for Enabling Data Mining Applications on GPUs W. Ma, and G. Agrawal |
  | |
| 15:30-16:00 | Break | |
| 16:00-17:30 |
Session 7a: Novel supercomputing applications Session chair: Beverly Sanders (University of Florida) |
Session 7b: Power management Session chair: Vijay Naik (IBM Research) |
|
EpiFast: A fast algorithm for large scale realistic epidemic simulations on distributed memory systems K. Bisset, J. Chen, X. Feng, A. Kumar, and M. Marathe |
Adagio: Making DVS Practical for Complex HPC Applications B. Rountree, D. Lowenthal, B. Supinski, M. Schulz, V. Freeh, and T. Bletsch |
|
|
Using Many-Core Hardware to Correlate Radio Astronomy Signals R. van Nieuwpoort and J. Romein |
A Comprehensive Power-Performance Model for NoCs with Multi-Flit Channel Buffers M. Arjomand and H. Sarbazi-Azad |
|
|
A Parallel Levenberg-Marquardt Algorithm J. Cao, K. Novstrup, A. Goyal, S. Midkiff, and J. Caruthers |
Rate-Based QoS Techniques for Cache/Memory in CMP Platforms A. Herdrich, R. Illikkal, R. Iyer, and D. Newell |
|
| 17:30-18:00 | Closing remarks | |


