Workshops and Tutorials


The 23rd International Conference on Supercomputing (ICS-2009) program will include workshops and tutorials on Monday, June 8 and Friday, June 12.


Monday, 8 June 2009
Morning W1: APGAS W3: WAHA   T4: PetaPerf  
Afternoon   T2: PAM-LS T3: Cetus
Friday, 12 June 2009
Morning W4: FastOS   T5: SM-Grid T6: StreamPPOT T8: FormalVerif
Afternoon   T7: PM-GPU    

The following workshops will take place in conjuntion with ICS'09:

The following tutorials will take place in conjuntion with ICS'09:

For additional information, please contact the chairs at ics09proposals@gmail.com.


W1: 1st Workshop on Asynchrony in the PGAS Programming Model (APGAS)

June 8th, 9:00-17:30 in 26-014 & 26-024

Workshop organizers: George Almasi, Calin Cascaval and Vijay Saraswat, IBM Research.

The new golden era of concurrency offers a diversity of concurrent architectures to the application programmer { clusters of symmetric multiprocessors, heterogeneous accelerators (such as the Cell, GPGPUs), large core-count integrated machines (such as the Blue Gene), and multiple levels of parallelism (multithreaded multicores, such the newer Power architectures, or distributed multiprocessors like the Cray MTA). The central programming challenge for these new architectures is to develop a suitable and robust programming model. The Partitioned Global Address Space (PGAS) model has attracted considerable attention as such a programming model, particularly in the High Performance Computing space. However, widespread acceptance and extension to newer paradigms (e.g. to heteregeneous accelerators and irregular application domains) has been hampered by a focus on homogoneous execution contexts and the Single Program Multiple Data (SPMD) threading model. Several research groups have recently started to investigate asynchronous execution in the PGAS model. Asynchronous execution is the foundation for active message programming, and fine-grained concurrency (e.g. Cilk-style fork-join, OpenMP style teams). It is also the basis for data transfer between non-coherent memories (DMA), remote atomic operations, and in general for techniques for overlapping computation and communication. Finally, asynchronous execution offers a simple model for spawning massive multi-threaded kernel computations on accelerators such as GPUs.

Link to workshop page

APGAS workshop program


W2: High Performance Embedded Storage Systems (HPESS) (CANCELLED)

Embedded systems are deployed in a wide range of applications including consumer products, car electronics and nuclear power plants. Storage solutions for embedded systems must meet high performance standards, de_ned by criteria such as reliability, speed, energy consumption, weight/space constraints and cost. These solutions are increasingly based on relatively recent technologies such as Flash and MEMS. This workshop aims to bring together researchers from both academia and industry to address, explore and exchange new ideas or results in both hardware design and software implementation of storage solutions for embedded high performance systems.

Link to workshop page


W3: 1st Workshop on Accelerators for High-performance Architectures (WAHA)

June 8th, 8:30-12:30 in 20-001

Workshop organizers:Tejas Karkhanis and Pradip Bose (IBM Research)

Application specific accelerators provide an efficient method of offloading computationally intensive task from the general purpose processor. Application specific accelerators are emerging in high performance computer architectures such as IBM's Mainframe, Cell-based and Roadrunner supercomputer systems, SUN's Niagara-based systems and others. The option of employing application specific accelerators brings lots of opportunities as well as challenges to the chip designer. The designer must arrive at the appropriate parts of the applications to accelerate with a hardware accelerator, design an efficient accelerator such that the hardware cost is amortized, adapt OS/compilers to make an efficient use of the hardware accelerator and so on. This workshop brings together hardware and software researchers and practitioners for discussion on the potential and limitations of application specific accelerators in the specific context of high-end processors, servers and supercomputing systems.

Link to workshop page

WAHA workshop program


W4: Forum to Address Scalable Technology for runtime and Operating Systems (FastOS)

June 12th, 8:30-18:00 in 26-014 & 26-024

Workshop organizers: Ron Minnich (Sandia National Labs) and Eric Van Hensbergen (IBM Austin Research Lab)

This workshop will be our first FAST-OS meeting for the new research areas. We will have presentations from the funded FAST-OS projects but also leave lots of time for discussion. We will consider a number of areas, targeted to future systems with 100 million CPUs. Topic areas include resiliency, kernel structure, file systems, scalability, and I/O.

Link to workshop page

FastOS workshop program


T1: HPC with Microsoft Windows HPC Server 2008 - A Programmer's Perspective (WinHPC) (CANCELLED)

Microsoft entered the HPC scene in 2006 with Windows Compute Cluster Server. HPC Server 2008 represents version 2, offering a host of new features including SOA support, head node failover, improved manageability, faster networking, and OGF-compatible interfaces. For software developers, the options in v2 are compelling: continued support for OpenMP and MPI, distributed-memory programming in .NET with MPI.NET, shared-memory programming with the Task Parallel Library and Parallel LINQ, functional parallel programming with F#,interactive, SOA-based HPC programs via Windows Communication Foundation brokers, integration with Windows Event Tracing for a more complete profiling picture, and improved APIs for programmatic interaction. This tutorial will discuss each of these additions in detail, using code examples and live demos on a portable cluster. At the end of the tutorial, attendees will understand the motivations behind each option, and how best to take advantage.

Tutorial organizer: Joe Hummel (hummel@lakeforest.edu). Department of Mathematics and Computer Science, Lake Forest College and Pluralsight, LLC.

Link to tutorial page


T2: A Practical Approach to Performance Analysis and Modeling of Large-scale Systems (PAM-LS)

June 8th, 14:00-17:30 in 26-004 (break 15:30-16:00)

This tutorial presents a practical approach to the performance modeling of large-scale, scientific applications on high performance systems. The defining characteristic of our tutorial involves the description of aproven modeling approach, developed at Los Alamos, of full-blown scientific codes, ranging from a few thousand to over 100,000 lines, that has been validated on systems containing 1,000's of processors. The goal is to impart a detailed understanding of factors contributing to the resulting performance of an application when mapped onto a given HPC platform. Performance modeling is the only technique that canquantitatively elucidate this understanding. We show how models are constructed and demonstrate how they are used to predict, explain, diagnose, and engineer application performance in existing or futurecodes and/or systems. Notably, our approach does not require the use of specific tools but rather is applicable across commonly used environments. Moreover, since our performance models are parametric in terms of machine and application characteristics, they imbue the user with the ability to "experiment ahead" with different system configurations or algorithms/coding strategies. Both will be demonstrated in studies emphasizing the application of these modeling techniques including: verifying system performance, comparison of large-scale systems, and examination of possible future systems.

Tutorial organizers: Adolfy Hoisie and Darren J. Kerbyson, Performance and Architecture Lab (PAL), Los Alamos National Laboratory.

Link to tutorial page


T3: A Source-to-Source Compiler Infrastructure for Multi-cores (Cetus)

June 8th, 14:00-17:30 in 20-001 (break 15:30-16:00)

This tutorial will introduce Cetus, a source-to-source restructuring compiler infrastructure for C programs. The Cetus is a community resource developed in support by the National Science Foundation. The infrastructure is available at link. Cetus is already used by a number of research projects in the U.S. and in other countries. Its main distinction from related infrastructure efforts is its focus on high-level source-to-source translation for C programs and abstract internal representation. These features have already proven to enable highly efficient design and implementation of new compilation techniques. The tutorial aims to reach a wider audience and provide guidance for the use of the resource and its advanced optimization techniques. These techniques include new symbolic analysis methods, automatic parallelization for multi-cores, and optimizations for heterogeneous multi-cores.

Link to tutorial page


T4: Tools for Scalable Performance Analysis on Petascale Systems (PetaPerf)

June 8th, 9:00-12:30 in 26-004 (break 10:30-11:00)

Tools are becoming increasingly important to efficiently utilize the computing power available in contemporary large scale systems. The drastic increase in the size and the complexity of systems require tools to be scalable while producing meaning full and easily digestible information that may help the user pin-point problems at scale. The goal of this tutorial is to introduce some state-of-the-art performance tools from three different organizations to a diverse audience group. Together these tools provide a broad spectrum of capabilities necessary to analyze the performance of scientific and engineering applications on a variety of large and small scale systems.

Tutorial organizers: I-Hsin Chung (IBM T. J. Watson Research Center), Seetharami R. Seelam (IBM T. J. Watson Research Center), Bernd Mohr (Julich Supercomputing Centre) and Jesus Labarta (Barcelona Supercomputing Center).

Link to tutorial page


T5: Security and VO Management in Grids (SMGrid)

June 12th, 9:00-12:30 in 26-004 (break 10:30-11:00)

This tutorial provides an overview of security and Virtual Organization management in established and new Grid systems. We survey the security and Virtual Organization management features provided by some major Grid middleware packages, and introduce the comparable functionality in XtreemOS, a Grid-based operating system. Concepts in Grid security are introduced, including their respective challenges and protection mechanisms. We describe the Globus, gLlite and UNICORE middleware packages, showing the services they provide, their VO management functions, and security abilities. The tutorial then explores the features of the XtreemOS Grid operating system, demonstrating the advantages of close integration between Grid functionality and operating system facilities.

Tutorial organizers: Yvon Jegou and Christine Morin (INRIA).

Link to tutorial page


T6: Embedded Streaming: Parallel Programming, Optimizations and Tools (StreamPPOT)

June 12th, 9:00-12:30 in 20-001 (break 10:30-11:00)

Streaming applications are characterized by data-driven computations on unbounded data vectors, and have proliferated in many domains such as video processing, advanced radio communications and signal processing in general. To achieve efficient implementations on multicore architectures, especially in view of prevalent real-time requirements, a streaming application needs to be carefully partitioned into independent or pipelined computation tasks considering the available processing, memory and communication resources. These challenges led to the development of supporting tool-chains, including that of the recent ACOTES (Advanced Compiler Technologies for Embedded Streaming) project. The ACOTES approach focuses on compiler-assisted mapping of streaming tasks onto highly parallel systems; automatic analysis and transformation techniques support the partitioning and mapping process, based on properties of the application domain, quantitative information about the target, and programmer directives. This tutorial presents the tool-chain and programming model designed in the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners. This tool-chain consists of an open-source trunk with optional non-free add-ons. We will review the streaming domain, its main trends, typical applications and architectures, programming challenges, and available solutions. We will walk-through a hands-on example of a streaming program starting from programming using special pragmas through multiple levels of compilation, down to actual execution on multicore architectures.

Tutorial organizers: Albert Cohen (INRIA), Xavier Martorell (UPC), Harm Munk (NXP), Dorit Nuzman (IBM), Andrea Ornstein (TMicroelectronics), Sebastian Pop (AMD), Uzi Shvadron (IBM) and Ayal Zaks (IBM).

Link to tutorial page


T7: Programming Models and Compiler Optimizations for GPUs and Multi-Core Processors (PM-GPU)

June 12th, 14:00-17:30 in 20-001 (break 15:30-16:00)

On-chip parallelism with multiple cores is now ubiquitous. Because of power and cooling constraints, recent performance improvements in both general-purpose and special-purpose processors have come primarily from increased on-chip parallelism rather than increased clock rates. Parallelism is therefore of considerable interest to a much broader group than developers of parallel applications for high-end supercomputers. Several programming environments have recently emerged in response to the need to develop applications for GPUs, the Cell processor, and multi-core processors from AMD, IBM, Intel etc. As commodity computing platforms all go parallel, programming these platforms in order to attain high performance has become an extremely important issue. There has been considerable recent interest in two complementary approaches: 1) developing programming models that explicitly expose the programmer to parallelism; and 2) compiler optimization frameworks to automatically transform sequential programs for parallel execution.
This tutorial will provide an introductory survey covering both these aspects. In contrast to conventional multicore architectures, GPUs and the Cell processor have to exploit parallelism while managing the physical memory on the processor (since there are no caches) by explicitly orchestrating the movement of data between large off-chip memories and the limited on-chip memory. This tutorial will address the issue of explicit memory management in detail.

Tutorial organizers: J. (Ram) Ramanujam (Department of Electrical and Computer Engineering and Center for Computation and Technology, Louisiana State University) and P. (Saday) Sadayappan (Department of Computer Science and Engineering The Ohio State University Columbus)

Link to tutorial page


T8: Practical Formal Verification of MPI and Thread Programs (FormalVerif)

June 12th, 9:00-12:30 in 20-051 (break 10:30-11:00)

This tutorial covers two tools, ISP and Inspect, developed under Microsoft and NSF funding. ISP ("In-situ Partial Order") is a tool for formal verification of MPI programs. Like conventional model checkers, ISP verifies the complete state space of a system and a test harness for a set of safety properties. However, unlike model checkers, ISP performs code level verification. This means that the tool replays all relevant interleavings of a concurrent program by replaying the actual program code without burdening users for building verification models. Relevant interleavings are computed through a customized dynamic partial reduction algorithm. ISP has been used to successfuly verify up to 14,000 lines of MPI/C code for deadlocks and assertion violations. ISP has been tested with MPICH2, OpenMPI,and Microsoft MPI libraries, and is available for download for Linux and Mac OS/X; also as a Visual Studio plugin for running under Windows. The tool looks and feels like a debugger - hence ideal for use from beginning courses on MPI to advanced development, saving valuable late-cycle debugging time due to missed bugs. Inspect is a similar tool for formal verification of Pthread C programs. It performs automated program instrumentation, and verifies thread programs of up to several thousand lines of code for data races, deadlocks, and assertion violations. It uses a family of dynamic partial reduction algorithms for interleaving reduction. Our half-day tutorial will provide the attendees a LiveCD to boot into, and be able to use ISP and Inspect, understanding the usage of the tools and their algorithms, and tool limitations.

Tutorial Organizer: Ganesh Gopalakrishnan, School of Computing, University of Utah

Link to tutorial page and notes