SC14 in New Orleans, LA (USA) is again moving at a pace that one would expect from a high-performance conference. It provides information at an incredible bandwidth and many of us will take weeks to process it all. It's hard to distill it down. So far, I have observed two major trends in the community: locality and analytics.
Locality is becoming one of the major topics in efficient parallel computing. Steadily growing chips, systems, and application parallelism necessitate moving data across larger distances in systems. This can be problematic since data movement is the major cost factor in terms of runtime and energy. A possible strategy to limit these costs is to arrange data such that it moves smaller distances during the computation. It's exciting to see many young scientists working in that field.
There several sessions devoted to locality and its effects, for example, the Birds of a Feather session Programming Abstractions for Data Locality on Wednesday brought together experts to have an open discussion about possible ways to combine the scattered research results into a consistent plan for the community. Such sessions are a key part of the conference and not rarely the birthplace for major trends. For example, many new ideas related to the MPI have been developed in similar contexts.
But what if any arrangement of the application causes a high communication volume? A communication pattern that is commonly found in big data analytics applications is that every process communicates with nearly every other process. Discussions of such workloads are quickly emerging at the conference. For example, the Graph500 list allows system builders to compete for the top position in solving graph problems. This is not coming as a surprise because SC has a long history in the large-scale computing that is often required to solve big data problems.
Attendees of Sunday's Irregular Applications: Architectures and Algorithms workshop had heated discussions about how to utilize the vast knowledge in technical high-performance computing in the context of these new applications. Besides usability and productivity, locality was one of the key topics. There are big opportunities for both, the data-center computing and the high-performance computing communities.
In general, there is no panacea for computation mapping and the best strategy really depends on the structure of the algorithm and its spatial and temporal locality of reference. The trend from highly-structured traditional scientific computing to new unstructured analytics applications is scary and interesting at the same time.
One possible approach is to simply reduce the maximum distance in the overall system. Topics such as low-diameter network topologies are already a trend in data-centric large systems. While several Dragonfly network installations are already operational, research does not stop there. Our paper on the Slim Fly topology goes one step further in order to minimize network diameter towards the theoretical limit. Novel network technologies such as optics and advanced routing enable such new directions, and as Burton Smith said at the end of the talk: "this is now becoming an exciting research direction." SC is the perfect conference to have such discussions as it brings hardware vendors and leaders in industry as well as research into the same room for discussions.
Even the programming models shift to make locality a key component of parallel programming. For example, MPI offers support for efficient process mapping and the new MPI-3 remote memory access programming interface allows to exploit modern hardware support for RDMA. More information can be found in the book Using Advanced MPI, released this week at SC14. New data-centric approaches to parallel programming are emerging at the conference.
SC is the place for high-performance community interaction (pun intended) for everyone ranging from undergraduate students to members of the national academies. It is the perfect location to start moving into our data-centric future and discuss locality as a first-class object. We need a major paradigm shift ranging from new algorithmic approaches and programming models to computer architectures supporting efficient executions. And the first step has to be to educate our community and teach input/output complexity as well as data-centric algorithms.
Torsten Hoefler is an assistant professor in the Scalable Parallel Computing Lab of the Computer Science Department of ETH Zurich.
No entries found