LCPC 2017

The 30th International Workshop on Languages and Compilers for Parallel Computing

Co-located with CnC 2017 - The Ninth Annual Concurrent Collections Workshop

LCPC October 11-13, 2017 & CnC October 14, 2017
Texas A&M University, College Station, Texas


Keynotes, Invited Speakers, and Panelists


Invited Speakers



Making Sparse Fast
Saman Amarasinghe, MIT, USA
Achieving high performance is no easy task. When it comes to programs operating on sparse data, where there is very little hardware, programming language or compiler support, getting high performance is nearly impossible. As important modern problems such as deep learning in big data analytics and physical simulations in computer graphics and scientific computing operate on sparse data, lack of performance is becoming a critical issue. Achieving high performance was so important from the early days of computing, many researchers have spent their lifetime trying to extract more FLOPS out of critical codes. Hardcore performance engineers try to get to this performance nirvana single handedly without any help from languages, compilers or tools. In this talk, using two examples, I’ll argue that domain specific languages and compiler technology can take most of the performance optimization burden even in a very difficult domain such as sparse computations.
The first example I will describe is TACO, an optimizing code generator for linear and tensor algebra. TACO introduces a new techniques for compiling compound tensor algebra expressions into efficient loops. TACO-generated code has competitive performance to best-in-class hand-written codes for sparse, dense and mixed tensor and matrix expressions.
Next, I will introduce Simit, a new language for physical simulation. Simit lets the programmer seamlessly work on a physical system both in its individual geometric elements as a graph as well as the behavior of the entire system as a set of global tensors. We demonstrate that Simit is easy to use: a Simit program is typically shorter than a Matlab program; that it is high performance: a Simit program running sequentially on a CPU performs comparably to hand-optimized simulations; and that it is portable: Simit programs can be compiled for GPUs with no change to the program, delivering 4 to 20× speedups over our optimized CPU code.
Saman Amarasinghe is a Professor at MIT.

Vivek Sarkar Software Challenges for Extreme Heterogeneity
Vivek Sarkar, Georgia Institute of Technology, USA
It is widely recognized that a major disruption is under way in computer hardware as processors strive to extend, and go beyond, the end-game of Moore's Law. This disruption will include new forms of heterogenous processor and memory hierarchies, near-memory computation structures, and, in some cases, Non-von Neumann computing elements. In this talk, we summarize the software challenges for these levels of "extreme heterogeneity", with a focus on the role of programming systems, which encompass programming models, compilers, and runtime systems. These challenges anticipate a new vision for programming systems that goes beyond their traditional role of mapping a specific subcomputation to a specific hardware platform, to an expanded world view in which programming systems control the global selection of computation and data mappings of subcomputations on heterogenenous subsystems. We will discuss recent trends in programming models, compilers, and runtime systems that point the way towards addressing the challenges of extreme heterogeneity.
Vivek Sarkar is a Professor in the School of Computer Science, and the Stephen Fleming Chair for Telecommunications in the College of Computing at at Georgia Institute of Technology, since August 2017. Prior to joining Georgia Tech, Sarkar was a Professor of Computer Science at Rice University, and the E.D. Butcher Chair in Engineering. During 2007 - 2017, Sarkar built Rice's Habanero Extreme Scale Software Research Group with the goal of unifying parallelism and concurrency elements of high-end computing, multicore, and embedded software stacks (http://habanero.rice.edu). He also served as Chair of the Department of Computer Science at Rice during 2013 - 2016.
Prior to joining Rice in 2007, Sarkar was Senior Manager of Programming Technologies at IBM Research. His research projects at IBM included the X10 programming language, the Jikes Research Virtual Machine for the Java language, the ASTI optimizer used in IBM’s XL Fortran product compilers, and the PTRAN automatic parallelization system. Sarkar became a member of the IBM Academy of Technology in 1995, and was inducted as an ACM Fellow in 2008. He has been serving as a member of the US Department of Energy’s Advanced Scientific Computing Advisory Committee (ASCAC) since 2009, and on CRA’s Board of Directors since 2015.

Harry Wijshoff A New Framework for Expressing, Parallelizing and Optimizing Parallel Applications
Harry Wijshoff, University of Leiden, the Netherlands
The Forelem framework was initially introduced as a means to optimize database queries using optimization techniques developed for compilers. Since its introduction, Forelem has proven to be more versatile and to be applicable beyond database applications. In this talk we show that the original Forelem framework can be adapted to express general applications and demonstrate how this framework can be used to express and optimize applications. More specifically, we will demonstrate the effectiveness of this framework by applying it to k-Means clustering and PageRank, resulting in automatically generated implementations of these applications. These implementations are more efficient than standard, hand coded, and state of the art MPI C/C++ implementations of k-Means and PageRank, as well as significantly outperform state-of-the-art Hadoop implementations.
H. Wijshoff has a long experience in parallel computing. Next to having published in major technical journals (IEEE Computer, IEEE Trans. on Computers, IEEE Trans. on Software Engineering, IEEE Trans. on Parallel and Distributed Computing, IEEE Trans. on VLSI Systems, ACM Trans. on Computer Systems, ACM Trans. on Database Systems, ACM Trans. on Mathematical Software, SIAM Journal on Computing), he is author of the book "Data Organization in Parallel Computers", Kluwer Academic Publishers, Boston, USA. The last 27 years he held positions at Utrecht University, the Netherlands, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, USA, and the Research Institute for Advanced Computer Science, NASA Ames, Mountain View, California, USA.
Since July 1, 1992, he is a full professor at the Computer Science Department (LIACS) of the University of Leiden. He has been working on data organization in parallel computers, applications and parallel algorithms, performance evaluation, optimizing compilers for irregular computations.

Katherine Yelick Languages and Compilers for Exascale Science
Katherine Yelick, UC Berkeley, Lawrence Berkeley National Laboratory, USA
In the next few years, exascale computing systems will become available to the scientific community. They will require new levels of parallelization, new models of memory and storage, and a variety of node architectures for processors and accelerators. They will enable simulations with unprecedented scale and complexity across many fields from fundamental science to the environment, infrastructure design, and human health. These systems will also offer exascale data analysis capabilities, allowing genomes, images, and sensor data to be processed, analyzed, and modeled using machine learning and other analytics techniques. But several programming challenges remain as architectures are diverging and data movement continues to dominate computation. In this talk, I will describe some of the latest “communication-avoiding” algorithms and open questions on automating the communication optimizations. Using example from genomics, MRI image analysis, and machine learning, I will argue that we can take advantage of the characteristics of particular science domains to produce compilers, libraries and runtime systems that are powerful and convenient, while still providing scalable, high performance code.
Katherine Yelick is a Professor of Electrical Engineering and Computer Sciences at the University of California at Berkeley and the Associate Laboratory Director for Computing Sciences at Lawrence Berkeley National Laboratory. Her research is in parallel programming languages, compilers, algorithms, and automatic performance tuning. Yelick was Director of the National Energy Research Scientific Computing Center (NERSC) from 2008 to 2012 and currently leads the Computing Sciences Area at Berkeley Lab, which includes NERSC, the Energy Sciences Network (ESnet) and the Computational Research Division (CRD). Yelick was recently elected to the National Academy of Engineering (NAE) and the American Associate of Arts and Sciences, and is an ACM Fellow and recipient of the ACM/IEEE Ken Kennedy and Athena awards.

Invited Speakers

James Brodman Programming in a Spatial World
James Brodman, Intel
Moore’s Law provided ever increasing performance gains for decades. However, power has become a limiting factor for architectural improvements. The increasing success of accelerators like GPUs and FPGAs shows willingness to trade off general purpose flexibility for greater efficiency and performance.
This talk will examine generating high-performing programs for FPGAs. While existing data parallel programming models are capable of generating good results, a few simple extensions to these models can exploit the unique nature of these devices. Several architecture-specific optimizations will be discussed as well as difficulties that arise due to optimization tradeoffs that differ from those for CPUs or GPUs.
James Brodman is a Staff Engineer in the Technology Pathfinding and Innovation team at Intel. He has worked on tools such as Intel Concurrent Collections and the Intel SPMD Program Compiler. He earned his PhD in Computer Science from the University of Illinois under the direction of David Padua and Maria Garzaran.

James Browne James Browne, University of Texas, USA
James C. Browne is professor emeritus of computer science, research professor at ICES and chief technology officer for the Ranger system at the Texas Advanced Computing Center. He earned his Ph.D. in chemical physics from the The University of Texas at Austin. Brownes current research interests span parallel programming and computation, performance optimization and fault/failure management for complex systems. Browne has attained fellow status in five different professional societies across several areas including the Association for Computing Machinery, the American Physical Society, the American Association for the Advancement of Science and the Institute for Constructive Capitalism. Browne received the 2004 University of Texas at Austin Career Research Excellence Award for maintaining a superior research program across multiple fields over a 45-year career during which he supervised or co-supervised the Ph.D. research of 69 students in four different fields.

Henry Dietz Henry Dietz (University of Kentucky), USA
Henry Dietz is a Professor in the Department of Electrical and Computer Engineering at the University of Kentucky, and he holds the James F. Hardymon Chair in Networking. He earned his B.S., M.S., and Ph.D. in Computer Science from the Polytechnic Institute of New York. His research instrests include compilers, digital imaging, hardware architectures and networking, operating systems, and parallel processing.

Rudolf Eigenmann Is parallelization technology ready for prime time?
Rudolf Eigenmann, University of Delaware, USA
Our language and compilers community has created a large body of work in parallelization techniques over the past three decades. Nevertheless, current practical compilers make little use of these contributions. Automatic parallelization tools have a mixed reputation at best. This situation contrasts with the expectation of the now over two-years-old National Strategic Computing Initiative (NSCI). The NSCI, in addition to pushing high-end compute capabilities, wants to make high-performance computing available to the large majority of non-experts in parallel computing. In this talk, after a brief review of the past, I will plot a path forward that aims to ensure that the technology our community has put so much energy in, will be harnessed and benefit a large number of HPC users. Elements of this path include highly interactive translators, options that can set the degree of automation versus user involvement, and tight involvement with the applications community, giving us constant feedback on how to improve the tools.
Rudolf (Rudi) Eigenmann has worked in the area of compilers in parallel computing since his Ph.D. at ETH Zürich, Switzerland in 1988. Over this time, he worked at the University of Illinois, Purdue University, the National Science Foundation, and has just joined the University of Delaware. His research interests include optimizing compilers, programming methodologies, tools, and performance evaluation for high-performance computing, as well as the design of cyberinfrastructure.

Guang Gao Compilers for Program Execution Models Inspired By Dataflow - A Personal Reflection of 30 Years (1987-2017)
Guang Gao, University of Delaware, USA
Recently we have witnessed a rapid growing Interests and activities on dataflow program execution models and systems -- - from academia and industry. In this talk, I will present a personal reflection on compiler technology evolution for dataflow-inspired parallel architectures in the past 30 years. Remarks will be made on aspects that may be particularly useful in exploring future innovative system design assisted by modern hardware/software technologies especially when facing the challenges from applications in advanced data analytics and machine learning.
Guang R. Gao is a computer scientist and an Endowed Distinguished Professor of Electrical and Computer Engineering at University Delaware. Gao’s research domain is high-performance computer architecture and systems. He is a leader in parallel computation models and systems – has a focus on dataflow program execution model and systems. Gao’s work has also been associated with his entrepreneur effort -- - applying his dataflow R&D results for real world applications. A unique achievement of Gao’s team is the critical role in the now legendary supercomputing system project – funded by DoD and IBM - known as IBM Cyclops-64 Supercomputer. The success on Cyclops64 supercomputer is recognized as a winner of disruptive technology award in SC2007 as one of the large-scale real system successfully designed, built and delivered based on innovative many-core chip technology a decade ago.
Gao is an ACM Fellow and IEEE Fellow, and a recipient of the Outstanding Achievement Award of Oversee Scholars from CCF (Chinese Computer Federation). He is a founder and chairman of the IEEE Computer Society Dataflow STC (the Special Interested Community of Parallel Model and System: Dataflow and Beyond). Gao is named the recipient of 2017 Bob Ramakrishna Rau Award.

Hironori Kasahara
Multigrain Parallelization and Compiler/Architecture Co-design for 30 Years with LCPC
Hironori Kasahara (Waseda University), Japan
Multicores have been attracting much attention to improve performance and reduce power consumption of computing systems facing the end of Moore’s Law. To obtain high performance and low power on multicores, co-design of hardware and software especially parallelizing and power reducing compiler is very important. OSCAR (Optimally Scheduled Advanced Multiprocessor) compiler and OSCAR multiprocessor/multicore architecture have been researched since 1985. The compiler has been progressed with LCPC. This talk includes
  • OSCAR multigrain parallelization compiler that hierarchically exploits coarse grain task parallelism, loop parallelism, and statement level parallelism,
  • global data locality optimization over coarse grain tasks for cache and local memory
  • automatic power reduction controlling frequency and voltage control, clock and power gating,
  • heterogeneous task scheduling with overlapping data transfers using DMA controllers
  • software coherence controls by OSCAR compiler
  • local memory automatic management with software-defined block and its replacement,
  • performance and power consumption of real applications including automobile engine control, cancer treatment, scientific applications and so on various multicore systems, such as Intel, ARM, IBM, Fujitsu, Renesas, Tilera and so on.
Hironori Kasahara is IEEE Computer Society (CS) 2018 President and 2017 President Elect and has served as a chair or member of 245 society and government committees, including the CS Board of Governors; Executive Committee; Planning Committee; chair of CS Multicore STC and CS Japan chapter; associate editor of IEEE Transactions on Computers; vice PC chair of the 1996 ENIAC 50th Anniversary International Conference on Supercomputing; general chair of LCPC 2012; PC member of SC, PACT, and ASPLOS; board member of IEEE Tokyo section; and member of the Earth Simulator committee.
He received a PhD in 1985 from Waseda University, Tokyo, joined its faculty in 1986, and has been a professor of computer science since 1997 and a director of the Advanced Multicore Research Institute since 2004. He was a visiting scholar at University of California, Berkeley, and the University of Illinois at Urbana–Champaign’s Center for Supercomputing R&D.
Kasahara received the CS Golden Core Member Award, IEEE Fellow, IFAC World Congress Young Author Prize, IPSJ Fellow and Sakai Special Research Award, and the Japanese Minister’s Science and Technology Prize. He led Japanese national projects on parallelizing compilers and embedded multicores, and has presented 214 papers, 139 invited talks, and 28 patents. His research on multicore architectures and software has appeared in 560 newspaper and Web articles.

David Kuck David Kuck (Intel)
BIO David J. Kuck, a graduate of the University of Michigan, was a professor in the Computer Science Department the University of Illinois at Urbana-Champaign. While at the University of Illinois at Urbana-Champaign he developed the Parafrase compiler system, which was the first testbed for the development of automatic vectorization and related program transformations. In his role as Director of the Center for Supercomputing Research and Development (CSRD-UIUC), Kuck led the construction of the CEDAR project, a hierarchical shared-memory 32-processor SMP supercomputer completed in 1988 at the University of Illinois. He founded Kuck and Associates (KAI) in 1979 to build a line of industry-standard optimizing compilers especially focused upon exploiting parallelism. Kuck is a fellow of the American Association for the Advancement of Science, the Association for Computing Machinery (ACM), and the Institute of Electrical and Electronics Engineers. He is also a member of the National Academy of Engineering. He has won the Eckert-Mauchly Award from ACM/IEEE and the IEEE Computer Society Charles Babbage Award.

Monica Lam
Thingtalk: a Distributed and Synthesizable Programming Language for Virtual Assistants
Monica Lam - Stanford University, USA
Virtual assistants, such as Alexa, Siri, and Google Home, are emerging as the super app that intermediates between users and their IoT devices and online services. As an intermediary, the virtual assistant sees all our personal data and has control over the services and vendors we use. A monopolistic virtual assistant would pose a great threat to personal privacy as well as open competition and innovation.
This talk presents Almond, an open-source project to create an open virtual assistant platform that protects user privacy. At the core of the project is ThingTalk, a domain-specific language which lets end users use natural language to describe sophisticated tasks involving an open world of skills. It also protects privacy by letting users share data and devices while keeping their credentials and data on their personal devices.
Dr. Monica Lam has been a Professor of Computer Science at Stanford University since 1988. She is the Faculty Director of the Stanford MobiSocial Computing Laboratory. She has worked in the areas of architecture, compiler optimization, software analysis to improve security, mobile and social computing. Her current research is to develop an open end-user programmable virtual assistant platform. She received a PhD in Computer Science from Carnegie Mellon University in 1987. Lam is an ACM Fellow, received an NSF Young Investigator award in 1992, and has won various best paper awards from the ACM. She is a co-author of the "dragon book", the most popular textbook in compilers.

Paul Petersen
When Small Things Cause Big Problems
Paul Petersen - Software and Services Group (SSG) at Intel
Effective performance optimization requires knowledge of the target application’s dynamic behavior. Measuring this behavior without excessive effort or substantially perturbing the applications execution has been a common request from developers. For many applications, we have today effective tools which can give you a good understanding of the dynamic behavior of an application at rather low cost. But this is only when certain assumptions are met. The typical assumption being that the application execution is sufficiently long relative to the sampling period, and that the behavior of the functions are relatively uniform without unexpected actions occurring on the system. But problems arise when you violate these assumptions. What if you care about small execution paths ( < < 100K instructions), but your sampling period is >100K instructions. What if you care about minimizing the cost of outliers more than reducing the average behavior of these short sequences? What if these outliers are not necessarily caused by the program itself, but by the interaction of the program with other things being managed by the OS? These problems can be solved with the help of hardware support for collecting fine-grain execution traces. In this talk we will walk through some simple examples illustrating these problems, and show what is possible with the instruction tracing features available on today’s systems.
Paul Petersen is a Sr. Principal Engineer in the Software and Services Group (SSG) at Intel leading a team in defining next generation features for parallel runtimes and software analysis tools. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in the early definition and implementations of OpenMP. While at KAI, he developed the Assure line of parallelization/correctness products, for Fortran, C++ and Java. In 2000, Intel Corporation acquired KAI, and he joined the software tools group creating the Thread Checker products, which evolved into the Inspector and Advisor components of the Intel® Parallel Studio. Inspector uses dynamic binary instrumentation to detect memory and concurrency bugs, and Advisor uses similar techniques along with performance measurement and modeling to assist developers in transforming existing serial applications to be ready for parallel execution.

Keshav Pingali Keshav Pingali (University of Texas)
Keshav Pingali is a Professor in the Department of Computer Science at the University of Texas at Austin, and he holds the W.A."Tex" Moncrief Chair of Computing in the Institute for Computational Engineering and Sciences (ICES) at UT Austin. He was on the faculty of the Department of Computer Science at Cornell University from 1986 to 2006, where he held the India Chair of Computer Science. Pingali is a Fellow of the ACM, IEEE and AAAS. He served as co-Editor-in-chief of the ACM Transactions on Programming Languages and Systems (TOPLAS) in 2007-2010, and on the NSF CISE Advisory Committee in 2009-2012.

Sanjay Rajopadhye Thirty Years of the Polyhedral Model: Let's Return to the Origins
Sanjay Rajopadhye, Colorado State University, USA.
Even after thirty years, the polyhedral model is far from being an unequivocal success, even on the restricted domain where it is applicable. Despite impressive recent progress, we do not (yet) have compilers for general-purpose processors that can generate code approaching either the machine peak, or the algorithmic peak of the source program, or even hand tuned libraries developed by “heroes of high performance computing.” I will try to explain why this is so by arguing that although the theory is elegant and beautiful, we have been solving the wrong problems. We are also targeting the wrong platforms. I will suggest a plan of how we can improve this state of affairs by targeting accelerators, building analytical models, and using discrete nonlinear optimization.
Sanjay Rajopadhye is currently a professor in the Computer Science (CS) and Electrical and Computer Engineering (ECE) departments at Colorado State University. He was educated at IIT Kharagpur (B.Tech, 1980), and the University of Utah (PhD, 1986), and is one of the inventors of the polyhedral model, a mathematical formalism for reasoning about massively parallel, regular computations. His Ph.D. dissertation made three key contributions — scheduling, locality, and closure — to the foundations of the model. It was developed to address the design of early era "hardware accelerators" called systolic arrays, and is now used for automatic parallelization of affine control loops in many productions compilers. His main research contributions revolve around the polyhedral model: code generation, tiling, memory reuse analyses, and recently energy optimization.

Saday Sadayappan On using data movement lower bounds to guide code optimization
P. (Saday) Sadayappan, Ohio State University, USA
The four-index integral transform is a computationally demanding calculation used in many quantum chemistry software suites like NWChem. It requires a sequence of four tensor contractions that each contract a four-dimensional tensor with a two-dimensional transformation matrix. Loop fusion and tiling can be used to reduce the total space requirement, as well as data movement within and across nodes of a parallel supercomputer. However, the large number of possible choices for loop fusion and tiling, and data/computation distribution across a parallel system, make it challenging to develop an optimized parallel implementation. Lower bounds on data movement as a function of available aggregate physical memory in a parallel computer system are used to identificy and prune ineffective fusion configurations. This enables a characterization of optimality criteria and the development of an improved parallel implementation of the four-index transform - with higher performance and the ability to model larger electronic systems than feasible with previously available implementations in NWChem.
P. (Saday) Sadayappan is a Professor of Computer Science and Engineering at The Ohio State University. His research interests include compiler optimization for heterogeneous systems, domain/pattern-specific compiler optimization, characterization of data movement complexity of algorithms, and data-structure centric performance optimization.

Xipeng Shen Exerting the Hidden Power of Compilers for Modern Computing - Experiences on Generalizing Redundancy Removal
Xipeng Shen, North Carolina State University, USA
Born soon after the advent of the first computer, as one of the oldest branches in Computer Science, Compiler Technology is often regarded as a mature field. However, recent observations led Dr. Shen and his group to believe that some dramatic, hidden power of compilers has remained untapped, especially for modern computing. When the power gets exerted, computing efficiency may improve by up to hundreds of times, and even automatic algorithm derivations become possible. In this talk, Dr. Shen will discuss the findings by drawing on their recent experiences in generalizing redundancy removal into a large scope and a high level. (The talk is based on his publications at PLDI'2017, OOPSLA'2017, ICDE'2017, VLDB'2015, ICML'2015.)
Xipeng Shen is an associate professor at the Computer Science Department, North Carolina State University (NCSU). He is a receipt of the 2010 NSF CAREER Award, 2011 DOE Early Career Award, and 2015 Google Faculty Research Award. Before joining NCSU, he was an Adina Allen Term Distinguished Associate Professor at the College of William and Mary. He has served as a consultant for Intel and Cisco, an IBM Canada CAS Research Faculty Fellow, and a visiting scientist at Microsoft Research and MIT.

Armando Solar-Lezama The route to automation
Armando Solar-Lezama, MIT, USA
Traditionally, there has been a trade-off between the level of abstraction afforded by a language and the performance one can expect from the resulting code. In this talk, I will describe how a new class of techniques based on program synthesis could help introduce more automation into high-performance programming tasks. The goal is to help to reduce programmer effort without sacrificing performance.
Armando Solar-Lezama is an associate professor at MIT where he leads the Computer Aided Programming Group. His research interests include software synthesis and its applications to areas ranging from end-user-programming to high-performance computing. He has a PhD from UC Berkeley and BS in Computer Science and Math from Texas A&M University.

Yan Solihin Hiding the High Overheads of Persistent Memory
Yan Solihin, NSF/NCSU, USA
Byte-addressable non-volatile memory technology is emerging as an alternative for DRAM for main memory. This new Non-Volatile Main Memory (NVMM) allows programmers to store important data in data structures in memory instead of serializing it to the file system, thereby providing a substantial performance boost. However, computer systems reorder memory operations and utilize volatile caches for better performance, making it difficult to ensure a consistent state in NVMM. Intel recently announced a new set of persistence instructions, clflushopt, clwb, and pcommit. These new instructions make it possible to implement fail-safe code on NVMM, but few workloads have been written or characterized using these new instructions.
In this talk, I will discuss a new logging approach for durable transactions that achieves the favorable characteristics of both prior software and hardware approaches. Like software, it has no hardware constraint limiting the number of transactions or logs available to it, and like hardware, it has very low overhead. Our approach introduces two new instructions: one that indicates whether a load instruction should create a log entry, and a log flush instruction to make a copy of a cache line in the log. We add hardware support, primarily within the core, to manage the execution of these instructions and critical ordering requirements between logging operations and updates to data. We also propose a novel optimization at the memory controller that is enabled by a battery backed write pending queue in the memory controller. Our experiments show that our technique improves performance by 1.48×, on average, compared to a system without hardware logging and 10.5% faster than ATOM. A significant advantage of our approach is dropping writes to the log when they are not needed.
Yan Solihin is a Program Director at the Division of Computer and Network Systems (CNS) at the National Science Foundation. His responsibilities include managing the Computer Systems Research (CSR) cluster, Scalable Parallelism in the eXtreme (SPX), and Secure and Trustworthy Cyberspace (SaTC), among others. He is also a Professor of Electrical and Computer Engineering at North Carolina State University. Yan Solihin is also a Professor at the Department of Electrical and Computer Engineering (ECE) at NCSU.
He obtained his B.S. degree in computer science from Institut Teknologi Bandung in 1995, B.S. degree in Mathematics from Universitas Terbuka Indonesia in 1995, M.A.Sc degree in computer engineering from Nanyang Technological University in 1997, and M.S. and Ph.D. degrees in computer science from the University of Illinois at Urbana-Champaign in 1999 and 2002. He is a recipient of 2010 and 2005 IBM Faculty Partnership Award, 2004 NSF Faculty Early Career Award, and 1997 AT&T Leadership Award. He is listed in the HPCA Hall of Fame. He is a senior member of the IEEE. His research interests include computer architecture, memory hierarchy design, non-volatile memory architecture, programming models, and workload cloning. He has published more than 50 papers in computer architecture and performance modeling. He has released several software packages to the public: ACAPP - a cache performance model toolset, HeapServer - a secure heap management library, Scaltool - parallel program scalability pinpointer, and Fodex - a forensic document examination toolset. He has written two graduate-level textbooks, including Fundamentals of Parallel Multicore Architecture, CRC Press, 2015.

2017 - Texas A&M University - Parasol Laboratory