Algorithmic skeletons are high-level representations for parallel programs that hide the underlying parallelism details from program specification. These skeletons are defined in terms of higher-order functions that can be composed to build larger programs. Many skeleton frameworks support efficient implementations for stand-alone skeletons such as
map, reduce, and zip for both shared-memory systems and small clusters. However, in these frameworks, expressing complex skeletons that are constructed through composition
of fundamental skeletons either requires complete reimplementation or suffers from limited scalability due to required global synchronization. In the stapl Skeleton Framework,
we represent skeletons as parametric data flow graphs and describe composition of skeletons by point-to-point dependencies of their data flow graph representations. As a result,
we eliminate the need for reimplementation and global synchronizations in composed skeletons. In this work, we describe the process of translating skeleton-based programs to
data flow graphs and define rules for skeleton composition. To show the expressivity and ease of use of our framework, we show skeleton-based representations of the NAS EP, IS,
and FT benchmarks. To show reusability and applicability of our framework on real-world applications we show an NBody application using the FMM (Fast Multipole Method) hierarchical algorithm. Our results show that expressivity can be achieved without loss of performance even in complex real-world applications.