ARMI is a communication library that provides a framework for expressing
fine-grain parallelism and mapping it to a particular machine using
shared-memory and message passing library calls. The library is an advanced
implementation of the RMI protocol and handles low-level details such as
scheduling incoming communication and aggregating outgoing communication to
coarsen parallelism. These details can be tuned for different
platforms to allow user codes to achieve the highest performance possible
without manual modification. ARMI is used by STAPL, our generic parallel
library, to provide a portable, user transparent communication layer. We
present the basic design as well as the mechanisms used in the current
Pthreads/OpenMP, MPI implementations and/or a combination thereof.
Performance comparisons between ARMI and explicit use of Pthreads or MPI are
given on a variety of machines, including an HP-V2200, Origin 3800, IBM
Regatta and IBM RS/6000 SP cluster.