Micro-benchmark suite to characterize the overhead in Java message-passing libraries. Codes availables for MPICH, mpiJava, CCJ and JMPI.
In order to characterize Java message-passing performance, we have followed the same approach as in  and , where the performance of MPI C routines was modeled on a Fast Ethernet cluster (only MPI-I/O primitives) and on the Fujitsu AP3000 multicomputer, respectively.
Thus, in point-to-point communications, message latency (T) can be modeled as an affine function of the message length n: T(n)=ts+tbn, where ts is the startup time, and tb is the transfer time per data unit (one byte from now on). Communication bandwidth is easily derived as Bw(n)=n/T(n). A generalization of the point-to-point model is used to characterize collective communications: T(n,p)=ts(p)+tb(p)n, where p is the number of processors involved in the communication.
The Low Level Operations section of the Java Grande Forum Benchmark Suite is not appropriate for our modeling purposes (eg, it only considers seven primitives and timing outliers are not discarded). We have thus developed our own microbenchmark suite which consists of a set of tests adapted to our specific needs. Regarding point-to-point primitives, a ping-pong test takes 150 measurements of the execution time varying the message size in powers of four from 0 bytes to 1 MB. We have chosen as test time the sextile value of the increasingly ordered measurements to avoid distortions due to timing outliers. Moreover, we have checked that the use of this value is statistically better than the mean or the median to derive our models. As the millisecond timing precision in Java is not enough for measuring short message latencies, in these cases we have gathered several executions to achieve higher precision. The parameters ts and tb were derived from a linear regression of T vs n. Similar tests were applied to collective primitives, but also varying the number of processors (from 2 up to the number of available processors in our clusters). A barrier was included to avoid a pipelined effect and to prevent the network contention that might appear by the overlap of collective communications executed on different iterations of the test. The parameters of the model were obtained from the regression of T vs n and p. Double precision addition was the operation used in the experiments with reduction primitives.