Initial MASS Timing Tests with Portland Group Fortran Compilers on a Linux/Dual-Pentium System
A series of timing tests were made on a dual Pentium system as an initial test of the performance using the family of Portland Group Fortran compilers. A set of 36 hr MASS simulations on a 135x85x20 grid with 44 km horizontal resolution were performed. The runs were initialized at 0000 UTC 17 December 1998, with an objective analysis of rawinsonde data with NCEP Eta gridded data as a first guess. The same initial and boundary conditions were used for each of the simulations. The 5.11.3 version of MASS with the Blackadar PBL and the diagnostic microphysics schemes were used.
The system used was sirocco.meso.ncsu.edu, which has two 350 MHz Pentium II processors. It has Windows 98 installed as one partition and Red Hat Linux 5.2 as another. Fortran 77 (pgf77) and Fortran 90 (pgf90) compilers from the Portland Group, Inc. were used. Tests with High Performance Fortran (pghpf) were attempted. Although the compilation was successful, any simulations with pghpf failed immediately for unknown reasons, so testing was deferred for now.

Figure 1. Performance tests of MASS model on dual Pentium Linux system.
The compilers have a large set of optimization options which are described in the documentation. Figure 1 shows the performance of the various MASS simulations. The method of comparison is to calculate the ratio of the simulated time to the clock time required for the simulation. For instance, the fastest run took 1 hour and 56 minutes for the 36 hr simulation, giving a ratio of 2160 min/116 min = 18.6. The use of the compilers automatic parallelization of the code and the second processor increased the efficiency by about 16%, so either the code doesnt parallelize very well, or the compiler is not very effective for this code. The pgf90 vectorization run is only very slightly faster than a non-vectorized run, which probably just shows that the Pentium processor is not a vector processor. The pgf77 runs were a little bit faster than the pgf90 runs. Below is a table of the compiler options used for each of the simulations shown in Figure 1.
Table 1. Compiler options used for Portland Group compiler tests.
|
Description |
Compiler Command |
Notes |
|
pgf77 parallelization on 2 processors |
pgf77 -Mconcur |
|
|
pgf90 parallelization on 2 processors |
pgf90 -Mconcur |
Environmental variable must be set: setenv NCPUS 2 |
|
pgf77 recommended optimization |
pgf77 O2 Munroll tp p6 -Mnoframe |
|
|
pgf90 vectorization |
pgf90 -Mvect |
|
|
pgf90 recommended optimization |
pgf90 O2 Munroll tp p6 -Mnoframe |
|
|
pgf90 no optimization |
pgf90 |
|
Included with the Portland Group compilers is a debugger and code profiler. Fig. 2 shows a sample of the profiler output, which produces a simple bar chart as an X Windows application. To use the profiler, the code is compiled with "Mprof=func" on the command line, then when the program is run a pgprof.out file is produced. The user then executes pgprof in the same directory to view the results. Optionally, a tabular text file can be written. The subroutines are sorted with the ones using the most time at the top.

Figure 2. Sample output of Portland Group code profiler.
Ken Waight
13 January, 1999