****************************** Solution E ****************************** I. Benchmark Results Below are the results from comparing the performance of a Linux lab computer bill, and two servers grendel and ozark: Tprime,lab = 6.12 Tgauss,lab = 63.60 Tprime,grendel = 0.00 Tgauss,grendel = 376.37 Tprime,ozark = 0.00 Tgauss,ozark = 305.41 Tprime,grendel / Tprime,lab = 0.00 Tprime,ozark / Tprime,lab = 0.00 Tgauss,grendel / Tgauss,lab = 5.92 Tgauss,ozark / Tgauss,lab = 4.80 II. Method and Analysis All of these results were obtained by the Linux command time, and are the user time results of running each program on the machine. Below is an example verbose output for the Gaussian run on grendel (time -v ./a.out): Command being timed: "./a.out" User time (seconds): 376.37 System time (seconds): 0.34 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 6:17.56 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 0 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 139 Voluntary context switches: 2 Involuntary context switches: 9034 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 The user time was chosen rather than the wallclock time, as there were significant differences in the realtime and usertime. Since each of the benchmarks were computationally heavy, it is significant to note that this difference could be due to memory allocations to carry out the computations or other kernel interaction. However, without analyzing what system calls were made, it is hard to determine if the difference was due to memory allocations or traditional overhead. To research this, and also determine if the high number of involuntary context switches played a role, the Gaussian test was rerun with the command strace on grendel to determine what actions the kernel took on behalf of the program. Below is that output: $ strace ./a.out execve("./a.out", ["./a.out"], [/* 22 vars */]) = 0 uname({sys="Linux", node="grendel.cs.hendrix.edu", ...}) = 0 brk(0) = 0x501000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a95556000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=128416, ...}) = 0 mmap(NULL, 128416, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2a95557000 close(3) = 0 open("/lib64/tls/libm.so.6", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360=0\3757\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=643240, ...}) = 0 mmap(0x37fd300000, 1593800, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x37fd300000 mprotect(0x37fd385000, 1049032, PROT_NONE) = 0 mmap(0x37fd484000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x84000) = 0x37fd484000 close(3) = 0 open("/lib64/tls/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\304\341\3747\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1635416, ...}) = 0 mmap(0x37fce00000, 2326568, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x37fce00000 mprotect(0x37fcf2f000, 1085480, PROT_NONE) = 0 mmap(0x37fd02f000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12f000) = 0x37fd02f000 mmap(0x37fd034000, 16424, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x37fd034000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a95577000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a95578000 mprotect(0x37fd02f000, 12288, PROT_READ) = 0 mprotect(0x37fd484000, 4096, PROT_READ) = 0 mprotect(0x37fcd14000, 4096, PROT_READ) = 0 arch_prctl(ARCH_SET_FS, 0x2a95577dc0) = 0 munmap(0x2a95557000, 128416) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a95557000 write(1, "Gaussian with n = 1920863359: 0."..., 39Gaussian with n = 1920863359: 0.000000 ) = 39 exit_group(0) = ? This output from the strace indicates that much of the system time was spent doing memory mapping (mmap, mprotect) in order to execute the file. This further indicates that the user time, or the actual time spent computing the results, is a more accurate measurement. However, while using the time command to determine performance for the floating point computation was insightful, as the results above show, it did not shed much light on the performance with the prime benchmarks, other than indicate that the integer computation in the Linux lab is comparatively slow. III. Appendix Source code: i. runBenchmarks.pl #!/usr/bin/perl #very simple perl script to run the two benchmarks locally use Cwd; my $path = cwd; my $primesFile = "$path/primesBenchmark.c"; my $gaussFile = "$path/gaussianBenchmark.c"; system "gcc $gaussFile -lm"; system "time -v ./a.out"; system "gcc $primesFile -lm"; system "time -v ./a.out"; --------------------------------------- ii. primesBenchmark.c #include #include #include int main () { srand(time(0)); int n = rand(); int d = 2; while(d^2 < n){ if(!(n % d)){ printf("Primes with n = %d: False\n", n); return 0; } d++; } printf("Primes with n = %d: True\n", n); return 0; } --------------------------------------- iii. gaussianBenchmark.c #include #include #define _USE_MATH_DEFINES #include double f(double arg){ double power = -1 * pow(arg,2) / 2.0; return pow(M_E,power); } int main () { srand(time(0)); int n = rand(); double sum = f(-2) + f(2); double delta = 4 / n; int i = 1; for(i; i < n; i++){ sum += 3 * f(-2 + i * delta); } double ret = delta * sum / 3; printf("Gaussian with n = %d: %f\n", n, ret); return 0; }