Tech news
at TheJemReport.com
Software reviews
at SoftwareinReview.com
Hardware reviews
at HardwareinReview.com
Discuss technology
at TJRForum.com

December 9, 2004

Benchmarking With FreeBSD

Filed under: Articles — @ 7:05 pm

Recently someone on the FreeBSD-Current mailing list (Poul-Henning Kamp) suggested some tips for precision benchmarking in FreeBSD. This was a veritable gold mine for me as I was gearing up to conduct a salvo of benchmarks on three different systems to compare performance in a variety of areas, but during my testing I found that most of these tips were useless or ill-advised for testing hardware. It occurred to me too late that the author of the post probably had software benchmarking in mind when he wrote it, and unfortunately that has little to do with performance testing of hardware. So, in the spirit of Bruce Lee, I took what was useful and worked from there; I read over the message thread, took into account all of the suggestions made by various people on the list and adapted them for my hardware benchmarking project.


Why FreeBSD?

Before I continue, I’d like to elaborate on why I chose FreeBSD as a benchmarking platform. The original reason was that it supports both the AMD64 and IA32 (i386) architectures, and the purpose of the benchmarking project was to compare performance between an Athlon 64 machine in both i386 and AMD64 modes. I also wanted to compare these two setups with a Pentium4 3.2E system to discover if Hyper-Threading or 64-bit extensions were more important to computing power. Microsoft operating systems available at the time of the project were not able to run in AMD64 mode, and even if they were, there was no 64-bit capable benchmarking software to use on a Windows platform. So the first goal was to find an OS that could use these two machines in the required modes, and the second goal was to find relevant benchmarking methods that could show the performance difference between the configurations. GNU/Linux was an option (specifically Gentoo Linux), but it wasn’t mature enough at the time of testing and it didn’t offer much to me in the way of benchmarking. NetBSD was also a consideration because it supports so many architectures and has been working with AMD64 longer than most other OSes. This was particularly attractive to me because I could also benchmark machines that were based on the SPARC, POWER, and MIPS architectures and compare them all. This would have worked except for the fact that NetBSD didn’t have an official release for AMD64 when I was ready to test, so I’d have to have used experimental code. I also would have trouble getting the same exact code onto each machine because it changes so quickly. FreeBSD already had an AMD64 release (two, actually) and it worked terrifically for my purposes. When I started testing I was using 5.2-RELEASE, but switched to and retested with 5.2.1-RELEASE when it became available. FreeBSD was perfect because I could use the actual release (guaranteeing the same age and quality of the code for both AMD64 and i386), and the ports tree had a number of excellent benchmark tests to choose from.

The FreeBSD base system comes with OpenSSL, which offers an excellent benchmarking mode. It also includes the old Unix time command, which is essential for stopwatch tests. So, all things considered, FreeBSD was the best operating system for the project.

Take What’s Useful

Overall I found the stated benchmarking guidelines to be overzealous, warning the tester about possibilities based on theories. Many of those tips you can ignore for hardware benchmarking as they’re completely superfluous or just plain wrong. In hardware benchmarking all you need to do with the operating system is provide a stable (context: uniform, consistent) platform to run your tests on. There must be no background noise and everything has to be the same every time you run a test. That means that you’ll want to start FreeBSD in single-user mode to avoid all of the background processes that start automatically through rc.conf and cron. Your network interface will not be up and your operating system will remain idle until you tell it to do something.

One thing you do not want to do (contrary to the post referenced above) is disable ACPI support in BIOS. In FreeBSD 5.2 and 5.2.1, the ACPI code is tied to other subsystems like SMPng and ATAng. I tried several different times with two different systems (one AMD64, the other a Pentium4 Prescott) to disable ACPI in BIOS and/or in FreeBSD and the results were less than desirable. In most cases my system was unable to boot because of a kernel trap error or problems (as usual) with the ATA subsystem. In other cases the system would crash while in use, sometimes resulting in data loss (in one case, severe data loss). If your BIOS is set to deactivate or reduce power to devices that are idle, this should not really affect your testing procedure as you’ll be restarting after every test and running your benchmarks as soon as the prompt is up. Just in case, you can safely disable any wake or sleep functions in the BIOS, but you’ll want to leave ACPI and APIC support enabled if you have these options available. You don’t want APM, which is the older power management system; APM can be safely disabled without any adverse effects.

There is no harm in mounting all of the usual filesystems unless some are on other disks. You really want to use one hard drive for your testing (unless you’re testing hard drive performance, in which case you might have different kinds of configurations depending on your method of testing) and you want all of your filesystems to be on it. Spinning up multiple hard disks can cause power fluctuations and extra I/O traffic that can alter your results. When you install FreeBSD on the drive, use the default partition configuration and when you’re ready to boot into single-user mode, mount all filesystems before running your tests.

There is no need to disable, disconnect or remove unnecessary hardware as long as it does not interfere with your testing. If you have a CDRW drive in your test system but will not be using it for testing, leave it in the system. Just don’t push the eject button at any time before or during testing as this causes the system to address the drive and search for a CD. Testing without a CD drive, if it did affect the results, would affect the results for all systems as they’ll all have (ideally) the same CD drive. And if you leave the CD drive out, it could skew the results in a way that makes them irrelevant to a real-world scenario. A CD drive that is not in use is not a variable.

Don’t touch the keyboard or mouse, and don’t interfere with the system while the tests are running. Although this generally will not skew the results to a reasonable degree, there is certainly no reason to interfere with the results.

There is something to be said for keeping the room temperature consistent, but to a certain degree (no pun intended) that’s beyond your control. You don’t want it to be unusually hot or cold — just normal room temperature throughout. This is especially important with high-temperature CPUs like those based on Intel’s Prescott core.

Restart the machine after every test. This ensures that your hard drive and CPU caches aren’t going to interfere with your data. This is particularly important when running synthetic benchmarks.

Configuring FreeBSD For Testing

There are only two main files that you’ll need to edit: /etc/make.conf and your kernel configuration file. You’ll probably need to copy over a make.conf file from /usr/share/examples/etc/ and while you’re at it, you’ll probably need a supfile or two for cvsup. Those can be found in /usr/share/examples/cvsup/ and all you really need to change is the host name for the server — cvsup1 or cvsup2 should be fine.

In configuring your kernel, disable all debugging options (like WITNESS and INVARIANTS, for instance) and drivers for hardware you don’t have. If you’re testing multiple machines with the same installation of FreeBSD, there is no need to recompile the kernel as long as you have the correct device driver support in for all of the devices you’ll need. At first you may need networking to update your ports tree or source tree and install programs, so don’t compile it out — remember that in single-user mode there are no processes running and the network will not be enabled, so it doesn’t matter if you have your network driver compiled in or not. You do want to try to use the same kernel (or kernel config) for all machines, if possible.

There are two process schedulers in FreeBSD: the older SCHED_4BSD and the newer SCHED_ULE. Experienced FreeBSD hackers have told me that on fast computers with high-end hard drives (SATA or SCSI-320 RAID arrays) the ULE scheduler is a better performer. I’ve found in my testing that the 4BSD scheduler seems to do better when multitasking or otherwise running multiple concurrent processes, but you should do your own testing to determine which one to use for your project. Ideally you would do all of your testing with both schedulers to satisfy those who are partial to one or the other.

Recommended Tests

If you’re testing a certain piece of hardware like a video card or hard drive, performance testing isn’t all that difficult to figure out on your own. The /usr/ports/benchmarks/ directory has some synthetic tests that you can experiment with, and from there you can decide what is most useful for your project. You should also visit the SPEC website to see if their standardized tests fit your situation. If you’re testing overall system performance versus another system, the testing methods aren’t quite as straightforward. You’ll need to consider several different angles to show how they differ in terms of performance.

Most of your tests are going to involve timing of certain operations such as compiling programs with a standard compiler, encoding audio or video, or in synthetic benchmark utilities which will measure how many certain types of operations can be performed in a given timeframe. Other benchmark methods will show how much data can be transferred, processed or rendered using a given configuration.

To test how long an operation will take, use the time command before it. If your systems are on the speedy side, compiling the base system is an excellent way to show how long a lengthy compile will take for each of them. If they’re the same architecture they’ll use the same source code and the same compiler, and the test takes anywhere from 30 minutes to an hour depending on the speed of your test systems. Simply cd to the /usr/src/ directory and type time make buildworld and when it’s done compiling it’ll show three numbers. The first is the number of seconds the operation took to complete; the second number is the time the buildworld took to execute, and the third number is the number of seconds consumed by system overhead. The important number is the first one, which is what you’ll be comparing between systems. Buildworld tests between different architectures can be somewhat inaccurate as you’re also testing the speed of the compiler, which will compile code differently for different processor architectures.

If you end up using the buildworld test, make sure you type make clean after each iteration to erase the programs generated by the buildworld process you just performed.

It may also be useful to perform the buildworld procedure with varying numbers of concurrent processes to test multitasking and SMP performance (and to stress the CPU a little more than usual). This is accomplished by using the -j switch. If you want to test with two concurrent processes, you’d type in time make buildworld -j2 and so on. In most single or dual-CPU systems you won’t need or want to go any higher than -j4 as the build can fail or cease to produce results that differ significantly from the previous numbers.

There are a lot of synthetic benchmarks in /usr/ports/benchmarks/ that are worth looking at, even if most of them may be either misleading or useless to your project. I used ubench and stream for my performance testing. Ubench is a very old and well-known test that produces numeric scores for the CPU and RAM, although I found that the memory test portion would exit on a signal 6 or 11 on more powerful x86 and AMD64 systems, and it provided obviously incorrect results for the AMD64 architecture. Stream is also an older test that measures memory bandwidth, an important and often overlooked facet of system performance. If you don’t configure it properly it will provide inconsistent or incorrect results, and the project website may be inaccessible to some people due to problems with the web server where it is located. I found the stream results to be off by about 1000 MB/sec for all test cases. It may be a good idea to leave out synthetic benchmarking entirely unless you can get stream to work correctly or discover other tests that you have a good reason to believe are effective. Relying entirely on synthetic tests is a very big mistake.

Stream required some adjustment in order to produce more accurate results for my project. Since the CPUs in question have large caches, the size of the test array had to be increased. This is done from the /usr/ports/benchmarks/stream/files/Makefile file, and the variable to edit is N. The above-referenced stream website, if it accessible to you (for some reason it won’t accept a connection from my machine) may include documentation that you can follow in order to configure the program for your needs.

You might also include some real-world benchmarks in your testing. OpenSSL is part of the base system and has a built-in benchmark command. To run it, type openssl speed although you may want to pipe the results to a text file because they’re quite extensive: openssl speed >results.txt is the command I used for benchmarking. OpenSSL is particularly affected by CPU-specific optimizations, so it’s an excellent test to use when comparing performance between separate architectures.

Some of the best real-world tests are encoding utilities for audio and video. Some good ones to use for audio are oggenc (part of the vorbis-tools port), lame and bladeenc. For video encoding you might consider using mencoder (which is part of the mplayer port), ffmpeg, and transcoder.

Testing Procedure

It’s important to develop a standard procedure which ensures consistently fair results. As mentioned above, you’ll want to start in single-user mode and you should only do one test iteration per reboot.

Ideally you’ll want to run each test a minimum of three times to ensure consistent results. If you find that your results are varying greatly, you have a problem someplace and you should discard your results and cease testing until the problem is found and fixed. If all goes well, average your results and use that final number as your basis for comparison. If you’re only comparing two test cases, it might be helpful to create statistics with the numbers you’ve gathered. The ministat program, found in /usr/src/tools/tools/ministat/ is an outstanding program for this purpose. It requires at least three numbers and two test cases; you put the three numbers for the first case into one file and the three numbers from the second case into another file, and when you feed them to ministat it will give you some useful statistics based on your results.

It’s not always a good idea to keep everything in text files. Whenever possible you should record your results in a notebook (not on loose paper, which has a tendency to get lost or damaged). While this will make the transfer to your article or report more time-consuming, your data will be safer. Some tests — like buildworld — take a long time to complete, and a whole data set can take hours or days to acquire. It would really be a shame to lose all of that data to carelessness or error.

I like to start my testing after the machine has been on for at least a half an hour so that it is up to normal operating temperature. Differences in temperature can show in your results and cause problems with inconsistencies in your data. You can also carefully preheat your CPU’s heatsink with a hair dryer (make sure the power is off first) if you like, but just leaving the system on or doing an initial test run to get it warmed up will probably work better (although you should remember to discard the results if you do decide to do a test run before your real testing).

Conclusion

All of the programs you use will have to be standardized and configured accordingly, and to that end you may have to run around to various websites and read all manner of documentation to gain an understanding of how these utilities are used. It’s important to have patience as you figure out the right ingredients for the mix. I had to spend days testing different methods and comparing results before I found a procedure that worked well for the results that I wanted to obtain — and then when I was done with the project, I found that I could have added more tests that would have made the project more relevant.

One of the most important steps in a benchmarking project is to make the results public for peer review. If you’re going to be publishing your project as an article, review or report, this last step is especially important. You should find a newsgroup, message forum or mailing list relevant to your project and publish your initial findings there, asking for comments or corrections. Even a well-researched and carefully constructed benchmarking project can have flaws that you don’t know about or can’t see, and others who have experience are invaluable resources for last-minute gotchas such as those.

After your project is published you should make your raw data available to others so that they can use it as a basis for comparison for their own projects, or for programming or debugging purposes.

A benchmarking project is a huge undertaking, and FreeBSD is well-equipped to help you with your work. If you know how to use it properly, it can be one of the best operating systems to use for hardware and software performance benchmarking.

Discuss this article or get technical support on our forum.

Copyright 2004 Jem Matzan. Verbatim copying and redistribution of this entire article are permitted without royalty in any medium provided this notice is preserved.

Comments (0)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

| Contact Us | About Us | RSS FAQ |
Copyright 2008. All content items belong to their respective authors.