On Benchmarking of Embedded Systems Processors

Benchmarking is the process of comparing two or more systems to determine which is more efficient and/or provides better performance. Clearly, for a benchmark to have any value, you need to understand what is actually being measured and under what conditions.
There are valid reasons for benchmarking embedded systems, the least of which is that benchmarking is in itself interesting. When it comes to computing systems, most benchmarking is done to measure CPU performance and memory consumption. In this post, I describe the process and some of the benchmarks used at A-WIT to choose which embedded processor platform would be optimum for our clients’ products.
These are 11 common tests, which have been referred to as “micro benchmarks”. This means they perform elementary operations that every microcontroller and programming language for any given project should offer. These tests employ commonly used algorithms and data structures that you find in most microcontroller projects today.
These benchmarks are straightforward and have just enough code to be meaningful and allow you to derive useful conclusions. Normally, we express the results in two graphs. The first graph is for both program and data memory allocation for each benchmark, in Bytes along a logarithmic Y scale. The second one is the performance results, which shows how long the benchmark took to run, in seconds in a logarithmic Y scale. The run time is measure on actual hardware and generating a signal while the benchmark was running. Then this signal is measured with an oscilloscope or an event counter. It is important to note that, in contrast to ordinary benchmarks, a lower value in these tests is always better because it is the memory consumed or measured time to perform an operation.
These are each of the benchmarks:
- 16-bit integer arithmetic test that simply does the fundamental arithmetic operations with 16-bit integer variables in a loop
- 32-bit integer arithmetic test that is the same as the 16-bit arithmetic test, but uses 32-bit integer variables instead
- Trigonometric functions test performing the sin, cos, tan, and sqrt functions in a loop, performed with floating-point arithmetic
- I/O test that performs a write/read of 18 bytes to/from a communication port
- Array test that measures the fundamental array write and access functionality
- Exception handling test that creates 2×16 interrupts, which are interleaved
- Sort test with a simple and well-known sorting algorithm
- Test of three floating-point 8×8 matrices that are used to perform a simple matrix multiply operation
- Test of six loops that are nested into each other to perform some 16-bit integer add operations
- Test of string concatenation performance
- Test of performance of function calls with six parameters
Understanding the application of benchmarking plays a central role when choosing which embedded system computing platform is the best fit for a particular product.