SpikeFun v0.67 comes with built-in benchmark tool (SpikeBench) which can be useful to assess CPU and (especially) memory subsystem performance of a desktop, workstation or server PC. Simulation of large scale biologically-inspired spiking neural networks is extremely tough on system memory and CPU, so it can serve as an useful benchmarking tool.
To aid benchmarking and extend the results to memory I/O, SpikeFun now also supports access to the PMU hardware registers found in newer Intel(R) CPUs such as CPUs with microarchitecture codenames 'Nehalem / Nehalem EP / Nehalem EX', 'Westmere / Westmere EP / Westmere EX' and 'Sandy Bridge / Sandy Bridge EP - Jaketown'.
By using PMU registers it is possible to directly measure the read and write memory bandwidth, as reported by the integrated memory controller (IMC) located in the so-called 'uncore' part of the CPU package. Additional useful information can be obtained from the cores themselves such as energy consumption, IPC, instructions retired, etc. If your CPU is having performance monitoring unit, and access to it is enabled, SpikeFun will also display and log this information during the benchmark. For NUMA systems, such as the test system I use, SpikeFun will display information for each CPU package. In the next versions I will also add ability to log QPI traffic (I have issues testing this as ASUS Z9PE-D8 WS motherboard BIOS does not configure QPI LL counter).
Please note that not all recent Intel CPUs support memory bandwidth measurements. Typically, this feature is supported by workstation and server-class processors (such as those that fit in LGA1366, LGA1567 and LGA 2011 sockets) and it is not present in desktop-class processors (such as those that fit in LGA 1155 / 1156 sockets).
Performance monitoring on Intel systems is done by using Intel's Performance Counter Monitor (PCM) library - which can be downloaded in source code form from here: http://software.intel.com/en-us/articles/intel-performance-counter-monitor/ (for licensing/copyright info please refer to benchmark.pdf document in SpikeFun download package)
So, let's see how the small benchmark looks on the reference Intel Xeon E5 2687 dual-CPU system with faster than officially supported DDR3 RAM speed (2133 MHz):
Looking at the picture above, some interesting observations can be made:
Interestingly, this exercise also shows how complex are even basic biological simulations - for a small network of 32768 neurons and 1.8 million synapses (with 2 receptors each - AMPA/NMDA or GABAa/GABAb) we need approximately ~130 GiB/s of memory bandwidth in order to achieve real-time performance!
Think about it - typical cat has approx. 300 million neurons in the cortex with trillions of synapses! And to model those accurately we might (probably) need more complex models than one currently implemented in SpikeFun... now this would be a very expensive hardware purchase, which is currently quite above my budget.
I don't even want to mention the number of neurons and synapses in human brain... hardware to simulate that in real-time would probably drain the budgets of G20 economies put together and you'd be still off by a large margin... It is pretty obvious that going with the Von Neumann architecture (that is, practically all computers made today) this challenge would be a rather expensive option - due to the inefficiency caused by the bus bottleneck.