Intel Parallel Studio XE 2017


In April, 1965 Gordon E. Moore published, in Electronics magazine, that power advances in computers would double every year. This became known, colloquially, as "Moore's Law". For most of the next 3 and a half decades the gains held mostly true. However, with the physical limitations in reducing transitor size on a chip, it's starting to get more difficult to realize the gains enjoyed in the past.

In an article, published in the Independent, Brian Krzanich, CEO of Intel, was quoted as saying, when asked if Moore's Law was dead, "... '[he was] not sure' whether the delay [in producing new chips] would turn out to be a temporary phenomenon or heralded a new era of slower advances in computing technology."

So what's a performance hungry low-level programmer using C++ to do?

The natural solution is to scale with added cores or by adding more efficient parallelization primitives. The release of the C++ 2011 ISO Standard, ISO/IEC14882, saw the addition of native thread support adding cross-platform native threading to the core standard library. This was a step toward the new concurrent world that languages like Golang, the D programming language, and later Rust would make as primary language design concerns. However, this only provided process-level concurrency and did not provide true parallelization in shared muti-process architectures.

The C++ 2017 standard goes further by overloading various algorithms to provide parallelization, or concurrent scaling, using newly introduced execution policies, notably:

The C++ language continues to recognize the importance within the developer community to write programs that can take advantage of the extended cores of todays hardware. But this too has limitations because a language is designed as a higher level abstraction and leaves knowledge of the underlying hardware up to compiler vendors and the optimizers they create. It may have access to the number of hardware cores it can use (std::thread::hardware_concurrency), the platform-dependent byte size and sizes that it can map various language primitive types to, but it cannot interact in any intimate way with the underlying CPU and various cores. Therefore, in order to provide a much deeper and richer experience, to the underlying hardware, you require additional libraries.

This leaves gaps in developers' toolbelts. For instance, the ability to profile, debug, and/or troubleshoot applications written at such a low-level requires special developer expertise and, possibly, the creation of extra tooling, usually written by the developer herself. This can be costly to an organization and error-prone as it requires the developers' expertise to translate the tools data into information that is useful, which is often not consistently interpreted under differing contexts given the varied assumptions developers can make.

What if you could actually use an IDE and compiler written by the chipset maker themselves? One designed with the parallel processing needs of the modern day, highly performant, application demands required to run on the latest Intel CPUs? Because, let's face it, our commodity hardware runs Intel CPUs. I had the chance to talk with a couple of representatives from Intel about their latest Intel Parallel Studio XE, while at ACCU 2017, and was pretty surprised to learn about some of the problems it was designed to solve, specifically:

  • it's an optimizing C++ or Fortran compiler for Windows, Linux and OS X
  • it has built-in vectorization and OpenMP support
  • it has a Math Kernel library
  • a data acceleration library for improved data analytics
  • it includes the Thread Building Blocks library
  • it can profile C, C++, Fortran, Python, Go, assembly and Java or a mixture of these languages
  • faster Python application performance by linking with Intel® performance libraries

Yes, I did some research online because, to me, it sounded too good to be true. Besides, I was at a conference and at conferences where everyone is in high-energy mode and some things can be, well, fluffed up. So I downloaded a copy and I'm starting to play with it.

To me, what is really compelling is the Math Kernel Library. Any system developer knows that floating-point operations are expensive. Add to that switching from kernel-space to user-space is also expensive. Now, not that I know definitively, but it seems to me that Intel® has optimized both of these with their Math Kernel library so that you may avoid the kernel-to-user space switch and perform an optimized floating-point operation. That alone would justify any licensing cost, in my opinion. But they've further optimized the data analytics pipeline with a performance library.

I hope to have more to say about it in the future, but for now, it looks pretty cool and does have some compelling features. I suggest having a look yourself, this may just save you many sleepless nights and a lot of unnecessary effort.

Until next time, Happy bug-free coding!