HPC? Realy?

Geoffrey Bastien
6 min readMar 11, 2022

--

The other day I was reading a post on HPC, and someone was talking about using Hyperthreading in the context of High-performance computing. I was on disbelieve and chock.

HPC(High-performance computing) Is more and more a bus word overuses. Like all the popular keywords, it helps to increase sales. After all, HPC is a very cool and glamorous bus word. The reality of HPC is significantly more grounded in the complexity of the hardware and the communication.

This article will talk about the essence of HPC and explain when you should consider using the keyword HPC. I know I will offend certain lectors, but bear with me. I will justify all my statements. In my opinion, we should not use the keyword HPC if:

  1. Use a high-level language like Python, .Net, Java.
  2. Use Virtualisation as a virtual server or Kubernetes and docker.
  3. Use processor with Hyperthreading activated.
  4. We do not pass a significant development time to validate the performance monitoring tool validating the hardware bottlenecks.
  5. Do not use vectorization or some hardware accelerator as GPU.
  6. Do not manage; none uniform access memory.
  7. Do not manage multithreading, Lock, or atomic access.
  8. Do not have a high-performance communication system.

And it is ok! Most of the programs running on servers do not need HPC. If you serve many simultaneous requests on a server, it will be ridiculous to use all the resources for one request and let wait for all other’s demands. You can undoubtedly use High-performance code to optimize a single request.

Virtualization

The concept behind virtualization is to abstract the physical layer. Hence, it is possible to use virtualization for a specific suite of programs and assign only the resources needed. Virtualization has the effect of optimizing the number of servers required and gives the ability to modify server configuration dynamically as required. Virtualization is also used for elasticity, the ability to replicate a server as needed to increase the volume of requests, splitting the load for the same program replicate between virtual servers and separating the resources between smaller virtual servers for a known need. Virtualization is an excellent tool for a website or job having some peak used; what an advantage to vary dynamically and adjust without setting up a new machine each time or let some servers sleep when not needed. That is especially true in the context of cloud computing, where you pay for what you use. So virtualization is beneficial when you split servers’ resources and reallocate dynamically. In HPC, the goal is the opposite around. We want to use many servers to resolve a substantial algorithm where all servers will act as one to compute a dedicated problem in a timely matter with a minimum of overhead as possible. Virtualization in that context and considering the overhead does not make much sense.

Material resources

It is not evident what is different between a server room built for HPC and a typical data center. Of course, the servers setup for HPC will be more oriented torse memory-intensive computing and I performance communication. However, the technology still has limitations. So, HPC is best at managing all the material limitations to bring the algorithm to compute maximum performance. Do not be a dupe. The only purposes are calculating for a reasonable price, for a large-scale solution, or reducing execution time. We can always add servers in a cluster to minimize computing time, so it is easy to understand everything comes down to money.

Considering all that material is driven by software, actual savings occur at this part of the process. Using better memories or better compute units will always cost me more in acquisition and operation. The ratio between efficient components and cost can modulate with time, but why have this component if you take advantage of only a tiny percentage of their capabilities. For example: To increase the amount cores available on one CPU, the company will generally add chiplets with interconnect. That gives the impression that the consumer uses only one CPU with more cores and appears as one big processing unit, and it will work out of the box as is, but not without a significant loss of performance. In reality, they are many small processing units, each connected to a portion of the memories. If a chiplets request the part of memory assigned to another chiplets, the data will have to passe in the interconnect module, which is pretty slow from a computing standpoint. The only way to constrict that disadvantage is for the software to manage the data the way it has been closed as possible at the core will use the data, making the interconnect communication not used as much(call nonuniform access memory). You will face the same compromise if you use a motherboard with many CPUs.

Many books as been written of all the concepts and applications of optimization as:

1) NUMA management and cache miss management.

2) Pipelining management and register renaming.

3) Vectorization by the use of the vector register.

4) Thread management.

The fact and the matter is none of the programming languages of high level allow efficient control of the material aspect. So yes, you will save a lot of programming time not to go on that route. A sizeable dedicated algorithm with the proper memory management and vectorization can achieve on a single core at least 16 times the performance compared to an optimized algorithm made in a conventional way and at least 32 times compared to a managed language of high level, and this gain multiplying by the number of cores available. That means resolving the same algorithm simultaneously in a traditional way will need many more servers. Those servers’ costs are not just in terms of the acquisition but also operation. Imagine if, to resolve an algorithm, you need only 25 servers instead of 500 because you gained 25X in the software.

Hyperthreading

It makes no sense to use Hyperthreading in HPC, and they’re why. Hyper trading was a technology patented by Sunmicrosystem in 1994. It brings two logical core by physical core. Hyperthreading has duplicated the part of the component used to save the core’s state but using the same processing unit. Hyperthreading takes advantage of the stalls time( Branch prediction or memory stalls) sleeps time to prioritize other work schedules for the joined logical core. It will also pass some work schedules according to the scheduler’s priority. Still, from a compute standpoint, Hyperthreading is a switch. When the physical core works for one logical core, the processing unit can not be used for the other one. If you have a program running with a lot of waiting or stalls time like most applications, Hyperthreading becomes an excellent optimization at the physical layer. In HPC, the programmer will do everything to reduce stalls points by implementing different techniques and sharing the work on all core available, reducing the penalty to do it as a minimum, Trying to get no sleeping time. It becomes a point where having Hyperthreading will block an efficient task for the benefit of another efficient job. Having to save the current state to do so, get back to the old state of a previous logical core, manage the scheduler. All that management brings more overhead and significantly increases the total executing time.

Communication

The attention given to the communication aspect is, in my opinion, as much crucial as the attention given to the computing portion. Why should I optimize a program to the extreme if that program will have to wait for data? Your communication system should always bring the data for computing a little faster than it takes to compute. That said, it is always easier to tell at to be done. It is a thin balance to use the communication without taking too much system capability. When we think about parallelization, what comes to mind at first is the usage of multithreading. Other parts of the server help you with the parallelization. You can think about the DMA(Direct memory access), where the transfer of data to the material is done almost without the intervention of the CPU. It is the case when you use a buffering socket. Buffering sockets are not magic. You still need to manage synchronization and the data as encryption, routing, etc. It is just one more tool that can help you balance computing and data management properly.

Conclusion

As an analogy is not because you install a new spoiler on your sports car, you can consider your vehicle in the category of NASCAR. It has way more to it. The spirit of HPC is all about hardware performance, including communication between nodes. At the core, HPC is all about resolving complex or large algorithms with the material available promptly. HPC has a very narrow purpose. The simple fact to build a distribution system does not justify using the keyword HPC and undervalue the tedious work as achieving ultimate performance. So please use the keyword HPC appropriately.

If you make it to the end, please Like this Article. Do not hesitate to contact me if you have questions. I am here to help!

geoffrey.bastien@a1soft4u.com.

--

--

No responses yet