Make super-sonic application With presumption load.

Geoffrey Bastien
10 min readMar 3, 2021

The strategy to make the application faster can be various. I wrote an article about the benefit of deep programming and Benchmarking in my article “ How the heck we forget performance? “.

I shared about the exclusive case when your code creates an algorithm dynamically at run time and the power of Self-modifying code in the article “ Self-modifying code It is for you?”.

I dream of the day, or we will see the business application as a real mathematical marvel, wanted to be the top performance application like gaming application. I believe the application of the future will anticipate any action and reason for the current interaction. Having always the next possible information incoming ready before even you request it. I believe the future application will be proactive and not reactive; let the user believe they control the information. Still, the application will always have one step in advance on the user and guide him in the process.

The most significant innovation incoming will soon change how we conceive and understand enterprise applications. The central business application will allow not only to control the data cohesively and increase the performance by presumption load, work in memory, triggers some process, and validate the business rule. The client portal, the CRM application, and the specialized application will send and receive data and instruction from the central business application like today’s database, making the entire business rule cohesive and dynamic. When I am referring to “central business application,” I refer to a computer cluster, Acting as one logic on a distributed system, without a point of failure and with redundancy apply as it should. Sharing data and regulations will become a work of two clicks, not a long ETL you have to maintain. We have the technology to create applications at that level of quality with an instantaneous response and prediction ahead. What is missing is the vision and this process for big enterprises to manage data as a performance element.

This intelligent and efficient manipulation of the data is precisely the vision behind the Node DB product I work on for many years now. I envision the data flowing from one side to another like a train or a perfect IO dance. I believe we all understand that the next decade’s real challenge is not the cloud but the ability to work in real-time with the enterprise’s data in a more coherent way.

Before to be able to implement those perfect buffering sockets as a performance method of communication, we are still dealing with the slow Web API from the other department, rights? So this time, I will like to talk about strategies, specially presumption load.

First, be aware Presumption Load or Presumption System exists only in my dictionary. The definition goes like this: Presumption Load is a mechanism that can be exploited locally or via a presumption system acting as an accelerator. The idea is to load or manipulating the data before to know for sure if the user will need it. The intent is to deliver data faster and in an intelligent way. Those predictions are based on an algorism to calculate the probability a set of data-specific will be requested. The idea of presumption is to create a management system that can load data in memory and filter it manipulated and deliver super fast.

Theory of Preload: “Nothing is lost, nothing is created, everything is transformed” (French chemist Antoine Laurent de Lavoisier, born in 1743). Computing makes no exception. Preload is an exchange of the CPU clock for Memories. Let me explain by example: One of the most efficient search algorithms is a binary search. The binary search requires ordering all your data at first for the algorism to work. The binary search is very efficient because the algorithm works base on organize data. The CPU time you use to manage your data convert into memory consumption to preserve the work accomplish. That work will benefit you for many searches to come. Obviously, you will not want to reorder or load the data for each search you want to accomplish. So, preloading can be considered as strategic exchanging processing for some preservation of the work done. Other factors can contribute to parallel work on the local node or multiple nodes to prepare the information.

You also have to be aware of the data behavior. The strategy can be drastically different if the data is available only in read-only. We will have no concern about the data’s validity because the data will stay validated indefinitely. Conversely, if the data have a short life cycle before expiration, the data will have to extract just on time or plane a mechanism to validate if the data change in between. Presumption load assumes your program has some knowledge of the data needed or at least potentially required. It can be loaded base on assumption or prediction as AI. The only preoccupation is this trigger you use to get you the data needed when needed pro-actively. Most likely, you will gain performance and benefit from that technic each time the prediction is right.

To preload, you will need to have a container of some sort. The container can be the local node or remote. Like any remote system regarding performance, remoting can be quite challenging and way more expensive to put in place. You will have to find a way to interlock in the current infrastructure. Doing a local preload is way least risky, but you will probably get a nicer gain ratio if you plan to do a remote preload system. The strategy around that container will directly be concerning the scope of your trigger. That functionality exploits the application's concept, and the user normally uses certain a pattern in the request of the data. This pattern can be detectable, which is exactly what Node Db does. When the system detected a pattern, the system will automatically preload the data. Try to apply locally first. When it became impossible, you will probably evaluate if it became worthy of going remote.

Let’s draw an example of a local preload: Consider a web application for CRM. One of the important screens of the application was allowing the user to see his client’s schedule. The application was opening on an informative home page after the user was login in. We realize almost all the users in the morning after login in were going to see his schedule. The algorism generating the schedules takes two to three seconds to process all the recurrent schedules for the weeks and cross-check with the client information. Was making a loading time too long. We decided in the morning after the user login. We were launching an asynchronies process to load the week’s schedule in the server session before selecting the schedule screen. That Preload was making that page open instantaneously when the user was requesting. This is defined as a local preload because the trigger was at the application’s level, and the information preserve was only for this instance of this application. Is that was a perfect solution? NO. The schedule was loaded in the session variable, requesting that the web servers have sufficient memory capacity for the users. Still, the memory consumed by the user was little, considering we were loading only one week of the schedule with minimum information. Considering the schedule can potentially change, we add a date validate base un the load completed, and we were doing an update as needed. So the risk was minimal, but the gain was appreciated.

Let’s make an example of a remote preload: A CRM of A doctor office loads each client’s billing information from a sub-vendors web service. The services of the sub-vendor have an average response time of around 3 seconds. Each time they look at the information about billings, this delay of three seconds makes the customer care representative not desirable, considering almost all their calls are based on questions about billings. We know all the billing is updated overnight, so the system does not have to worry about data refresh when loading its information. Accordingly to the type of treatment received or the question’s complexity, the call can be switched for a most specialized customer care representative. It’s also possible for the call to be transferred to an administrator to agree to pay if needed. In those specific circumstances, we have not only an opportunity for remote preload. We can also increase the value by adding an item or not in the container. What can be the trigger? To decide, we need first to understand the topologies and see how we envision the system to make a final decision.

You can see the presumption servers are placed between the web servers, the IVR system, and the different data sources in the image above. The way I design the system, the presumption servers act as a hub to the data, allowing intelligent data management to implement. A trigger can be determined programmatically or, even better, use an Ai system based on the request. Determine the action to take and evolve with time. It is clear to me if I will set up an AI system is because the amount of data management will be way larger than the scenarios presented above. Still, I wanted to present the ability for those systems to become and data intelligent hub. I exclude in the current design the fact the document system pass by the presumption system and does not overload the communication. We can decide to pass all requests via the presumption system even if some requests can be considered as a pass-through for many reasons. If we want to get the ability to get a statistic on the data usage, or if eventually, we change our strategies in terms of data for documentation without changing the topologies.

In that specific case, all systems at the right of the presumption server can be a trigger. The simple fact to request data to the presumption system will trigger the preload needed. As you can see, the system needs to have redundancy and no point of failure, which suppose a certain synchronization between the presumption nodes must be done for the preloading to be efficient and allows the distribution of the work and the data. If you have to build a presumption system, you can take advantage of creating a session accessible from any system. Memory storage allows you to keep information on the current status if the client has been securely validated, started call, a note from previous customer care representative, etc.… Getting a plus value in the current status or interaction and saving even more time and inefficient interaction with the client. So the hub data can be very a marvel in data logistics.

When you talk about a remote system, you need to talk about communication and security. The two subjects are wide, many books and work been written on this matter. I will barely touch the subject here. I will like to make some observation base on my experience in the matter. If you decide to move forward in this development, I recommend you document yourself in buffering socket programming. The performance aspect of communication is often not taking very seriously. Your communication system will become your bottleneck, so specific attention should be given to this aspect. In Node DB, we built in C++ three different versions of the communication module before to achieve a level I consider as performant, and I make many socket application projects before.

Socket communication has multiple modes. Some modes have been added specially for high-performance communication. Those modes take advantage of your computer’s physicals capability, using the memory controller to transfer memory to your computer’s network adapter without your CPU to intervene after sending the request. Those capabilities bring the CPU's ability to work on other tasks, as preparing the next message, but bring more complexity in synchronization and message management.

Most of the servers have two network adapters for redundancy. We wanted to take advantage of those physicals capabilities and communicate on those two cards simultaneously if worded to do so. We did not understand the level of complexity we add over the buffering socket. The socket manager has to disassemble your message in a segment, send it, and rebuild the message on the receiver. The simple fact you use two different Network adapters can make your message pass by to different routes on the network, making one potentially faster than the other one. You have to get the capability to buffer on the receiver to rebuild the message when sufficient segments have been received. You have to consider the speed of the network passing by a different switch, the card’s speed, and many other factors to do an intelligent repartition. Sometimes for some reason, you may choose to communicate on only one adapter under certain circumstances. In Node DB, we implemented two Nodes’ ability to communicate together, self-test to ensure the communication quality in terms of speed, and decide the good logistics path. Even if the communication system is one of the most important parts of the distributed system, you usually create those servers to serve a purpose. It will benefit if you let a vital part of the machine resources for that purpose. You can not use all the server resources around the logistic of communication.

We have achieved a network of 10gb/s on a server with two network adaptors a final transfer rate of 18gb/s. Those numbers are fantastic, but the amount of investment needed to achieve those results is significant. The development, your switch, your network adapter, the ability to keep your component cold, yes, the component’s temperature play a factor in your speed communication. The balance between the workload to serve a purpose and the communication becomes a real concern in performance communication. Something you do not even consider a slower communication system.

Security has to be added to your package management in your communication. Please do not improvise your self creator of the algorithm. Many libraries for encryption already exist; else, many documentation exists for you to implement encryption will be bulletproof. If you program low level and attend to specify by CPU, Intel has a lot of opcode regarding encryption to be more efficient. Whatever you choose, you need to ensure your security and strategy are very tight.

On the other hand, you can use more conventional communication as HTTP or HTTPS that makes the integration way faster, considering many existing libraries. In terms of performance, this solution can be considered a turtle compared to the solution explained before, which can be compared more like a formula one. In Node DB, we have implemented the two communication methods. The idea is all communication between nodes should be the fastest method. We should have the ability to respond in JSON directly to avoid conversion at the program level if needed. Again the choice is strategically contextual.

The presumption system in remote can become an intelligent hub for data and save a lot of money to increase its quality. The presumption system certainly required some consideration. Obviously, no one likes to wait, and time is money, so always keep in mind your need for performance.

Do not hesitate to contact me if you have questions. I am here to help! Please click like if you believe that article helps you.

geoffrey.bastien@gmail.com

Originally published at https://www.linkedin.com.

--

--