Multiple readers have told me that it is difficult for them to understand and/or visualize the effects of latency on their HPC applications, particularly in modern NUMA (non-uniform memory access) and NUNA (non-uniform network access) environments.
Let's breaks down the different levels of latency in a typical modern server and network computing environments.
Here's a familiar analogy: your home. You spend a lot of time at your home; it's where you live and play.
You have friends and neighbors that live right near you in the same subdivision. You interact with these neighbors frequently, if for no other reason than they're close by / easy and fast to get to.
But you also have friends in other subdivisions. They're a several-minute drive across surface streets from your home. You interact with these people, too, but you have to think and plan a little more vs. just walking next door.
And then you have friends who live far away - it takes a long time to get there, and involves travel over long-distance highways.
The distance-from-home analogy is pretty easy to understand:
So when you send a message to a peer (e.g,. MPI_SEND to another MPI process), consider with whom you're communicating: are they next door, in the next subdivision, or in the next city? That gives you an idea of the magnitude of the cost of communicating with them.
But let's add another dimension here: caches and RAM. Data locality is a major factor in performance, and is frequently under-appreciated.
Stay tuned; we'll talk about data locality by extending this analogy in a future entry...