This article will tell you what an NPU is, what they're used for, and how their functions and purposes compare to other, similar processors.
"NPU" stands for "neural processing unit." It's a specially designed and manufactured piece of silicon that can accelerate machine learning calculations faster than a more generalized central processing unit (CPU) or graphics processing unit (GPU).
Although they usually appear more in datacenters and servers, NPUs are now appearing in recent consumer processors to accelerate local calculations of artificial intelligence (AI) software, like large language models, similar to ChatGPT.
NPUs have been popping up in a range of devices for years, including Apple's iPhones and iPads, "Copilot+" PCs and laptops, and more. While Apple's Bionic chips with onboard NPUs perform tasks like video stabilization and photo correction, they're now seeing use for generative AI functions such as drawing and text generation. These uses also include responding to real-language questions using large language model AIs.
Typically, these kinds of tasks happen in the cloud due to the extreme processing demands, but the latest generations of NPUs allow for some of those tasks to happen directly on a device. That efficiency improves (lowers) the latency of their responses and reduces the overall power and data requirements for these tasks, making them faster and less battery-intensive.
zf L / Getty Images
One of the reasons that "NPU" as a term can be a little confusing is because it sounds so much like a range of other terms associated with processors and modern technology. An NPU is not a CPU, though it can be part of an SoC, which can also contain a CPU and GPU. Let's break down these terms to clear things up.
A more modern concept has emerged in recent years, known as a GPNPU. These processors combine a graphics and neural processor on a single chip.
This consolidation can allow an NPU to use the general-purpose capabilities of a GPU to accelerate AI calculations beyond what the NPU can manage by itself. Some modern devices have been advertising their TFLOPs calculation speed for AI work as a metric for how fast the device is at AI calculations, and those TFLOPs are often split between the NPU and GPU, as both can contribute.
GPNPUs may be the next evolution of that, helping to push AI calculations further than before, and potentially allow the NPU to contribute to 3D rendering with AI calculations for upscaling and frame generation.