Katsumata's 3rd place is AMD's Ryzen and Intel's Core i9. Outbreak of multi-core war. Introducing the 8-core "Ryzen 7 1800X" with ZEN microarchitecture, the "Ryzen Threadripper 1950X" that realizes a 16-core / 32-thread environment, and the "Core i9-7980XE" with 18-core / 36 threads.
Wakasugi This year, a consumer CPU with double-digit threads came out all at once. As the basic flow is that "AMD will release 8-core Ryzen", Intel also feels impatient, or it seems that a counter product has been released and a multi-core CPU has come out conspicuously. is.
Goto Ryzen has renewed the microarchitecture, and the performance per clock has improved significantly compared to the Bulldozer series, but it is said that it has returned to the original line. It is said that it has become a single thread performance comparable to Intel, but it is quite difficult to extend the single thread performance any more. It's not that there is no way, but there are barriers to efficient power growth.
CPUs used to have cycles of shrinking processes, doubling the transistors, increasing frequencies by 40%, naturally improving performance, naturally complicating microarchitectures, and speeding up. But now I can't realize that cycle. Because now it is difficult to reduce the voltage and leakage current of the power supply any more. As a result, it is difficult to improve the single thread performance of the CPU.
The reason for increasing the number of cores is to further improve the performance from there. I used to do it on servers for a long time, but I brought it to consumer products. It's a straightforward approach. Now, the only way to improve CPU performance is to increase the number of cores.
However, due to "Amdahl's law", there is a limit to parallelization, and a performance bottleneck comes. Now, the performance will increase as the number of threads increases for a specific application, but as in the era when process technology grew from here onward, another measure is needed to further improve the performance. Become.
It requires the task to flow to another core, or to extend the core to add another unit that is specific to a particular workload. For example, if the workload of deep learning increases, it is a method (domain specific) to add a unit specialized for the workload of deep learning.
GPUs have traditionally improved the performance of GPUs for parallel workloads. If we don't do the same thing more and more, we can't improve the performance anymore.
In fact, the multi-core movement that started with Ryzen is just the middle point in the big trend to improve the performance of the processor. The destiny of adding various functions to domain specific to improve performance is waiting for us.
Kasahara: In our industry, we used to call that kind of heterogeneous processor, but maybe we don't call it that way now.
It's easy to understand if you look at the makers of SoCs, but as Mr. Goto said, there are CPUs, GPUs, and then NPUs (Neural Processing Units) and so on. We are moving toward improving the overall performance by adding various units to the entire processor.
Goto: With a general-purpose processor, there is no way to improve performance anymore. It's not impossible to raise it, but the amount of increase is too small.
Kasahara: The general-purpose processor itself has changed, and what's interesting is that NVIDIA started using CUDA for general-purpose.
Goto: The General-purpose I'm talking about is a single-threaded story. The main use of GPUs in data centers now is parallel computing. I am running deep learning on a massively parallel computer + "Volta".
The challenge for the future is how to adapt the GPU to deep learning workloads. In the future, processors dedicated to deep learning will prosper, so it is necessary to support it.
At Intel Shift 2017 held by Intel in October, Nirvana CTO (Amir Khosrowshahi) said, "GPUs are out of date for deep learning."
Goto: That's true if you only do deep learning, but the GPU has the advantage that it can also be used as a general-purpose processor. NVIDIA cannot evolve the GPU as it is, so the next step will be to divide the GPU into two, one for deep learning and one for graphics. Actually, it's already differentiated.
It is difficult to optimize for deep learning, and even if it is optimized too much, versatility will be lost this time. The algorithms are changing all the time, so we have to deal with them as well.
Kasahara: Every time we discuss whether to make it programmable or dedicated hardware.
Goto: That was an old discussion, but now there is a new trend to create a "dedicated one" as an extension of a general-purpose architecture. The client PC will have an inference chip. The issue is how to make it as a set with recognition processing.
Kasahara The reasoning here is, for example, when a "cat image" is input, it is recognized as "this is a cat". The industry is discussing the idea of having such a function in a processor, and many people call it an NPU.
Even with dedicated hardware, the contents are different. For example, what SoC does is simply something that is close to an accelerator.
Goto Neural, Neuron, but what makes neurons hardware is called "Neuromorphic Computing", which is another category. A neural network is a software model of the function of a neuron, and it is said that an accelerator for it is put into the processor. It's far more advanced on mobile and lagging behind on PC.
Kasahara: It's not that the PC is late, but that it's not necessary now. But now that is changing too.
Applications used to be on the PC, or local, but applications are moving towards the cloud. UWP is aiming for that, isn't it?
Goto: The algorithm itself changes steadily, so all the inference guys are specific algorithms and made specifically for specific network models, so if you do so, for example, if you only support CMN (Cepstral Mean Normalization), RMN (Robust Mixed-) What about other neural network types such as Norm) and LSTM (Long Short-Term Memory)? GPUs can handle it, but it's inefficient, so there's a need for something in between, and everyone is making it now.
Yamada: When I saw "18 cores and 36 threads", I was wondering how fast I could write a manuscript if I used "Hidemaru" (laughs).
Kasahara No, I can't write fast (laughs). Hidemaru is a single thread.
Goto But that's really correct in a sense. You can make a "machine that makes manuscripts without permission" by deep learning.
Yamada: If you use bullet points, it will be written without permission.
Kasahara: Let's get back to Ryzen, but thanks to AMD's hard work, Intel's ass caught fire, and now we can compete again. The PC industry has been stagnant for the last few years, as Intel is a company that "doesn't work" if you're not careful. I think there are places where people like AMD will not give you good things unless you push Intel's ass.
In fact, the CPU hasn't changed much since Penrin. It hasn't changed from dual core, and there is no new one in the frame of U processor for 7 to 8 years, so finally this year, Kaby Lake Refresh came out. This is because AMD pushed it. The reason why the 18-core Core i9 came out is probably because AMD has released the 16-core Threadripper. Didn't you do it because you had to do it? I think that's a very good thing.
Yamada: While I'm alive, I want to touch a computer that is surprisingly fast. A stress-free computer.
Goto: If the ratio of deep learning increases in the increasing workload of the world, a computer that can speed up this is a "good computer".
Kasahara But Shohei-san's "stress-free" seems to be possible if the flash memory is made faster.
Yamada No, putting it in RAM doesn't make it surprisingly fast.