USE Network launch I UAV Works VALAQ l Cable harnesses l USVs insight l Xponential 2020 update l MARIN AUV l Suter Industries TOA 288 l Vitirover l AI systems l Vtrus ABI

86 resources the data flow needs. This approach can dramatically simplify the implementation of an inference engine. In one development for an automotive system, a team of six engineers spent six months optimising a custom neural network framework. The new approach was able to take the code and optimise it in 10 minutes to the same level of efficiency. Another approach is to take the GPU blocks being used for video processing and optimise them for an embedded neural network accelerator that can run efficiently alongside a CPU. For example, in a 360 º view design with cameras around the vehicle, the GPU blocks can be used for de-warping and stitching the video images, and then a separate dedicated DNN accelerator is used to do the inferencing of object identification and classification, providing an alert as a result if necessary to the central control unit. This requires a software layer, or API, that works efficiently across multiple CPU cores, GPU cores and neural network accelerator (NNA) cores. The goal is to run as much as possible on the NNA, as the performance improvement is orders of magnitude better than the GPU. The NNA uses around 30-40% MAC blocks with other dedicated blocks to accelerate other functions in the framework such as quantisation, normalisation and pooling. The data in the framework is also optimised from the 32-bit floating- point format used in the training down to a smaller format, which can be down to 8-bit integers but could be 10- or 12-bit, which is the most efficient implementation. That gives a processing capability of 10 TOPS in a standard 16 nm chip-making technology. The next generation of designs, built on a 7 nm process, will have 10 times the performance of current systems, depending on the amount of memory available, with up to 16 cores on a chip. However, using multiple cores does not necessarily accelerate the performance, as the design of the model will determine how it can be split up, and the network designers will play a part in how multi- core architectures will work. Simulation of the framework also then becomes a key part of chip design. A framework can be tested with simulated data, and that optimised framework is then mapped to a simulation of the hardware. June/July 2020 | Unmanned Systems Technology This modular compiler was developed using neural network inference engine hardware with automotive qualification design from day one (Courtesy of Blaize) An API that can be used with CPUs, GPUs and dedicated neural network accelerators is a key step forward for the next generation of machine learning chip designs (Courtesy of Imagination Technologies)