Geoff Tate

Geoff Tate

CEO of Flex Logix, Inc.

Advice on how to compare inferencing alternatives and the characteristics of an optimal inferencing engine.

In the last six months, we’ve seen an influx of specialized processors and IP to handle neural inferencing in AI applications at the edge and in the data center. Customers have been racing to evaluate these neural inferencing options, only to find out that it’s extremely confusing and no one really knows how to measure them. Some vendors talk about TOPS and TOPS/Watt without specifying models, batch sizes or process/voltage/temperature conditions. Others use the ResNet-50 benchmark, which is a much simpler model than most people need so its value in evaluating inference options is questionable.

There are almost a dozen vendors promoting inferencing IP but none of them gives even a ResNet-50 benchmark.

The only information they state typically is TOPS (Tera-Operations/Second) and TOPS/Watt.

Let’s discuss why these two indicators of performance and power efficiency are almost useless by themselves.

In the 30 years since FPGAs were first developed, they have become bigger and faster, but their basic architecture has remained unchanged. One FPGA company executive once said they don’t really sell programmable logic, they sell programmable interconnect, because 70-80% of the fabric of an FPGA is the traditional mesh fabric which routes the signals programmably between all of the logic blocks. And as the FPGA gets bigger, the amount of interconnection typically needs to grow to avoid routing problems (complexity scales with N2).

Market leaders in a wide range of markets are quickly embracing embedded FPGA technology because it significantly increases the return on investment required to bring complex chips to market. With readily-available, high-density blocks of programmable RTL in any size and with the features a customer needs, designers now have the powerful flexibility to customize a single chip for multiple markets and/or upgrade the chip while in the system to adjust to changing standards such as networking protocols. Customers can also easily update their chips with the latest deep learning algorithms and/or implement their own versions of protocols in data centers. In the past, the only way to accomplish these updates was through expensive and time-consuming new mask versions of the same chip.

All chip designers know that the deadline to “freeze RTL” comes all too soon. Now you don’t have to freeze the critical part of your RTL that needs to continue to change to keep up with evolving standards and changing customer requirements. Thanks to a handful of embedded FPGA companies, including Flex Logix who have introduced proven embedded FPGA platforms, chip designers can keep some of the chip flexible, enabling RTL to be updated any time, even in the customers’ systems.

In this article, we’ll talk about some of the key advantages to embedded FPGA, discuss how chip designers can use it, and also correct some of the most common misconceptions about what embedded FPGA is and isn’t.