Cadence has experienced great success with its Tensilica Vision DSP family. The Tensilica Vision DSP family shares a common single-instruction multiple-data/very-long-instruction-word (SIMD/VLIW) architecture (Fig. 1). Its latest announcement is an expansion of the family of processor IP cores.
%{[ data-embed-type=”image” data-embed-id=”60870fcb66c9037c158b46cb” data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”1. Cadence’s Tensilica DSPs share a common architecture. Configurations such as SIMD width, and options like floating point support, can be combined to create unique solutions.” data-embed-src=”https://img.electronicdesign.com/files/base/ebm/electronicdesign/image/2021/04/Cadence_DSP_Fig_1___16279_Tensilica_Vision_Q8_diagram_v3.60870fca8972b.png?auto=format&fit=max&w=1440″ data-embed-caption=”1. Cadence’s Tensilica DSPs share a common architecture. Configurations such as SIMD width, and options like floating point support, can be combined to create unique solutions.” ]}%
The company’s latest low-end solutions target applications like smart sensors, mobile devices, and augmented reality (AR), where low power and always-on are becoming more common. Features like user authentication via voice commands, face detection, and fingerprint recognition requires artificial-intelligence/machine-learning (AI/ML) support. High-end solutions analyze multiple video streams in real-time, and they also use multiple AI/ML models.
The SIMD/VLIW architecture includes multiple, wide, 2048-bit, dual load/store memory interfaces and scatter-gather support. The cores also feature a 128-/256-bit AXI iDMA interface.
The Vision DSP family adds the new Tensilica Vision P1 at the low end and the multicore Tensilica Vision Q8 at the high end (Fig. 2). The Tensilica Vision P1’s 128-bit SIMD support can deliver over 0.256 TOPS of performance using one-third the area and power of its Tensilica P6 sibling. This make the P1 ideal for always-on applications that require minimal power.
%{[ data-embed-type=”image” data-embed-id=”60870fe1c0152fb0148b46ad” data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”2. The Vision DSP family ranges from the low-power Vision P1 to the multicore Vision Q8.” data-embed-src=”https://img.electronicdesign.com/files/base/ebm/electronicdesign/image/2021/04/Cadence_DSP_Fig_2_Family.60870fe0ce615.png?auto=format&fit=max&w=1440″ data-embed-caption=”2. The Vision DSP family ranges from the low-power Vision P1 to the multicore Vision Q8.” ]}%
The single-core Tensilica Vision Q8 supports a 1024-bit SIMD engine with 3.8 TOPS of performance and 129-GFLOPS FP32 floating-point performance. That’s twice the performance of the Q7 DSP.
This family of DSP cores provides developers with a range of options to meet different power and performance requirements that can include always-on, low-power solutions to high-performance, multistream machine-learning platforms. They share a common architecture and software solution from Cadence that allows for easy migration from one solution to another.
The company can provide developers with core IP or complete subsystems (Fig. 3). This is especially handy for more complex systems, especially those that need to address safety-related applications such as automotive platforms that need to be certified to safety standards such as ISO26262 ASIL-D certification.
%{[ data-embed-type=”image” data-embed-id=”60870ffa2f5c134f158b4691″ data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”3. Cadence provides complete subsystem designs like this one based around the Tensilica Vision Q8 DSP core, which can deliver 800 GFLOPS of FP32 performance.” data-embed-src=”https://img.electronicdesign.com/files/base/ebm/electronicdesign/image/2021/04/Cadence_DSP_Fig_3_subsystem.60870ff9ab4d5.png?auto=format&fit=max&w=1440″ data-embed-caption=”3. Cadence provides complete subsystem designs like this one based around the Tensilica Vision Q8 DSP core, which can deliver 800 GFLOPS of FP32 performance.” ]}%
Cadence’s software support includes Halide, OpenCL OpenVx Graph, and C/C++ compiler support (Fig. 4). The runtime can work with its own single-threaded XTOS or multithreaded XOS operating systems or with third-party RTOSes. Software libraries are provided for features like TensorFlow Lite for Microcontrollers support.
%{[ data-embed-type=”image” data-embed-id=”608710102f5c1384018b47ba” data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”4. All Tensilica DSPs are supported by shared Xtensa C/C++ and OpenCL compilers. Halide is a C/C++-based image and array enhanced compiler.” data-embed-src=”https://img.electronicdesign.com/files/base/ebm/electronicdesign/image/2021/04/Cadence_DSP_Fig_4_Software.608710101837b.png?auto=format&fit=max&w=1440″ data-embed-caption=”4. All Tensilica DSPs are supported by shared Xtensa C/C++ and OpenCL compilers. Halide is a C/C++-based image and array enhanced compiler.” ]}%
The DSPs are supported by the Tensilica Xtensa Neural Network Compiler (XNNC). XNNC also can target Cadence’s AI/ML DNA 150 processor. XNNC supports AI/ML models from TensorFlow, Caffe2, Keras, PyTorch, and Chainer.
Floating point is optional in these platforms, where often integer or fixed-point support is sufficient for an application. The Tensilica DSP architecture handles all of the new compact, numeric formats used by AI/ML applications, especially when scaling down to reduce performance and power requirements. The high-end platforms include complex floating-point support for FP16, FP32, and FP64 data formats. There are ADDSUB FFT enhancements for FP16 and FP32 as well.
In addition, Cadence supports the Tensilica Instruction Extension (TIE) language. Designers can create new TIE instructions that are automatically handled by the optimizing compiler. Typically, these new instructions are hidden from high-level language programmers, with the compiler handling utilization and optimization of TIE instructions to deliver higher performance with less overhead. TIE instructions can be added while maintaining ISO26262 certification.