Implementation
To take full advantage of the highly automated, fast turn-around implementation and verification offered by modern silicon integrated circuit design flows, we developed a small standard cell library. A standard cell library is a collection of small pre-verified building blocks from which much larger and more complex designs can be quickly and easily built using sophisticated electronic design automation tools such as synthesis, place and route.
Before the implementation of the standard cell library could begin, some preliminary investigations were done to determine the most suitable standard cell architecture for the library given the constraints of the target technology. The cell architecture is the set of features that are common to every cell in the library, such as cell height, power strap sizing, routeing grid and so on, which allow the cells to be snapped together in a standard way to form larger structures. These common features are largely governed by the design rules of the manufacturing process but are also influenced by the performance and area requirements of the final design.
Once the cell architecture was established, the next step was to determine the content of the cell library not only in terms of variety of logic functions but also the number of drive strength variants of each logic function. Because the effort involved to design, implement and characterize each standard cell is substantial, it was decided to run some trials with a small prototype library and then to expand the library as required. To evaluate the performance of this small prototype standard cell library some simple representative circuits (such as ring oscillators, counters and shift arrays) were implemented, manufactured and tested.
We migrated from 1.0-μm design rules to the new FlexIC 0.8-μm design rules to reduce area and, hence, increase yield. As this meant redrawing each cell in the library with smaller transistors, we took the opportunity also to change the standard cell architecture to include MT1 (metal-tracking 1) pins to make it easier for the router to hook up the cells. Improvements to the resistive material (higher sheet resistance, Rs) also enabled a 3× reduction in the size of the resistors.
This dramatic reduction in both transistor and resistor size reduced the area of most cells by about 50% (see Extended Data Fig. 1), which in turn improved the manufacturing yield by bringing down the overall size of the design. However, as there were still manufacturing yield issues that we could further mitigate by changes to the standard cell architecture, the library was redrawn again. This time we focused on things that would improve the overall yield of the final design, such as the inclusion of redundant vias and contacts, reducing the number of vertices in the source–drain polygons (where possible) and keeping the size of stacked transistors to a minimum. In addition, we reverted to a lower sheet resistance in order to improve the process spread but we were able to maintain the area savings by using narrower resistors. To improve the overall quality of the logic synthesis a number of complex AND-OR-INVERT and OR-AND-INVERT logic gates were added to the library as well as some high-drive-strength simple logic gates, such as NAND2_X2 and NOR2_X2.
The FlexLogIC process is an NMOS process and so relies on a resistive load to pull the cell output towards the power supply to drive a logic 1. As a consequence of this, the cell output rise times are much slower than the fall times and this asymmetry can affect performance, especially for heavily loaded nets. To improve the timing on critical nets, such as the clock, we added buffers with an active transistor pull-up. Although these active pull-ups increase the area by a small amount, they do have the added benefit of reducing the static power consumption. Layouts and simulated transfer characteristics of buffers with resistive pull-up and active transistor pull-up are shown in Extended Data Fig. 2.
This simple standard cell library was then successfully used as the target technology to implement the PlasticARM SoC using a typical silicon integrated circuit design flow based on industry standard electronic design automation tools. The standard cell library contents and cell usage information are shown in Extended Data Table 1.
As we do not yet have a dedicated static random access memory FlexIC, we created a simple register file by carefully placing some modified standard cells in a tiled array that connected by abutment to form a 32 × 32 bit memory (this block can be seen in the chip layout in Fig. 1c).
The FlexLogIC technology (see Extended Data Table 2) has four routable metal layers of which only the lower two were used inside the standard cells. This left the top two metal layers free to be used for the interconnect between the standard cells, which could then be routed over the top of any neighbouring cells leading to a much-improved overall gate density of about 300 gates per mm2.
Fabrication
Process parameters and statistical variations of TFT parameters are summarized in Extended Data Table 2. FlexLogIC is a proprietary 200-mm wafer semiconductor manufacturing process that creates patterned layers of metal-oxide thin-film transistors and resistors, with four routable (gold-free) metal layers deposited onto a flexible polyimide substrate according to the FlexIC design. Repeated instances of the FlexIC design are realized by running multiple sequences of thin-film material deposition, patterning and etching. For ease of handling and to allow industry standard process tools to be used and sub-micrometre patterned features to be achieved (down to 0.8 μm), the flexible polyimide substrate is spin-coated onto glass at the outset of production. The process has been optimized to ensure that the thickness variation is substantially less than 3% over a lateral distance of 20 mm. Thin-film material deposition is achieved through a combination of physical-vapour deposition, atomic-layer deposition and solution-processing (for example, spin-coating). Substrate processing conditions have been carefully optimized to minimise film stress and substrate bow. Feature patterning is achieved using a photolithographic 5× stepper tool, which images a shot that is repeated at multiple instances across the 200-mm-diameter wafer. Each shot is focused individually, which further compensates for any thickness variation within the spun-cast film. The technology measurements were carried out using process control monitoring structures.
Simulation, test and validation
We captured the timing characteristics of the functional PlasticARM FlexIC using a test measurement setup, and compared the measured results with the results of its register-transfer level (RTL) simulation in order to validate the functionality.
The RTL simulation is shown in Extended Data Fig. 3. It starts by resetting the PlasticARM to a known state by setting a RESET input to ‘0’. Then, RESET is set to ‘1’, the processor is released from its reset state and starts executing the code from ROM. At first, the GPIO[0] output pin is toggled once before the three tests described in Fig. 2 are executed. In the first test, data are read and added to an accumulator from the ROM, and the sum is compared against an expected value (see Fig. 2a). If values match, a short burst of two pulses is sent to GPIO[0] as shown in Extended Data Fig. 3a. If values are different, the period and duty cycle of pulses on GPIO[0] is increased in Extended Data Fig. 3b. In the second test (Fig. 2b), data are written to RAM, read back and compared. If data has not been corrupted while writing or reading from the RAM, a short burst of three pulses is sent to GPIO[0] as shown in Extended Data Fig. 3a. If data was corrupted, the period and duty cycle of pulses on GPIO[0] is increased as before. In the final test (Fig. 2c), the processor enters an infinite loop and measures the time a ‘1’ is applied on the GPIO[1] input pin. If GPIO[1] is held at ‘1’ without any glitches for long enough, GPIO[0] changes from ‘0’ to ‘1’. PlasticARM was implemented with a clock frequency of 20 kHz. Since it does not use any timers, a value was chosen in software to represent the GPIO[1] signal being held at ‘1’ for approximately 1 s when operating at 20 kHz. In our simulations in Extended Data Fig. 3a, that value corresponds to 20,459 clock cycles, which at 20 kHz yields 1.02295 s.
After fabrication, PlasticARM was tested on a wafer probe station while still attached to a glass carrier. The input signals including a clock signal were generated externally with a ZC702 FPGA Evaluation Board from Xilinx. Both input and output signals were captured using a Saleae Logic Pro 16 logic analyser. Measurements were carried out at 3 V and 4.5 V, with various clock frequencies. An experiment with power supply set to 3 V and clock frequency of 20 kHz is shown in Extended Data Fig. 4. The ZC702 I/O voltage caps the inputs and outputs to 2.5 V. The measured data waveform is shown in Extended Data Fig. 4a, and matches the waveform in the RTL simulation of all three tests in Extended Data Fig. 3a. PlasticARM is fully functional up to 29 kHz at 3 V and 40 kHz at 4.5 V.