Doctoral Thesis

On-Chip Switched Capacitor Voltage Regulators for Granular Microprocessor Power Delivery

Author(s):
Andersen, Toke M.

Publication Date:
2015

Permanent Link:
https://doi.org/10.3929/ethz-a-010510895

Rights / License:
In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.
DISS. ETH NO. 22553

ON-CHIP SWITCHED CAPACITOR VOLTAGE REGULATORS FOR GRANULAR MICROPROCESSOR POWER DELIVERY

A thesis submitted to attain the degree of DOCTOR OF SCIENCES of ETH ZURICH
( Dr. Sc. ETH Zurich)

presented by
TOKE MEYER ANDERSEN

M. Sc., Technical University of Denmark (DTU)
born on 02.02.1986
citizen of Copenhagen, Denmark

accepted on the recommendation of

Prof. Dr. Johann. W. Kolar, examiner
Prof. Dr. Michael A. E. Andersen, co-examiner

2015
Acknowledgments

FIRST AND FOREMOST, I would like to thank Prof. Johann W. Kolar for the invitation to join the Power Electronic Systems Laboratory (PES) at ETH Zurich. At PES, I had the pleasure to be co-supervised by Dr. Florian Krismer, who never failed to improve my work, especially the scientific publications that this thesis is based on. The liters of red ink were well spent. A special thanks to Dr. Thomas Toifl for welcoming me to the high speed I/O link technology group at IBM Research – Zurich. Without the fruitful collaboration between IBM and ETH, this PhD project would not have been possible. I would like to thank Prof. Michael A. E. Andersen from the Technical University of Denmark (DTU) for being part of the examination committee.

A bunch of extraordinarily extraordinary people at PES made my stay there extraordinary. It was a particularly good choice to join Gabriel Ortiz, Roman Bosshard, and Jonas Huber on a roadtrip on the US West coast after a conference. Furthermore, Jonas Huber and I had the pleasure of organizing two now-legendary Great Grand PES Hikes in the Swiss Alps. Other PES people worth mentioning are Arda Tüysüz, David Boillat, Christoph Gammeter, Patricio Cortes, Pedro Bezerra, Oliver Knecht, Matthias Kasper, Ralph Burkart, Dominik Bortis, Yanick Lobsiger, Michael Flankl, Mario Mauerer, Daniel Rothmund, Thomas Guillo, Lukas Fässler, Michael Leibl, Lukas Schrittwieser, Daniel Steinert, Ivana Kovacevic, Uwe Badstübner, Andrija Stupar, Hirofumi Uemura, Cheng-Wei Cheng, Claudius Zingerli, Thomas Baumbgartner, Ben Wrzecionko, Jonas Mühlethaler, Christoph Marxgut, Mario Schweizer, Bernardo Cougo, Thomas Reichert, and Thiago Soeiro. Finally, I had the pleasure to supervise three talented master students: Samuel Bögli, Christian Stocker, and Hans Sjökvist. Thank you all!

Likewise, a bunch of amazingly amazing people at IBM Research made the entire stay there amazing. A big thanks to my office colleague and friend Lukas Kull. With you, it was always downhill. Other
people in the group at IBM worth mentioning are Pier Andrea Francese, Thomas Morf, Christian Menolfi, Marcel Kossel, Matthias Brändli, Hazar Yuksel, Danny Luu, Alessandro Cevrero, Joerg-Erik Sagmeister, Urs Bapst, Peter Buchmann, Bernhard Klein, Mareike Kühn, Winnie Tatiana, and Cosimo Aprile. Through my involvement in the CarrICool project, I was fortunate to work with the highly inspiring people like Thomas Brunschwiler, Arvind Shridhar, and Bruno Michel, as well as Ningning Wang, Caroline Rabot, Zoran Pavlovic, and Cian Ó Mathúna from Tyndall National Research Center, Ireland, Sohie Gaboriau and Catherine Bunel from IPDIA, France, and Wolfram Steller from Fraunhofer Fhg, Germany. Also, a special thanks to my friend Hans Meyvaert from KU Leuven, Belgium. Besides the groups, I have met a lot of fellow students who I today consider some of my dearest friends. I still laugh when thinking about the good old open office trio consisting of Rik Jongerius, Alexis Hafner, and myself. We surely had a great time at IBM, and we still have a great time today. I had a good time with Philip Mensch as IBM PreDoc President, organizing various social events for all PhD students at IBM. Additional friends worth mentioning are Adela Almasi, Robby Zippel, Tomas Tuma, Matteo Cossale, Twan Kampf, Djordje Zegarac, Sigrun Köster and Anil Kurmus.

Thank you all!

I very much appreciate the wonderful time spend with my WG flatmates Alexandra Hoffmann, Jarno Hartog, Aglaya Salvatierra, Alexandra Schwizer, Myriam Peter, Silvan Wick, Larissa Scherrer, and Nicole Vogler. I would like to thank Mette Lerche Sørensen for the good times we had along the way. A very special thanks goes to Dr. Carina Joly for her support and kindness during the end of my PhD. With her, I shared wonderful moments including very Bach music and holunder.

Thank you so much!

Last but definitely not least, I wish to thank my family in Denmark for their support and understanding during my stay abroad. Thanks to my parents Betty and Lasse and my sisters Thea and Tenna. I very much enjoyed your visits as well as my frequent visits back home.

With love!

Toke Meyer Andersen
THE EVER INCREASING supply currents at decreasing supply voltages in microprocessor systems result in inefficient and unstable power delivery due to parasitic resistances and inductances in the power distribution network. By supplying the microprocessor system with a higher-than-nominal input voltage, the input current, which flows through the power distribution network, is decreased proportionally for the same power specification. To facilitate this scenario, an on-chip (or fully-integrated) voltage regulator is required to convert the higher-than-nominal input voltage down to the nominal supply voltage specified by the microprocessor. Furthermore, on-chip voltage regulators enable a granular power delivery that consolidates several voltage domains, e.g. voltage domains for cores, caches, graphic processors, I/O’s, etc., in the microprocessor system from a single input voltage on the motherboard. In addition, for a multi-core or many-core microprocessor system, on-chip voltage regulators enable per-core regulation where the supply voltage of each core is regulated independently from one another. This reduces the voltage overhead, which in turn reduces the energy consumption for a given computation. The adoption of granular power delivery and per-core regulation in future many-core microprocessor systems thus promises significant power and energy savings.

This thesis focuses on the electrical design and implementation of on-chip voltage regulators for granular microprocessor power delivery and per-core regulation. To achieve the power and energy savings discussed above, the on-chip voltage regulator must

► be designed and implemented using the same 32 nm SOI CMOS semiconductor technology as the microprocessor.

► achieve high conversion efficiency to improve the overall system efficiency.

► achieve high power density to fit onto the microprocessor chip.
Abstract

- demonstrate high output power to supply all microprocessor voltage domains.
- achieve fast response times to transient load changes over a wide voltage range to enable dynamic voltage and frequency scaling capabilities.

Typically, an inductive buck converter is used for microprocessor power delivery. Hence, their suitability for on-chip integration is investigated by Pareto optimization of on-chip inductors. Pareto optimization procedures for both air core and cored on-chip inductors are developed to evaluate the efficiency and power density for a given converter specification and design space of the geometrical parameters. According to the Pareto optimization, inductors using the top metal layers of the 32 nm SOI CMOS technology achieve insufficient efficiency to be suited for this application, but both air core and cored inductors manufactured using additional post-processing steps are suited.

Due to the unavailability of the additional post-processing steps required to manufacture efficient on-chip inductors, switched capacitor converters, which are implemented using transistors and capacitors only, are considered. To analyze on-chip switched capacitor converters, a state space model framework is developed. The model framework takes the effects of the parasitic bottom plate capacitance present in on-chip capacitors into account, and it is used in a Pareto optimization procedure to select the optimal design for a given converter specification and design space. The first converter design consists of a single stage 2:1 voltage conversion ratio on-chip switched capacitor converter. The design utilizes the high-density deep trench capacitors available in the 32 nm SOI CMOS technology. Measurements of the first converter design result in 86% maximum efficiency at 4.6 W/mm² power density whilst converting from 1.8 V input voltage to 830 mV output voltage. Hence, on-chip switched capacitor converters prove to be suited for granular microprocessor power delivery.

Based on the promising measurement results of the first converter design, a complete on-chip switched capacitor voltage regulator is designed. A reconfigurable power stage, which features a 2:1 and a 3:2 voltage conversion ratio, is designed. The reconfigurable power stage supports a wide output voltage range of 0.7 V – 1.1 V with 1.8 V input
Abstract

voltage, thereby enabling dynamic voltage and frequency scaling for per-core regulation. Interleaving is employed to significantly reduce the input and output decoupling requirements. A single bound hysteretic control scheme with a digital clock interleaver is developed. Utilizing the fast transistors of the 32 nm SOI CMOS technology, the controller is clocked at 4 GHz to provide sub-nanosecond response time to a load change. Measurements of the second converter design results in a maximum efficiency of 86% at 2.2 W/mm² power density in the 2:1 configuration and 90% at 3.7 W/mm² power density in the 3:2 configuration. Furthermore, the sub-nanosecond response time of the controller is verified using an on-chip programmable load. Despite the sub-nanosecond response time, a 90 mV output voltage droop is observed. The output voltage droop is found to be caused by a significant input voltage droop, which is due to supply instability resulting from the parasitic inductance of the power distribution network.

As a final step, a novel feedforward control scheme for reconfigurable switched capacitor voltage regulators is developed. The feedforward control mitigates the output voltage droop by dynamically changing the configuration of the converter when an input voltage droop is detected. Measurement results of the third converter design confirm the transient response of the feedforward control scheme, and the output voltage droop is reduced from 90 mV to 30 mV. The minimum supply voltage required by the microprocessor cores can therefore be maintained with a 60 mV voltage overhead reduction, thereby reducing the compute energy of the system. Finally, the third converter design delivers 10 W maximum output power at 85% efficiency and 5 W/mm² power density. To facilitate the measurements for this design, a thermal model is developed to take temperature dependencies of the on-chip programmable load into account.

This thesis concludes that on-chip inductors using the top metal layers of the 32 nm SOI CMOS technology are unsuited for buck converter integration due to the high dc resistances, and thereby low efficiencies, achievable with the limited metal thicknesses. However, on-chip inductors with additional post-processing steps, e.g. thicker top metal layers and/or magnetic material deposition, as well as inductors integrated on a separate die or into the laminate, are suited. This thesis further concludes that on-chip switched capacitor voltage regulators, which histor-
ically have been perceived as being inefficient, low power, and difficult to regulate, are a viable candidate to enable granular microprocessor power delivery and per-core regulation. The measured performances of the presented converters rank among the highest efficiency, highest power density, highest output power, and fastest transient response time on-chip voltage regulators published to date.
Zusammenfassung


Diese Arbeit befasst sich mit der elektrischen Auslegung und der Realisierung von vollintegrierten Spannungsreglern für granulare Leistungsverteilung und separate Spannungsregelung pro Kern. Um die oben erwähnten Leistungs- und Energieeinsparungen zu erzielen, muss der vollintegrierte Spannungsregler

- in der selben 32 nm-SOI-CMOS-Halbleitertechnologie wie der Mikroprozessor selbst entwickelt und realisiert werden,
hohe Effizienz erreichen, um die Gesamteffizienz des Systems zu verbessern,

hohe Leistungsdichte aufweisen, um auf dem Mikroprozessorchip Platz zu finden,

hohe Ausgangsleistung bereitstellen, um alle Spannungsdomänen des Mikroprozessors zu versorgen,

über schnelle Reaktionszeiten bei transienten Laständerungen in einem grossen Spannungsbereich verfügen, um die dynamische Skalierung von Spannung und Frequenz zu ermöglichen.

Typischerweise werden für die Leistungsversorgung von Mikroprozessoren Abwärtswandler verwendet. Daher wird im ersten Schritt deren Eignung für die Integration auf dem Chip mittels Pareto-Optimierung von auf dem Chip ausgeführten Induktivitäten untersucht. Es werden Pareto-Optimierungsverfahren für vollintegrierte, sowohl eisenlose als auch eisenkernbasierte Induktivitäten entwickelt, um Effizienzen und Leistungsdichten für eine vorgegebene Wandlerspezifikation und für geometrische Parameter aus einem gegebenen Designraum zu evaluieren. Gemäß dieser Pareto-Optimierung erreichen Induktivitäten, die die oberste Metallisierungsschicht der 32 nm-SOI-CMOS-Technologie nutzen, nur unzureichende Wirkungsgrade, was sie für die vorliegende Applikation ungeeignet macht. Hingegen eignen sich sowohl eisenlose als auch eisenkernbasierte Induktivitäten, welche jedoch nur mittels zusätzlicher Nachbearbeitungsschritte hergestellt werden können.

Aufgrund der Nichtverfügbarkeit zusätzlicher Nachbearbeitungsschritte, welche für die Herstellung von effizienten vollintegrierten Induktivitäten benötigt würden, werden Schaltkondensatorwandler (‘switched capacitor converter’) betrachtet, die nur aus Transistoren und Kondensatoren bestehen. Um solche vollintegrierten Schaltkondensatorwandler zu analysieren, wird ein Zustandsraummodell entwickelt. Dieses Modell berücksichtigt die Effekte der parasitären Bodenplattenkapazität, welche bei vollintegrierten Kondensatoren auftritt, und es wird in einer Pareto-Optimierung verwendet, um das optimale Design für eine gegebene Wandlerspezifikation und einen gegebenen Designraum auszuwählen. Das erste Wandlerdesign besteht aus einem einstufigen, vollintegrierten Schaltkondensatorwandler, der ein Spannungsübersetzungsverhältnis von 2:1 aufweist. Die Auslegung nutzt die in der 32 nm-SOI-
Zusammenfassung

CMOS-Technologie verfügbaren hochintegrierten Deep-Trench-Kondensatoren. Messungen an diesem ersten Wandlerdesign ergeben einen Wirkungsgrad von maximal 86% bei einer Leistungsdichte von 4.6 W/mm², während eine Eingangsspannung von 1.8 V in eine Ausgangsspannung von 830 mV umgewandelt wird. Es zeigt sich daher, dass sich vollintegrierte Schaltkondensatorwandler für die granulare Leistungsverteilung eignen.

Basierend auf den vielversprechenden Messergebnissen des ersten Wandlerdesigns wird ein kompletter vollintegrierter Schaltkondensatorwandler konzipiert. Es wird eine rekonfigurierbare Leistungsstufe, welche sowohl auf ein Spannungsübersetzungsverhältnis 2:1 als auch 3:2 eingestellt werden kann, entwickelt. Die rekonfigurierbare Leistungsstufe unterstützt bei einer Eingangsspannung von 1.8 V einen grossen Ausgangsspannungsbereich von 0.7 V – 1.1 V, was die dynamische Skalierung von Spannung und Frequenz für die separate Spannungsregelung pro Kern ermöglicht. Um die Entkopplungsanforderungen am Eingang und am Ausgang signifikant zu verringern, wird versetzte Taktung (‘interleaving’) verwendet. Es wird ein Zweipunkteregler mit einseitiger Hysterese und einem digitalen Taktverschachtelungsblock entwickelt. Indem die schnellen Transistoren der 32 nm-SOI-CMOS-Technologie genutzt werden, kann der Regler mit 4 GHz getaktet werden, was bei Laständerungen Reaktionszeiten von unter einer Nanosekunde möglich macht. Messungen an diesem zweiten Wandlerdesign ergeben eine maximale Effizienz von 86% bei einer Leistungsdichte von 2.2 W/mm² in der 2:1-Konfiguration und 90% bei 3.7 W/mm² in der 3:2-Konfiguration. Zudem wird die Reaktionszeit des Reglers im Subnanosekundenbereich mittels einer auf dem Chip integrierten programmierbaren Last verifiziert. Trotz der Reaktionszeit unter einer Nanosekunde tritt eine Regelabweichung von 90 mV in der Ausgangsspannung auf, welche auf einen signifikante Einbruch der Eingangssspannung zurückgeführt werden kann, der wiederum von Versorgungsspannungsinstabilitäten aufgrund der parasitären Induktivitäten des Leistungsverteilernetzwerkes herrührt.

Am abschliessenden Teil der Arbeit wird daher eine neuartige Regelung mit Vorsteuerung für rekonfigurierbare Schaltkondensatorwandler entwickelt. Diese Regelung mit Störgrössenaufschaltung verringert die Regelabweichung der Ausgangsspannung, indem die Konfiguration

— xi —
des Konverters dynamisch geändert wird, sobald eine Abweichung der Eingangssspannung detektiert wird. Messungen am dritten Wandlerdesign bestätigen das angestrebte transiente Verhalten der Regelung mit Vorsteuerungsschaltung, und die Regelabweichung in der Ausgangsspannung kann von 90 mV auf 30 mV reduziert werden. Die minimale von den Mikroprozessorkernen benötigte Versorgungsspannung kann daher bei einer um 60 mV reduzierten Eingangssspannung eingehalten werden, wodurch der Energieverbrauch pro Berechnung des Systems reduziert wird. Schlussendlich liefert das dritte Wandlerdesign eine maximale Ausgangsleistung von 10 W bei einem Wirkungsgrad von 85% und einer Leistungsdichte von 5 W/mm². Um die Messungen an diesem Design zu ermöglichen, wird ein thermisches Modell entwickelt, so dass die Temperaturabhängigkeiten der vollintegrierten programmierbaren Last berücksichtigt werden können.

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>AC/ac</td>
<td>Alternate Current</td>
</tr>
<tr>
<td>BEOL</td>
<td>Back End Of the Line</td>
</tr>
<tr>
<td>C4</td>
<td>Controlled Collapse Chip Connection</td>
</tr>
<tr>
<td>CMOS</td>
<td>Complimentary Metal Oxide Semiconductor</td>
</tr>
<tr>
<td>DC/dc</td>
<td>Direct Current</td>
</tr>
<tr>
<td>DUT</td>
<td>Device Under Test</td>
</tr>
<tr>
<td>DVFS</td>
<td>Dynamic Voltage and Frequency Scaling</td>
</tr>
<tr>
<td>FEM</td>
<td>Finite Element Method</td>
</tr>
<tr>
<td>FSL</td>
<td>Fast Switching Frequency Limit</td>
</tr>
<tr>
<td>FEOL</td>
<td>Front End Of the Line</td>
</tr>
<tr>
<td>gnd</td>
<td>Electrical Ground</td>
</tr>
<tr>
<td>I/O</td>
<td>Input/Output</td>
</tr>
<tr>
<td>ICT</td>
<td>Information and Communication Technologies</td>
</tr>
<tr>
<td>IR drop</td>
<td>Wire/Copper voltage drop</td>
</tr>
<tr>
<td>ITRS</td>
<td>International Technology Roadmap for Semiconductors</td>
</tr>
<tr>
<td>KCL</td>
<td>Kirchhoff’s Current Law</td>
</tr>
<tr>
<td>KVL</td>
<td>Kirchhoff’s Voltage Law</td>
</tr>
<tr>
<td>MIM</td>
<td>Metal Insulator Metal</td>
</tr>
<tr>
<td>MOM</td>
<td>Metal Oxide Metal</td>
</tr>
<tr>
<td>MOSFET</td>
<td>Metal Oxide Semiconductor Field Effect Transistor</td>
</tr>
<tr>
<td>MSB</td>
<td>Most Significant Bit</td>
</tr>
<tr>
<td>NMOS/Nmos</td>
<td>N-channel MOSFET</td>
</tr>
<tr>
<td>OCVR</td>
<td>On-Chip Voltage Regulator</td>
</tr>
<tr>
<td>Opamp</td>
<td>Operational amplifier</td>
</tr>
<tr>
<td>PAR</td>
<td>Peak to Average Ratio</td>
</tr>
<tr>
<td>PDN</td>
<td>Power Distribution Network</td>
</tr>
<tr>
<td>Abbreviation</td>
<td>Description</td>
</tr>
<tr>
<td>--------------</td>
<td>----------------------------------</td>
</tr>
<tr>
<td>PMOS/Pmos</td>
<td>P-channel MOSFET</td>
</tr>
<tr>
<td>POL</td>
<td>Point Of Load</td>
</tr>
<tr>
<td>PSiP</td>
<td>Power Supply in Package</td>
</tr>
<tr>
<td>PwrSoC</td>
<td>Power Supply on Chip</td>
</tr>
<tr>
<td>RDL</td>
<td>Redistribution Layer</td>
</tr>
<tr>
<td>RMS/rms</td>
<td>Root Mean Square</td>
</tr>
<tr>
<td>SC</td>
<td>Switched Capacitor</td>
</tr>
<tr>
<td>SCVR</td>
<td>Switched Capacitor Voltage Regulator</td>
</tr>
<tr>
<td>SOI</td>
<td>Silicon On Insulator</td>
</tr>
<tr>
<td>SSL</td>
<td>Slow Switching Frequency Limit</td>
</tr>
<tr>
<td>TIM</td>
<td>Thermal Interface Material</td>
</tr>
<tr>
<td>VRM</td>
<td>Voltage Regulator Module</td>
</tr>
<tr>
<td>Section</td>
<td>Title</td>
</tr>
<tr>
<td>---------</td>
<td>-------</td>
</tr>
<tr>
<td>4.3</td>
<td>Single Bound Hysteretic Control</td>
</tr>
<tr>
<td>4.3.1</td>
<td>Digital Clock Interleaver</td>
</tr>
<tr>
<td>4.3.2</td>
<td>$f_{sw,\text{max}}$ and Loop Latency</td>
</tr>
<tr>
<td>4.4</td>
<td>Second SC Converter Design</td>
</tr>
<tr>
<td>4.4.1</td>
<td>System Overview</td>
</tr>
<tr>
<td>4.5</td>
<td>Second Hardware Results</td>
</tr>
<tr>
<td>4.5.1</td>
<td>Measured Efficiency and Power Density</td>
</tr>
<tr>
<td>4.5.2</td>
<td>Measured Transient Response</td>
</tr>
<tr>
<td>4.6</td>
<td>Summary</td>
</tr>
<tr>
<td>5</td>
<td>Feedforward Control for Reconfigurable SCVR</td>
</tr>
<tr>
<td>5.1</td>
<td>Novel Feedforward Control</td>
</tr>
<tr>
<td>5.1.1</td>
<td>Digital Gear Controller</td>
</tr>
<tr>
<td>5.2</td>
<td>Third SC Converter Design</td>
</tr>
<tr>
<td>5.2.1</td>
<td>System Overview</td>
</tr>
<tr>
<td>5.3</td>
<td>Third Hardware Results</td>
</tr>
<tr>
<td>5.3.1</td>
<td>Thermal Model</td>
</tr>
<tr>
<td>5.3.2</td>
<td>Measured Efficiency and Power Density</td>
</tr>
<tr>
<td>5.3.3</td>
<td>Measured Transient Response</td>
</tr>
<tr>
<td>5.4</td>
<td>Summary</td>
</tr>
<tr>
<td>6</td>
<td>Conclusions</td>
</tr>
<tr>
<td>6.1</td>
<td>State of the Art – Year 2015 Landscape</td>
</tr>
<tr>
<td>6.2</td>
<td>Outlook</td>
</tr>
<tr>
<td>Bibliography</td>
<td></td>
</tr>
</tbody>
</table>
Electricity consumption within information and communication technologies (ICT) now approaches 10% of the world’s total electricity consumption. That is, 1500 TWh annual electricity consumption, and forecasts predict as much as 6000 TWh annual electricity consumption for ICT in 2035 [46]. This tremendous amount of electricity is used to produce, store, transport, process, and display the zettabytes of data produced and used by 1) data centers, 2) wired and wireless communication infrastructure, and 3) end-user devices such as personal computers, smart phones, and digital televisions. At the heart of all of these lies the data processing units like microprocessor cores, caches, graphic processors, I/O circuits and networks, etc. From a power conversion perspective, these data processing units act as electronic loads powered by a point of load (POL) converter, which is the final power conversion stage in the entire power delivery chain. For high-performance multi-core and many-core microprocessors, which are the main target application considered in this thesis, the POL converter typically consists of an external voltage regulator module (VRM).

According to the 2013 international technology roadmap for semiconductors (ITRS) [47], supply voltages for high-performance microprocessors will continue to decrease from around 0.86 V today towards 0.75 V in 2020. However, the predicted power density is expected to remain close to constant, revealing an expected increase in supply current. According to the 2013 ITRS, 50% of the total number of package pins in high-performance microprocessors are utilized as supply pins (power
Figure 1.1: Projections of supply pin allocation over supply current from a survey of 30 years of published microprocessor architectures. Increasing supply currents stemming from the continuation of technology downscaling in combination with more cores added in multi-core and many-core microprocessor systems represent a bottleneck in signal pin availability. (This figure is adapted from [48] with permission from its authors.)

and ground). The percentage is 66.7% for high-volume microprocessors. Hence, only less than half of the total pins are used for signals. Furthermore, the maximum allowable current per pin determines the required number of supply pins based on the microprocessor’s power specification. It is therefore a major packaging challenge to efficiently deliver the high current from the VRM through the resistive and inductive package pins to the microprocessor load.

In Fig. 1.1, the trend of package pin distribution between supply and signal pins as a function of the supply current is shown [48]. It is concluded that if future microprocessors continue to follow historical trends by increasing the supply current, the number of supply pins will constitute an increasingly larger fraction of the total pins, thereby
leaving few pins available for signals. Hence, the projections from the 2013 ITRS discussed above in combination with the trend in Fig. 1.1 highlight a major packaging concern to efficiently power future microprocessor systems.

This thesis treats on-chip voltage regulators that enable power and energy savings for future microprocessor systems. Section 1.1 motivates the use of on-chip voltage regulators for granular power delivery and per-core regulation. The various levels of integration considered are defined in Section 1.2, and suited power converter topologies are presented in Section 1.3. A state of the art overview of on-chip voltage regulators published up until 2010 is presented in Section 1.4. Section 1.5 presents and discusses the targeted converter specifications. The thesis structure is outlined in Section 1.6 along with a list of publications that the remainder of this thesis is based upon.

### 1.1 Granular Microprocessor Power Delivery

On-chip voltage regulators (OCVRs) enable granular power delivery by providing independent voltage domains with various voltage and current specifications. These domains include microprocessor cores, caches, signal I/O’s, memory, graphic processors, etc. either all on the same die or on a separate die within the package [12, 49, 11]. The OCVR generates the desired supply voltage from a higher-than-nominal input voltage, thereby becoming an independent POL converter for each voltage domain.

Using an example of five independent and different voltage domains, Fig. 1.2 illustrates how OCVRs can be implemented to provide granular power delivery. The example voltages and currents shown are representative for a high-performance microprocessor system. The typical granular power delivery is shown in Fig. 1.2(a), where several external VRMs are used to supply each of the independent voltage domains. Each external VRM has different output voltage and current requirements. Around 360 A of current flows through the combined power distribution network (PDN), of which most current is for $V_1$ that supplies the microprocessor cores. These high currents are challenging to supply through the PDN due to 1) wire voltage drops (IR drops) in the
Figure 1.2: Example microprocessor system consisting of five independent voltage domains with different current and voltage specifications: (a) typical power delivery where five external VRMs are designed to supply each independent voltage domain; (b) granular power delivery where a single external VRM supplies five OCVRs, each of which supplies an independent voltage domain.
1.1. Granular Microprocessor Power Delivery

parasitic wiring resistance, 2) \( L\frac{di}{dt} \) supply voltage variations caused by the parasitic wiring inductance, and 3) C4 solder bumps, which are typically limited to around 200 mA per C4. Furthermore, the VRMs typically take up a significant fraction of the total motherboard area.

The target granular power delivery shown in Fig. 1.2(b) utilizes OCVRs to supply each independent voltage domain from a single external VRM, which provides a higher-than-nominal supply voltage. Each OCVR is dimensioned and scaled to match the requirements of its respective voltage domain. With 1.8 V input voltage and the same total dissipated power, only around 210 A of current flows through the PDN, thereby directly reducing the issues with IR losses, \( L\frac{di}{dt} \) supply voltage variations, and limited C4 solder bumps. The design of the external VRM can be simplified when converting down to a fixed and higher voltage instead of a dynamic and lower voltage. Also, the efficiency of the VRM alone can be improved 3% – 4% [50]. Finally, with only a single external VRM, motherboard area used for power delivery can be significantly reduced, allowing for an overall smaller form factor of the entire microprocessor system.

Although OCVRs in Fig. 1.2(b) are shown to supply all voltage domains, other target scenarios using e.g. OCVRs to supply some voltage domains and external VRMs to supply others are also feasible. However, this thesis details OCVR design techniques that can be adapted and applied to supply all voltage domains regardless of current and voltage ratings to fully exploit the benefits of granular power delivery.

1.1.1 Per-Core Regulation

As detailed above, OCVRs can be considered for granular microprocessor power delivery supplying several independent voltage domains with various voltage and current specifications [23, 10, 51]. This subsection uses the voltage domain \( V_1 \) from Fig. 1.2 to highlight the benefits of per-core regulation for multi-core and many-core microprocessor systems. This voltage domain supplies several microprocessor cores, has variable output voltage for dynamic voltage and frequency scaling, and high current requirements. However, other voltage domains supplying several cores can be considered as well.
Dynamic voltage and frequency scaling (DVFS) is a popular technique to dynamically adjust the voltage and clock frequency of a microprocessor core to meet, but not exceed, supply voltage demands [50]. Changing the logic state of a digital circuit electrically translates into charging or discharging the total capacitance $C_{\text{tot}}$ of all logic gates that change state. With a clock frequency $f_{\text{clk}}$, the total power dissipation of the logic circuit, e.g. a microprocessor core, can be estimated using

$$P_{\text{logic}} = C_{\text{tot}}V_{\text{sup}}^2f_{\text{clk}},$$

(1.1)

where $V_{\text{sup}}$ is the supply voltage of the logic circuit. DVFS ensures that the supply voltage and clock frequency follow the demand set by the microprocessor core for a given computational workload. From a power delivery point of view, only $V_{\text{sup}}$ can be regulated to save system power as both $C_{\text{tot}}$ and $f_{\text{clk}}$ are determined by the microprocessor core and the workload. From (1.1), the power dissipation reveals a quadratic dependency with $V_{\text{sup}}$. Hence, a power management scheme is introduced to meet, but not exceed, the varying demands in $V_{\text{sup}}$.

Three power management schemes for an example four core microprocessor are shown in Fig. 1.3. The three example workloads performed by the microprocessor have specific core utilization profiles that dictate the supply voltage requirements for each core. Following a transient load step, the supply voltage typically experiences a droop. Therefore, extra voltage overhead is added on top of the core utilization profile to support the frequency scaling, which is equivalent to a transient load step. However, the voltage overhead directly leads to additional system energy loss. The function of the power management scheme is to reduce the voltage overhead as much as possible while still meeting the supply voltage demands at all times.

Most power management schemes today support the DVFS shown in Fig. 1.3(a), where a single supply voltage is delivered to all cores simultaneously, and the DVFS voltage is adjusted to the voltage requirement of the core having the highest utilization profile within a certain workload. As can be seen for workloads 1 and 3, cores with low utilization profiles experience a large voltage overhead, thereby leading to undesired system energy losses. For workload 2 having uniform core utilization, this power management scheme provides no system energy savings. For per-core DVFS shown in Fig. 1.3(b), the supply voltage
1.1. Granular Microprocessor Power Delivery

Figure 1.3: Power management schemes for multi-core and many-core microprocessor systems. Any voltage overhead translates directly to additional system energy loss.

of each core is independently regulated to match its utilization profile within each workload. This power management scheme significantly reduces the voltage overhead in workload 1 and 3, thereby reducing the additional system energy losses. However, the voltage overhead re-
required to account for transient load steps is still required, and there is again no system energy savings for workload 2. In Fig. 1.3(c), the improved per-core DVFS with reduced voltage overhead is shown as the ultimate goal where overhead voltage for each core is minimized. To be feasible, the power management scheme must provide a solution to reduce the voltage droop following a transient event, thereby allowing for an overall lower supply voltage with per-core DVFS. This attractive power management scheme therefore leads to minimal system energy losses for all workloads.

In Fig. 1.4, a multi-core microprocessor power delivery implementation for a single voltage domain, e.g. $V_1$ from Fig. 1.2, is shown. In the typical power delivery shown in Fig. 1.4(a), the external VRM converts an input voltage of 12 V to a variable output voltage ranging from 0.7 V – 1.1 V to support DVFS. Since the voltage domain requires a high

---

Figure 1.4: Multi-core microprocessor power delivery implementation: (a) DVFS applied to all cores simultaneously; (b) per-core DVFS enabled by the on-chip voltage regulators (OCVR).
1.2. Levels of Integration

We distinguish between three levels of OCVR integration: 3D, 2D, and 2.5D. These levels define to which extend the various converter components are integrated with the load, i.e. onto the microprocessor die. The three levels of integration considered are illustrated in Fig. 1.5. Common for all levels of integration is that no additional external components are implemented. The following three subsections treat each integration level in more detail.

1.2.1 3D Integration

3D integration is illustrated in Fig. 1.5(a). The converter components, i.e. transistors, capacitors, inductors, and control circuits, are external to the microprocessor die. There are several options for the component placement. For instance, the components can be monolithically integrated on a separate die forming a chip stack with the microprocessor die (as shown), embedded into the laminate in the chip shadow area, or

current, the issues described above regarding high currents through the PDN apply. The output voltage is supplied to all cores on the microprocessor die following the DVFS power management scheme in Fig. 1.3(a) having excessive overhead voltages for certain workloads.

Fig. 1.4(b) shows a power delivery implementation having multiple OCVRs on the microprocessor die. This implementation enables per-core DVFS to reduce the voltage overhead of each core and further save system energy as shown in Fig. 1.3(b). Per-core DVFS can, depending on workload, increase the overall system efficiency by up to 21% [50]. This efficiency improvement stems mainly from reduced power losses in the PDN, and it includes the power losses of the added OCVR. Furthermore, if the OCVR can reduce the voltage overhead while still maintaining the minimum voltage required by each core, the per-core DVFS with reduced overhead voltage from Fig. 1.3(c) can be implemented. These attractive benefits motivate for further investigation and exploration of OCVRs to enable per-core DVFS with reduced overhead voltage.
Figure 1.5: The OCVR level of integration depends on the extent to which the converter components are integrated onto the microprocessor die.
placed within the package on the laminate next to the microprocessor in a multi chip module.

3D integration requires two active semiconductor dies, one for the power converter and one for the microprocessor load. However, the semiconductor technology needs not necessarily to be the same, i.e. a 3D integrated converter can be designed using a semiconductor technology that is cheaper or offer different process options than the microprocessor semiconductor technology. Furthermore, passive components can be manufactured using the available process options, e.g. magnetic materials for high-density inductors or trenches for high-density deep trench capacitors.

1.2.2 2D Integration

2D integration is illustrated in Fig. 1.5(b). All converter components, i.e. transistors, capacitors, inductors, and control circuits, are integrated onto the microprocessor die. This is therefore the highest level of integration possible. Compared to 3D integration, 2D integration only requires one active semiconductor die.

2D integrated converters have to be designed in the same semiconductor technology as the microprocessor and they therefore require some part of the microprocessor chip area to be allocated for the components. Furthermore, passive components have to be manufactured using the process options available in the semiconductor technology or additional processing steps that are compatible with the semiconductor technology’s BEOL. As will be justified further in Section 1.5, the work presented in this thesis focuses on 2D integration being the ultimate integration challenge for power electronics converters.

1.2.3 2.5D Integration

2.5D integration is illustrated in Fig. 1.5(c). This level of integration lies in between 3D and 2D, where some components, typically transistors and control circuits, are integrated onto the microprocessor die and other components, typically inductors and capacitors, are external to the microprocessor die. The passives can be either integrated on an
interposer [33] or embedded into the laminate [12]. Hence, 2.5D integration only requires one active semiconductor die as in 2D integration.

For 2.5D integrated converters, the transistors and control have to be designed using the same semiconductor technology as the microprocessor. The interposer die contains passive components only and does therefore not take up additional microprocessor chip area. Equivalently for the passive components integrated into the laminate. For this reason, the microprocessor chip area needed for 2.5D integrated converters is smaller than for 2D integrated converters. Furthermore, the interposer/laminate can be manufactured using process steps not readily available in the microprocessor semiconductor technology, e.g. magnetic materials, trenches, or thicker copper in the redistribution layers (RDLs).

1.3 Converter Topologies

Converter topologies suited for OCVRs are typically split into three main categories: linear regulators, (inductor-based) buck converters, and switched capacitor converters. From these, other topologies, that are derivatives or combinations of the three main categories, can be constructed, e.g. 3-level buck converters [43, 52], merged SC and buck converter topologies [53], resonant SC converters [54, 55, 56], or hybrid SC and linear regulator topologies [57]. However, these and other possible topologies are not considered further in this thesis.

In Fig. 1.6, an overview of the three main converter topologies and their equivalent circuits are shown. Each converter topology and its advantages and disadvantages with respect to granular microprocessor power delivery and per-core regulation are discussed in the following.

1.3.1 Linear Regulators

The linear regulator, which is shown in Fig. 1.6(a), consists of a PMOS transistor controlled by an operational amplifier (Opamp). The Opamp measures the difference between the output voltage and the reference voltage, and produces an error signal that is applied to the PMOS
Figure 1.6: The three main converter topologies suited for on-chip integration: (a) the linear regulator; (b) the buck converter; (c) the switched capacitor (SC) converter. All converter topologies are shown with an output decoupling capacitor and a resistor as load. Equivalent circuit models of each topology are included to identify loss components and regulation capabilities. For the buck converter equivalent circuit model, $D$ represents the duty cycle, and for the SC converter equivalent circuit model, $M$ represents the voltage conversion ratio of the topology.

The error signal is used to regulate the on-state resistance of the PMOS, thereby controlling the voltage drop of the transistor to ensure the desired voltage at the output node. The capacitor decouples the output node. The equivalent circuit model of the linear regulator
therefore is an equivalent output resistance $R_{eq}$ in series between the input supply and the load resistor. The ideal efficiency of the linear regulator is

$$\eta_{\text{ideal,lin}} = \frac{V_{\text{out}}}{V_{\text{in}}}. \quad (1.2)$$

Hence, the efficiency drops linearly with the output voltage owing to the resistive regulation.

Linear regulators benefit from their high power density requiring neither capacitors nor inductors, which typically take up most of the converter area, in their power stages. They are easy to integrate monolithically using components readily available in the semiconductor technology. The main design challenge lies in the design of the Opamp, which needs to be stable and high bandwidth for fast regulation. Linear regulators can therefore be used for fast per-core DVFS and are most prominent and often used for 2D integration [58]. The efficiency in (1.2) is limited by the voltage conversion ratio, thereby making them impractical for higher-than-nominal input voltages. Hence not all benefits of the targeted microprocessor power delivery in Fig. 1.2 can be exploited. For this reason, linear regulators are not treated further in this thesis.

### 1.3.2 Buck Converters

The buck converter, which is shown in Fig. 1.6(b), consists of two transistors operated as switches and an output filter consisting of an inductor and a capacitor. The switches are operated at a high switching frequency, and the duty cycle $D$ determines the fraction of the switching period that the input-referred switch conducts; the ground-referred switch thereafter conducts for $1 - D$ of the switching period. The output voltage of the buck converter is a function of the duty cycle following the well-known expression $V_{\text{out}} = DV_{\text{in}}$ [59]. Hence, regulation of the output voltage can be done by controlling the duty cycle. The equivalent circuit model consists of a dc transformer with the input node on the primary side and the output node on the secondary side. The transformer winding ratio is governed by the duty cycle. The ideal efficiency of the buck converter therefore is

$$\eta_{\text{ideal,buck}} = 1. \quad (1.3)$$
Hence, the efficiency of the buck converter is ideally 100% regardless of the voltage conversion ratio.

Buck converters are, compared to linear regulators, more challenging to integrate monolithically since they include inductors which typically take up the majority of the converter area. The main design challenge is to design and manufacture an integrated inductor with low losses. Typically, the design dimensions are limited by the manufacturing process and there are only a few suited magnetic materials for this level of inductor integration. Furthermore, efficient inductors are typically not readily available in common semiconductor technologies. However, the ideal efficiency of 100% for any output voltage from (1.3) makes buck converters attractive for all levels of integration. Integrated inductors for on-chip buck converters are treated further in Chapter 2.

### 1.3.3 Switched Capacitor Converters

The 2:1 switched capacitor (SC) converter shown in Fig. 1.6(c) consists of four transistors operated as switches and one capacitor. For SC converters, the topology determines the voltage conversion ratio, and other voltage conversion ratios can be implemented by increasing the number of transistors and capacitors [60, 61]. The switches are conducting such that the capacitor is in series between the input and the output in the charging state and in parallel with the output in the discharging state. Typically, the switches are operated at high a switching frequency with 50% duty cycle.\(^1\) The equivalent circuit model of the SC converter consists of a dc transformer, where the winding ratio is governed by the topology specific voltage conversion ratio \(M\), and an equivalent output resistance \(R_{eq}\) in series between the secondary side of the transformer and the load resistor. \(R_{eq}\), which depends on the switching frequency, models the losses associated with charging and discharging the capacitor in the parasitic on-state resistances of the switches. The ideal efficiency of the SC converter therefore is

\[
\eta_{\text{ideal,sc}} = \frac{V_{\text{out}}}{MV_{\text{in}}}. \tag{1.4}
\]

\(^1\)Operating the SC converter at duty cycles below 50% is possible, and the effect on the equivalent output resistance \(R_{eq}\) is similar to frequency modulation [62].
Hence, the efficiency of the SC converter is 100% when $V_{\text{out}} = MV_{\text{in}}$, but it drops linearly with decreasing output voltage for $V_{\text{out}} < MV_{\text{in}}$. The linear decrease in efficiency at decreasing output voltage is similar to the linear regulation in Section 1.3.1, except for the voltage conversion ratio $M$ defined by the converter topology.

The main advantage of the SC converter is that no inductors are required. Therefore, SC converters can be implemented using transistors and capacitors that are readily available in most semiconductor technologies [63, 64]. Although the efficiency in (1.4) drops linearly with the output voltage as for the linear regulator, it does so from the voltage resulting from the conversion ratio, thereby being supportive of higher-than-nominal input voltages. SC converters are typically found suited for 2D and 3D integration. However, 2.5D integration of SC converters might become challenging and complex to implement due to the increased number of components, especially for conversion ratios requiring more switches and capacitors. Chapters 3, 4, and 5 treat the design and implementation of 2D SC converters for granular microprocessor power delivery and per-core regulation.

### 1.4 State of the Art – Year 2010 Landscape

Having motivated OCVRs for granular microprocessor power delivery and per-core regulation, this section gives an overview of the state of the art of integrated power converters.

The state of the art overview as it looked at the end of 2010 (when this project began) is shown in Fig. 1.7. The label next to each point refers to the published OCVR design as given in the references. Furthermore, each point is distinguished by the level of integration and converter topology following the definitions given in Sections 1.2 and 1.3, respectively. In the state of the art overview, designs which may be intended for 2D integration but are presented with an external load are here labeled as 3D integrated converters.

The quoted efficiency versus power density is shown in Fig. 1.7(a). All buck converters, except for design [15], have power densities in the $0.005 \, \text{W/mm}^2 - 0.210 \, \text{W/mm}^2$ range at efficiencies between $52\% - 87\%$. 

— 16 —
1.4. State of the Art – Year 2010 Landscape

(a) Quoted efficiency versus power density

(b) Quoted efficiency versus maximum output power

Figure 1.7: State of the art overview of OCVR designs published up until the end of 2010. In the conclusions in Chapter 6, Fig. 6.1 shows the updated overview, which includes OCVR designs published up until 2015.
Design [15] shows a significantly improved power density of 2.8 W/mm$^2$ at a peak efficiency of 76%. All SC converters, except for design [13], have power densities in the 0.001–0.005 W/mm$^2$ range at efficiencies between 62%–76%. Design [13] shows a significantly improved power density of 2.1 W/mm$^2$ at a peak efficiency of 90%. As observed, there is a clear separation between low power density SC converters and mid power density buck converters. However, designs [15, 13], both published in 2010, show promising efficiency and power density for both SC and buck converters.

The quoted efficiency versus maximum output power is shown in Fig. 1.7(b). The majority of the buck converter designs have more than 100 mW maximum output power, whereas all SC converters have below 10 mW maximum output power. Indeed, when examining the literature, SC converters are often considered suited for low power applications whereas buck converters are often considered suited for high power applications. As observed, these considerations seem appropriate from this state of the art overview.

In Section 6.1, the state of the art overview is updated to include designs published up until the beginning of 2015. The updated overview includes the SC converter designs presented in this thesis, revealing that these designs are among the best performing integrated power converters presented to date. Furthermore, the updated overview severely challenges the conclusion drawn from Fig. 1.7(b) about SC converters being limited to low power converter realizations.

As a closing remark regarding terminology describing the levels of integration, the terms ‘power supply in package’ (PSiP) and ‘power supply on chip’ (PwrSoC) [65, 38, 66, 67] are sometimes used in the literature. Using the definitions from Section 1.2, PSiP corresponds to 3D integration with all components integrated within the package but external to the load and PwrSoC corresponds to 2D integration with all components integrated with the load. Sometimes PwrSoC is considered 2.5D integration as well. Furthermore, the terms ‘on-chip’ [27, 28, 30, 31, 29] and ‘monolithic’ [26, 34, 36, 37, 11] are used for passive components integrated on the chip die for 3D and 2D integration.
1.5 Converter Specifications

Based on the motivation for granular microprocessor power delivery with per-core DVFS given in Section 1.1, the following defines and discusses the main OCVR design specifications targeted in this thesis:

- **High efficiency**
  The 85% – 90% efficiency range is targeted. Implementing an OCVR with low efficiency would severely limit or even negate the power and energy efficiency gains achieved by reduced IR losses in the PDN and reduced overhead voltages using per-core DVFS.

- **High power density**
  For integrated OCVRs, the power density of the converter must be high. Therefore, power densities above 1 W/mm² are targeted.

- **High output power**
  From the 2010 state of the art overview in Fig. 1.7, only design [15] delivers more than 1 W of output power. Although the SC converter in [13] delivers below 10 mW of output power, its high power density makes higher output power SC converter designs feasible. It is therefore a goal to demonstrate above 1 W maximum output power.

- **Wide output voltage range**
  To enable per-core DVFS as shown in Fig. 1.4(b), the output voltage range is 0.7 V – 1.1 V for a fixed 1.8 V input supply.

- **Fast transient response**
  Fast transient response times are required to enable the per-core DVFS that supports various workload supply voltage requirements of individual microprocessor cores as shown in Fig. 1.3(b). Therefore, a response time of the OCVR of 1 ns is targeted.

- **Reduced output voltage overhead**
  As shown in Fig. 1.3(c), reducing the output voltage overhead while still maintaining a certain $V_{\text{out, min}}$ at all times is an additional improvement to per-core DVFS. Therefore, reducing the voltage overhead serves as a design goal.

- **2D integration**
  As shown in Fig. 1.5(b), 2D integration requires the OCVR to
Table 1.1: On-chip voltage regulator specifications.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Symbol</th>
<th>Specification</th>
</tr>
</thead>
<tbody>
<tr>
<td>Efficiency</td>
<td>$\eta$</td>
<td>$&gt; 85%$</td>
</tr>
<tr>
<td>Power density</td>
<td>$\rho$</td>
<td>$&gt; 1 \text{ W/mm}^2$</td>
</tr>
<tr>
<td>Output power</td>
<td>$P_{\text{out}}$</td>
<td>$&gt; 1 \text{ W}$</td>
</tr>
<tr>
<td>Input voltage</td>
<td>$V_{\text{in}}$</td>
<td>$1.8 \text{ V}$</td>
</tr>
<tr>
<td>Output voltage</td>
<td>$V_{\text{out}}$</td>
<td>$0.7 \text{ V} - 1.1 \text{ V}$</td>
</tr>
<tr>
<td>Transient response</td>
<td>$-$</td>
<td>$&lt; 1 \text{ ns}$</td>
</tr>
<tr>
<td>Voltage overhead</td>
<td>$-$</td>
<td>Minimal</td>
</tr>
<tr>
<td>Level of integration</td>
<td>$-$</td>
<td>2D</td>
</tr>
</tbody>
</table>

be implemented onto the microprocessor die, i.e. to be designed using the same semiconductor technology as the microprocessor. The choice of 2D integration for this project is not equivalent to ruling out 3D and 2.5D integration, each of which according to Fig. 1.7 shows promising performance. However, 2D integration is the highest level of integration possible and it therefore resonates well with the ambitious goals of this thesis.

The specifications discussed above are summarized in Tab. 1.1. Note that the choice of converter topology is not specified, since any topology that meets the above specifications should be considered as a viable OCVR candidate for granular microprocessor power delivery with per-core regulation. Also the switching frequency is not specified as it is considered a design parameter to be determined based on modeling and simulation optimization results.

1.5.1 Available Semiconductor Technology

As mentioned in Section 1.2, the on-chip components for 2D and 2.5D integration must be designed using the same semiconductor technology as the microprocessor. Hence, the semiconductor technology and its process options play an important role in choosing the most suited converter topology to implement the OCVR.

The semiconductor technology available in this thesis is a 32 nm SOI CMOS technology. The maximum voltage for any pair of terminals of
the fast thin-oxide transistors should not exceed 1.2 V in this technology. If this voltage is exceeded, an overvoltage situation occurs, which degrades the performance and lifetime of the transistor or, in the worst case, destroys it.

This semiconductor technology features the deep trench capacitor, which is a front end of the line (FEOL) device that has higher capacitance density compared to other on-chip capacitor technologies [68]. The cross section of the deep trench capacitor is shown in Fig. 1.8. The buried oxide (BOX) layer provides the isolation for silicon on insulator (SOI), which decouples the bulk terminal of transistors and reduces the parasitic capacitances, thereby improving the switching times of digital circuits. The deep trench capacitor is constructed from round trenches that go through the BOX layer, which allow for a large plate area between the top and bottom electrodes. Combined with a high-$\kappa$
dielectric material, the deep trench capacitor provides a much larger capacitance density compared to planar, MIM, MOM, or MOSFET based on-chip capacitors. Although only the cross section of the deep trench is shown in Fig. 1.8, the trenches form a two-dimensional array to make up the whole capacitor. The trenches shown are not to scale, since the pitch to length ratio of the trenches is more than 10,000. Each trench provides a capacitance \( C_t \) between the top electrode in the trench and the bottom electrode in the surrounding substrate. Hence, the total capacitance becomes

\[
C = N_t C_t,
\]

where \( N_t \) denotes the number of trenches. Furthermore, the deep trench capacitor includes a parasitic bottom plate capacitor \( C_{bp} \), which stems from the junction capacitance between the P substrate and the \( N^+ \) doping region. The bottom plate capacitor is reduced compared to other on-chip capacitors since the \( N^+ \) doping regions between the trenches are electrically shorted. This results in a smaller plate area for the bottom plate capacitor.

Air core inductors can be formed using the top metal layers of the metal stack. However, this semiconductor technology does not feature additional post-processing steps, such as magnetic material deposition or thickened copper layers, for more flexibility in the inductor design.
1.6 Thesis Outline

Having introduced and motivated OCVRs for future granular microprocessor power delivery with per-core regulation, the remainder of this thesis treats the design, implementation, and experimental evaluation of OCVRs. The thesis is structured according to the learnings achieved as the project progressed. Hence, instead of directly presenting the final results and conclusions, the key learnings of each design step are presented and discussed. It is therefore the hope that the reader can follow the logic of the design decisions being made along the way, ultimately leading to the main results of this thesis.

Chapter 2 presents Pareto optimization procedures of inductors for on-chip buck converters. First, on-chip air core inductors using the top metal layers of the 32 nm semiconductor technology are considered. Results from the Pareto optimization reveal that the limited winding thickness of the top metal layers in the metal stack yields limited efficiencies at relatively low power densities and unattractively high switching frequencies. Second, microfabricated inductors, which are manufactured using additional post-processing steps at the BEOL, are considered. Pareto optimizations of both microfabricated air core inductors and cored racetrack inductors can be designed for high efficiency at suited power densities and switching frequencies. However, since the required post-processing steps are not readily available in the technology, on-chip buck converters are not considered further in this thesis.

Chapter 3 presents the first design considerations and experimental verification of on-chip SC converters. A state space model framework is developed to allow for a Pareto optimization of SC converters. The state space model takes the effect of the parasitic bottom plate capacitor on the steady state converter operation and efficiency into account. Using the high-density deep trench capacitors available in the 32 nm technology, the Pareto optimization results predict very attractive efficiency and power density designs. Measurement results of the first SC converter design achieve 86\% maximum efficiency at 4.6 W/mm² power density for an unregulated 2:1 SC converter. These attractive results motivate for further investigation of on-chip SC converters.

Chapter 4 presents a complete on-chip switched capacitor voltage regulator (SCVR). A 2:1 and 3:2 reconfigurable SC converter supports
the 0.7 V − 1.1 V output voltage range required for DVFS from a 1.8 V input supply. A Pareto optimization using the state space model framework on the reconfigurable SC converter allows for selection of a suited converter design. Using interleaving and a single bound hysteretic control scheme with digital clock interleaver clocked at 4 GHz, the second SC converter design is designed for low output voltage ripple and sub-nanosecond response time to transient load steps. Measurement results of the second design achieve 90% maximum efficiency at 3.7 W/mm² power density. Due to the 16-phase interleaving scheme employed, a peak-peak output voltage ripple of 30 mV is achieved without dedicated output decoupling capacitors. Furthermore, the transient response to a rapid load step of the on-chip programmable load is below 1 ns. Although the sub-nanosecond response time is verified, the output voltage experiences a droop, which is caused by a much larger droop on the input node of the converter.

Chapter 5 presents a novel feedforward control for reconfigurable SC converters. The feedforward control dynamically changes the configuration to a higher voltage conversion ratio when an input voltage droop is detected, thereby mitigating the output voltage droop during the transient event. Measurement results of the third SC converter design verify the feedforward control, and the overhead voltage to maintain $V_{\text{out, min}}$ required by the microprocessor is reduced from 90 mV to 30 mV. Furthermore, the third design delivers 10 W maximum output power at 85% efficiency and 5 W/mm² power density, thereby proving the feasibility of high-power on-chip SCVRs.

Chapter 6 concludes the thesis. The 2010 state of the art overview from Fig. 1.7 is updated with OCVR designs published up until the beginning of 2015. The experimental results achieved in this thesis place themselves among the highest efficiency and highest power density designs presented to date. The 10 W maximum output power busts the general misconception that SC converters are only suited for low-power applications.
1.6. Thesis Outline

1.6.1 List of Publications

This thesis is based on the following conference and transaction publications:


Inductors for On-Chip Buck Converters

BUCK CONVERTERS are typically used in external VRMs for microprocessor power delivery. Therefore, investigating the buck converter for on-chip integration is a natural first choice. The main challenge when integrating buck converters is to design and manufacture efficient on-chip inductors using much less volume than discretely built inductors. Therefore, the focus of this chapter is on the design of on-chip inductors, thereby being able to investigate and evaluate on-chip buck converters as a potential OCVR candidate for granular microprocessor power delivery with per-core regulation.

The basic buck converter and its operating modes are summarized in Section 2.1. In Section 2.2, the air core spiral inductor is investigated. A Pareto optimization procedure is developed for a) on-chip spiral inductor using the design rules of the top metal layers in the 32 nm semiconductor technology and b) post-processed (microfabricated) spiral inductors with relaxed design rule limitations. For a given buck converter specification, the Pareto fronts identify the inductor designs with the highest possible efficiency for a given power density within the design space of the geometrical parameters. In Section 2.3, microfabricated racetrack inductors using magnetic materials are investigated. The microfabricated racetrack inductor is modeled analytically and a Pareto optimization is performed.

This chapter is based on the publications [1] and [2].
Chapter 2. Inductors for On-Chip Buck Converters

2.1 Buck Converter

In the following, design and operation of the buck converter introduced in Section 1.3.2 are summarized. Furthermore, considerations for on-chip buck converter implementations are discussed.

The classical buck converter is depicted in Fig. 2.1. The output voltage is

\[ V_{\text{out}} = D V_{\text{in}}, \]  

(2.1)

where \( D \) is the duty cycle and \( V_{\text{in}} \) the input voltage. The inductance \( L \) for a given peak-peak inductor current ripple \( \Delta I_{\text{Lpp}} \) is

\[ L = V_{\text{out}} \frac{1 - D}{f_{\text{sw}} \Delta I_{\text{Lpp}}}, \]  

(2.2)

where \( f_{\text{sw}} \) is the switching frequency.

We define the peak to average ratio \( PAR \) of the inductor current as

\[ PAR = \frac{I_{\text{Lp}}}{I_{\text{out}}} = 1 + \frac{\Delta I_{\text{Lpp}}}{2I_{\text{out}}}, \]  

(2.3)

where \( I_{\text{Lp}} \) is the peak inductor current and \( I_{\text{out}} \) is the dc output current. It is apparent that for \( 1 < PAR < 2 \), the buck converter operates in the continuous conduction mode with solely positive inductor currents.

![Figure 2.1: Buck converter with two transistors \( Q_1 \) and \( Q_2 \) controlled by the duty cycle \( D \). The output filter consists of the inductor \( L \) and the output capacitor \( C_{\text{out}} \), which is in parallel with the load.](image)
\( i_L(t) > 0 \), for \( PAR > 2 \) in the continuous conduction mode with positive and negative inductor currents. Due to the bi-directional synchronous transistor \( Q_2 \), discontinuous conduction mode that clamps the inductor current to zero does not occur. For \( PAR = 2 \), the converter is operated in the boundary conduction mode (BCM) with \( i_L(t) \geq 0 \).

It follows from (2.2) and (2.3) that the inductance \( L \) is

\[
L = \frac{V_{\text{in}} - V_{\text{out}}}{2f_{\text{sw}}I_{\text{out}}(PAR - 1)} \frac{V_{\text{out}}}{V_{\text{in}}}. \tag{2.4}
\]

Hence for a given inductance and fixed \( V_{\text{in}} \), \( V_{\text{out}} \), and \( I_{\text{out}} \), the specified \( PAR \) can be used to determine the switching frequency of the converter. Alternatively, the switching frequency can be specified and the \( PAR \) can be determined, or the required inductance can be determined when specifying both \( f_{\text{sw}} \) and \( PAR \).

The inductor current at the \( k \)'th switching frequency harmonic assuming triangular current waveform is

\[
I_k = \Delta I_{pp} \frac{\sin(D\pi k)}{(\pi k)^2 D(1 - D)}. \tag{2.5}
\]

### 2.1.1 On-Chip Implementation Considerations

The on-chip inductance that can be achieved is typically low compared to discrete inductors [69, 70, 71]. Using (2.4), when fixing the operating conditions, i.e. fixing \( V_{\text{in}} \), \( V_{\text{out}} \), \( I_{\text{out}} \), and \( PAR \), a low inductance is achieved by operating at a high switching frequency. All buck converters from the 2010 state of the art overview in Fig. 1.7 are operated between 1 MHz – 200 MHz switching frequency. The on-chip inductor has many design freedoms, e.g. geometry, size, operating modes, magnetic materials, etc. It is therefore the buck converter component with the most optimization potential, and it is the main focus of the remainder of this chapter.

Before conveying a deeper investigation of on-chip inductors, the following list highlights additional on-chip buck converter design considerations that are not treated further in this thesis:

- Multiphase buck converters, which can be implemented to reduce the output voltage ripple. The required output capacitance is
simply determined from voltage ripple requirements, and it may be reduced using a multiphase design [9, 42, 72]. The deep trench capacitors available in the 32 nm technology provide a high capacitance density that is very suited for on-chip decoupling.

Coupled inductors for multiphase buck converters can also be considered. Coupled inductors lead to a reduced inductor current ripple, which can be used to a) reduce the switching frequency, b) improve the transient response, or c) a combination of the two [73, 33, 24].

To design an on-chip buck converter in the 32 nm semiconductor technology, the transistors $Q_1$ and $Q_2$ from Fig. 2.1 need to block the full 1.8 V input voltage. This input voltage is higher than the maximum allowable voltage $V_{\text{max}} = 1.2$ V for each device, so stacking of two transistors is required to overcome this issue since the transistors share the input voltage among them. Alternatively, multilevel buck converters can be considered. For a multilevel buck converter, each transistor only blocks the output voltage, but more capacitors for the intermediate voltage levels are required [52, 43].

The optimum transistor size is usually determined as a trade-off between on-state resistance and gate charge, representing a certain distribution of conduction and switching losses, respectively, that depend on the chosen operating mode of the converter [9].

Control schemes that are suited for on-chip buck converter implementations and which provide fast output voltage and current regulation capabilities.

### 2.2 Air Core Spiral Inductors

The air core spiral inductor is one of the simplest inductor geometries, and it is therefore a good starting point for an investigation of on-chip inductors. Requiring no magnetic materials, the air core spiral inductor can be directly implemented using the top metal layers of the metal stack of the 32 nm semiconductor technology. If feasible, additional post-processing steps could be adapted in the technology to
2.2. Air Core Spiral Inductors

Air core spiral inductor

Figure 2.2: Air core spiral inductor intended for on-chip buck converters. Either an on-chip implementation using design rules of the 32 nm technology or an implementation using additional post-processing steps are considered. The windings are modeled as concentric circles and the geometrical parameters shown define the geometry of the inductor.

From [75], the inductance of an air core spiral inductor is approximated as

\[
L_{\text{spiral}} \approx \frac{\mu_0 N^2 d_{\text{avg}}}{2} \left[ \ln \left( \frac{c_1}{k} \right) + c_2 k^2 \right],
\]

(2.6)
Table 2.1: Geometrical parameters for the air core spiral inductor.

<table>
<thead>
<tr>
<th>Geometrical parameter</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>$N$</td>
<td>Number of windings</td>
</tr>
<tr>
<td>$d_i$</td>
<td>Inner diameter</td>
</tr>
<tr>
<td>$d_o$</td>
<td>Outer diameter</td>
</tr>
<tr>
<td>$t_w$</td>
<td>Winding width</td>
</tr>
<tr>
<td>$t_h$</td>
<td>Winding height</td>
</tr>
<tr>
<td>$t_s$</td>
<td>Space between windings</td>
</tr>
</tbody>
</table>

where

$$d_{\text{avg}} = \frac{d_o' + d_i'}{2}, \quad k = \frac{d_o' - d_i'}{d_o'' + d_i''}, \quad c_1 = 2.46, \quad c_2 = 0.20,$$

$$d_i' = \max(0, \, d_i - \frac{t_w + t_s}{2}), \quad d_o' = d_o + \frac{t_w + t_s}{2}.$$

The dc resistance of the copper windings is estimated as

$$R_{\text{dc,spiral}} = \sum_{j=1}^{N} \frac{2\pi \rho}{t_h \ln \left( \frac{r_{o,j}}{r_{i,j}} \right)},$$

(2.7)

where

$$r_{i,j} = \frac{1}{2}d_i + (j-1)(t_w + t_s), \quad r_{o,j} = r_{i,j} + t_w,$$

and where $\rho = 0.0172 \cdot 10^{-6} \Omega \text{m}$ is the resistivity of copper. The ac resistance of the spiral inductor at the switching frequency and its harmonics is obtained from finite element method (FEM) simulations.

The area of the spiral inductor is

$$A_{L,\text{spiral}} = \pi \left[ \frac{1}{2}d_i + N(t_s + t_w) - t_s \right]^2.$$  

(2.8)

### 2.2.1 Pareto Optimization Procedure

Using the above model, a Pareto optimization is performed to identify the air core spiral inductor designs that achieve the highest efficiency at
a specific power density. Initially, the design space is defined by selecting a minimum and maximum value of each geometrical parameter from Tab. 2.1. Additionally, an incremental step size for each geometrical parameter can be set. For the air core spiral inductor, a generic FEM simulation that inputs the geometrical parameters is setup in order to ensure accurate estimations of dc and ac model components. For the air core spiral inductors considered here, the computational efforts required by the FEM simulator to evaluate all inductors in the design space are manageable.

The Pareto optimization procedure is illustrated by the flowchart shown in Fig. 2.3. The inputs of the procedure are the converter operating conditions: \( V_{\text{in}}, V_{\text{out}}, I_{\text{out}}, \) and \( PAR \) from (2.3). The first set \( x_1 \) of geometrical parameters from the design space \( X \) is loaded into the generic air core spiral inductor FEM simulation and a dc simulation is run to extract the dc inductance \( L_{\text{dc},i} \) and the dc resistance \( R_{\text{dc},i} \). The inductor area \( A_{L,i} \) is also determined using (2.8). The switching frequency \( f_{\text{sw},i} \) is calculated based on \( L_{\text{dc},i} \) and the converter operating conditions using (2.4). Another FEM simulation is run at the switching frequency and each of the \( k \) harmonics considered, and an ac resistance \( R_{\text{ac},ik} \) for each harmonic is extracted. From the operating conditions, the amplitude of the triangular inductor current \( I_k \) at the \( k \)'th harmonic can be determined using (2.5). The total inductor power loss \( P_{\text{loss},i} \) can then be estimated as

\[
P_{\text{loss},i} = R_{\text{dc},i}I_{\text{out}}^2 + \sum_{k} R_{\text{ac},ik} \frac{I_k^2}{2}, \tag{2.9}
\]

Finally, the efficiency \( \eta_i \) and power density \( \rho_i \) for the \( i \)'th geometrical parameter set are calculated as

\[
\eta_i = \frac{P_{\text{out}}}{P_{\text{out}} + P_{\text{loss},i}}, \tag{2.10}
\]
\[
\rho_i = \frac{P_{\text{out}}}{A_{L,i}}. \tag{2.11}
\]

The entire inductor optimization procedure is complete when all designs in the design space have been processed. Thereafter, the highest efficiency inductor given a power density is found by searching the data,
Air core spiral inductor Pareto optimization procedure

Set operating conditions of the buck converter
\[ O = \{ V_{in}, V_{out}, I_{out}, PAR \} \]

Define design space \( X = \{ x_1, x_2, ..., x_m \} \), where
\[ x_i = [ N_i, d_{i,i}, t_{w,i}, t_{h,i}, t_{s,i}, c_{i,i} ]^T \]

Load \( x_i \) into generic air core spiral inductor FEM simulation

Run dc simulation
\[ Y_{dc,i} = \{ R_{dc,i}, L_{dc,i}, A_{L,i} \} \]

Determine \( f_{sw,i} \) from \( L_{dc,i} \) and \( O \)

Run ac simulations at \( f_{sw,i}, 2f_{sw,i}, \ldots, nf_{sw,i} \)
\[ Y_{ac,i} = \{ R_{ac,i1}, R_{ac,i2}, \ldots, R_{ac,in} \} \]

Determine \( \eta_i \) and \( \rho_i \) from \( O, Y_{dc,i} \), and \( Y_{ac,i} \)

Is \( i = m? \)

No \( \rightarrow i = i + 1 \)

Yes \( \rightarrow \) Generate Pareto front

Figure 2.3: Flowchart of the air core spiral inductor Pareto optimization procedure. The procedure uses a generic air core spiral inductor FEM simulation for accurate dc and ac component extractions.
Table 2.2: Buck converter specifications and design space for the post-processed and on-chip air core spiral inductor case study.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Post-processed</th>
<th>On-chip</th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_{in}$</td>
<td>1.6 V</td>
<td>1.6 V</td>
</tr>
<tr>
<td>$V_{out}$</td>
<td>0.8 V</td>
<td>0.8 V</td>
</tr>
<tr>
<td>$I_{out}$</td>
<td>500 mA</td>
<td>50 mA</td>
</tr>
<tr>
<td>$PAR$</td>
<td>2 (BCM)</td>
<td>2 (BCM)</td>
</tr>
<tr>
<td>$N$</td>
<td>1\ldots10</td>
<td>1\ldots35</td>
</tr>
<tr>
<td>$d_i$</td>
<td>40 \mu m \ldots 200 \mu m</td>
<td>20 \mu m \ldots 400 \mu m</td>
</tr>
<tr>
<td>$t_w$</td>
<td>10 \mu m \ldots 100 \mu m</td>
<td>3 \mu m \ldots 50 \mu m</td>
</tr>
<tr>
<td>$t_s$</td>
<td>10 \mu m \ldots 100 \mu m</td>
<td>1.8 \mu m \ldots 5 \mu m</td>
</tr>
<tr>
<td>$t_h$</td>
<td>10 \mu m \ldots 100 \mu m</td>
<td>3 \mu m</td>
</tr>
</tbody>
</table>

and the Pareto front is generated. The geometrical parameters for the inductors forming the Pareto front can then be identified for practical realizations and experimental verification.

The thermal limitations are not considered in the Pareto optimization procedure from Fig. 2.3. As will be discussed in Section 3.2.4, a simple thermal model with a constant temperature of 85\degree is considered, since the die temperature is dictated by the microprocessor core and not the on-chip power converter. Adding thermal properties to the Pareto optimization is expected to limit the maximum achievable power density somewhat.

### 2.2.2 Case Study

A case study of the air core spiral inductor Pareto optimization procedure presented above is carried out with the buck converter specifications and geometrical parameter design space listed in Table 2.2. For the on-chip inductor, the geometrical parameter values are restricted to the design rules of the 32 nm semiconductor technology. Hence, the winding height $t_h = 3 \mu m$ for the on-chip inductor is a fixed value that cannot be altered. For the post-processed implementation, the geometrical parameter values are relaxed compared to the on-chip case.
However, the wire width and spacing are restricted to be greater than or equal to the wire thickness for practical reasons. In both cases, the values used are considered to be representative for practical realizations.

The resulting Pareto fronts are shown in Fig. 2.4, where the dark, medium, and light blue domains represent three (arbitrarily chosen) switching frequency limits. The difference in switching frequency for the on-chip spiral inductor is necessary to include designs that achieve an acceptable efficiency and power density performance. The post-processed inductors in Fig. 2.4(a) are seen to result in > 90% efficiency designs at > 1 W/mm² power densities and at reasonable switching frequencies. However, the on-chip inductors in Fig. 2.4(b) do not achieve comparable efficiencies. Only for very high switching frequencies does the on-chip implementation achieve acceptable efficiencies. However, such high switching frequencies are typically undesired due to increased transistor switching losses, which are not considered in this case study.

As an example, two optimized inductor designs are selected based on a predefined efficiency of $\eta = 95\%$ for the post-processed inductor and $\eta = 90\%$ for the on-chip inductor, since the on-chip inductor does not reach 95% efficiency for this design space. These two inductor designs are marked with circles in Fig. 2.4, and the parameters are listed in Tab. 2.3. The main reason that the on-chip air core inductor performs less than the post-processed air core inductor is the $t_h = 3\,\mu m$ winding height limitation that leads to a high dc resistance of the winding, thereby compromising the efficiency. It is therefore concluded that on-chip air core inductors are unsuited for this application due to limitations in the 32 nm semiconductor technology. Instead, the post-processed inductor, which does not have the strict winding height limitation, is seen to result in attractive performances. Accordingly, post-processed inductors are found suited for the application, assuming the additional post-processing manufacturing steps are feasible and reliable.

\textsuperscript{1}The inward bends, which are marked with dotted lines, at high power densities are unexpected since a vertical limit is expected when no thermal considerations are included. However, the inward bend represents the inductor designs resulting from the design space, and the missing vertical limit is due to the limited resolution of the parameter values in the design space.
Figure 2.4: Resulting Pareto fronts for the air core spiral inductor case study. The shadings correspond to maximum switching frequency limits of the buck converter.
Table 2.3: Geometry parameters and simulation results for the selected inductors found using the air core inductor optimization procedure.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Post-processed</th>
<th>On-chip</th>
</tr>
</thead>
<tbody>
<tr>
<td>(N)</td>
<td>3</td>
<td>6</td>
</tr>
<tr>
<td>(d_i)</td>
<td>120 (\mu)m</td>
<td>100 (\mu)m</td>
</tr>
<tr>
<td>(d_o)</td>
<td>508 (\mu)m</td>
<td>478 (\mu)m</td>
</tr>
<tr>
<td>(t_w)</td>
<td>46 (\mu)m</td>
<td>30 (\mu)m</td>
</tr>
<tr>
<td>(t_h)</td>
<td>28 (\mu)m</td>
<td>3 (\mu)m</td>
</tr>
<tr>
<td>(t_s)</td>
<td>28 (\mu)m</td>
<td>1.8 (\mu)m</td>
</tr>
<tr>
<td>(L)</td>
<td>2.3 nH</td>
<td>9.1 nH</td>
</tr>
<tr>
<td>(f_{sw})</td>
<td>170 MHz</td>
<td>410 MHz</td>
</tr>
<tr>
<td>(R_{dc})</td>
<td>40 m(\Omega)</td>
<td>1.0 (\Omega)</td>
</tr>
<tr>
<td>(R_{ac} @ f_{sw})</td>
<td>132 m(\Omega)</td>
<td>1.9 (\Omega)</td>
</tr>
<tr>
<td>(\eta)</td>
<td>94.5%</td>
<td>89.6%</td>
</tr>
<tr>
<td>(\alpha)</td>
<td>1.97 W/mm(^2)</td>
<td>0.22 W/mm(^2)</td>
</tr>
</tbody>
</table>

2.3 Microfabricated Racetrack Inductors

This section treats microfabricated racetrack inductors having a magnetic core material to boost the inductance over an air core design. The increased inductance can be used to reduce the switching frequency for the same buck converter operating conditions. Only post-processed microfabricated racetrack inductors are considered since the required manufacturing steps to deposit magnetic material are not available in the BEOL of the 32 nm SOI CMOS technology. Still, the cored racetrack inductors can, as the air core spiral inductor, be considered for all the levels of integration defined in Section 1.2.

The cross section of the cored racetrack inductor is depicted in Fig. 2.5, and the geometrical parameters that define the design are listed in Tab. 2.4. The racetrack inductor is principally an elongated spiral where the straight parts of the winding are suited for thin-film permalloy magnetic material deposition with little anisotropic effects.
2.3. Microfabricated Racetrack Inductors

**Figure 2.5:** Cross-sectional view of the cored racetrack inductor with coreless half-spiral end turns.

**Table 2.4:** Geometrical parameters for the cored racetrack inductor.

<table>
<thead>
<tr>
<th>Geometrical parameter</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>$N$</td>
<td>Number of turns</td>
</tr>
<tr>
<td>$t_w$</td>
<td>Winding width</td>
</tr>
<tr>
<td>$t_t$</td>
<td>Winding thickness</td>
</tr>
<tr>
<td>$t_s$</td>
<td>Winding spacing</td>
</tr>
<tr>
<td>$c_w$</td>
<td>Core width</td>
</tr>
<tr>
<td>$c_t$</td>
<td>Core thickness</td>
</tr>
<tr>
<td>$c_l$</td>
<td>Core length</td>
</tr>
<tr>
<td>$d_h$</td>
<td>Device height</td>
</tr>
<tr>
<td>$d_w$</td>
<td>Device width</td>
</tr>
<tr>
<td>$d_l$</td>
<td>Device length</td>
</tr>
</tbody>
</table>

The following derives a model of cored racetrack inductors to be used in a Pareto optimization procedure that determines the inductor design with the highest efficiency at a specific power density. As opposed to the air core spiral inductor Pareto optimization procedure discussed above, the cored racetrack model is analytical. The computational effort to perform a large number of FEM simulations that include magnetic materials is impractically high compared to evaluating an analytical
model. Hence, the model presented below can aid the design engineer to relatively quickly find the optimal geometrical parameters, and then implement this model in a FEM simulator for more accurate performance evaluations and further parameter fine adjustments.

### 2.3.1 Inductance Estimation

The inductance of the cored racetrack inductor is estimated by partitioning it into three parts: the core, the winding parts covered by the core, and the coreless end turns. From Fig. 2.5, the cross sectional area of the core is \( A_c = c_t c_l \) and the magnetic path length is \( l_m = 2(c_w + d_h) \), hence the inductance contribution from the two cores from Fig. 2.5 can be estimated as

\[
L_{\text{core}} = 2\mu_0\mu_c N^2 \frac{A_c}{l_m} = \frac{\mu_0\mu_c N^2 c_t c_l}{c_w + d_h},
\]  

(2.12)

where \( \mu_0 = 4\pi 10^{-7} \text{ H/m} \) is the permeability of free air and \( \mu_c \) is the relative permeability of the core material.

The winding inductance contribution of the cored part is estimated considering the self inductance of each wire and the mutual inductances of the adjacent wires. The self inductance (in \( \mu \text{H} \)) of a straight wire with rectangular cross section using [76] is

\[
L_{t, \text{self}} \approx 0.2c_l \left[ \ln \left( \frac{2c_l}{t_w + t_s} \right) + \frac{1}{2} \right]
\]  

(2.13)

and the total mutual inductance (in \( \mu \text{H} \)) for \( N > 1 \) between the adjacent straight wires in the cored part is

\[
L_{t, \text{mutual}} \approx \sum_{i=1}^{N-1} \sum_{j=i+1}^{N} 0.2c_l \left[ \ln \left( \frac{2c_l}{(j-i)(t_w + t_s)} \right) - 1 
\right.
\]

\[
\left. + \frac{(j-i)(t_w + t_s)}{c_l} - \left( \frac{(j-i)(t_w + t_s)}{2c_l} \right)^2 \right].
\]  

(2.14)

The above two equations neglect tabulated correction terms that were found to have negligible influence on the calculated inductances. The winding inductance of both cored parts therefore is

\[
L_{t, \text{core}} = 2 \left( NL_{t, \text{self}} + L_{t, \text{mutual}} \right).
\]  

(2.15)
2.3. Microfabricated Racetrack Inductors

The inductance contributions of the two coreless end windings are assumed to equal the inductance of an air core spiral inductor similarly to those discussed in Section 2.2:

\[ L_{t,\text{spiral}} \approx \frac{u_0 N^2 d_{\text{avg}}}{2} \left[ \ln \left( \frac{2.46}{p} \right) + 0.2p^2 \right], \quad (2.16) \]

where \( d_{\text{avg}} = \frac{(d_o + d_i)}{2} \) is the average diameter, \( p = \frac{(d_o - d_i)}{(d_o + d_i)} \) is the fill ratio, and the empirical constants stem from curve fitting of measured circular planar spiral inductors [75]. The outer diameter \( d_o \) and the inner diameter \( d_i \) are derived using Fig. 2.5.

Combining all inductance contributions, the inductance of the cored racetrack inductor is

\[ L = L_{\text{core}} + L_{t,\text{core}} + L_{t,\text{spiral}}. \quad (2.17) \]

2.3.2 Copper Loss Analysis

Assuming the length of each winding to be \( 2c_l \) plus the circumference of the \( n \)'th winding circle that accounts for the end turns, the dc winding resistance can be estimated as

\[ R_{\text{dc}} = \frac{\rho_t}{t_w t_t} \left( 2Nc_l + 2\pi \sum_{n=1}^{N} r_n \right), \quad (2.18) \]

where \( r_n = \frac{d_o}{2} - n(t_w + t_s) \) is the radius of the \( n \)'th end winding circle and \( \rho_t \) is the resistivity of the winding material.

The Dowell analysis [77] for ac resistance factor calculations utilizes a one-dimensional modeling approach assuming horizontal field direction in the winding window. Although this assumption may yield limited accuracy for microfabricated inductors [78, 79], it is included here to indicate the effect of switching frequency on copper losses. Assuming the effective number of layers for the Dowell analysis is \( h = 0.5 \) as in [78], the ac resistance factor at the \( k \)'th switching frequency harmonic becomes

\[ F_k = \theta_k \left[ \frac{\sinh(2\theta_k) + \sin(2\theta_k)}{\cosh(2\theta_k) - \cos(2\theta_k)} + \frac{2(h^2 - 1)}{3} \frac{\sinh(\theta_k) - \sin(\theta_k)}{\cosh(\theta_k) + \cos(\theta_k)} \right], \quad (2.19) \]
where \( \theta_k = t_t / \delta_k = t_t / \sqrt{\rho_t / (\mu_0 \mu_t \pi k f_s)} \) is the winding thickness to skin depth ratio at the \( k' \)th switching frequency harmonic with \( \mu_t \) being the relative permeability of copper. Hence, the ac winding resistance at the \( k' \)th switching frequency harmonic is

\[
R_{ac,k} \approx F_k R_{dc}. \tag{2.20}
\]

Assuming triangular inductor current waveform, the total copper losses using (2.5), (2.18), and (2.20) therefore are

\[
P_t = R_{dc} I_{out}^2 + \sum_{k=1}^{k_{max}} R_{ac,k} I_k^2 / 2, \tag{2.21}
\]

where \( k_{max} = 20 \) is the maximum switching frequency harmonic considered.

### 2.3.3 Core Loss Analysis

The following core loss analysis is based on the assumption that the magnetic field, and thereby the flux density, is constant throughout the entire core material. The dc magnetic field and flux density are

\[
H_{dc} = \frac{NI_{out}}{2(c_w + d_h)}, \tag{2.22}
\]

\[
B_{dc} = \mu_0 \mu_c H_{dc} = \frac{\mu_0 \mu_c NI_{out}}{2(c_w + d_h)}. \tag{2.23}
\]

To derive analytical calculations of core losses is a challenging task because of the non-linear loss mechanisms in magnetic materials [80, 81]. For that reason, core losses are most commonly determined using an empirical curve fit to measured core loss data known as the Steinmetz equation [59]

\[
P_{\text{steinmetz}} = K_f f_{sw}^\alpha \left( \frac{\Delta B_{pp}}{2} \right)^\beta I_c, \tag{2.24}
\]

where \( K, \alpha, \) and \( \beta \) are the material dependent Steinmetz parameters and the peak-peak flux density is \( \Delta B_{pp} = B_{dc} \Delta I_{pp}/I_{dc} \).

If the Steinmetz parameters are unknown, the core losses can be determined by considering hysteresis losses and induced eddy current.
2.3.2 Microfabricated Racetrack Inductors

losses individually. The hysteresis losses, which are due to the hysteretic change in flux density versus magnetic field over a switching period, are approximately proportional to switching frequency and can be expressed as [59]

\[ P_h = K_h f_{sw} \left( \frac{\Delta B_{pp}}{2} \right)^b V_c, \]  

(2.25)

where \( K_h \) and \( b \) are material dependent parameters.

The proximity effect of the generated magnetic field gives rise to induced eddy currents in the core material. To estimate the eddy current losses, the core is considered to be composed of four bus bars of equal thickness: one bus bar for each top and bottom section, and one bus bar for each side wall section. The magnetic field inside each bus bar is assumed to be homogeneous and therefore the expression for proximity losses in a bus bar [82] can be applied to determine the eddy current losses in the two cores:

\[ P_e = 2 \rho c_2 (c_w + d_h) c_t \sum_{k=1}^{k_{\text{max}}} \nu_k \frac{\sinh(\nu_k) - \sin(\nu_k)}{\cosh(\nu_k) + \cos(\nu_k)} H_k^2, \]  

(2.26)

where \( \nu_k = c_t / \delta_{c,k} = c_t / \sqrt{\rho c / (\mu_0 \mu_c \pi f)} \) is the core thickness to skin depth ratio at the \( k \)'th switching frequency harmonic with \( \rho c \) being the resistivity of the core material. The amplitude of the magnetic field at the \( k \)'th switching frequency harmonic is \( H_k = H_{dc} I_k / I_{dc} \).

Combining all loss contributions, the efficiency of the cored racetrack inductor becomes

\[ \eta = \frac{P_{out}}{P_{out} + P_{loss}} = \frac{P_{out}}{P_{out} + P_t + P_h + P_e}, \]  

(2.27)

with \( P_{out} = V_{out} I_{out} \), and the power density is

\[ \rho = \frac{P_{out}}{A}, \]  

(2.28)

where \( A = d_ld_w \) is the area of the cored racetrack inductor.

2.3.4 Model Verification

To verify the cored racetrack inductor model derived above, the calculated results are compared with both FEM simulations and reported
results of three different cored racetrack inductor designs. Manufacturing steps and further details of the microfabricated inductors can be found in [70, 38, 83].

Tab. 2.5 lists the buck converter specifications and geometrical parameters with which each inductor has been designed, and it lists the calculated, simulated, and reported results. As can be seen, the analytical inductance calculations from (2.17) fit the simulated and reported values well. The calculated dc resistances using (2.18) fit the simulated values well, however, the reported dc resistances for inductor 2 and 3 deviate slightly from the calculations and simulations. The ac winding resistance calculations from (2.20) are seen to deviate from the simulated resistances since the 1D field approximation inside the winding window yields limited accuracy. The dc copper loss dominates over the ac copper loss in all three inductors considered and thus the deviation in $R_{ac}$ has minor effect on the overall efficiency estimation. Inspection of the 3D FEM simulation results shows that the current density in the core resembles the current density in a bus bar as assumed in (2.26). Thus, the calculated eddy current losses in (2.26) are matching the simulated and reported values well.

2.3.5 Pareto Optimization Procedure

A Pareto optimization procedure of cored racetrack inductors is developed in a similar fashion as for the air core spiral inductor Pareto optimization discussed in Section 2.2.1. Based on the buck converter operating conditions, the optimization procedure outputs the Pareto front for the cored racetrack inductor using the analytical model derived above.

A flowchart that describes the cored racetrack Pareto optimization procedure is shown in Fig. 2.6. The inputs of the optimization procedure are the buck converter specifications and the design space $X$ containing $m$ cored racetrack inductors. Each set $x_i \in X$, where $i = \{1, 2, \ldots, m\}$, contains the geometrical parameters of the $i$th racetrack inductor design defined in Tab. 2.4. The inductance as well as dc and ac losses are computed using the analytical model derived above. Efficiency and power density for each set are determined using (2.27) and (2.28), respectively, and the Pareto front is plotted when all sets in
### Table 2.5: Comparison of calculated, simulated, and reported results for three microfabricated cored racetrack inductors.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Inductor 1</th>
<th>Inductor 2</th>
<th>Inductor 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_{in}$ [V]</td>
<td>1.8</td>
<td>3.0</td>
<td>3.6</td>
</tr>
<tr>
<td>$V_{out}$ [V]</td>
<td>1.12</td>
<td>1.5</td>
<td>1.2</td>
</tr>
<tr>
<td>$I_{out}$ [mA]</td>
<td>70</td>
<td>125</td>
<td>250</td>
</tr>
<tr>
<td>$P_{out}$ [mW]</td>
<td>78.4</td>
<td>188</td>
<td>300</td>
</tr>
<tr>
<td>$PAR$</td>
<td>1.9</td>
<td>1.6</td>
<td>1.3</td>
</tr>
<tr>
<td>$N$</td>
<td>5</td>
<td>5</td>
<td>7</td>
</tr>
<tr>
<td>$t_w$ [µm]</td>
<td>80</td>
<td>52</td>
<td>50</td>
</tr>
<tr>
<td>$t_t$ [µm]</td>
<td>50</td>
<td>28</td>
<td>50</td>
</tr>
<tr>
<td>$t_s$ [µm]</td>
<td>50</td>
<td>10</td>
<td>50</td>
</tr>
<tr>
<td>$c_w$ [µm]</td>
<td>750</td>
<td>335</td>
<td>850</td>
</tr>
<tr>
<td>$c_t$ [µm]</td>
<td>4.2</td>
<td>4.2</td>
<td>4.2</td>
</tr>
<tr>
<td>$c_l$ [µm]</td>
<td>2300</td>
<td>1400</td>
<td>3850</td>
</tr>
<tr>
<td>$d_h$ [µm]</td>
<td>170</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>$d_w$ [µm]</td>
<td>1800</td>
<td>800</td>
<td>2000</td>
</tr>
<tr>
<td>$d_l$ [µm]</td>
<td>4130</td>
<td>3120</td>
<td>5760</td>
</tr>
<tr>
<td>$\rho$ [W/mm$^2$]</td>
<td>10.5</td>
<td>78.3</td>
<td>25.6</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Calc. Sim. [70]</th>
<th>Calc. Sim. [38]</th>
<th>Calc. Sim. [83]</th>
</tr>
</thead>
<tbody>
<tr>
<td>$L_{core}$ [nH]</td>
<td>92</td>
<td>97</td>
<td>102</td>
</tr>
<tr>
<td>$L_{t,core}$ [nH]</td>
<td>28</td>
<td>30</td>
<td>19</td>
</tr>
<tr>
<td>$L_{spiral}$ [nH]</td>
<td>47</td>
<td>48</td>
<td>30</td>
</tr>
<tr>
<td>$L$ [nH]</td>
<td>167</td>
<td>175</td>
<td>160</td>
</tr>
<tr>
<td>$R_{dc}$ [mΩ]</td>
<td>169</td>
<td>180</td>
<td>191</td>
</tr>
<tr>
<td>$f_{sw}$ [MHz]</td>
<td>20</td>
<td>19</td>
<td>20</td>
</tr>
<tr>
<td>$R_{ac}$ [mΩ]</td>
<td>249</td>
<td>344</td>
<td>–</td>
</tr>
<tr>
<td>$B_{pk}$ [mT]</td>
<td>153</td>
<td>–</td>
<td>–</td>
</tr>
<tr>
<td>$P_t$ [mW]</td>
<td>1.1</td>
<td>1.3</td>
<td>1.7</td>
</tr>
<tr>
<td>$P_h$ [mW]</td>
<td>1.7</td>
<td>1.1</td>
<td>1.7</td>
</tr>
<tr>
<td>$P_e$ [mW]</td>
<td>2.1</td>
<td>1.8</td>
<td>2.5</td>
</tr>
<tr>
<td>$P_{loss}$ [mW]</td>
<td>4.9</td>
<td>4.2</td>
<td>5.9</td>
</tr>
<tr>
<td>$\eta$ [%]</td>
<td>94.1</td>
<td>94.9</td>
<td>93</td>
</tr>
</tbody>
</table>
Cored racetrack inductor Pareto optimization procedure

Set buck converter operating conditions:
\[ O = \{V_{in}, V_{out}, I_{out}, PAR\} \]

Define design space \( X = \{x_1, x_2, ..., x_m\} \), where
\[ x_i = [N_i, t_{w,i}, t_{t,i}, t_{s,i}, c_{w,i}, c_{t,i}, c_{l,i}, d_{h,i}, d_{w,i}, d_{l,i}]^T \]

Load \( x_i \) to the analytical cored racetrack inductor model

Perform dc calculations
\[ Y_{dc,i} = \{L_i, R_{dc,i}, A\} \]

Determine \( f_{sw,i} \) using \( Y_{dc,i} \) and \( O \)

Perform ac calculations
\[ Y_{ac,i} = \{R_{ac,ki}, P_{t,i}, P_{h,i}, P_{e,i}\} \]

Determine \( \eta_i \) and \( \rho_i \) from \( O, Y_{dc,i}, \) and \( Y_{ac,i} \)

Is \( i = m? \)

\[ i = i + 1 \]

Generate Pareto front

---

**Figure 2.6:** Flowchart of the cored racetrack inductor optimization procedure. The Pareto front is generated from the maximum efficiency for a given power density resulting from evaluation of all cored racetrack inductors in the design space.
the design space have been processed. The Pareto front can be used to select the best inductor design for a given buck converter specifications based on an optimum trade-off between efficiency and power density. The selected optimum inductor design may thereafter be implemented in a FEM simulator for more accurate characterization and fine tuning of the geometrical parameters.

### 2.3.6 Case Study

A case study of the cored racetrack inductor Pareto optimization procedure presented above is carried out with the buck converter specifications and geometrical parameter design space listed in Tab. 2.6. As for the on-chip spiral inductor case study in Section 2.2.1, no thermal considerations are employed for this Pareto optimization procedure. The following list contains the remaining geometrical parameters, which can be deducted using Fig. 2.5, along with additional design limitations that apply in this case study:

- The distance from the winding to the core side wall is assumed to equal $t_s$, hence the core width becomes $c_w = N t_w + (N + 1) t_s + 2 c_t$. However, designs having $c_w > 1500 \, \mu m$ are omitted.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Cored racetrack</th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_{in}$</td>
<td>1.8 V</td>
</tr>
<tr>
<td>$V_{out}$</td>
<td>0.9 V</td>
</tr>
<tr>
<td>$I_{out}$</td>
<td>250 mA</td>
</tr>
<tr>
<td>$PAR$</td>
<td>2 (BCM)</td>
</tr>
<tr>
<td>$c_l$</td>
<td>1 mm . . . 9 mm</td>
</tr>
<tr>
<td>$c_t$</td>
<td>1 $\mu$m . . . 9 $\mu$m</td>
</tr>
<tr>
<td>$N$</td>
<td>1 . . . 8</td>
</tr>
<tr>
<td>$t_w$</td>
<td>10 $\mu$m . . . 1500 $\mu$m</td>
</tr>
<tr>
<td>$t_s$</td>
<td>10 $\mu$m . . . 100 $\mu$m</td>
</tr>
<tr>
<td>$t_t$</td>
<td>10 $\mu$m . . . 60 $\mu$m</td>
</tr>
</tbody>
</table>

The table provides the specifications and design space for the cored racetrack inductor case study.
The distance between the two cores is assumed to equal $2(t_w + t_s)$. Hence the device width and length become $d_w = 2(c_w + t_w + t_s)$ and $d_l = c_l + d_w - 2(t_s + c_t)$, respectively.

The vertical distance between the winding and core is assumed to be $\frac{1}{2}t_s$. Hence the device height is obtained using $d_h = 2(t_s + c_t)$.

The winding thickness maximum limit of 60 $\mu$m is due to the maximum plating mold thickness that can be reliably formed using photo resist while providing reasonable process yield.

An inductor set in the design space is omitted if $t_w < t_t/2.5$ or $t_s < t_t/2.5$ due to yield issues in the fabrication process.

The magnetic core material is permalloy Ni$_{45}$Fe$_{55}$, which has resistivity $\rho_c = 45\mu\Omega \cdot \text{cm}$ and relative permeability $\mu_c = 280$. The Steinmetz parameters to be used in (2.24) and (2.25) are $\alpha = 1$ and $\beta = b = 1.73$. The saturation flux density of Ni$_{45}$Fe$_{55}$ is reported in the literature to be $B_{\text{sat}} = 1.6$ T [83]. Hence, an inductor design is omitted if $B_{\text{pk}} > B_{\text{sat}}$. Furthermore, anisotropic effects are neglected.

The evaluated efficiencies and power densities of the cored racetrack inductors in the case study design space are mapped to the $\eta - \rho$ plane as shown in Fig. 2.7. The Pareto front is constructed from the designs that achieve the highest efficiencies for a given power density. Three Pareto fronts with different switching frequency limits are shown in dark, medium, and light blue shadings highlighting the inductor’s efficiency and power density improvement achieved by increasing the switching frequency.

Three different cored racetrack inductor designs on the Pareto front for $f_{\text{sw}} < 25$ MHz are highlighted in Fig. 2.7, and their geometrical parameters and evaluated performances are listed in Tab. 2.7. The very high efficiency design marked [I] on the Pareto front exhibits the lowest power density value considered. The core width is at the maximum limit resulting in wide windings that reduce the copper losses at the cost of increased area. The minimum core thickness and core length facilitate low core losses. The very high power density design marked [II] exhibits the lowest efficiency of the designs on the Pareto front; the number of turns and the winding dimensions are low resulting in low area at the cost of increased copper losses. The core thickness
of 3 µm increases the inductance to ensure a switching frequency below 25 MHz at the cost of increased eddy current losses. The power loss density is $\alpha_{\text{loss}} = 117 \text{ mW/mm}^2$. This is less than typical power loss densities of advanced microprocessor systems, which can be more than 500 mW/mm$^2$. Thus, the realization of the highest power density design is feasible from a thermal point of view. The third highlighted design marked [III] on the Pareto front is included to exemplify a trade-off between the very high efficiency design and the very high power density design.

As can be seen in Fig. 2.7, increasing the switching frequency improves the efficiency and power density of the cored racetrack inductor. However, managing transistor switching losses, which are not included in the model, might still dictate a not too high switching frequency to be selected. In any case, the cored racetrack inductors are seen to achieve > 90% efficiencies at > 1 W/mm$^2$ power densities, and are therefore
Table 2.7: Geometrical parameters and evaluated performances of the three highlighted cored racetrack inductor designs on the Pareto front for switching frequencies below 25 MHz.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>$N$</td>
<td>4</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td>$t_w$</td>
<td>300 µm</td>
<td>20 µm</td>
<td>50 µm</td>
</tr>
<tr>
<td>$t_t$</td>
<td>60 µm</td>
<td>20 µm</td>
<td>70 µm</td>
</tr>
<tr>
<td>$t_s$</td>
<td>40 µm</td>
<td>10 µm</td>
<td>20 µm</td>
</tr>
<tr>
<td>$c_l$</td>
<td>1000 µm</td>
<td>1000 µm</td>
<td>2000 µm</td>
</tr>
<tr>
<td>$c_t$</td>
<td>1 µm</td>
<td>3 µm</td>
<td>1 µm</td>
</tr>
<tr>
<td>$L$</td>
<td>41.7 nH</td>
<td>40.5 nH</td>
<td>38.6 nH</td>
</tr>
<tr>
<td>$f_{sw}$</td>
<td>21.6 MHz</td>
<td>22.2 MHz</td>
<td>23.3 MHz</td>
</tr>
<tr>
<td>$B_{pk}$</td>
<td>0.29 T</td>
<td>0.43 T</td>
<td>1.26 T</td>
</tr>
<tr>
<td>$P_t$</td>
<td>3.0 mW</td>
<td>16.6 mW</td>
<td>7.5 mW</td>
</tr>
<tr>
<td>$P_h$</td>
<td>0.9 mW</td>
<td>5.5 mW</td>
<td>3.3 mW</td>
</tr>
<tr>
<td>$P_e$</td>
<td>0.1 mW</td>
<td>8.0 mW</td>
<td>0.5 mW</td>
</tr>
<tr>
<td>$\eta$</td>
<td>98.3%</td>
<td>88.2%</td>
<td>95.2%</td>
</tr>
<tr>
<td>$\alpha$</td>
<td>14 mW/mm²</td>
<td>1204 mW/mm²</td>
<td>107 mW/mm²</td>
</tr>
</tbody>
</table>

considered suited for on-chip buck converters.

2.4 Summary

For integrated buck converters, the design of efficient on-chip inductors proves to be a challenging task. Firstly, the small volume available limits the geometrical parameters, and thereby the achievable inductance, and this forces a drastic increase in switching frequency compared to discrete buck converters. Secondly, the cored inductors are limited by availability of high-frequency magnetic materials. Once modeled using either analytical models or finite element method (FEM) simulations, the use of Pareto optimization procedures provides an excellent approach to evaluate the trade-off between efficiency and power density.
within a certain design space of inductor geometries.

For on-chip air core spiral inductors implemented using top metal layers of the 32 nm process, the Pareto fronts shown in Fig. 2.4(b) contain designs with limited efficiency and power density at switching frequencies below 200 MHz. The 3 \( \mu \)m winding height is identified as the main limiting geometrical parameter resulting in a high dc winding resistance and thereby low inductor efficiency. For a power density of 1 W/mm\(^2\), an inductor efficiency of 90\% at a switching frequency of 1 GHz can be achieved. Such high switching frequencies would further penalize the overall converter efficiency when taking transistor switching losses into account.

For post-processed (microfabricated) air core spiral inductors, the winding height limit is relaxed, and efficient inductors can be designed as shown in Fig. 2.4(a). Here, 1 W/mm\(^2\) power density can be achieved at 95\% inductor efficiency at 100 MHz switching frequency.

Incorporating magnetic materials is a means to boost inductance and thereby reduce switching frequency for the same operating mode. A popular approach is the cored racetrack inductor, which is an elongated spiral inductor where magnetic material is deposited around the straight parts of the windings. The analytical model derived in Section 2.3 is used in a Pareto optimization procedure to evaluate a certain design space of inductor geometries with significant less computational effort compared to FEM simulations. From the resulting cored racetrack inductor Pareto front in Fig. 2.7, 1 W/mm\(^2\) power density is achieved at 92\% inductor efficiency and 50 MHz switching frequency.

The two key learnings from this chapter are:

- On-chip air core spiral inductors with windings formed by the top metal layers of the 32 nm semiconductor technology’s metal stack are unsuited for on-chip buck converter applications.

- Post-processed (microfabricated) inductors, both with and without magnetic materials, are suited for on-chip buck converters, assuming that the required post-processing steps can be reliably manufactured.

The post-processed inductors will, despite their promising performance results, not be pursued further in this thesis. The reason is lack
of the required post-processing steps to allow for either thicker winding material and/or magnetic material deposition. However, on-chip buck converters are still a hot research topic showing promising results when the additional post-processing steps are available [84] or for 2.5D levels of integration that are currently being commercialized [33, 12]. Finally, the availability of high density deep trench capacitors makes on-chip switched capacitor converters an attractive candidate for high efficiency and high power density OCVR implementations. The next chapter therefore investigates the performance potential of on-chip switched capacitor converters.
SWITCHED CAPACITOR (SC) converters are typically used as charge pumps for low-power applications but not as POL converters for higher-power applications. However, SC converters for microprocessor power delivery have gained in popularity due to one simple fact: there are no inductors. From the previous chapter, we concluded that on-chip inductors require post-processing steps, which are typically not readily available in common semiconductor technologies, to achieve good performances. Capacitors, on the other hand, are readily available in most semiconductor processes, and on-chip SC converters do therefore not require additional processing steps. The available 32 nm SOI CMOS technology features the high-density deep trench capacitor from Fig. 1.8, which has a higher capacitance density and smaller parasitics compared to e.g. MIM, MOM, and MOSFET based on-chip capacitors. The use of deep trench capacitors therefore has tremendous potential for high efficiency and high power density on-chip SC converter designs.

A basic SC converter analysis is carried out in Section 3.1. However, the analysis is challenging to apply for SC converters with conversion ratios other than 2:1. Therefore, a general model framework for SC converters is sought. In Section 3.2, the SC converter equivalent circuit model and existing modeling frameworks are discussed. The existing model frameworks do not include the parasitic bottom plate capacitor, which plays an important role in on-chip SC converters affecting both the operation and the efficiency of the converter. Therefore, a state space model is developed to accurately take the effect of the parasitic
bottom plate capacitor into account. The state space model framework, which is verified by simulations, gives new insights into the circuit behavior and its loss components, and it is used in a Pareto optimization procedure. Section 3.3 details the first converter design, which is based on the output of a Pareto optimization procedure of a 2:1 SC converter. The SC converter power stage furthermore features a charge recycling circuit that reduces the switching losses associated with the parasitic bottom plate capacitor. Section 3.4 presents the first hardware results. The measurement results validate the potential for SC converters to be used as OCVRs for granular microprocessor power delivery with per-core regulation.

This chapter is based on the publications [3], [4], and [7].

3.1 Basic SC Converter Analysis

In the following, an analysis of the 2:1 SC converter is carried out to illustrate the circuit operation and the governing design equations [62, 60]. The analysis aims to determine an expression for the charge, and thereby the current, delivered to the output as well as the equivalent output resistance $R_{eq}$ of the equivalent circuit model shown in Fig. 1.6(c).

As depicted in Fig. 3.1, the 2:1 SC converter consists of a flying capacitor $C$ and four switches having on-state resistances $R_{on}$. For simplicity, all switches are assumed to have equal on-state resistances, however, the analysis applies for unequal on-state resistances as well. In steady state, the flying capacitor is switched with 50% duty cycle between a) the charging state, where the flying capacitor is in series between the input and the output (switches $S_1$ and $S_3$ are on), and b) the discharging state, where the flying capacitor is in parallel with the output (switches $S_2$ and $S_4$ are on), leaving the input disconnected.

To charge a capacitor with capacitance $C$ through a resistor with resistance $R$, the capacitor voltage can be described as

$$v_C(t) = V_1 + (V_0 - V_1)e^{-t/(RC)}, \quad (3.1)$$

where $V_1$ is the voltage that the capacitor charges towards from its initial voltage $V_0$. 

— 54 —
3.1. Basic SC Converter Analysis

![2:1 SC converter](image)

**Figure 3.1:** The 2:1 SC converter including the parasitic on-state resistances $R_{on}$ of the switches and the equivalent series resistance $R_{esr}$ of the flying capacitor.

For the 2:1 SC converter in Fig. 3.1, the capacitor charges towards $V_{in} - V_{out}$ in the charging state, and it discharges towards $V_{out}$ in the discharging state. The steady state flying capacitor voltage $v_C(t)$ for constant $V_{in}$ and $V_{out}$ is depicted in **Fig. 3.2**. As seen, the capacitor charges and discharges towards $V_{in} - V_{out}$ and $V_{out}$, respectively, and the voltages $V_{C,\text{max}1}$ and $V_{C,\text{min}1}$ denote the actual voltages that the capacitor charges or discharges to, respectively, within one switching period $T_{sw} = 1/f_{sw}$. Also shown is a steady state flying capacitor voltage waveform at four times the switching frequency, i.e. for $4f_{sw}$. Here, the capacitor still charges and discharges towards $V_{in} - V_{out}$ and $V_{out}$, respectively, but only charges and discharges to $V_{C,\text{max}2}$ and $V_{C,\text{min}2}$, respectively. The capacitor voltage always switches around $V_{in}/2$ for the 2:1 SC converter.

Using (3.1), the expressions for $V_{C,\text{max}}$ and $V_{C,\text{min}}$ in the charging
Flying capacitor voltage waveform

![Diagram of flying capacitor voltage waveform]

Figure 3.2: Steady state voltage $v_C(t)$ of the flying capacitor shown for two switching frequencies. The voltage difference $\Delta v_C$ is used to determine the charge delivered to the output per switching period $T_{sw}$.

and discharging states, respectively, become

$$V_{C,max} = V_{in} - V_{out} + (V_{C,min} - V_{in} + V_{out})e^{-1/(2f_{sw}R_{tot}C)}$$  \hspace{1cm} (3.2)

$$V_{C,min} = V_{out} + (V_{C,max} - V_{out})e^{-1/(2f_{sw}R_{tot}C)},$$  \hspace{1cm} (3.3)

where $R_{tot} = 2R_{on} + R_{esr}$ denotes the total resistance in series with the capacitor in both states, and $t = 1/(2f_{sw})$ for 50% duty cycle.

Solving (3.2) and (3.3) for $V_{C,max}$ and $V_{C,min}$, the capacitor voltage difference $\Delta v_C$ can be found to be

$$\Delta v_C = V_{C,max} - V_{C,min} = (V_{in} - 2V_{out})k,$$  \hspace{1cm} (3.4)

with

$$k = \frac{1 - e^{-1/(2f_{sw}R_{tot}C)}}{1 + e^{-1/(2f_{sw}R_{tot}C)}}.$$  \hspace{1cm} (3.5)
The flying capacitor delivers an equal amount of charge \( C\Delta v_C \) to the output in each state, so the total output charge per switching period is

\[
Q_{\text{out}} = 2C\Delta v_C, \quad (3.6)
\]

and the output current becomes

\[
I_{\text{out}} = Q_{\text{out}} f_{\text{sw}} = 2C (V_{\text{in}} - 2V_{\text{out}}) kf_{\text{sw}}. \quad (3.7)
\]

Using the output current expression in (3.7) with the equivalent circuit model in Fig. 1.6(c), the equivalent output resistance of the SC converter becomes

\[
R_{\text{eq}} = \frac{V_{\text{in}}/2 - V_{\text{out}}}{I_{\text{out}}} = \frac{1}{4Ckf_{\text{sw}}}. \quad (3.8)
\]

As seen, \( R_{\text{eq}} \) is frequency dependent, and it contains the exponential terms governed by the factor \( k \) from (3.5).

The equivalent output resistance can be decomposed into two resistance contributions governing the slow switching limit (SSL) and the fast switching limit (FSL), respectively. As shown in [60], \( k \to 1 \) for \( f_{\text{sw}} \to 0 \), and \( k \to 1/(4CR_{\text{tot}}f_{\text{sw}}) \) for \( f_{\text{sw}} \to \infty \). Hence, the equivalent resistance frequency asymptotes can be derived using (3.8) as

\[
R_{\text{eq,SSL}} = \frac{1}{4Cf_{\text{sw}}}, \quad (3.9)
\]

\[
R_{\text{eq,FSL}} = R_{\text{tot}}. \quad (3.10)
\]

In [60], a model framework to derive \( R_{\text{eq,SSL}} \) and \( R_{\text{eq,FSL}} \) for any SC converter topology is developed. Those resistances are then used to approximate the equivalent output resistance as

\[
R_{\text{eq,approx}} \approx \sqrt{R_{\text{eq,SSL}}^2 + R_{\text{eq,FSL}}^2}. \quad (3.11)
\]

The model framework can be applied on any SC converter using simple circuit analysis techniques.

The equivalent output resistance over switching frequency is shown in Fig. 3.3. As seen, \( R_{\text{eq}} \) in the SSL region asymptote dictates a \( 1/f_{\text{sw}} \) behavior at low switching frequencies, whereas the FSL is independent
Figure 3.3: Equivalent output resistance $R_{eq}$ as function of the switching frequency. The $R_{eq}$ has a $1/f_{sw}$ behavior in the slow switching limit (SSL) region and a frequency independent behavior in the fast switching limit (FSL) region. The approximated equivalent output resistance $R_{eq,\text{approx}}$ can easily be derived using simple circuit analysis.

of frequency. Furthermore, $R_{eq,\text{approx}}$ from (3.11) can be seen to approximate the $R_{eq}$ from (3.8) fairly well.

It is elaborated in [60] that $R_{eq,\text{SSL}}$ and $R_{eq,\text{FSL}}$ are comparably much simpler to derive than the exact expression of $R_{eq}$ in (3.8), especially for other SC converter topologies having more switches and capacitors. Accordingly, the model framework is by many considered the preferred approach for SC converter modeling and design. However, the model framework has two disadvantages regarding on-chip SC converters: firstly, as seen in Fig. 3.3, the approximation applied when determining $R_{eq}$ is least accurate between the SSL and FSL regions. This region is often preferred for high efficiency operation since $R_{eq}$ is low and $f_{sw}$ is moderate. Secondly, it neglects switching losses that are mainly associated with the parasitic bottom plate capacitor of the flying capacitor. For that reason, the FSL region is typically avoided.
due to high switching losses not accounted for in Fig. 3.3. Switching losses may not be of major concern for discrete SC converters since the parasitic bottom plate capacitors of discrete capacitors can often be neglected. However, switching losses may influence the converter’s steady state operation and efficiency, and can therefore not be neglected for on-chip implementations. For this reason, the next section develops a model framework which accurately captures the effect of the parasitic bottom plate capacitor.

### 3.2 State Space Model Framework

Applying a model framework translates any SC converter into an equivalent circuit model that captures the steady state converter operation and power losses (conversion efficiency). As introduced in Section 1.3.3, a SC converter consists of a number of switches and flying capacitors. The configuration of capacitors and switches in first the charging state and thereafter the discharging state determines the ideal voltage conversion ratio $M$ of the SC converter topology. As shown in Fig. 1.6(c), the SC converter equivalent circuit model includes the switching frequency dependent equivalent output resistance $R_{eq}$ shown in Fig. 3.3. As detailed above, $R_{eq}$ represents the converter’s conduction losses resulting from charging and discharging the flying capacitors.

![Extended SC converter equivalent circuit model](image)

**Figure 3.4:** The extended SC converter equivalent circuit model, where the ideal transformer models the voltage conversion ratio $M$ defined by the topology and the resistors $R_{eq}$ and $R_{bp}$ model conduction and switching losses, respectively.
Chapter 3. Switched Capacitor Converters

The extended SC converter equivalent circuit model is shown in Fig. 3.4. It features $R_{bp}$, which models the switching losses associated with the parasitic bottom plate capacitor. As seen, $R_{bp}$ sinks a current $I_{bp}$ that would otherwise have been delivered to the output. Switching losses thereby affect both the steady state converter behavior and the efficiency. $R_{bp}$ is indicated in e.g. [23, 25], but not in a comprehensive manner that considers steady state operation.

In [85], a model framework based on conventional circuit analysis put into a state space representation is used. Once all node equations have been put into matrix form, $R_{eq}$ can be calculated accurately. However, also this model framework does not account for switching losses. The next section extends the state space model framework in [85] to take the effect of the parasitic bottom plate capacitor modeled by $R_{bp}$ into account.

### 3.2.1 Model Derivation

In the following, a SC converter state space model framework that includes the effects of the parasitic bottom plate capacitor on steady state operation and efficiency is developed, thereby being suited for on-chip SC converter design.

We have $2n$ capacitors ($n$ flying capacitors and $n$ bottom plate capacitors), which are put as diagonal elements into a $2n$ diagonal matrix $C$. The input and the output voltages are composed into vector $u$. Vectors $v(t)$ and $i(t)$ collect all instantaneous capacitor voltages and currents, respectively, and they are related by

$$i(t) = C \dot{v}(t), \quad (3.12)$$

where $\dot{v}(t)$ is the time derivative of $v(t)$.

For the charging state (state 1), Kirchhoff’s voltage and current laws (KVL and KCL, respectively) are applied to determine $2n$ independent equations of the form

$$E_1 i(t) + F_1 v(t) + G_1 u = 0. \quad (3.13)$$

When KVL is applied, rows in $E_1$ are resistances (transistor on-state resistances and / or flying capacitor equivalent series resistances), and
rows in \( F_1 \) and \( G_1 \) are -1, 0, or 1. When KCL is applied, rows in \( E_1 \) are -1, 0, or 1 and rows in \( F_1 \) and \( G_1 \) are all 0. Letting \( v \) represent the system states, (3.12) and (3.13) can be combined into

\[
\dot{v}(t) = A_1 v(t) + B_1 u
\]

\[
A_1 = -C^{-1} E_1^{-1} F_1
\]

\[
B_1 = -C^{-1} E_1^{-1} G_1,
\]

where \( C \) is always invertible because it is a diagonal matrix and \( E_1 \) is invertible when KVL and KCL have been applied correctly [85]. The general solution to the system of differential equations in (3.14) is

\[
v(t) = e^{A_1(t-t_0)} v(t_0) + \left[ \int_{t_0}^{t} e^{A_1(t-\tau)} B_1 d\tau \right] u,
\]

where we have utilized that \( u \) is independent of \( \tau \). \( \Phi_1(t) \) is known as the state transition matrix. Using the same approach for the discharging state (state 2) results in \( A_2 \) and \( B_2 \), as well as \( \Phi_2(t) \) and \( \Gamma_2(t) \).

With 50% duty cycle, \( t_1 = 1/(2f_{sw}) \) is the duration of the charging state, and \( t_2 = 1/(2f_{sw}) \) is the duration of the discharging state. Hence, assuming the charging state begins at \( t_0 = 0 \), the system states (capacitor voltages) at the end of each switching state equals

\[
v(t_1) = \Phi_1(t_1) v(0) + \Gamma_1(t_1) u
\]

\[
v(t_1 + t_2) = \Phi_2(t_2) v(t_1) + \Gamma_2(t_2) u.
\]

In steady state, \( v(0) = v(t_1 + t_2) \) applies, which, using (3.16) and (3.17), gives the initial condition

\[
v(0) = (\mathbf{I} - \Phi_2(t_2) \Phi_1(t_1))^{-1} (\Phi_2(t_2) \Gamma_1(t_1) + \Gamma_2(t_2)) u,
\]

where \( \mathbf{I} \) is the 2\( n \) identity matrix. The charge delivered by each capacitor per switching state is determined as

\[
q_1 = C (v(t_1) - v(0))
\]

\[
q_2 = C (v(t_1 + t_2) - v(t_1)) = -q_1,
\]

where the last equality holds because of charge conservation.
Knowing the capacitor charges in (3.19) and (3.20) for a SC converter topology, the input and output charges, and thereby the input and output currents, in each switching state can be calculated. Finally, the input and output powers are determined to compute the efficiency. The following exemplifies how the state space model framework can be applied on the 2:1 SC converter.

### 3.2.2 2:1 SC Converter Model

The state space model framework derived above is applied on the 2:1 SC converter from Fig. 3.1, where the equivalent circuits in the charging and the discharging states are shown in Fig. 3.5. In the equivalent circuits, each switch is replaced by an on-state resistance $R_{on1-4}$ when on and an open circuit when off, and the flying capacitor model includes its equivalent series resistance $R_{esr}$ and the bottom plate capacitor $C_{bp}$ connected to ground. The input and output nodes are modeled as ideal dc voltage sources.

The application of KVL and KCL put into the form of (3.13) yields the system matrices

$$
C = \begin{pmatrix} C & 0 \\ 0 & C_{bp} \end{pmatrix}, \quad i = \begin{pmatrix} i_C \\ i_{Cbp} \end{pmatrix}, \quad v = \begin{pmatrix} v_C \\ v_{Cbp} \end{pmatrix}, \quad u = \begin{pmatrix} V_{in} \\ V_{out} \end{pmatrix},
$$

$$
E_1 = \begin{pmatrix} R_{on1} + R_{esr} & 0 \\ -R_{on3} & R_{on3} \end{pmatrix}, \quad E_2 = \begin{pmatrix} R_{on2} + R_{esr} & 0 \\ R_{on4} & -R_{on4} \end{pmatrix},
$$

$$
F_1 = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \quad F_2 = \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix},
$$

$$
G_1 = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix}, \quad G_2 = \begin{pmatrix} 0 & -1 \\ 0 & 0 \end{pmatrix}.
$$

Now the state space model derived above can be applied to calculate the capacitor charges in (3.19) and (3.20). From Fig. 3.5, the output and input charge in each state for the 2:1 SC converter can be found as

$$
q_{out1} = q_C - q_{Cbp}, \quad (3.21)
$$

$$
q_{in1} = q_C, \quad (3.22)
$$

$$
q_{out2} = q_C, \quad (3.23)
$$

$$
q_{in2} = 0 \quad (3.24)
$$
and the total average output current over a full switching period becomes

\[
I_{\text{out}} = \frac{q_{\text{out}1} + q_{\text{out}2}}{t_1 + t_2} = \left(2q_C - q_{\text{Cbp}}\right)f_{\text{sw}}, \quad (3.25)
\]

where for \(q_{\text{Cbp}} = 0\), the output current expression in (3.25) reduces to the expression derived in (3.7). Likewise, the total average input current is

\[
I_{\text{in}} = \frac{q_{\text{in}1} + q_{\text{in}2}}{t_1 + t_2} = q Cf_{\text{sw}}. \quad (3.26)
\]
Hence, the output current is affected by the presence of the parasitic bottom plate capacitance. As will be further detailed in Section 3.2.3, the parasitic bottom plate capacitor therefore affects the equivalent output resistance $R_{eq}$ of the converter.

Using (3.25) and (3.26), the total efficiency of the 2:1 SC converter can be calculated as

$$\eta = \frac{P_{out}}{P_{in}} = \frac{V_{out}I_{out}}{V_{in}I_{in}} = \frac{V_{out}}{V_{in}} \left( \frac{2 - \frac{q_{Cbp}}{q_C}}{2} \right).$$

(3.27)

As can be seen, if $C_{bp}$ is neglected, then $q_{Cbp}$ = 0, and the efficiency in (3.27) reduces to the ideal SC converter efficiency in (1.4) for $M = 1/2$. Additionally, it is seen from (3.27) how $q_{Cbp}$ directly influences the efficiency of the 2:1 SC converter.

To port this analysis to the equivalent circuit model from Fig. 3.4, the resistances can directly be determined as

$$R_{eq} = \frac{MV_{in} - V_{out}}{I_{out}} = \frac{1}{2} \frac{V_{in} - V_{out}}{(2q_C - q_{Cbp}) f_{sw}},$$

(3.28)

$$R_{bp} = \frac{MV_{in}}{M I_{in} - I_{out}} = \frac{1}{2} \frac{V_{in}}{q_{Cbp} f_{sw}},$$

(3.29)

where $M = 1/2$ is the voltage conversion ratio.

### 3.2.3 Model Verification

The state space model framework of the 2:1 SC converter is verified against simulations using the Matlab Simulink environment. For the verification, an example design with $V_{in} = 1.8$ V, $R_{on1-4} = R_{esr} = 0.5$ Ω, and $C = 2$ nF is selected. For output voltage sweeps, the switching frequency is arbitrarily chosen to equal $f_{sw} = 100$ MHz, and for switching frequency sweeps, the output voltage is arbitrarily chosen to equal $V_{out} = 850$ mV.

Typically, the ratio of the bottom plate capacitance to the flying capacitance is denoted as

$$\alpha = \frac{C_{bp}}{C}.$$
3.2. State Space Model Framework

(a) State space model verification: $\eta$

(b) State space model verification: $I_{out}$

Figure 3.6: Verification of (a) efficiency, (b) output current resulting from the state space model framework. The simulated results (dots) match the model results (lines) over output voltage for various values of $\alpha = C_{bp}/C$. The switching frequency is 100 MHz and the input voltage is 1.8 V.
Figure 3.7: Verification of (a) equivalent output resistance, and (b) equivalent bottom plate resistance resulting from the state space model framework. The simulated results (dots) match the model results (lines) over switching frequency for various values of $\alpha = \frac{C_{bp}}{C}$. The output voltage is 850 mV and the input voltage is 1.8 V.
The value of $\alpha$ depends both on the semiconductor process and the 
on-chip capacitor technology.

The model results for various values of $\alpha$ are shown as solid lines 
in Fig. 3.6 and Fig. 3.7. The Matlab Simulink simulation results are 
shown as dots to verify the model results. As can be seen, the state 
space model framework is able to accurately capture the influence of 
the bottom plate capacitor on the converter’s steady state operation 
and efficiency. In the following, the model results excluding ($\alpha = 0\%$) 
and including ($\alpha > 0\%$) the switching losses associated with $C_{bp}$ are 
discussed. Although the discussion focuses on the 2:1 SC converter, the 
key learnings apply to other SC converter topologies, like the 3:2 SC 
converter discussed in Section 4.1.2, as well.

**Model results for $\alpha = 0\%$**

When $\alpha = 0\%$, the ideal efficiency shown in Fig. 3.6(a) approaches 100% 
as the output voltage approaches $V_{in}/2 = 900 \text{ mV}$. However, at this 
output voltage, the output current shown in Fig. 3.6(b) approaches 
0 mA. Hence the SC converter theoretically has 100% efficiency at 
$V_{out} \rightarrow V_{in}/2$ but with $I_{out} \rightarrow 0 \text{ mA}$. For output voltages below half 
the input voltage, the ideal efficiency decreases linearly with decreasing 
output voltage following (1.4), where $M = 1/2$ for the 2:1 converter, 
and current is delivered to the load, i.e. $I_{out} > 0 \text{ mA}$ for $V_{out} < 900 \text{ mV}$.

The equivalent output resistance $R_{eq}$ shown in Fig. 3.7(a) exhibits 
the well-known $1/f_{sw}$ behavior at low switching frequencies and a constant behavior at high switching frequencies. These switching frequency 
regions are the SSL and FSL from Fig. 3.3. When disregarding $C_{bp}$, 
$R_{bp}$ in Fig. 3.7(b) is an open circuit.

**Model results for $\alpha > 0\%$**

When $\alpha > 0\%$, the efficiency shown in Fig. 3.6(a) drops when the output 
voltage approaches $V_{in}/2 = 900 \text{ mV}$. This is because the switching losses 
associated with the parasitic bottom plate capacitor become comparable 
to the output power for output voltages close to the ideal voltage ratio. The output current is shown in Fig. 3.6(b), and it is low for output 
voltages approaching 900 mV. Furthermore, for any given output 
voltage, the overall efficiency and the output current are reduced when 
switching losses are included.
For the equivalent output resistance $R_{eq}$ shown in Fig. 3.7(a), the inclusion of switching losses has a direct impact on the minimum resistance, and thereby highest efficiency, achievable. The upward bend at high switching frequencies is a result of taking the parasitic bottom plate capacitor into account. For the equivalent resistance in (3.8), the charge $q_{C_{bp}}$ of the bottom plate capacitor subtracts from the charge of the flying capacitor $q_C$. Hence, $q_{C_{bp}}$ affects $R_{eq}$ for $\alpha > 0\%$ by a) an overall increase at any switching frequency and b) an upward bend at high switching frequencies where $q_{C_{bp}}$ become comparable to $q_C$. These effects are not captured by existing modeling frameworks [60, 85]. Furthermore, Fig. 3.7(a) shows that there exists an optimum switching frequency (minimum $R_{eq}$) to operate the SC converter, and that this optimum switching frequency is a function of $\alpha$.

The bottom plate resistance $R_{bp}$ is shown in Fig. 3.7(b). From the equivalent circuit in Fig. 3.4, $R_{bp}$ sinks a current $I_{bp}$ that would otherwise have been delivered to the output, thereby affecting both the efficiency and the output current of the converter.

### 3.2.4 Device Models

To prepare the SC converter state space model framework presented above for a Pareto optimization, the device models for the transistors and the capacitor must be defined. The simplified transistor model, which is equivalent for both NMOS and PMOS transistors, and the deep trench capacitor model are shown in Fig. 3.8. The transistor model consists of the on-state resistance $R_{on}$ and the input and output capacitances $C_{iss}$ and $C_{oss}$, respectively. The capacitor model consists of the capacitance $C$, the equivalent series resistance $R_{esr}$, and the parasitic bottom plate capacitor $C_{bp}$.

Only the width for the thin-oxide transistors is considered, since the transistor length is fixed in this technology. The transistor on-state resistance and input and output capacitors depend on several parameters and are generally nonlinear with voltage and temperature. Although the nonlinear voltage dependency plays a role, it is disregarded in the parameter extractions for simplicity. Having the extracted parameters, a best fit expression is sought to be used in the Pareto optimization procedure, which is discussed in the next subsection. To accurately
3.2. State Space Model Framework

To capture the nonlinear dependencies, the converter is later simulated using hardware-correlated models in the Cadence design environment.

A simple thermal model is considered for the parameter extractions. For the application of 2D microprocessor power delivery, the converter is integrated on the same die as the microprocessor load. Therefore, assuming a high converter efficiency, the load, and not the converter losses, dictates the die temperature and thereby the operating temperature of the converter. For this reason, the parameters are extracted for one temperature corresponding to the maximum allowable temperature of 85°C for a microprocessor core.

For the transistor on-state resistance, a drain current of $I_d = 20$ mA with a gate-source voltage of $V_{gs} = 900$ mV is applied. The extracted on-state resistance is then least-mean-square fitted using

$$R_{on}(T_w) = \frac{1}{p_1 T_w}, \quad (3.31)$$

where $T_w$ is the transistor width, and $p_1$ is a fitting coefficient [86].

Using an ac analysis, the input capacitance $C_{iss}$ is extracted with the drain and source terminals shorted. Similarly, the output capacitance

---

Figure 3.8: Simplified transistor and deep trench capacitor models used for the Pareto optimization procedure.
Chapter 3. Switched Capacitor Converters

Extracted transistor parameters

![Graph showing extracted transistor parameters](image)

**Figure 3.9:** Extracted transistor parameters (dots) and resulting fitting functions (lines) for both NMOS and PMOS thin-oxide transistors at $85^\circ C$ in the available 32 nm SOI CMOS technology.

$C_{\text{oss}}$ is extracted with the gate and source terminals shorted. The two capacitances are then least-mean-square fitted using

\[
C_{\text{iss}}(T_w) = p_2 T_w \tag{3.32}
\]

\[
C_{\text{oss}}(T_w) = p_3 T_w, \tag{3.33}
\]

where $p_2$ and $p_3$ are fitting coefficients.

The extracted parameters and the resulting fitting functions as a function of transistor width $T_w$ are shown in **Fig. 3.9**. Note that although $T_w$ is given in millimeters, the transistor layout uses an array of many smaller-sized transistor units in parallel to form the actual transistor.\(^1\) As seen, the fitting functions are able to accurately capture the

\(^1\)Using chip design nomenclature, this is equivalent to have a transistor with a large number of fingers.
transistor parameters over a wide range of transistor widths. The fitting parameters $p_1$–$p_3$ used in the extracted transistor parameter functions (3.31), (3.32), and (3.33) are listed in Table 3.1.

For the deep trench capacitor, which is shown in Fig. 1.8, the model is linear, so the capacitance scales linearly with the number $X_C$ of unit capacitors considered. Hence,

$$C(X_C) = C_{\text{unit}} X_C,$$

where $C_{\text{unit}}$ is the capacitance of a deep trench capacitor unit. Likewise, the equivalent series resistance $R_{\text{esr}}$ of the deep trench capacitor is

$$R_{\text{esr}}(X_C) = \frac{R_{\text{esr,unit}}}{X_C}.$$

The extracted parameters for the deep trench capacitor in the 32 nm semiconductor technology are listed in Table 3.2. Since $X_C$ is an integer number, the parameters of the deep trench capacitor are discretized. However, as will be seen in Section 3.3, $X_C$ is large enough that the discretization is not an issue.

In an ac analysis, the output capacitances of the transistors connecting to a flying capacitor are effectively in parallel with the parasitic bottom plate capacitor $C_{\text{bp}}$. Hence, the modified parasitic bottom plate capacitance ratio $\alpha'$, which includes the output capacitances of the connecting transistors, needs to be determined and used in the state space model framework.

<table>
<thead>
<tr>
<th>Table 3.1: Fitting coefficients for the extracted transistor parameter fitting functions.</th>
</tr>
</thead>
<tbody>
<tr>
<td>$p_1$ [m$^{-1}$Ω$^{-1}$]</td>
</tr>
<tr>
<td>-----------------</td>
</tr>
<tr>
<td>NMOS 3002</td>
</tr>
<tr>
<td>PMOS 3165</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Table 3.2: Extracted parameters for the deep trench capacitor.</th>
</tr>
</thead>
<tbody>
<tr>
<td>$C_{\text{unit}}$ [nF]</td>
</tr>
<tr>
<td>-----------------</td>
</tr>
<tr>
<td>Deep trench</td>
</tr>
</tbody>
</table>
In a similar fashion to the on-chip inductors for integrated buck converters discussed in Chapter 2, a Pareto optimization procedure for on-chip SC converters is developed. The converter performance for each
design is estimated using the state space model framework derived in Section 3.2.1 and the device models from Section 3.2.4.

A flowchart of the SC converter Pareto optimization procedure is depicted in Fig. 3.11. The inputs of the procedure are the SC converter topology governed by $M$ and the electrical specifications $V_{in}$, $V_{out}$, and $I_{out}$. For a given SC converter topology, the design space $X$ contains $m$ SC converter designs. Each set $x_i \in X$, where $i = \{1, 2, \ldots, m\}$ consists of 1) the number $X_C$ of unit capacitors for each flying capacitor and 2) the transistor width $T_w$ for each transistor in the power stage. The design space is shown in vectorial form to allow for different sizes of capacitors and/or transistors in the power stage. However, both the capacitors and transistors, respectively, may all be of equal size. Furthermore, a range $f_{sw} = [f_{sw,1}, f_{sw,2}, \ldots f_{sw,\text{max}}]$ of allowable switching frequencies is specified. Finally, the area $A_C$ per unit capacitor and the area $A_T$ transistor width are determined from the device layouts. They, together with a fixed area for the gate driver circuit, are used to estimate the area and thereby the power density of a design.

To begin the Pareto optimization procedure, the first set $x_1$ ($i = 1$) is loaded and the extracted device parameters are determined using the fitting functions presented in Section 3.2.4. The modified parasitic bottom plate capacitance ratio $\alpha'$ is then estimated using (3.36). Thereafter, the first $f_{sw,1}$ ($k = 1$) switching frequency is selected from the predefined switching frequency range, and the state space model derived in Section 3.2.1 is used to evaluate the electrical performance of the $i$'th design. If the evaluated output current $I_{out,i}$ meets (or, due to the quantization of the design space parameters, slightly exceeds) the output current specification $I_{out}$, the design and its performance are stored and further processed in the subsequent step in the flowchart. However, if $I_{out,i}$ does not meet the output current specification $I_{out}$, the next $k = k + 1$ switching frequency in the range is selected and the design is reevaluated using the state space model framework. This inner switching frequency loop continues until $I_{out,i} \geq I_{out}$ is satisfied. If the maximum switching frequency is reached, i.e. if $f_{sw,i} = f_{sw,\text{max}}$, and the design still does not satisfy $I_{out,i} \geq I_{out}$, the $i$'th design is skipped and the next design $i = i + 1$ is loaded and evaluated.

Since the transistor gate losses are not included in the state space model, the gate losses for the transistors in the power stage of the $i$'th
SC converter Pareto optimization procedure

Set SC converter operating conditions:
\[ O = \{ M, V_{in}, V_{out}, I_{out} \} \]

Define design space \( X = \{ x_1, x_2, \ldots, x_m \} \), where
\[ x_i = [T_{w,i}, X_{C,i}] \text{ and } f_{sw} = [f_{sw,1}, f_{sw,2}, \ldots, f_{sw,max}] \]

Load \( x_i \) and extract device parameters
\[ D_i = \{ R_{on,i}, C_{iss,i}, C_{oss,i}, C_t, R_{esr,i} \} \]

Determine \( \alpha'_i \) from \( \alpha \) and \( C_{oss,i} \)

Select \( f_{sw,i} = f_{sw,k} \)

Apply SC converter state space model
\[ Y_i = \{ I_{out,i}, P_{out,i}, P_{in,i} \} \]

Is \( I_{out,i} \geq I_{out} \)? No

Is \( f_{sw,i} = f_{sw,max} \)? Yes

\[ Y_i = \{ Y_i, P_{g,i}, \rho_i, \eta_i \} \]

Skip design \( x_i \)

Is \( i = m \)? No

Yes

\[ i = i + 1 \]

Generate Pareto front

**Figure 3.11:** Flowchart of the SC converter Pareto optimization procedure, which is based on the state space model.
design, which satisfies \( I_{\text{out}, i} \geq I_{\text{out}} \), are estimated using

\[
P_{g,i} \approx \sum_n C_{inn,i} V_{gsn,i}^2 f_{sw,i},
\]

(3.37)

where index \( n \) refers to each transistor in the power stage, \( C_{inn,i} \) the input capacitance and \( V_{gsn,i} \) the gate-source voltage of the \( n \)'th transistor, respectively. The efficiency and power density of the \( i \)'th converter therefore becomes

\[
\eta_i = \frac{P_{\text{out},i}}{P_{\text{in},i} + P_{g,i}},
\]

(3.38)

\[
\rho_i = \frac{P_{\text{out},i}}{A_i} = \frac{P_{\text{out},i}}{N_T A_T T_{w,i} + N_C A_C X_{C,i}},
\]

(3.39)

where \( N_T \) and \( N_C \) are the number of transistor and capacitors in the power stage, respectively.

The Pareto front is generated when all designs in the design space have been evaluated. The optimal design can then be selected based on a trade-off between efficiency and power density. The chosen design is thereafter implemented in the Cadence design environment, which features hardware-correlated device models, for fine tuning of the design.

### 3.3 First SC Converter Design

Using the Pareto optimization procedure developed above, this section treats the first SC converter design and its implementation. The first converter is intended to serve as a proof of concept that aims to investigate the efficiency and power density achievable with on-chip SC converters in the 32 nm semiconductor technology. Therefore, the simple 2:1 SC converter shown in Fig. 3.1 is designed, where \( S_1 \) and \( S_3 \) are implemented as PMOS transistors and \( S_2 \) and \( S_4 \) are implemented as NMOS transistors.

The electrical specifications and the parameter design space for the 2:1 SC converter design are listed in Tab. 3.3. The transistor area \( A_T \) includes the last stage of the gate driver buffer (to be discussed in Section 3.3.2) and power grid margin to give a more realistic transistor
Table 3.3: Electrical specifications and design space for the first SC converter design.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>$M$</td>
<td>1/2</td>
</tr>
<tr>
<td>$V_{in}$</td>
<td>1.8 V</td>
</tr>
<tr>
<td>$V_{out}$</td>
<td>830 mV</td>
</tr>
<tr>
<td>$I_{out}$</td>
<td>20 mA</td>
</tr>
<tr>
<td>$X_C$</td>
<td>$100 \ldots 5000$</td>
</tr>
<tr>
<td>$T_w$</td>
<td>$100 \mu m \ldots 5000 \mu m$</td>
</tr>
<tr>
<td>$f_{sw}$</td>
<td>$10 \text{MHz} \ldots 300 \text{MHz}$</td>
</tr>
<tr>
<td>$A_C$</td>
<td>$5.129 \cdot 10^{-6} \text{mm}^2$</td>
</tr>
<tr>
<td>$A_T$</td>
<td>$0.322 \cdot 10^{-6} \text{mm}^2/\mu m$</td>
</tr>
</tbody>
</table>

Table 3.4: Selected Pareto optimized design for the first SC converter hardware implementation.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>First SC design</th>
</tr>
</thead>
<tbody>
<tr>
<td>$X_C$</td>
<td>400</td>
</tr>
<tr>
<td>$T_w$</td>
<td>$650 \mu m$</td>
</tr>
<tr>
<td>$f_{sw}$</td>
<td>100 MHz</td>
</tr>
<tr>
<td>$I_{out}$</td>
<td>19.6 mA</td>
</tr>
<tr>
<td>$\eta$</td>
<td>86.9%</td>
</tr>
<tr>
<td>$\rho$</td>
<td>5.1 W/mm$^2$</td>
</tr>
</tbody>
</table>

area estimation than simply the active transistor area. Although $T_w$ represents the width of each transistor in the power stage, PMOS transistors are 15% wider than NMOS transistors following a design rule recommendation, but this is not a strict requirement.

The SC converter Pareto optimization procedure developed in Section 3.2.5 is applied on the 2:1 SC converter. The resulting Pareto front is shown in Fig. 3.12. A design with a reasonable tradeoff between efficiency and power density is selected and marked by the cross on the Pareto front. The design and its performance are listed in Tab. 3.4.
3.3. First SC Converter Design

The selected design is implemented in the Cadence design environment for additional testing using hardware-correlated models and for layout of the final converter. The simulated efficiency and power for the complete converter schematic are shown with a dot in Fig. 3.12. As seen, there is good agreement between the modeled and simulated performances. A slightly lower efficiency is observed. Recall from (3.37) that the state space model takes the power losses of charging and discharging the transistor input capacitances into account, but the state space model does not include the gate driver circuits that generate the gate signals. The slightly lower efficiency is attributed to the additional losses of the gate driver, which is discussed in Section 3.3.2. The slightly lower power density result is attributed to the fact that the converter area estimation is based on a 100% area utilization of the transistors and capacitors. However, the final layout has a lower area utilization due to layout constraints, thereby affecting the area estimation. The

---

**Figure 3.12:** Resulting Pareto fronts of the optimization procedure for chip 1. The cross represents the chosen design point, the dot marks the simulated results of the complete converter schematic, and the triangle marks the measurement results.
measured results, which are presented and discussed in Section 3.4, are seen to be in good agreement with the in Cadence simulated results.

In Fig. 3.13, the designs space, which is used to generate the Pareto front in Fig. 3.12, is investigated in more detail. This investigation is used to get insight into how the design space parameters, especially $X_C$, $T_w$, and $f_{sw}$, affect the efficiency and power density of the converter. From Fig. 3.13, the following is concluded:

- The flattening of the efficiency at low power densities can be attributed to the decrease in switching frequency, and thereby decreasing switching losses. For low switching frequencies, switching losses are low, and the converter losses are primarily governed by conduction losses, which from (1.4) follow directly from the specified input and output voltages, i.e. independent of switching frequency.

- The increase in switching frequency allows for the highest power density results, but the efficiency bends downwards due to increased switching losses.

- It can be seen that the maximum power density is achieved with the lowest transistor width $T_w$. The minimum transistor width of 520 $\mu$m is higher than the minimum value of 100 $\mu$m from the design space in Tab. 3.4. The designs having $T_w < 520$ $\mu$m are not shown, since they result in lower efficiency designs that are not part of the Pareto front.

- It can be seen that the maximum power density is achieved with the lowest number of unit capacitors $X_C$. The minimum number of capacitors of 210 is higher than the minimum value of 100 from the design space in Tab. 3.4. The designs having $X_C < 210$ do not fulfill the output current specification.

### 3.3.1 Power Stage with Charge Recycling

As discussed in Section 3.2, the parasitic bottom plate capacitor significantly influences both the output current and efficiency of the converter. However, techniques to recycle the charge on the parasitic bottom plate capacitor before it is discharged to ground exist [60, 31]. For this reason,
Figure 3.13: Design space investigation showing how the design space parameters $X_C$, $T_w$, and $f_{sw}$ affect the efficiency and power density of the converter.
2:1 SC converter power stage with charge recycling

![Diagram showing the 2:1 SC converter power stage with charge recycling.](image)

**Figure 3.14**: The implemented 2:1 SC converter power stage with charge recycling. The power stage is split into two power stages, SC₁ and SC₂, that enable the implementation of the charge recycling circuit to reduce the switching losses associated with the parasitic bottom plate capacitors, \(C_{bp1}\) and \(C_{bp2}\), shown in gray.

The SC converter power stage, which forms the basis for the first hardware design, features a charge recycling circuit that reduces switching losses associated with the parasitic bottom plate capacitor.

The implementation of the 2:1 SC converter power stage with charge recycling is shown in **Fig. 3.14**. Switches are implemented using thin-oxide transistors and capacitors are implemented using the deep trench capacitor from the 32 nm semiconductor technology. The gate driver, which generates the level-shifted gate signals with deadtime, is discussed in the next subsection.

To design the charge recycling circuit, the power stage is split into two power stages, SC₁ and SC₂, as shown in Fig. 3.14. The two power stages are, as seen by the swapping of the gate signals, interleaved such that SC₁ is in the charging state when SC₂ is in the discharging state, and vice versa. By the end of SC₁’s charging state, \(C_{bp1}\) is charged
to $V_{out}$ and $C_{bp2}$ is discharged. During the following deadtime interval, the charge recycling transistor $S_{cr}$ is turned on, and charge from $C_{bp1}$ is recycled to $C_{bp2}$. When SC$_2$’s charging state (SC$_1$’s discharging state) begins, it will require less energy to charge $C_{bp2}$ to $V_{out}$ and less energy is lost when $C_{bp1}$ is discharged to ground. In the next deadtime interval, which occurs after SC$_2$’s charging state, charge is recycled from $C_{bp2}$ to $C_{bp1}$.

Simulations show that the efficiency gain when using the charge recycling circuit in this technology is in the order of 0.5 to 1 percentage points. Although this is not a huge efficiency improvement, the efficiency boost comes at a very simple circuit design and a small additional chip area.

### 3.3.2 Stacked Voltage Domain Gate Driver

Since the thin-oxide transistors in the 32nm SOI CMOS technology cannot tolerate a voltage higher than 1.2 V, special care has to be taken to ensure that no transistor is exposed to overvoltage with an input voltage of 1.8 V. Hence, the gate driver, which generates the gate signals for the power transistors $S_{1x} - S_{2x}$ in Fig. 3.14, has to be designed ensuring that no single transistor is exposed to overvoltage.

The gate driver implementation and its output gate signals are shown in Fig. 3.15. The transistor level schematic, which is shown in Fig. 3.15(a), employs a stacked voltage domain, where the upper voltage domain driving $S_{x1}$ and $S_{x2}$ is supplied between $V_{in}$ and $V_{out}$ and the lower voltage domain driving $S_{x3}$ and $S_{x4}$ is supplied between $V_{out}$ and ground (gnd). The input clock $clk_{in}$ is externally supplied in the lower voltage domain. Therefore, a level shifter circuit is implemented to shift the input clock to the upper voltage domain. In each voltage domain, the clock signal is passed through a latch with built-in delay (non-overlapping clock) to generate a deadtime interval between the clock edges to avoid shoot-through currents in the power stage transistors. The delay units, which determine the duration of the deadtime interval, consist of an even number of logic inverters. Tapered buffers are inserted after the deadtime circuits to provide sufficient drive strength to turn on and off the power transistors $S_{1x} - S_{2x}$. 
Figure 3.15: The gate driver is implemented using a stacked voltage domain since the 1.8 V input voltage is higher than the 1.2 V maximum allowable blocking voltage of the transistors in the 32 nm technology: (a) gate driver transistor level schematic, which consists of a level shifter and two identical non-overlapping clock circuits (latches) that generate the deadtime interval in each voltage domain; (b) output level-shifted gate signals with deadtime.
3.4 First Hardware Results

The output waveforms of the gate driver are shown in Fig. 3.15(b). In the first hardware design, the deadtime is designed to match the requirements of the charge recycling circuit discussed in the previous subsection. In later hardware designs, which are discussed in the subsequent chapters, the deadtime is minimized to reduce latency.

Special care has to be taken to ensure start-up of the stacked voltage domain gate driver. If the output voltage is not sustained, some transistors may be exposed to the full input voltage. Simulations show that the converter starts up without overvoltage when the load is disconnected. The reason is that the impedances of the power stage in Fig. 3.14 and gate driver in Fig. 3.15 in both voltage domains are virtually identical, thereby providing a close to equal voltage divider. If needed, decoupling the input voltage with stacked capacitors that are tapped at the output voltage, as shown top left in Fig. 3.15, provides extra margin for a safe startup.

3.4 First Hardware Results

To investigate the feasibility of on-chip SC converters, the 2:1 SC converter design with charge recycling discussed in Section 3.3.1 is implemented in the 32 nm semiconductor technology that features the high-density deep trench capacitor.

The system overview of the implemented converter directly follows the transistor level schematics of the 2:1 SC converter power stage with charge recycling shown in Fig. 3.14 and the stacked voltage domain gate driver shown in Fig. 3.15.

A chip photo of the on-chip 2:1 SC converter design with magnified layout view is shown in Fig. 3.16. From the converter design listed in Tab. 3.4, the two deep trench capacitors are laid out using \( X_C = 400 \) deep trench capacitor units. The four zones with transistors each contain a NMOS and PMOS transistor pair having a total transistor width of \( T_w = 650 \) \( \mu \)m. Each transistor is laid out with 1300 fingers (small transistor units in parallel), resulting in 0.5 \( \mu \)m width per finger. Furthermore, the converter is laid out in a symmetrical fashion which is compatible with the possibility of interleaving several SC converter
units to lower the output voltage ripple and increase output power. The total active converter area, which includes the gate driver and the charge recycling circuit, is 0.00344 mm$^2$. The flying capacitors accounts for 65%, the power stage transistors and charge recycling for 26%, and the gate driver for 9% of the total converter area.

### 3.4.1 Measured Efficiency and Power Density

Measurements are carried out using GBB PicoProbe needles on the unpackaged chip die mounted on a probe station. An overview of the general measurement used for this chip is depicted in Fig. 3.17. The input current is measured using Keithley 2400 series Sourcemeters, and the on-chip input and output voltages are measured on Kelvin contacts using an Agilent 34970 Data Acquisition/Switch Unit. For this design, various discrete resistors, which are attached external to the chip, act as loads for the converter. The output current is measured as the voltage measured across the resistor load divided by the load resistance.

Additional input decoupling capacitance $C_{in}$ is added to compensate for the effects of the cables connecting the measurement equipment to the DUT. Since there is no on-chip output decoupling capacitor in this design, the output decoupling is also connected external to the chip, and a larger than required capacitance of $C_{out} = 33 \text{nF}$ is added to the

---

**Figure 3.16:** Chip photo with magnified layout view of the implemented 2:1 SC converter. The input clock and ground pads are not shown. The total active converter area is 0.00344 mm$^2$. 

---
3.4. First Hardware Results

Figure 3.17: Overview diagram of the measurement setup with Kelvin contacts. The output load consists of discrete load resistors as well as output decoupling. This setup is used for chip 1.

output to ensure a negligible output voltage ripple. This enables a good characterization of the on-chip SC converter performance. However, it should be noted that the required output decoupling capacitance can be drastically lowered, or even omitted, by employing interleaving as discussed in Section 4.2. For this reason, the output decoupling capacitor is excluded in the power density estimations.

In Fig. 3.18, the measured efficiency and power density over output voltage at $f_{sw} = 100$ MHz are shown. The maximum efficiency is 86% at $4.6 \text{ W/mm}^2$ power density. Operating the converter at voltages below the minimum efficiency point (830 mV) results in a linearly decreasing efficiency, but at the same time an increase in power density. These characteristics are in agreement with the modeling results discussed in Section 3.2.3. Comparing these efficiency and power density results with the estimated performance from the Pareto optimization shown in Fig. 3.12, there is a very good match between the maximum efficiency and power density.

In Fig. 3.19, the corresponding efficiency and power density for each measurement point acquired are mapped to the $\eta - \rho$ plane. Each point therefore represents a different output voltage, switching frequency, and load resistance value, and all points for a given input voltage illustrate the entire performance landscape of the converter. The performance landscape for three different input voltages are shown in gray scale. The detailed performances of the maximum efficiency and maximum
Figure 3.18: Efficiency and power density over output voltage measured with the annotated load resistances. The switching frequency is 100 MHz and the input voltage is $V_{\text{in}} = 1.8$ V.

power density measurement points are listed in Tab. 3.5. As can be seen, the highest efficiency results (points 1 – 3) all have $f_{\text{sw}} = 71$ MHz and $R_{\text{load}} = 83.3 \Omega$. This indicates that the maximum efficiency is a particular operating point with a fixed input/output voltage conversion ratio, which equals 2.1:1 for this design. The highest power density results (points 4 – 6) all have very high switching frequency, but relatively low efficiency.

Fig. 3.19 serves to illustrate the trade-off between efficiency and power density. For instance for $V_{\text{in}} = 1.8$ V, power densities of more than 10 W/mm² can be achieved at efficiencies below 75%, whereas efficiencies above 85% can be achieved at power densities below 5 W/mm². The maximum efficiency is observed to be independent of input voltage, since also the output voltage at the maximum efficiency point changes. This leads to a constant ratio $V_{\text{out}}/V_{\text{in}}$, which from (1.4) results in a constant efficiency. However, the maximum power density that can be
3.4. First Hardware Results

Chip 1: Measured performance landscape

![Graph showing measured performance landscape for Chip 1 with different input voltages (Vin=2.0V, Vin=1.8V, Vin=1.6V) and cooling power densities (ρ cool=1W/mm², ρ cool=5W/mm², ρ cool=10W/mm²). The minimum efficiency required to fulfill cooling requirements are superimposed for three different cooling power densities illustrating that cooling must be taken into account for high power density on-chip SC converters.]

**Figure 3.19:** Envelope of the highest efficiency per power density measured with different input voltages. The minimum efficiency required to fulfill cooling requirements are superimposed for three different cooling power densities illustrating that cooling must be taken into account for high power density on-chip SC converters.

**Table 3.5:** Details of the highest efficiency and highest power density measurement results.

<table>
<thead>
<tr>
<th>Point #</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_{in}$ [V]</td>
<td>1.6</td>
<td>1.8</td>
<td>2.0</td>
<td>1.6</td>
<td>1.8</td>
<td>2.0</td>
</tr>
<tr>
<td>$V_{out}$ [mV]</td>
<td>759</td>
<td>857</td>
<td>954</td>
<td>616</td>
<td>721</td>
<td>820</td>
</tr>
<tr>
<td>$f_{sw}$ [MHz]</td>
<td>71</td>
<td>71</td>
<td>71</td>
<td>167</td>
<td>200</td>
<td>200</td>
</tr>
<tr>
<td>$R_{load}$ [Ω]</td>
<td>83.3</td>
<td>83.3</td>
<td>83.3</td>
<td>11.1</td>
<td>11.1</td>
<td>11.1</td>
</tr>
<tr>
<td>$\eta$ [%]</td>
<td>88.4</td>
<td>87.6</td>
<td>88.0</td>
<td>71.3</td>
<td>73.3</td>
<td>75.1</td>
</tr>
<tr>
<td>$\rho$ [W/mm²]</td>
<td>2.0</td>
<td>3.0</td>
<td>3.2</td>
<td>9.9</td>
<td>13.6</td>
<td>17.6</td>
</tr>
</tbody>
</table>
achieved increases with input voltage since the higher output voltage over the fixed load resistor results in a higher output power for the same converter area. The decrease in efficiency at low power density levels is a result of limited parameter ranges of the measurement setup.

Cooling requirements may become an issue for very high power density designs since also the power loss density will be high, especially when operated in high power density regions of the performance landscape where efficiency is low. With a cooling power density of $\rho_{\text{cool}} = P_{\text{loss}}/A$ that can be effectively cooled by the chosen cooling technology, the minimum converter efficiency $\eta_{\text{min}}$ can be expressed as

$$\eta_{\text{min}} = \frac{P_{\text{out}}}{P_{\text{out}} + P_{\text{loss}}} = \frac{P_{\text{out}}/A}{P_{\text{out}}/A + P_{\text{loss}}/A} = \frac{\rho}{\rho + \rho_{\text{cool}}},$$

where $P_{\text{loss}}$ is the total power loss and $A$ is the area of the converter.

The minimum efficiency requirements for three different cooling power densities are also shown in Fig. 3.19. For $V_{\text{in}} = 1.8$ V, it is seen that a cooling power density of minimum 5 W/mm$^2$ is required to operate the converter in the entire operating region of interest. If the cooling power density is below 5 W/mm$^2$, the converter’s allowed operating region must be limited accordingly. For this design in particular, cooling is not an issue due to the relatively low maximum output power of 42 mW. From [87], a cooling power density of more than 7 W/mm$^2$ can be achieved using ultrathin manifold microchannel heat sinks. Hence, cooling of the on-chip SC converter for higher output powers is considered feasible using this or similar performing cooling technologies.

### 3.5 Summary

Existing model frameworks for SC converters do not take the effect of the parasitic bottom plate capacitance $C_{bp}$ into account. Therefore, a state space model framework that accurately accounts for the switching losses associated with the parasitic bottom plate capacitance is presented. The model framework can be applied to any SC converter topology. The state space model framework is verified with Matlab Simulink simulations. As Figs. 3.6 and 3.7 show, $C_{bp}$ directly impacts both the operation and efficiency of the converter.
3.5. Summary

A Pareto optimization procedure for SC converters is developed based on the state space model framework. The Pareto optimization procedure uses extracted device parameters from hardware-correlated models in the design space. A model of the 2:1 SC converter is developed and evaluated Pareto optimization procedure to select the optimum parameters for the first SC converter design.

With the availability of high-density deep trench capacitors, a 2:1 SC converter with charge recycling is implemented. A stacked voltage domain gate driver is designed to support the 1.8 V input voltage since the maximum allowable blocking voltage of the thin-oxide transistors in the 32 nm SOI CMOS technology is 1.2 V. As shown in Fig. 3.18, a maximum efficiency of 86% at 4.6 W/mm² power density is measured, thereby proving the feasibility of on-chip SC converters for granular microprocessor power delivery.

The two key learnings from this chapter are:

► Understanding and characterization of the parasitic bottom plate capacitor’s influence on the converter operation and efficiency. The effects are accurately captured by the developed state space model framework for any SC converter topology.

► Experimental results highlight the feasibility of high efficiency and high power density on-chip SC converter designs.

Based on these promising results, on-chip SC converters are investigated further in the subsequent chapters of this thesis. The next chapter details the design, implementation, and experimental results of a complete on-chip switched capacitor voltage regulator.
In The Literature, SC converters are often perceived as 1) low efficiency, 2) characterized by narrow input and/or output voltage ranges, 3) difficult to regulate, and 4) limited to output powers below 100 mW. According to the previous chapter, SC converters can be integrated with high efficiency and high power density. The next step, which is the focus of this chapter, is to design a complete switched capacitor voltage regulator (SCVR) that is suitable for granular microprocessor power delivery and per-core regulation over a wide output voltage range. Furthermore, this chapter suggests that all limitations frequently mentioned in the literature and listed above are unjustified.

A wide output voltage range is required to efficiently support per-core regulation with DVFS. Section 4.1 presents the 2:1 and 3:2 reconfigurable SC converter that provides a wide output voltage range at a fixed input supply voltage. The concept of interleaving is discussed in Section 4.2. Interleaving provides a simple approach to simultaneously reduce the output voltage and input current ripples of the converter. In Section 4.3, the single bound hysteretic control scheme and its implementation are presented. The digital control scheme allows for sub-nanosecond response time to transient load changes. Section 4.4 details the converter model and corresponding Pareto optimization procedure that is used to select the second SC converter design. The complete SCVR design features 16 interleaved phases of the reconfigurable power
stage, and they are controlled by the single bound hysteretic control scheme. The experimental results of the implemented converter are presented in Section 4.5.

This chapter is based on the publications [5] and [8].

4.1 Reconfigurable SC Converters

Since the conversion ratio of a SC converter is fixed by the topology, SC converters are often perceived as being fixed input/output voltage ratio converters. However, this is not the complete picture, as the output voltage can be operated below the voltage resulting from the conversion ratio. For instance, as verified in Fig. 3.18, a 2:1 SC converter supports output voltages below half the input voltage. However, for a microprocessor application using DVFS, the 0.7 V – 1.1 V output voltage range required exceeds the range supported when considering a fixed 1.8 V input voltage. Instead, a 3:2 conversion ratio SC converter, which supports output voltages below two-thirds of the input voltage, may be more suitable. Recall from (1.4) and Fig. 3.6 that the efficiency of an ideal SC converter decreases linearly with the output voltage. It is therefore undesirable to operate SC converters at output voltages far below the conversion ratio voltage. It is for example inefficient to operate a 3:2 SC converter below 900 mV when the input voltage is 1.8 V. Typically, this limitation is overcome by using reconfigurable (gearbox) power stages that can switch between multiple voltage conversion ratios in order to increase the input/output voltage range for which the converter operates efficiently [23, 29, 60, 88, 61, 89]. For instance, a SC converter that can be reconfigured between the 2:1 and a 3:2 configuration can, with 1.8 V input voltage, efficiently support the entire output voltage range of 0.7 V – 1.1 V required by DVFS. This particular power stage is discussed in more detail in the following.

4.1.1 2:1 and 3:2 Reconfigurable Power Stage

The 2:1 and 3:2 reconfigurable power stage is shown in Fig. 4.1. It consists of two flying capacitors and nine transistors that are operated as switches [23, 29]. In each configuration, the two flying capacitors
are sequentially switched between a charging and a discharging state at 50% duty cycle. In the 2:1 configuration shown in Fig. 4.1(a), the switch connecting the two flying capacitors is always off, and the power stage reduces to two 2:1 converters operated in parallel. In the charging state, the two paralleled flying capacitors are in series with the input and the output nodes. In the discharging state, the two paralleled flying capacitors are in parallel with the output node, and the input node is unconnected. In the 3:2 configuration shown in Fig. 4.1(b), the parallel connection of the two flying capacitors is in series between the input and the output nodes in the charging state. In the discharging state,
the series connection of the two flying capacitors is in parallel with the output node, and the input node is unconnected.

### 4.1.2 3:2 SC Converter Model

The model of the 3:2 SC converter is carried out in a similar fashion as the model for the 2:1 SC converter derived in Section 3.2.2. The 3:2 SC converter model is derived in the following, and it is in Section 4.4 used in a Pareto analysis for the second SC converter design.

The 3:2 SC converter from Fig. 4.1(b) has the equivalent circuits in the charging and the discharging states as shown in Fig. 4.2. In the equivalent circuits, the switches are replaced by their on-state resistances \( R_{on1-9} \) when on and an open circuit when off. Note that switches \( S_4 \) and \( S_7 \) are off in this configuration. The flying capacitor model includes its equivalent series resistance \( R_{esr} \) and the bottom plate capacitor \( C_{bp} \) connected to ground. The input and output nodes are modeled as ideal dc voltage sources.

The application of KVL and KCL put into the form of (3.13) yields the system matrices

\[
\begin{align*}
C &= \begin{pmatrix} C_1 & 0 & 0 & 0 \\
0 & C_2 & 0 & 0 \\
0 & 0 & C_{bp1} & 0 \\
0 & 0 & 0 & C_{bp2} \end{pmatrix}, \\
i &= \begin{pmatrix} i_{C1} \\
i_{C2} \\
i_{Cbp1} \\
i_{Cbp2} \end{pmatrix}, \\
v &= \begin{pmatrix} v_{C1} \\
v_{C2} \\
v_{Cbp1} \\
v_{Cbp2} \end{pmatrix}, \\
u &= \begin{pmatrix} V_{in} \\
V_{out} \end{pmatrix},
\end{align*}
\]

\[
E_1 = \begin{pmatrix}
R_{on1} + R_{esr1} & 0 & 0 & 0 \\
0 & R_{on6} + R_{esr2} & 0 & 0 \\
R_{on3} & 0 & -R_{on3} & 0 \\
0 & R_{on8} & 0 & -R_{on8} 
\end{pmatrix},
\]

\[
E_2 = \begin{pmatrix}
R_{on2} + R_{esr1} & 0 & 0 & 0 \\
0 & R_{on5} + R_{esr2} & 0 & 0 \\
1 & -1 & -1 & 0 \\
0 & R_{on9} & 0 & -R_{on9} 
\end{pmatrix},
\]
4.1. Reconfigurable SC Converters

(a) Charging state

(b) Discharging state

\[ V_{in} \quad q_{in1} = 0 \quad q_{in2} \quad V_{out} \]

\[ q_{C1} \quad q_{C2} \quad q_{C_{bp1}} \quad q_{C_{bp2}} \]

\[ C_1 \quad C_2 \quad C_{bp1} \quad C_{bp2} \]

\[ + \quad - \quad + \quad - \quad + \quad - \]

\[ v_{C1} \quad v_{C2} \quad v_{C_{bp1}} \quad v_{C_{bp2}} \]

\[ R_{on1} \quad R_{on6} \quad R_{on2} \quad R_{on9} \quad R_{on3} \quad R_{on8} \quad R_{on5} \]

\[ R_{esr1} \quad R_{esr2} \quad R_{esr1} \quad R_{esr2} \]

\[ V_{out} \quad q_{out1} \quad q_{out2} \]

Figure 4.2: The 3:2 SC converter equivalent circuits in (a) the charging state and (b) the discharging state include the switch on-state resistances \( R_{on1-9} \), the capacitor equivalent series resistances \( R_{esr1-2} \), and the parasitic bottom plate capacitors \( C_{bp1-2} \).
\[
F_1 = \begin{pmatrix}
1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 \\
0 & 0 & -1 & 0 \\
0 & 0 & 0 & -1
\end{pmatrix},
\]
\[
F_2 = \begin{pmatrix}
1 & 0 & 1 & 0 \\
0 & 1 & -1 & 1 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & -1
\end{pmatrix},
\]
\[
G_1 = \begin{pmatrix}
-1 & 0 \\
-1 & 0 \\
0 & 1 \\
0 & 1
\end{pmatrix},
\]
\[
G_2 = \begin{pmatrix}
0 & -1 \\
0 & 0 \\
0 & 0 \\
0 & 0
\end{pmatrix}.
\]

Now the state space model derived above can be applied to calculate the capacitor charges in (3.19) and (3.20). From Fig. 4.2, the output and input charge in each state can be found as

\[
q_{\text{out}1} = qC_1 - qC_{bp1} + qC_2 - qC_{bp2}, \quad (4.1)
\]
\[
q_{\text{in}1} = qC_1 + qC_2, \quad (4.2)
\]
\[
q_{\text{out}2} = qC_1, \quad (4.3)
\]
\[
q_{\text{in}2} = 0, \quad (4.4)
\]

and the total average output current over a full switching period becomes

\[
I_{\text{out}} = \frac{q_{\text{out}1} + q_{\text{out}2}}{t_1 + t_2} = \left(2qC_1 + qC_2 - qC_{bp1} - qC_{bp2}\right) f_{\text{sw}}. \quad (4.5)
\]

Likewise, the total average input current is

\[
I_{\text{in}} = \frac{q_{\text{in}1} + q_{\text{in}2}}{t_1 + t_2} = (qC_1 + qC_2) f_{\text{sw}}. \quad (4.6)
\]

Using (4.5) and (4.6), the total efficiency of the 3:2 SC converter can be calculated as

\[
\eta = \frac{P_{\text{out}}}{P_{\text{in}}} = \frac{V_{\text{out}} I_{\text{out}}}{V_{\text{in}} I_{\text{in}}} = \frac{V_{\text{out}}}{V_{\text{in}}} \frac{2qC_1 + qC_2 - qC_{bp1} - qC_{bp2}}{qC_1 + qC_2}. \quad (4.7)
\]

As can be seen, if \(C_{bp1}\) and \(C_{bp2}\) are neglected, then \(qC_{bp1} = qC_{bp2} = 0\) and \(qC_1 = qC_2\), and the efficiency in (4.7) reduces to the ideal SC converter efficiency in (1.4) for \(M = 2/3\).

To port this analysis to the equivalent circuit model from Fig. 3.4,
the resistances can be determined as

\[ R_{\text{eq}} = \frac{M V_{\text{in}} - V_{\text{out}}}{I_{\text{out}}} = \frac{2}{3} \frac{V_{\text{in}} - V_{\text{out}}}{(2qC_1 + qC_2 - qC_{bp1} - qC_{bp2}) f_{\text{sw}}}, \]  \(4.8\)

\[ R_{bp} = \frac{M V_{\text{in}}}{\frac{1}{M} I_{\text{in}} - I_{\text{out}}} = \frac{2}{3} \frac{V_{\text{in}}}{\left(\frac{qC_2 - qC_1}{2} + qC_{bp1} + qC_{bp2}\right) f_{\text{sw}}}, \]  \(4.9\)

where \( M = 2/3 \) is the voltage conversion ratio.

### 4.1.3 Stacked Transistor Implementation

For the implementation of the reconfigurable power stage in the 32 nm SOI CMOS technology, the switches are implemented using thin-oxide transistors and the capacitors are implemented using deep trench capacitors. As discussed in Section 1.5.1, a thin-oxide transistor from this technology experiences an overvoltage situation if any pair of transistor terminals exceeds \( V_{\text{max}} = 1.2 \text{ V} \). The reconfigurable SC converter power stage in Fig. 4.1 inherently exposes transistors \( S_5 \) and \( S_6 \) to overvoltage, and transistor \( S_7 \) inherently exhibits undesired turn on behavior. The following discusses how these undesired circuit behaviors are overcome.

The transistor level schematic of the reconfigurable SC converter power stage is shown in **Fig. 4.3**. As seen, the power stage includes three stacked transistors \( S_{5s}, S_{6s}, \) and \( S_{7s} \). The purpose of \( S_{5s} \) and \( S_{6s} \) is to avoid overvoltage situations inherent in the power stage. Regarding \( S_{7s} \), it is important to note that the transistors in the 32 nm semiconductor technology are layout symmetrical, meaning that there is no physical difference between the drain and source terminals. This implies that a transistor can be turned on with zero gate-source voltage if the gate-drain voltage is above the threshold voltage. The purpose of \( S_{7s} \) is to prevent undesired turn on due to layout symmetry. The following discusses the function and operation of each stacked transistor in more detail.

**Stacked Transistor \( S_{5s} \)**

From Fig. 4.1, it is clear that \( S_5 \) in Fig. 4.3 should always be off in the 2:1 configuration. However, simply grounding its gate to turn it
Reconfigurable power stage implementation

Figure 4.3: The 2:1 and 3:2 reconfigurable SC converter power stage implementation including stacked transistors $S_{5s}$ and $S_{6s}$ that protect against overvoltage situations and $S_{7s}$ that prevents undesired turn on of $S_7$.

... leads to an overvoltage situation since the voltage at node $V_2$ is $V_{in}$ in the charging phase of the 2:1 configuration, thereby resulting in an unacceptably high gate-source voltage of $-V_{in}$ for $S_5$. Stacked transistor $S_{5s}$, which has its gate tied to $V_{out}$, results in a tolerable gate-source voltage of $V_{out} - V_{in}$. However, $S_5$ is still needed due to layout symmetry: without $S_5$, the gate-drain of $S_{5s}$ would equal $V_{out}$ in the discharging state of the 2:1 configuration because $V_1 = 0$ V. This would lead $S_{5s}$ to undesirably turn on. For these reasons, both $S_5$ and $S_{5s}$ are needed, and together they hinder the overvoltage situation and ensure the desired turn off in the 2:1 configuration.
4.1. Reconfigurable SC Converters

Stacked Transistor $S_{6s}$

From Fig. 4.3, it is seen that the voltage at node $V_2$ can get as low as $V_{\text{out, min,3:2/2}} = 900\text{ mV}/2 = 450\text{ mV}$ in the 3:2 configuration. This low voltage imposes an overvoltage situation of $V_{\text{in}} - 450\text{ mV} = 1.35\text{ V}$ on the drain-source and gate-drain terminals of $S_6$. For the gate-drain overvoltage, the stacked transistor $S_{6s}$ effectively overcomes this overvoltage situation in a similar manner as for $S_{5s}$ discussed above. With the gate of $S_{6s}$ tied to $V_{\text{out}}$, the desired switching in both configurations is solely determined by the switching of $S_6$. For the drain-source overvoltage, the stacked transistor implementation shares the voltage between the two transistors, thereby eliminating the overvoltage situation.

Stacked Transistor $S_{7s}$

Stacked transistor $S_{7s}$ is inserted to prevent an undesired turn on of $S_7$ in the 3:2 configuration, where, according to Fig. 4.3, it should always be turned off. As before, the voltage at $V_2$ can be as low as 450 mV, which, with $v_{g7}$ tied to $V_{\text{out}}$, results in a positive gate-drain voltage of $S_7$, causing it to undesirably turn on due to layout symmetry. Stacked transistor $S_{7s}$ effectively ensures that $S_7$ does not turn on in the 3:2 configuration, while at the same time ensuring the transistor turns on as desired in the 2:1 configuration.

4.1.4 Gate Driver

The gate driver for the reconfigurable SC converter power stage is basically the same stacked voltage domain gate driver as discussed in Section 3.3.2. Tab. 4.1 shows how the gate driver output waveforms from Fig. 3.15(b) are shared among the transistors in the reconfigurable power stage. For transistor $S_4$, $S_5$, $S_7$, and $S_{7s}$ that have different gate signals between configurations, logic multiplexers controlled by an external gear signal are added to the gate driver circuit. Following this

---

1Principally, the output voltage can be lower than 900 mV in the 3:2 configuration. This would result in an even lower voltage at $V_2$ making the need for $S_{6s}$ more profound. However, this design is supposed to be operated in the 2:1 configuration for $V_{\text{out}} < 900\text{ mV}$. 

---
Table 4.1: Gate signals for all transistors in the 2:1 and 3:2 reconfigurable SC converter power stage.

<table>
<thead>
<tr>
<th>Gate signal</th>
<th>2:1 configuration</th>
<th>3:2 configuration</th>
</tr>
</thead>
<tbody>
<tr>
<td>$v_{g1}$</td>
<td>$v_{g,pH}$</td>
<td>same as 2:1</td>
</tr>
<tr>
<td>$v_{g2}$</td>
<td>$v_{g,nH}$</td>
<td>same as 2:1</td>
</tr>
<tr>
<td>$v_{g3}$</td>
<td>$v_{g,pL}$</td>
<td>same as 2:1</td>
</tr>
<tr>
<td>$v_{g4}$</td>
<td>$v_{g,pH}$</td>
<td>gnd</td>
</tr>
<tr>
<td>$v_{g5}$</td>
<td>gnd</td>
<td>$V_{out}$</td>
</tr>
<tr>
<td>$v_{g5s}$</td>
<td>$V_{out}$</td>
<td>same as 2:1</td>
</tr>
<tr>
<td>$v_{g6}$</td>
<td>$v_{g,pH}$</td>
<td>same as 2:1</td>
</tr>
<tr>
<td>$v_{g6s}$</td>
<td>$V_{out}$</td>
<td>same as 2:1</td>
</tr>
<tr>
<td>$v_{g7}$</td>
<td>$v_{g,nH}$</td>
<td>$V_{out}$</td>
</tr>
<tr>
<td>$v_{g7s}$</td>
<td>gnd</td>
<td>$V_{out}$</td>
</tr>
<tr>
<td>$v_{g8}$</td>
<td>$v_{g,pL}$</td>
<td>same as 2:1</td>
</tr>
<tr>
<td>$v_{g9}$</td>
<td>$v_{g,nL}$</td>
<td>same as 2:1</td>
</tr>
</tbody>
</table>

gate signal distribution, the transistors in the reconfigurable power stage switch as desired without overvoltage and/or the desired turn on and turn off states of all transistors are ensured.

4.2 Interleaving

Interleaving is a popular technique employed in SCVRs [29, 23, 90, 25, 88, 60, 16]. Instead of implementing only a single SC converter unit, the SC converter is divided into several smaller units having the input clock signals phase shifted with respect to each other.

Fig. 4.4 shows how employing interleaving is a means to simultaneously reduce the input current ripple and the output voltage ripple without excessive input and output decoupling capacitors. Although the input current and output voltage ripple decays are shown to be linear, they are strictly exponential waveforms following Fig. 3.2.
4.2. Interleaving

(a) Without interleaving

(b) With interleaving

**Figure 4.4:** SC converter implementation both with and without interleaving. Interleaving of SC converter units is a means to simultaneously reduce the input current ripple and the output voltage ripple, thereby greatly reducing (or even omitting) the need for input and output decoupling capacitors.

With a single-phase implementation as shown in Fig. 4.4(a), the output voltage and input current ripples typically require substantial decoupling capacitance to meet ripple specifications. This directly translates into occupying significant chip area with capacitors. With an $N$-phase interleaving implementation as shown in Fig. 4.4(b), whenever a SC converter phase changes state from charging to discharging or vice versa, the flying capacitors of the remaining $N - 1$ phases effectively act as decoupling to that switching event. Interleaving is therefore a tech-
nique to utilize the capacitances of the flying capacitors for decoupling. Hence, the input and output decoupling capacitors required to reduce the steady state ripples can be greatly reduced or even omitted, thereby saving precious chip area.

The drawback of interleaving is the slightly more complex clock generation required. However, compared to the major ripple reductions attained, interleaving proves a beneficial design technique for an on-chip SC converter design. Furthermore, interleaving works very well with the single-bound hysteretic control scheme discussed next.

4.3 Single Bound Hysteretic Control

A single bound hysteretic control scheme is implemented as the overall control loop [60, 91]. This control scheme is advantageous due to 1) its simple implementation, 2) its inherently stable operation, and 3) its high control bandwidth. This control scheme modulates the switching frequency to regulate the SC converters’ equivalent output resistance $R_{eq}$ to achieve the desired output voltage, see Fig. 3.7. Switching frequency modulation is a popular control technique for on-chip SC converters [60, 23, 88, 16, 90, 22, 25].

The single bound hysteretic control scheme implementation considered in this thesis consists of a clocked comparator that, in a sampled manner, compares the output voltage $V_{out}$ with a reference voltage $V_{ref}$. The clocked comparator produces a clock trigger $clk_{trig}$, which is fed to a digital clock interleaver that manages the input clock phases of the interleaved SC converter. The concept of the single bound hysteretic control scheme is shown in Fig. 4.5. As can be seen, $clk_{trig}$ transitions to logic high whenever $V_{out}$ is less than $V_{ref}$ at the rising edge of $clk_{cc}$. A rising edge on $clk_{trig}$ causes the digital clock interleaver (discussed below) to change the state of the next SC converter unit to deliver more charge to the output, thereby causing a rise in the output voltage. The output voltage slope is shown as vertical for simplicity, but the actual slope during the rise follows the flying capacitor voltage waveform from Fig. 3.2. If $V_{out}$ is greater than $V_{ref}$ at the rising edge of $clk_{cc}$, $clk_{trig}$ remains logic low and the clock pulse is skipped, and no SC converter unit changes state.
4.3. Single Bound Hysteretic Control

Single bound hysteretic regulation

Figure 4.5: The single bound hysteretic control scheme produces a trigger clock $clk_{trig}$ whenever the output voltage is below the reference voltage at the sampling event. A rising edge of $clk_{trig}$ changes the state of a SC converter unit to deliver more charge to the output node.

The circuit schematic of the clocked comparator with reset is shown in Fig. 4.6. Based on $clk_{cc}$, the clocked comparator generates $clk_{trig}$ used in the single bound hysteretic control illustrated in Fig. 4.5. From the circuit schematic in Fig. 4.6(a), the inputs of the clocked comparator are the differential transistor pair with $V_p$ and $V_n$. The comparator structure is a sense-amp latch, also known as a Strong-ARM latch. Offset calibration of the comparator is performed by the second differential transistor pair with $oc_p$ and $oc_n$. The clocked comparator with reset outputs a pulse on $out_p$ and no pulse on $out_n$ whenever $V_p < V_n$ following a rising edge of $clk_{cc}$. It outputs a pulse on $out_n$ and no pulse on $out_p$ whenever a $V_p > V_n$ following a rising edge of $clk_{cc}$. For the single bound hysteretic control, $out_n$ is not used. The detailed analysis and design of the clocked comparator, which initially is designed to be used in high-speed analog to digital converters, are treated further in [92].

In Fig. 4.6(b), the clocked comparator symbol is illustrated. The clocked comparator equivalent circuit, which is shown in Fig. 4.6(c), consists of an ideal comparator followed by a flip-flop that performs the sampling event at each rising edge of the comparator clock $clk_{cc}$. The AND gate provides a reset of the output trigger signal $clk_{trig}$ before the subsequent sampling event. If the equivalent circuit is implemented in
Figure 4.6: Clocked comparator with reset used to implement the single bound hysteretic control scheme.

A circuit simulator, it should be ensured that clk_{trig} contains no glitches from possible non-zero propagation delays in the ideal comparator, flip-flop, and AND gate.

4.3.1 Digital Clock Interleaver

The clocked comparator output clk_{trig} is fed to the digital clock interleaver, whose function is to provide the phase shifted clock signals to the interleaved SC converter units. Fig. 4.7 shows the implemented
4.3. Single Bound Hysteretic Control

\[ N = 2^b \text{ phase digital clock interleaver} \]

\[ \begin{align*}
&\text{clk}_{\text{trig}} \\
&\text{clk}_{\text{MSB}} \\
&\text{clk}_0 \\
&\text{clk}_1 \\
&\text{clk}_2 \\
&\text{clk}_3 \\
\end{align*} \]

\[ 1/f_{cc} \]

\[ 1/f_{sw} \]

\[ t \]

**Figure 4.7:** The digital clock interleaver divides the high frequency \( \text{clk}_{\text{trig}} \) signal into \( N \) clock phases for the interleaved SC converter units. Shown for \( N = 4 \) (\( b = 2 \)) interleaved phases.

digital clock interleaver, where a shift register performs frequency division of the high frequency \( \text{clk}_{\text{trig}} \) signal. The outputs are \( N = 2^b \) clock phases, each of which is fed to an individual SC converter unit.

Prior art implementations of this control scheme use the inverted output of the last flip-flop and feed it to the first flip-flop of the shift register [60, 88]. Doing this requires the shift register to be properly initialized. As shown in Fig. 4.7, we use the most significant bit (MSB) of a \((b+1)\)-bit counter as input to the shift register. This solution does not require any initialization of the shift registers since the desired flip-flop states are reached once the counter has completed one full count cycle. This still holds if the counter is not initialized to start at counting from zero. Hence, using the counter’s MSB as input to the shift register is a very robust implementation of the single bound hysteretic control scheme.
4.3.2 $f_{sw,\text{max}}$ and Loop Latency

Assuming that no pulses of $clk_{\text{trig}}$ are skipped by the clocked comparator (as illustrated in Fig. 4.7), the maximum switching frequency $f_{sw,\text{max}}$ of each SC converter unit is limited by the digital clock interleaver to

$$f_{sw,\text{max}} = \frac{1}{2} \frac{f_{cc}}{N} = \frac{f_{cc}}{2^{b+1}}, \quad (4.10)$$

where $f_{cc}$ is the clock frequency of $clk_{cc}$. The factor $1/2$ is due to the fact that it takes two rising edges of $clk_{trig}$ per SC converter clock period.

Using (4.10), the maximum switching frequency for a given number of interleaved stages $N$ is shown to be limited by $f_{cc}$. It is therefore desirable to select $f_{cc}$ as high as possible to allow for a large number of interleaved stages. However, the loop latency, which is the propagation delay from when the sampling event occurs until the corresponding SC converter unit changes state, imposes an upper limit to $f_{cc}$. The total loop latency $t_{\text{lat}}$ is the sum of the propagation delays of 1) the clocked comparator, 2) the digital clock interleaver, 3) the gate driver, and 4) additional parasitic wiring capacitances. Therefore, $f_{cc}$ should be limited to

$$f_{cc} < \frac{1}{t_{\text{lat}}}, \quad (4.11)$$

If the criteria in (4.11) is not met, a double sample event occurs in which the subsequent sampling event is triggered before the present sampling event has had its effect on the output voltage. When a double sampling event occurs, two SC converter units change switching states where only one unit is required to. Although this is not critical for the stability of the control loop, the output voltage experiences a higher voltage ripple, thereby not fully exploiting the ripple reductions of the interleaving scheme [91].

4.4 Second SC Converter Design

The second SC converter design implements the 2:1 and 3:2 reconfigurable converter presented in Section 4.1. For this power stage, both
Table 4.2: Electrical specifications and design space for the second SC converter design.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>2:1</th>
<th>3:2</th>
</tr>
</thead>
<tbody>
<tr>
<td>$M$</td>
<td>1/2</td>
<td>2/3</td>
</tr>
<tr>
<td>$V_{\text{in}}$</td>
<td>1.8 V</td>
<td>1.8 V</td>
</tr>
<tr>
<td>$V_{\text{out}}$</td>
<td>850 mV</td>
<td>1.1 V</td>
</tr>
<tr>
<td>$I_{\text{out}}$</td>
<td>34 mA</td>
<td>37 mA</td>
</tr>
<tr>
<td>$X_C$</td>
<td>100 μm...5000 μm</td>
<td></td>
</tr>
<tr>
<td>$T_w$</td>
<td>100 μm...300 MHz</td>
<td></td>
</tr>
<tr>
<td>$f_{\text{sw}}$</td>
<td>10 MHz...300 MHz</td>
<td></td>
</tr>
<tr>
<td>$A_C$</td>
<td>$5.129 \cdot 10^{-6}$ mm$^2$</td>
<td></td>
</tr>
<tr>
<td>$A_T$</td>
<td>$0.231 \cdot 10^{-6}$ mm$^2$/μm</td>
<td></td>
</tr>
</tbody>
</table>

the 2:1 SC converter model in Section 3.2.2 and the 3:2 SC converter model in Section 4.1.2 are used. The Pareto optimization procedure follows the flowchart depicted in Fig. 3.11.

The electrical specifications and parameter design space for the reconfigurable SC converter design are listed in Tab. 4.2. The design space is similar as for the first SC converter design except for $A_T$ which has been reduced due to improvements in the transistor layout. Still, the transistor area $A_T$ includes the last stage of the gate driver buffer and power grid margin to give a more realistic transistor area estimation than simply the active transistor area. Although $T_w$ represents the width of each transistor in the power stage, PMOS transistors are 15% wider than NMOS transistors following a design rule recommendation, but this is not a strict requirement. Special care is taken regarding the stacked transistor implementation discussed in Section 4.1.3, since they affect the on-state resistances of the switches. Finally, using the parameter extraction methods described in Section 3.2.4, the on-state resistances of transistors $S_5$ and $S_{5s}$ are extracted with a lower gate-source voltage owing the gate signal scheme shown in Tab. 4.1. In short, the combined on-state resistance of $S_5$ and $S_{5s}$ is approximately five times the on-state resistance of a transistor with the full gate-source voltage.
Table 4.3: Selected Pareto optimized design for second SC converter hardware implementation.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>2:1</th>
<th>3:2</th>
</tr>
</thead>
<tbody>
<tr>
<td>$X_C$</td>
<td>600</td>
<td></td>
</tr>
<tr>
<td>$T_w$</td>
<td>560 $\mu$m</td>
<td></td>
</tr>
<tr>
<td>$f_{sw}$</td>
<td>100 MHz</td>
<td></td>
</tr>
<tr>
<td>$I_{out}$</td>
<td>36.9 mA</td>
<td>37.2 mA</td>
</tr>
<tr>
<td>$\eta$</td>
<td>86.5%</td>
<td>84.9%</td>
</tr>
<tr>
<td>$\rho$</td>
<td>4.4 W/mm$^2$</td>
<td>5.2 W/mm$^2$</td>
</tr>
</tbody>
</table>

As discussed above, the reconfigurable power stage reduces to two 2:1 SC converters operated in parallel in the 2:1 configuration. To account for this in the Pareto optimization, the 2:1 SC converter model in Section 3.2.2 is applied with each transistor width $T_w$ being the combined width of the two parallel contributions. For instance, the on-state resistance for $S_1'$ becomes $R'_{on1} = R_{on1}|| (R_{on6} + R_{on6a})$. The same approach is used for the other transistors as well as the flying capacitors. In this way, the stacked transistor implementation is taken into account for the Pareto optimization of the reconfigurable converter operated in the 2:1 configuration.

The results of the Pareto optimization procedure applied on the reconfigurable SC converter are shown in Fig. 4.8. A Pareto optimal design with a reasonable tradeoff between efficiency and power density in both configurations is selected and marked by the cross on the Pareto fronts. The design and its performance are listed in Tab. 4.3. The selected design is implemented in the Cadence design environment for additional testing using hardware-correlated models and for layout of the final converter. The simulated efficiency and power for the complete converter schematic are shown as the dot in Fig. 4.8. As seen, there is good agreement between the modeled and simulated performances. A slightly lower efficiency is observed. Recall from (3.37) that the state space model takes the power losses of charging and discharging the transistor input capacitances into account, but the state space model does not include the gate driver circuits that generate the gate signals.
4.4. Second SC Converter Design

Figure 4.8: Resulting Pareto fronts for the second SC converter design in both the $M = 1/2$ and $M = 2/3$ configuration. The cross represents chosen design point and the dot marks the simulated results of the complete converter schematic.
The slightly lower efficiency is attributed to the additional losses of the gate driver, which is discussed in Section 3.3.2. The slightly lower power density result is attributed to the fact that the converter area estimation is based on a 100% area utilization of the transistors and capacitors. However, the final layout has a lower area utilization due to layout constraints, thereby affecting the area estimation.

The Pareto optimization investigations of the reconfigurable SC converter in the 2:1 and 3:2 configurations are shown in Fig. 4.9 and Fig. 4.10, respectively. These investigations are used to get insight into how the design space parameters, especially $X_C$, $T_w$, and $f_{sw}$, affect the efficiency and power density of the converter in both configurations. The conclusions drawn are similar to those described in Section 3.3.

The measurement results of the second and third chips, which are detailed in Sections 4.5 and 5.3, respectively, are added to the Pareto optimization shown in Fig. 4.8. As seen, there is not a very good match between measured and modeled efficiency and power density in both configurations for both chip 2 and chip 3. The measurement setup of these converters will be discussed in later sections. However, there are a number of uncertainties in the measurement setup that could describe the discrepancy between the model and measurement results. The following list discusses several measurement uncertainties:

- **IR voltage drops in the power grid**
  The layout of the entire converter incorporates a regular top level power grid formed by the top metal layers of the metal stack. However, the power grid is resistive, which result in IR voltage drops that influence the on-chip voltage measurements. This effect depends on the amount of current flowing in the power grid. Since chip 3 is higher power than chip 2, the effect is seen to be more profound for chip 3.

- **On-chip load resistance measurements**
  The resistance characterization of the on-chip load is measured using a 4-point measurement to neglect the voltage drop of the cabling and interconnects to the prototype chip. However, due to the resistive power grid, the measured Kelvin voltage at the top metal layer may be different than the actual voltage across the load resistors at the bottom metal layer, and this may influ-
Figure 4.9: Design space investigation showing how the design space parameters $X_C$, $T_w$, and $f_{sw}$ affect the efficiency and power density of the reconfigurable SC converter in the 2:1 configuration.
(a) Pareto optimization investigation 3:2 I

(b) Pareto optimization investigation 3:2 II

Figure 4.10: Design space investigation showing how the design space parameters $X_C$, $T_w$, and $f_{sw}$ affect the efficiency and power density of the reconfigurable SC converter in the 3:2 configuration.
ence the estimated output power, and thereby efficiency, of the converter.

▶ On-chip temperature effects
Parameters of a semiconductor device generally change with temperature. As argued in Section 3.2.4, a constant temperature of the thermal model is considered since the microprocessor cores, and not the on-chip SC converter, dictate the die temperature. However, for the prototypes presented here, the on-chip resistive load results in a chip die temperature that varies with the input power of the converter. For chip 2, no additional temperature considerations are taken into account. For chip 3, a thermal model, see Section 5.3.1, is developed to take the effect of the temperature increase on the on-chip load resistance into account.

▶ Device parameter extraction mismatches
From Section 3.2.4, the device parameters, which are used in the model evaluation, are extracted under constant voltage, current, and temperature conditions. As seen, there is a good agreement with the simulation setup in the Cadence design environment. However, it is challenging to verify the device parameters on-chip as well as the conditions. Typically, IC designs need to function within $\pm 30\%$ tolerances on all device parameters following the uncertainties introduced during the chip manufacturing processes.

▶ "Black box" measurement setup
In general, the measurement setups only allow to verify the terminal behavior but internal nodes are not available for measurements. This makes a complete verification virtually impossible, as individual blocks, devices, controllers, etc. cannot be independently nor accurately verified. Therefore, the on-chip converter can be treated only as a "black box", where discrepancies with expected results cannot be fully investigated.

### 4.4.1 System Overview

The design of the reconfigurable SC converter presented above is combined with the concepts of interleaving discussed in Section 4.2 and the single bound hysteretic control scheme presented in Section 4.3. The complete system overview of the implemented SCVR is illustrated in
Chip 2: System overview

Figure 4.11: Complete SCVR system consisting of a 16-phase reconfigurable SC converter and the single bound hysteretic control scheme formed by a clocked comparator with digital clock interleaver. The on-chip programmable load enables characterization of steady state performances under various loading conditions as well as transient responses during load steps.

Fig. 4.11. The chip features 16 interleaved phases of the 2:1 and 3:2 reconfigurable SC converter design from above. This implies that the output power is 16x higher than designed for, but the efficiency and power density performances are the same. The changing between the 2:1 and the 3:2 configuration is set by the signal gear, which is configured externally through a digital configuration interface (not shown). Also, the programmable load is configured externally by the digital configuration interface by enabling the signals $e_{1..31}$. For no load, all signals $e_{1..31}$ are logic 0.

A $N = 16$ ($b = 4$) single bound hysteretic control scheme is employed. The clock frequency of the clocked comparator is $f_{cc} = 4$ GHz, which, using (4.10), results in $f_{sw,max} = 125$ MHz maximum switching frequency of each SC converter unit. From extracted layout simulations,
the total loop latency is \( \sim 200 \text{ ps} \), thereby fulfilling the criteria in (4.11) since \( f_{\text{cc}} = 1/250 \text{ ps} < 1/200 \text{ ps} \). Therefore, having 250 ps sampling period of the clocked comparator enables sub-nanosecond response times to transient load changes.

An on-chip programmable load is implemented to verify the system performance under various loading conditions and to investigate the transient response time following a load step. The on-chip programmable load consists of an array of 31 resistors (resulting in 32 different load values including no load), each of which is in series with a switch. The resistance of the programmable load can be externally configured through the digital configuration interface.

As a comment on the implemented interleaving scheme, the 16 interleaved clock phases in this design are distributed between 0° – 180°. Having 32 phases interleaved from 0° – 360° instead of 16 phases interleaved from 0° – 180° does not lead to a reduction of the output voltage ripple since the frequency of the output voltage ripple is twice the switching frequency, as shown in Fig. 4.4. However, it does affect the input current ripple since the frequency doubling of the ripple does not occur there. In hindsight, this design could have easily been implemented with 32 phases interleaved from 0° – 360° using twice the number of power stages clocked with the inverse of the 16 interleaved clock phases. The third hardware results, which are presented in the next chapter, use interleaving from 0° – 360°.

### 4.5 Second Hardware Results

The chip photo of the 16-phase interleaved SCVR with single bound hysteretic control (digital controller) is shown in Fig. 4.12. The SC converter units are placed next to each other with the digital controller in the middle. Due to interleaving, no dedicated output decoupling capacitance and only very little input decoupling capacitance are implemented. Hence the SC converter units take up the majority of the total available chip area. Also shown in Fig. 4.12 is the layout of the SC converter unit. The deep trench capacitors take up 72.1%, the transistors 27.3%, and the gate driver 0.6% of the total converter area. A regular
Figure 4.12: Chip photo of the 16-phase interleaved SCVR implemented in the 32 nm technology with high density deep trench capacitors. The total converter area is 0.15 mm².

The top-level power grid consisting of $V_{\text{in}}$, $V_{\text{out}}$, and gnd covers the entire active chip area to minimize power grid resistance and inductance.

4.5.1 Measured Efficiency and Power Density

Measurements are carried out using GBB PicoProbe needles on the unpackaged chip die mounted on a probe station. An overview of the measurement setup is depicted in Fig. 4.13. The input and output voltages are measured using Kelvin contacts to account for the voltage drops of cable and contact resistances. A Keithley SourceMeter is used as input supply. The input power $P_{\text{in}}$ is estimated using the current displayed on the input supply and the measured Kelvin input voltage. However, $P_{\text{in}}$ does not include the power consumption of the digital controller as it is not possible to separate that particular power consumption from the total digital circuit power consumption, which includes several housekeeping functions for testing that are not part of
4.5. Second Hardware Results

**Measurement setup**

![Measurement setup diagram](image)

**Figure 4.13:** Overview diagram of the measurement setup with Kelvin contacts, 4-point resistance measurement, and oscilloscope for transient response measurements. This setup is used for chip 2 and chip 3.

The measured efficiency and power density for three different load levels are shown in **Fig. 4.14**. For $V_{in} = 1.8$ V, the efficiency is above 70% over the DVFS output voltage range of 0.7 V – 1.1 V. The maximum efficiency is 86% at 2.2 W/mm$^2$ power density in the 2:1 configuration and 90% at 3.7 W/mm$^2$ power density in the 3:2 configuration. The measured efficiency over output power for four different output voltages is shown in **Fig. 4.15**. As seen, the maximum output power varies with the output voltage and it is limited by the lowest resistance value of the on-chip programmable load. The maximum output power for this converter is 840 mW at $V_{out} = 1090$ mV.

### 4.5.2 Measured Transient Response

Transient responses are measured using a 20 GHz, 50 GS/s Textronix DSA72004 oscilloscope. The Kelvin contacts are probed using 40 GHz
Figure 4.14: Measured efficiency and power density for three different load levels with $V_{\text{in}} = 1.8$ V. 1x nominal load corresponds to the case where 16 out of the 31 resistors in the programmable load are enabled.
4.5. Second Hardware Results

Chip 2: Measured output power

![Graph showing measured efficiency over output power for four different output voltages with $V_{in} = 1.8$ V.]

Figure 4.15: Measured efficiency over output power for four different output voltages with $V_{in} = 1.8$ V.

needles from GGB Industries, Inc. and 30 GHz Sucoflex cables are used to connect the probes to the oscilloscope.

The on-chip programmable load can provide a load step between any two load levels within 50 ps. The transient responses are measured when stepping between 30 mA and 365 mA output current at $V_{out} = 850$ mV. This load step characteristic equates to a load current slope of 7 A/ns. Such fast load steps are used to evaluate the sub-nanosecond response under worst-case conditions [22].

The measured transient responses for both step up and step down are shown in Fig. 4.16. At $t_{up}$, a transient step up event occurs. Prior to $t_{up}$, the converter is operated in light load and a low-frequency ripple is seen on the output voltage since the SC converter units are operated at low switching frequency. Right after $t_{up}$, the control scheme abruptly increases the switching frequency to make the SC converter deliver more current to the output and thereby maintain the output voltage. Accordingly, the frequency of the output voltage ripple increases. As seen
Figure 4.16: Transient responses for $V_{\text{out}} = 850$ mV and $V_{\text{in}} = 1.8$ V showing the sub-nanosecond response time of the single bound hysteretic control scheme. The output voltage is maintained for the duration $t_1 \approx 15$ ns after the first transient event at $t_{\text{up}}$. The output voltage droop starting at $t_{\text{up}} + t_1$ is caused by the droop of the input voltage. In steady state, the measured output voltage ripple is $V_{\text{ripple,pp}} = 30$ mV.

In the zoom in Fig. 4.16, the output voltage is maintained for a duration of $t_1 \approx 15$ ns after the transient event occurs. This verifies the sub-nanosecond response time of the digital controller, since the output voltage would droop immediately if regulation had not been applied, i.e. if the digital controller would not have increased the switching frequency within a nanosecond. At $t_{\text{up}} + t_1$, the output voltage is seen to experience a significant droop. The cause of this droop is to be found on the input side of the converter, as the input voltage droops as well.
As described in Section 4.1, the output voltage in the 2:1 configuration cannot be higher than half the input voltage. With the input voltage droop seen in Fig. 4.16, the input voltage is less than twice the desired output voltage, thereby causing the output voltage to droop accordingly. The input voltage droop is a result of the rapid current change, which is being limited by the parasitic package inductance of the PDN from Fig. 1.2. As soon as the input power supply recovers the input node, the output voltage is again maintained at the desired level. Although the measurement setup does not directly reflect a microprocessor PDN, simulation results using netlists extracted from real microprocessor PDNs show similar droops on the input and output nodes. Clearly, such droops must be accounted for to fit the microprocessor power delivery application. A solution to be considered is to implement more decoupling capacitance on the input side, but this would penalize the achievable power density. The next chapter details a novel feedforward control scheme that mitigates the output voltage droop caused by the transient input voltage droop.

Also shown in Fig. 4.16 is the step down event at $t_{\text{down}}$. Before $t_{\text{down}}$, the converter delivers 365 mA, and a high-frequency ripple appears on the output voltage, meaning that all SC converter units are operated at a high switching frequency to deliver the current to the output. Right after the transient event, current continues to be delivered to the now much smaller load. This causes the output voltage to increase even though none of the SC converter units change state. The output voltage thereafter drops as the light load continues to draw current, and normal operation is resumed once $V_{\text{out}}$ reaches $V_{\text{ref}}$. From an application point of view, the overshoot that follows a step down event is not critical as long as it does not lead to any overvoltage situations. With an overshoot of 125 mV, no transistors are exposed to overvoltage for this design.

4.6 Summary

A wide output voltage range is required to efficiently support DVFS. The 2:1 and 3:2 reconfigurable SC converter power stage in Fig. 4.1 provides a high efficiency over a wide output voltage range of $0.7 \text{ V} - 1.1 \text{ V}$ from a fixed 1.8 V input supply. The power stage implementation shown
Chapter 4. On-Chip Switched Capacitor Voltage Regulators

in Fig. 4.3 furthermore incorporates stacked transistors to protect the transistors in the 32 nm semiconductor technology against overvoltage situations.

The second SC converter design is selected based on a Pareto optimization of the reconfigurable power stage. A 16-phase interleaving scheme is employed to reduce the input current and output voltage ripples without implementing dedicated decoupling capacitors. The single bound hysteretic control scheme, which consists of a clocked comparator and a digital clock interleaver, is incorporated. Using a 4 GHz clock, this control scheme utilizes the fast transistors of the 32 nm SOI CMOS technology to achieve sub-nanosecond response time. The chip furthermore incorporates an on-chip programmable load to emulate a microprocessor core.

For $V_{in} = 1.8$ V, the measured efficiency shown in Fig. 4.14 is above 70% over the entire DVFS output voltage range of $0.7 \text{ V} - 1.1 \text{ V}$. The maximum efficiency is 86% at $2.2 \text{ W/mm}^2$ power density in the 2:1 configuration and 90% at $3.7 \text{ W/mm}^2$ power density in the 3:2 configuration. Due to the 16-phase interleaving scheme, the measured output voltage ripple in steady state is $V_{\text{ripple,pp}} = 30 \text{ mV}$ without the use of dedicated output decoupling capacitors.

The measured transient responses shown in Fig. 4.16 validate the sub-nanosecond response time of the digital controller. However, the output voltage experiences a significant droop due to the large input voltage droop, which is caused by the parasitic inductance of the PDN that limits the rate of change of the input current.

The two key learnings from this chapter are:

- On-chip SCVRs can simultaneously achieve 1) high efficiencies at high power densities, 2) wide output voltage ranges, 3) sub-nanosecond response times control, and 4) output powers above 100 mW.
- A sub-nanosecond response time control scheme does not necessarily prevent the output voltage droop due to the parasitic inductance of the PDN.

Based on the results obtained in this chapter, on-chip SCVRs can now be considered a prominent OCVR candidate for granular microprocessor
power delivery with per-core regulation. The challenges remaining to complete the specifications from Tab. 1.1 are 1) the reduction of the voltage overhead by mitigating the output voltage droop and 2) the power rating to $>1\text{W}$ output power. The next chapter considers a novel feedforward control scheme that reduces the output voltage droop caused by the transient input voltage droop. Furthermore, a converter design with much higher output power is presented.
Feedforward Control for Reconfigurable SCVR

For the application of granular microprocessor power delivery, the on-chip SCVR must supply above a minimum output voltage $V_{\text{out, min}}$ at all times in order for the microprocessor core to meet setup time requirements. Following a transient load change, the output voltage typically experiences a droop due to the parasitic inductance of the PDN. Therefore, the steady-state output voltage is kept high enough to ensure $V_{\text{out}} > V_{\text{out, min}}$ at all times, thereby introducing a voltage overhead that leads to increased energy consumption [50, 12]. This chapter focuses on a novel feedforward control scheme that mitigates the output voltage droop, thereby enabling per-core DVFS with reduced voltage overhead as shown in Fig. 1.3(c).

The single bound hysteretic control discussed in the previous chapter provides a sub-nanosecond response time to a transient load change. However, as seen in Fig. 4.16, the output voltage still droops. As discussed, the droop is neither caused by the regulation loop being too slow nor by the lack of output decoupling capacitance. Instead, a significant droop at the input of the converter is considered to be the root cause of the output voltage droop. The input voltage droop, and thereby the output voltage droop, can be reduced by implementing sufficient on-chip input decoupling capacitors. However, a large capacitance is needed to reduce the droop sufficiently, and it becomes impractical to implement owing to the large chip area overhead required.
Section 5.1 presents a novel feedforward control scheme for reconfigurable SC VRs. The feedforward scheme changes the configuration of the SC converter when an input voltage droop is detected, thereby mitigating the output voltage droop which allows for reducing the overhead voltage. The design of the third SC converter, which features a 64-phase reconfigurable SC converter, is presented in Section 5.2. The design incorporates the novel feedforward control, which works in conjunction with the single bound hysteretic control (feedback control) from Section 4.3. In Section 5.3, measurement results of the feedforward controlled SC VR are presented. This design furthermore demonstrates an output power of 10 W, which is the highest output power achieved by an on-chip SC VR to date.

This chapter is based on the publications [6] and [8].

5.1 Novel Feedforward Control

The single bound hysteretic control scheme discussed in Section 4.3 is considered a feedback control since it regulates based on the output voltage. The novel regulation concept, discussed next, is a feedforward control since it regulates based on the input voltage. However, a feedback control scheme, e.g. the single bound hysteretic control discussed in Section 4.3, that regulates based on the output voltage, is still required. In Fig. 5.1, a conceptual overview of the typical feedback control in conjunction with the new feedforward control is illustrated. Recalling from Section 3.2, the SC converter equivalent circuit model (neglecting the switching losses) consists of a dc transformer with a fixed conversion ratio $M$ and an equivalent output resistance $R_{eq}$. As shown in Fig. 5.1(a), the typical feedback control regulates $R_{eq}$ to achieve the desired output voltage as shown in the corresponding flowchart. Typically, as in the single bound hysteretic control scheme, $R_{eq}$ is modulated by the switching frequency following the characteristics shown in Fig. 3.7(a).

The novel feedforward control, which can be implemented with a reconfigurable SC converter, introduces an additional control loop as depicted Fig. 5.1(b). As show in the corresponding flowchart, the feedforward control dynamically changes the configuration to a higher volt-
5.1. Novel Feedforward Control

(a) Typical feedback

(b) Additional feedforward

Figure 5.1: The typical SC converter feedback control modulates $R_{eq}$ to achieve the desired output voltage whereas the novel feedforward control dynamically changes the conversion ratio $M$ when an input voltage droop is detected.

![Flowchart](image)

---

5.1.1 Digital Gear Controller

For interleaved designs, which are considered here, it is found from simulations that changing the configuration of all converter units simultaneously leads to unnecessarily high ripples at the output node. Therefore, the digital gear controller, which implements the feedforward control, is designed to change the configuration one at a time. **Fig. 5.2** shows the circuit schematic of the digital gear controller for an example 4-phase reconfigurable SCVR. The input voltage $V_{in}'$ is compared...
with the reference $V'_{\text{in,ref}}$ by a clocked comparator having both positive (gp) and negative (gn) outputs, where the $V'_\text{in}$ and $V'_{\text{in,ref}}$ denotes scaled voltages of $V_\text{in}$ and $V_{\text{in,ref}}$ to avoid overvoltage situations. The clocked comparator is implemented using the same sense-amp topology shown in Fig. 4.6, and it is clocked with the same high-frequency clock $\text{clk}_{\text{cc}}$ as in the single bound hysteretic control from Section 4.3. The gear signals are governed by a bi-directional shift register, where the direction is controlled by the select signal (sel). When $V'_\text{in} < V'_{\text{in,ref}}$, a rising edge of gp appears triggering sel to go high, and logic 1 is stored in the first flip-flop, causing gear_0 to go high. Consecutive gp triggers cause the following gear signals to go high, and when all gear signals are high, subsequent gp triggers have no further impact since logic 1 is
stored in all flip-flops, i.e., all gear signals are high and all SC converter units operate in the 3:2 configuration. Once $V_{in}^\prime > V_{in,ref}^\prime$, a rising edge on gn appears triggering sel to go low, and logic 0 is stored in the last flip-flop, causing gear$_3$ to go low. Again, consecutive gn triggers cause the gear signals to go low one at a time. From simulations, it is found that pulse skipping of gn (denoted gn$'$) leads to the smoothest transition back to the original conversion ratio. The pulse skipping in gn$'$ is shown in Fig. 5.2 by the gray tone of every second pulse. However, pulse skipping of more pulses is possible as well.

### 5.2 Third SC Converter Design

An on-chip SCVR featuring the feedforward control is designed and implemented in the 32 nm SOI CMOS technology. The converter, which builds upon the promising results presented in Section 4.5, is primarily designed to reduce the $V_{out,min}$ by means of the feedforward control.

The third SC converter design is similar to the 2:1 and 3:2 reconfigurable SC converter design from Section 4.4. Hence, no new Pareto optimization is carried out for this design. However, the number of interleaved phases as well as the number of unit power stages per phase are increased. This is done to be able to deliver 10 W maximum output power, thereby making this design the highest output power on-chip SCVR presented to date.

#### 5.2.1 System Overview

The complete overview of the implemented SCVR is shown in Fig. 5.3. A 64-phase interleaving scheme of a 2:1 and 3:2 reconfigurable SC converter is employed. The feedback control is implemented as a single bound hysteretic control comprising a clocked comparator and a digital clock interleaver as described in Section 4.3. The feedforward control is implemented as depicted in Fig. 5.2 using a clocked comparator and the digital gear controller that dynamically changes the configuration (gear) of the interleaved SC converter units. An on-chip programmable load is also incorporated. The programmable load is configured externally by a digital configuration interface (not shown) by enabling the signals e$_1$..31. For no load, all signals e$_1$..31 are logic 0.
Chip 3: System overview

Figure 5.3: System overview of the 64-phase 2:1 and 3:2 reconfigurable SCVR. The feedforward control scheme works in conjunction with the single bound hysteretic control scheme (feedback control).

5.3 Third Hardware Results

The chip photo is shown in Fig. 5.4. Four instances of a 16-phase interleaved SC converter are implemented. Each instance is laid out with the feedback and feedforward controllers in the center and the load resistors at the converter perimeter. However, although the separation in four instances, all converters and controllers are operated in unison. The input power is supplied in the middle pad row for a symmetrical power delivery to the chip. The double pad rows to the left are for Kelvin probing at various points on the chip. The layout of the 2:1 and 3:2 reconfigurable SC converter unit is reused from the design presented in Section 4.5 with only minor modifications. Also the top level power grid is reused from the previous design.
5.3. Third Hardware Results

Figure 5.4: Chip photo and SC converter unit layout of the 10 W SCVR implemented in the 32 nm technology with deep trench capacitors. The total converter consists of four 16-phase SC converter instances, where $R$ denotes the programmable load, which is distributed among the converter instances. The total active converter area is 1.968 mm$^2$.

5.3.1 Thermal Model

Before proceeding with the efficiency and power density measurements, a thermal model is developed to predict the on-chip temperature during operation. Since the on-chip resistance of the programmable load array cannot be measured when the converter is operating, the resistance is measured under 'cold' conditions, i.e. when the converter is not operating. For low-power implementations, the temperature increase is typically small and can in most cases be neglected. However, for high-power implementations, the temperature dependency of the on-chip resistance must be taken into account when estimating the converter’s output power, from which efficiency and power density are calculated.
A thermal model of the SCVR design is setup using the 3D-ICE thermal model simulator [93, 94]. The entire chiplet measures 3 mm × 3 mm, but the SCVR design takes up 2 mm × 2 mm and is placed in the lower left corner of the chiplet. Using the 3D-ICE simulator, the expected heat flux per region is mapped to the model. Since the load is integrated on the same chip as the converter, the entire input power is dissipated on the chip. A converter efficiency of 90% is assumed. Using the floorplan in Fig. 5.4, this means that 90% of the input power is uniformly dissipated in load regions and 10% of the input power is uniformly dissipated in the converter regions. The model furthermore assumes the silicon die thickness to be 780 µm, and a thermal interface material (TIM) of thickness 20 µm glues the chiplet to a 5 mm × 5 mm water cooled copper coldplate, where the cooling water is kept at a temperature of 27°C using a chiller.

The results of the 3D-ICE simulations are shown in Fig. 5.5. The simulated heat map for an input power of 10 W is shown in Fig. 5.5(a), where the color coding resembles the maximum temperature on the chip. As can be seen, the maximum on-chip temperature for 10 W is 69°C. Using the same simulation setup for other power levels, Fig. 5.5(b) shows the maximum and average die temperatures as a function of the input power. As seen, the cooling setup manages to keep the on-chip temperature below 70°C, which is considered appropriate for the measurement setup.

The resistance of the on-chip programmable load is measured using a 4-point measurement setup. Using the chiller to heat the water flowing through the coldplate to a predefined temperature, the chip is heated to approximately the same temperature which enables the characterization of the on-chip load resistance over temperature. Fig. 5.6 shows measured load resistances over temperature for 50% and 100% loads. A close to linear increase in measured on-chip resistance over temperature is observed. Similar measurements are carried out for all 32 resistance levels provided by the on-chip programmable load, and the measured temperature-correlated load resistances are used to determine the converter’s output power, efficiency, and power density discussed next.
Figure 5.5: Thermal model simulation results using 3D-ICE: (a) simulated heat map shown for $P_{in} = 10$ W; (b) maximum and average temperatures as a function of the input power.
Figure 5.6: Measured on-chip load resistance over temperature. Although only shown for 50% ($R_{15}$) and 100% ($R_{31}$) load, similar measurements are carried out for all 32 load levels provided by the on-chip programmable load.

5.3.2 Measured Efficiency and Power Density

Measurements are carried out using GBB PicoProbe needles on the unpackaged chip die mounted on a probe station. The input and output voltages are measured using Kelvin contacts to account for the voltage drops of cable and contact resistances. An Agilent E3633A power supply is used as input supply. The measurement is (except for the input supply) similar to the one shown in Fig. 4.13. The input power $P_{in}$ is estimated using the current displayed on the input supply and the measured Kelvin input voltage. However, $P_{in}$ does not include the power consumption of the digital controller as it is not possible to separate that power consumption from the total digital power consumption, which includes several housekeeping functions for testing that are not part of the digital controller. The output power is measured using the thermal model discussed above to take the resistance increase as a
function of temperature into account. For a given load value, the input power is determined using the Kelvin contact to measure the on-chip input voltage and the input current displayed by the input supply.

The input power is used to estimate the maximum on-chip temperature at a specific load resistance. Hence, the measured efficiency taking temperature effects into account is

\[ \eta(T_{\text{max}}) = \frac{P_{\text{out}}(T_{\text{max}})}{P_{\text{in}}} = \frac{V_{\text{out}}^2}{R_{\text{load}}(T_{\text{max}}(P_{\text{in}}))} \frac{1}{P_{\text{in}}}, \]  

(5.1)

where \( T_{\text{max}} \) is the maximum operating temperature in Celsius, and \( R_{\text{load}}(T_{\text{max}}(P_{\text{in}})) \) is the measured load resistance evaluated at the maximum die temperature from Fig. 5.6. The maximum die temperature \( T_{\text{max}}(P_{\text{in}}) \) is determined from the measured input power using the thermal model results in Fig. 5.5(b).

The measured efficiency over output power and power density for four different output voltages at \( V_{\text{in}} = 1.8 \) V is shown in Fig. 5.7. The efficiency at nominal load in the 2:1 configuration for \( V_{\text{out}} = 0.85 \) V is 83% at 1.9 W/mm\(^2\) power density. For the same load with \( V_{\text{out}} = 1.1 \) V, the efficiency is 85% at 3.2 W/mm\(^2\) power density. Finally, the 10 W output power is achieved at 84% efficiency and 5 W/mm\(^2\) power density for \( V_{\text{out}} = 1100 \) mV.

Also shown in gray-scale in Fig. 5.7 are the efficiency and output power calculation results when disregarding the influence of the temperature on the load resistance, i.e. when \( R_{\text{load}}(T_{\text{max}} = 30^\circ\text{C}) \) is considered in (5.1). As can be seen, both the efficiency and the output power are overestimated when disregarding the temperature effects, especially for high output powers.

5.3.3 Measured Transient Response

Transient responses are measured using a 20 GHz, 50 GS/s Textronix DSA72004 oscilloscope. The Kelvin contacts are probed using 40 GHz needles from GGB Industries, Inc. and 30 GHz Sucoflex cables are used to connect the probes to the oscilloscope.

The measured transient responses are shown in Fig. 5.8. Without the feedforward control as shown in Fig. 5.8(a), the sub-nanosecond
Chapter 5. Feedforward Control for Reconfigurable SCVR

Chip 3: Measured performance

![Graph showing measured efficiency over output power and power density for four different output voltages at V_{in} = 1.8 V. The maximum output power is 10 W at V_{out} = 1100 mV in the 3:2 configuration. The gray-scale results are disregarding the influence of the temperature on the load resistance when estimating the converter output power and efficiency.]

**Figure 5.7:** Measured efficiency over output power and power density for four different output voltages at V_{in} = 1.8 V. The maximum output power is 10 W at V_{out} = 1100 mV in the 3:2 configuration. The gray-scale results are disregarding the influence of the temperature on the load resistance when estimating the converter output power and efficiency.

feedback control maintains the output voltage for a short duration following the transient event. However, the collapse of the input voltage causes the output node to experience a large droop, which leads to a relatively low V_{out,min}. These results are in agreement with the previous transient responses shown in Fig. 4.16.

With the feedforward control as shown in Fig. 5.8(b), the reconfigurable SC converter dynamically changes from the 2:1 to the 3:2 configuration when the input voltage droop is detected. As observed, the resulting output voltage droop is significantly reduced, leading to an improved V_{out,min}. For this design, the voltage overhead is reduced by 60 mV, which can be used to reduce the steady state output voltage.
and still comply with $V_{out,\text{min}}$ requirements.

As furthermore seen in Fig. 5.8(b), the input voltage droop is worsened by the feedforward control. However, from an application point of view, ensuring the output voltage droop to always be above $V_{out,\text{min}}$ is all that matters. Furthermore, the larger ripple after the transient event is a result of the converter being in the 3:2 configuration, which, for that configuration, is a relatively low output voltage operation with higher ripple. From an application point of view, this ripple is not considered to be critical since digital loads such as microprocessor cores are inherently insensitive to supply noise as long as the supply voltage remains within the allowable tolerance band. Alternatively, additional output decoupling capacitors could be added to further minimize the output.
Chapter 5. Feedforward Control for Reconfigurable SCVR

voltage ripple. Although not shown, the converter transitions back to the more efficient 2:1 configuration once the transient has settled completely using the pulse skipping scheme of the digital gear controller discussed in Section 5.1. In conclusion, the feedforward control is an enabler for per-core DVFS with improved $V_{\text{out, min}}$, which, as shown in Fig. 1.3(c), has the potential to save significant amounts of compute energy in future multi-core and many-core microprocessor systems.

5.4 Summary

A novel feedforward control for reconfigurable SC converters is presented. The feedforward control dynamically changes the configuration of the converter to a higher voltage conversion ratio when an input voltage droop is detected. As seen in Fig. 5.8, the feedforward control reduces the output voltage droop from 90 mV to 30 mV, thereby improving $V_{\text{out, min}}$ by 60 mV without the use of dedicated input or output decoupling capacitors.

To account for the change of on-chip resistance with temperature, a thermal model is developed to predict the on-chip temperature. Correlating the measured on-chip load resistances with the operating temperature allows for a more accurate efficiency estimation at high output powers. Measurement results of the third SC converter design achieve 1) maximum efficiencies above 85%, 2) power densities above $2.5 \text{ W/mm}^2$, 3) transient responses faster than 1 ns with reduced $V_{\text{out, min}}$ overhead, and 4) output powers up to 10 W.

The two key learnings from this chapter are:

- The feedforward control for reconfigurable SC converters reduces the voltage overhead required to meet microprocessor $V_{\text{out, min}}$ requirements. This can lead to significant energy savings in future multi-core and many-core microprocessor systems.
- The feasibility of high-power on-chip SCVR designs is demonstrated experimentally by achieving 10 W maximum output power.

Based on the third hardware design, SCVRs now enable all benefits of granular microprocessor power delivery with per-core regulation from
Fig. 1.2. Furthermore, the specifications set out in Tab. 1.1 are met. Therefore, SC converters can now be considered as 1) high efficiency, 2) wide output voltage range, 3) easy to regulate, and 4) high-power converters.
Conclusions

FOR THE APPLICATION of power delivery for high-performance multi-core and many-core microprocessor systems, on-chip voltage regulators (OCVRs) can be incorporated to provide granular power delivery with per-core regulation, thereby enabling significant overall system energy and power savings. According to the 2013 international technology roadmap for semiconductors (ITRS), the performance, and thereby energy consumption, of future microprocessor systems continue to increase. Furthermore, the ever decreasing supply voltages lead to increasing supply currents that are challenging to supply efficiently through the power delivery network. Hence, the potential energy and power savings provided by granular power delivery with per-core regulation enable future microprocessor systems to scale without hitting energy and power walls that would otherwise limit the scaling.

This thesis treats the design, analysis, and implementation of OCVRs for granular microprocessor power delivery. On-chip inductors for integrated buck converters are modeled to predict their efficiency and power density performances. Also, on-chip switched capacitor (SC) voltage regulators (SCVR) are considered. The main experimental results in this thesis include three on-chip SCVRs implemented in a 32 nm SOI CMOS technology that features the high-density deep trench capacitor. Experimental verifications of these designs simultaneously achieve a 0.7 V – 1.1 V output voltage range for a fixed 1.8 V input supply, > 85% efficiency, > 2 W/mm² power density, up to 10 W output power, < 1 ns transient response time, and reduced voltage overhead while maintain-
ing a certain minimum supply voltage $V_{\text{out}, \text{min}}$. The on-chip SCVRs presented in this thesis therefore meet the design specifications that enable granular microprocessor power delivery with per-core regulation.

The key learnings from this thesis are:

▶ On-chip inductors using the top metal layers of the metal stack achieve inadequate efficiency and power density performances. The main reason is the limited winding thickness given by the design rules of the semiconductor technology.

▶ Microfabricated inductors manufactured either with or without magnetic materials using additional post-processing manufacturing steps achieve attractive efficiency and power density performances.

▶ The deep trench capacitor, which is available for instance in the 32 nm SOI CMOS technology used in this thesis, is a game changer with respect to SCVR efficiency and power density due to its high capacitance density and low parasitic bottom plate capacitance.

▶ The parasitic bottom plate capacitor, which is included in the state space model framework developed in this thesis, influences both the steady state operation and efficiency of SC converters.

▶ The state space model framework is suited for a Pareto optimization analysis of SC converters.

▶ Reconfigurable SC converter power stages efficiently widen the supported output voltage range for a fixed input supply.

▶ Sub-nanosecond response times to transient load changes are feasible using a high number of interleaved stages and the single bound hysteretic control scheme clocked at gigahertz frequencies.

▶ The novel feedforward control for reconfigurable SC converters can reduce the voltage overhead required to maintain a certain minimum output voltage of the microprocessor core under all loading and transient conditions. Reducing the voltage overhead reduces the energy consumption per computation, thereby enabling significant energy savings in future microprocessor systems.

▶ SC converters are, contrary to common belief, not limited to low-power applications. This is demonstrated by the 10 W maximum
6.1. State of the Art – Year 2015 Landscape

Based on the 2010 state of the art overview shown in Fig. 1.7, an updated overview featuring OCVR designs published up until the beginning of 2015 is shown in Fig. 6.1. The new state of the art overview includes the three converter designs discussed in this thesis, and the 2D, 3D, and 2.5D integration levels follow the definitions from Fig. 1.5. Recall that the levels of integration are defined with respect to the load, such that 2D integration is with the load and the converter on the same die, 2.5D with the switches and control on the load die but with the passives on a separate interposer or in the laminate, and 3D integration with the converter (including passive components) on a die separate to the load. The maximum performance contours from the 2010 state of the art overview are added to illustrate the evolution of on-chip power converters.

Comparing the quoted efficiency and the corresponding power density shown in Fig. 6.1(a) in 2015 with the 2010 state of the art overview shown in Fig. 1.7(a), SC converters have now filled up the power density gap between $0.02 \text{ W/mm}^2 - 0.8 \text{ W/mm}^2$ and buck converters have filled up the power density gap between $0.2 \text{ W/mm}^2 - 1.5 \text{ W/mm}^2$. As seen, the first [3], second [5], and third [6] SCVR designs presented in the thesis place themselves among the highest efficiency and highest power density converters published to date.

Comparing the quoted efficiency and the maximum output power for 2015 shown in Fig. 6.1(b) with 2010 shown in Fig. 1.7(b), SC converters are now no longer are limited to 10 mW output power. For SC converters, the maximum output power achieved in the third converter design [6] is about an order of magnitude higher than other published SC converters. It is therefore no longer valid to claim that SC converters are limited to low-power applications.

Thanks in part to the SCVR designs presented in this thesis, the clear performance separation between buck converters and SC converters observed in the 2010 state of the art overview in Fig. 1.7 is smeared.
Figure 6.1: 2015 state of the art overview of published OCVRs. The overview includes publications governing the first [3], second [5], and third [6] SCVR designs of this thesis. Comparing with the 2010 state of the art contours, both buck converters and SC converters have improved significantly in efficiency, power density, and output power.
out for the 2015 state of the art overview in Fig. 6.1. Hence, neither the buck converter nor the SC converter should be ruled out based solely on electrical specifications. Instead, one must assess other parameters such as semiconductor technology, integration level, cost, complexity, etc. to choose the best converter topology for a given application.

6.2 Outlook

Based on the concepts and results presented in this thesis, research and further investigations within OCVRs could go in the following directions:

► The 32 nm SOI CMOS technology used throughout this thesis is no longer the latest semiconductor technology used in high-performance microprocessor systems. Porting the designs presented in this thesis to 14 nm is therefore a logical next step, and further improvements in especially power density and transient response time are expected from the technology scaling.

► As motivated in Section 1.1.1, the well-cited and often used results in [50] predicts up to 21% system efficiency improvement using OCVRs with per-core DVFS. The efficiency improvements are estimated based on state of the art on-chip buck converters from 2008. Revising with recent state of the art, e.g. the SCVRs presented in this thesis, is expected to result in even more attractive energy and power savings estimations.

► The 1.8 V input supply is an issue with the 1.2 V maximum voltage of the transistors in the 32 nm semiconductor technology used in this thesis. The stacking of two transistors in the gate driver and SC converter power stage has proved to be a useful and robust implementation. Going to higher input voltages is attractive since it would result in even more reduced supply currents in the PDN.

► Exploration of hybrid converter topologies is getting increased attention since the best of both worlds of buck converters and SC converters can be achieved. Examples include the merged converter topology [53], the 3-level buck converter [36, 43], or
resonant SC converters [54, 56]. All these topologies show great potential for certain OCVR applications.

▶ On-chip step-up converters are attractive in applications where the supply current is not a main issue or the number of voltage domains is strictly limited. For instance, a step-up converter could supply high-voltage domains like I/Os or embedded memory from the microprocessor’s nominal supply voltage, thereby not requiring separate supply voltages for these domains.


[39] M. Wens and M. Steyaert, “A fully-integrated 0.18 µm CMOS DC-DC step-
down converter, using a bondwire spiral inductor,” in *Proc. of the IEEE Custom
Integrated Circuits Conference (CICC)*, San Jose, CA, USA, Sep. 2008, pp. 17–
20.

[40] ——, “A fully-integrated 130 nm CMOS DC-DC step-down converter, regulated
by a constant on/off-time control system,” in *Proc. of the IEEE European Solid-
State Circuits Conference (ESSCIRC)*, Edinburgh, United Kingdom, Sep. 2008,

[41] ——, “A fully integrated CMOS 800 mW four-phase semiconstant ON/OFF-
time step-down converter,” *IEEE Transactions on Power Electronics*, vol. 26,


[43] W. Kim, D. Brooks, and G.-Y. Wei, “A fully-integrated 3-level DC-DC con-
verter for nanosecond-scale DVFS,” *IEEE Journal of Solid-State Circuits*, vol. 47,

[44] J. Kwong, Y. Ramadass, N. Verma, M. Koesler, K. Huber, H. Moormann,
and A. Chandrakasan, “A 65 nm sub-Vt microcontroller with integrated SRAM
and switched-capacitor DC-DC converter,” in *Proc. of the IEEE International
Solid-State Circuits Conference (ISSCC)*, San Francisco, CA, USA, Feb. 2008,
pp. 318–616.

step-up and step-down DC/DC converters in 32 nm SOI with opportunistic
current borrowing and fast DVFS capabilities,” in *Proc. of the IEEE Asian


Available: www.itrs.net

impact of packaging and application properties on the memory and power walls,” in *Proc. of the IEEE Int. Symp. on Low Power Electronics and Design

teus, R. H. Dennard, and W. Haensch, “Practical strategies for power-efficient
215–236.


[90] T. V. Breussegem and M. Steyaert, “A 82% efficiency 0.5% ripple 16-phase fully integrated capacitive voltage doubler,” in *Proc. of the IEEE Symposium on VLSI Circuits (VLSIC)*, Kyoto, Japan, June 2009, pp. 198–199.


Curriculum Vitae
Toke Meyer Andersen

Contact information:
Company
Nordic Power Converters,
Smedeholm 13A, 2730 Herlev, Denmark
toke@nopoc.com,
Tel: +45 60 630 670

Private
Toke Meyer Andersen,
Bygmestervej 29, 5. Tv., 2400 København NV, Denmark,
tokeandersen@hotmail.com,
Tel: +45 20 97 20 23

Personal information:
Date & place of birth 2. February 1986, Copenhagen, Denmark.
Nationality Danish
Languages Danish (native), English (fluent), German (fluent)

Working experience:
Since May 2015 Co-Founder and Senior R&D Engineer at Nordic Power Converters
Oct. 2010 – Feb. 2015 PhD at ETH Zurich, Power Electronics Systems Laboratory (PES) by Prof. Dr. Johann W. Kolar. The PhD project was carried out in collaboration with IBM Research – Zurich.
Teaching assistant at ETH Zurich:
Modelierung Mechatronische Systeme (1 semester)
Netwerke und Schaltungen Praktikum (4 semesters)

Sep. 2010 (1 month) Research Assistant
Technical University of Denmark by Prof. Michael A. E. Andersen.

2008 – 2009 Teaching assistant at the Technical University of Denmark:
Integrated Analog Electronics (2 semesters)
CMOS RF Integrated Circuits (2 semesters)

Education:
2010 – 2015 PhD in Electrical Engineering
ETH Zurich, Switzerland; IBM Research – Zurich, Switzerland

2008 – 2010 M.Sc. in Electrical Engineering
DTU; Bang & Olufsen A/S

2005 – 2008 B.Sc. in Electrical Engineering
Thesis: “Programmable Class D Audio Amplifier Integrated in CMOS Technology”
DTU; Oticon A/S