# Investigation of High-Fidelity Clock Generation and Distribution for Microprocessor Applications

Siva Thyagarajan, Michael Lorek {sivavth,mlorek}@eecs.berkeley.edu

Abstract—As clock speeds have entered into the Gigahertz regime, constraints on clock jitter and duty cycle specifications have become more stringent. This presents unique challenges in the generation and distribution of accurate, balanced clock signals. In this paper, we present an in-depth analysis of the state-of-the-art techniques used to obtain high frequency 50% duty cycle clocks. The circuit examined consists of analog duty cycle corrector and detector blocks that mitigate duty cycle errors arising from device mismatches. Simulation results with real, imbalanced Phase Locked Loop clocks confirm the effectiveness of the technique to within 0.1% error in output duty cycle.

#### I. INTRODUCTION

Clock distribution in microprocessors has become an increasingly complex design challenge as computation has pushed into the Gigahertz frequency range. The short clock periods associated with GHz-frequency signals have forced clock transition accuracy specifications towards the low picosecond range. Novel circuit methods have been implemented inside and outside the traditional Phase Locked Loop (PLL) circuit and aim to synthesize higher-fidelity, GHz-range clock signals. Such clock conditioning techniques enable higherperformance computing; digital logic operating frequencies can be increased as a result of less clock uncertainty. These signal processing techniques typically perform jitter reduction and duty cycle correction techniques.

Clock jitter is defined as the deviation in a clocks output transition from its ideal position, and can be broken down into contributions from multiple sources [1]. Random jitter originates from random noise in electronic components and takes on a Gaussian shape. Deterministic Jitter can be attributed to a specific source, such as cross-talk due to parasitic coupling capacitances, and is usually data or operation-dependent. Total jitter measurements are generally specified in terms of an RMS value which is obtained by sampling a large number of clock periods and plotting the jitter distribution. Many adverse effects due to jitter, including the direct effect of clock uncertainty (jitter) on the timing constraints of a logic path, have made the minimization of PLL clock jitter in highperformance systems a topic of great interest.

A clock's duty cycle is given as the percentage of the signal period that the wave is a logical "1". Differential signals considered, the duty cycle is taken as the percentage of the clock period that the differential signal is greater than zero. Duty cycle errors largely originate from mismatch between pairs of devices. This mismatch increases inversely proportional to  $\sqrt{W * L}$  and has become increasingly significant in deep sub-micron processes [2]. Circuits such as A/D converters

and NORA logic blocks perform computation during both the positive and negative clock phases and are hence very sensitive to mismatch and duty cycle imbalances. Consequently, the synthesis of high-frequency clocks with precise 50% duty cycles are very important in VLSI circuits and will be the focus of the analysis detailed in this paper.

The paper is organized as follows: Section II summarizes the literature in this area and examines current techniques for clock conditioning. Section III presents an in-depth analysis, circuit implementation and simulation results of the duty cycle correction technique. Section IV concludes our work.

## II. SUMMARY OF THE STATE-OF-THE-ART

#### A. System-Level Microprocessor Clock Distribution

Modern-day microprocessors receive immense engineering efforts to optimize system-level clock distribution networks. These implementations generally include multiple clock domains with various clock uncertainty correction schemes. This enables distributed clocks to meet multiple fidelity specifications and thus be used for different applications [3], [4]. In the following paragraphs, a discussion of Intel's most recent state-of-the-art clocking architecture will be discussed.

Intel's Nehalem clocking micro-architecture, introduced in 2009, is a modular design with features to enable fast local PLL re-locking from low-power states and low jitter, consequently promoting improved system performance and enabling lower-power operation [4]. A central PLL generates 1X, 2X, and 4X multiples of the external reference frequency; these clocks are distributed to various local PLLs across the chip. The central PLL is designed to have a low bandwidth in order to attenuate high-frequency jitter on the reference clock. Higher reference frequency inputs enable higher local PLL loop bandwidths without compromising stability [5]. This results in faster lock time and reduced long-term jitter. Intel also employs two clock conditioning schemes in their Nehalem architecture: Adaptive Frequency System (AFS) and Duty Cycle Correction.

The AFS system aims to adjust the VCO operating frequency to maintain timing margins in the presence of digital voltage supply droops. This system actively monitors firstorder digital voltage supply droops and adjusts the VCO analog supply voltage and thus VCO output frequency. Therefore, when logic supply levels are lowered, propogation delays increase, but logic clock frequencies are reduced such that timing constraints are not violated. This enables faster clock speeds in comparison to conventional designs that simply reduce clock frequency to account for worst-case digital voltage supply variation [4].

Intel supports this clocking system architecture with 45nm fabricated chip results. Data shows a 25% decrease in PLL lock time and 20% lower RMS long-term jitter when using 2X local PLL reference clock inputs [4]. Increasing the reference clock input to 4X mode reduces lock time and RMS jitter by 56% and 30%, respectively. There is no published data showing the Nehalem DCC circuit's performance. An in-depth anaylsis of such circuits follow in Section IV.

#### B. PLL Jitter Reduction Techniqes

In order to achieve low jitter operation in PLLs, various techniques have been proposed in literature. The efficacy of a proposed solution is determined by the amount of jitter reduction, the extra overheads (in terms of power and area), and most importantly its robustness to process and temperature variations.



Fig. 1. Integration of loop filter resistor R [6]

One of the earliest suggested techniques is the method of Self-Biasing. As the input frequency to the PLL changes, a constant bandwidth can constrain its jitter performance. Using this technique, the loop bandwidth of the PLL is made to track the reference frequency. This is achieved by making  $R_{\sqrt{I_{cp}}}$ a constant , where  $I_{cp}$  is the charge pump current and R the resistor in the loop filter. Fig. 1 shows the general idea for the loop filter resistance implementation in [6], [7]. The loop filter is decoupled into two circuits, requiring two charge pump currents. In [6], a differential delay element has been used to minimize jitter due to supply variations. This implementation translates to the modified PLL architecture shown in Fig. 2, where a second charge pump implements the zero through the loop filter transformation. Using a quantitative analysis, the loop bandwidth and damping factor can be shown to be proportional to the square root of the ratio of the loop filter and VCO output capacitances. Systematic variations in the capacitance values over process and temperature therefore do not alter the loop parameters, as this ratio is well matched.



Fig. 2. Modified PLL architecture with tracking bandwidth and damping factor [6]

A similar technique can be used in a PLL implemented using a ring oscillator VCO with a regulated supply voltage [7]. The theory of its operation is similar to [6], except that an amplifier is used in unity gain feedback to regulate the VCO control voltage. Square law behavior has been assumed for these transistors. The charge pump currents  $I_{cp1}$  and  $I_{cp2}$  are scaled independently to achieve optimum characteristics.

Various adaptive techniques have also been published in the literature. These techniques involve jitter minimization during system operation and calibration. The total output rms jitter in a PLL is mainly the sum of the input and VCO jitter. Analytically, the total jitter is found to be:

$$\sigma_{tot}^2 = \sigma_{in}^2 + \sigma_{VCO}^2 \tag{1}$$

where  $\sigma_{tot}$ ,  $\sigma_{in}$  and  $\sigma_{VCO}$  are the standard deviations of the total noise, input and VCO jitter respectively [8], [9].



Fig. 3. Jitter measurement using dead-zone window [10]

This technique digitally varies the natural frequency  $\omega_n$ and the loop filter zero  $\omega_z$  [10]. Fig. 3 shows the jitter measurement using a dead zone technique in which the data is sampled at the data transitions and edges. The number of transitions outside the dead zone (set by the data edges) for a given total number of transitions yields an estimate of the total jitter. This can be used to measure very low jitters without using circuits operating in the order of the jitter values (typically *ps*). The measurement of jitter is performed off chip and minimized using a Table-Lookup or a Gradient-Descent Method. Various factors which need to be considered include: algorithm convergence issues, charge injection from the digital tuning circuit and the achievable lock range.

On-chip measurement of jitter using the dead-zone technique has been implemented in [11] using voltage controlled delay lines (VCDL) and edge comparison circuits. The noise contributed by the jitter estimation circuitry is uncorrelated to the PLL noise and thus is a systematic error in the jitter estimation. As before, the control voltage of the VCDL is updated using a jitter estimation algorithm. Recent techniques for jitter minimization use two paths in a third-order loop filter. The slow path allows the charge pump and the loop filter noise to be filtered by around 70% [3]. The fast path in this design allows for a shorter lock time when the PLL wakes up from power save modes. Use of digital state machines, voltage regulators and bandgap reference voltages help to reduce variations in the VCO control voltage further. The loop parameters are also varied to minimize jitter as discussed earlier.

# C. Duty Cycle Correction

Duty cycle correction techniques for clock distribution circuits have been used in practice for many years [2], [4], [12]–[15]. The practical efficacy of these techniques can be see in Fig. 4. A conventional technique employed to enforce a 50% clock duty cycle runs the PLL VCO at twice the desired operating frequency, and then divides the output by two [14], [15]. However, this technique is a large power waste due to extra switching. Improved analog and digital DCC implementations have been shown to work in practice, with different implementations for various clocking applications [4]. The current implementations of these DCC circuits are introduced in this section.



Fig. 4. DCC circuit input and output duty cycle histogram [2]

A block diagram of a digital DCC control loop is shown in Fig. 5. A timing path with a variable setup time drives a latch than can be configured as active high or active low [16]. The setup time of the timing path can be swept to deduce the high and low clock phase times, and thus the input clock duty cycle. DCC adjustments are made by high-resolution digital adjustments, 1.25ps in [4], driven by digital state machine. A patent is currently pending on this duty cycle adjustment circuit and Intel has not published any results [16].



Fig. 5. Digital DCC loop implementation [4]

Fig. 6 shows a conceptual block diagram of an analog feedback duty cycle correction technique. The feedback loop error signal is generated in the detector circuit; the detector functions as an integrator to compare the positive and negative clock phases. The corrector circuit uses the error signal to create a steady-state current imbalance in the output clock



Fig. 6. Feedback mechanism in the duty cycle correction loop. [17]

generation circuit. This current disproportion changes the differential quiescent DC clock output voltages and thus a 50% differential output clock duty cycle can be achieved [2]. The integration of the described technique into a PLL system can be seen in Fig. 7. This DCC implementation has been shown to be very effective. Results from a fabricated  $0.6\mu$ m CMOS chip show an output duty cycle variation of 0.21% across input duty cycles of 20% - 80%, at 4MHz [2]. A thorough analysis of this technique will be discussed in the proceeding sections of this paper.



Fig. 7. PLL system with analog duty cycle corrector circuit [4]

#### III. ANALYSIS

As detailed in the previous sections, a Duty Cycle Correction (DCC) technique is necessary for the synthesis of robust, high-frequency clock signals. To characterize the analog DCC technique proposed by [2], [4], a PLL with an output frequency of 3.2 GHz was designed using Verilog-A blocks and transistor level circuitry. 32 nm Low Power Predictive Technology Models (PTM) models used were used for transistor-level circuitry. The DCC circuit is also implemented as a combination of ideal blocks and transistor-level circuits. Mathematical analysis of the DCC control loop is presented, and the technique is verified using Cadence Spectre to simulate an integrated PLL and DCC system.

# A. PLL Circuit Design

The 3.2 GHz VCO output frequency was chosen to be used for our PLL design and subsequent analysis as most processors today operate near this frequency. The PLL has a 2X locking range (1.6 GHz - 3.2 GHz), with a reference frequency of 200 MHz. The natural frequency of the PLL was chosen to be 1/10 of the input reference frequency due to stability reasons [5]. The PLL design consists of a phase frequency detector (PFD) with  $K_{pfd} = \frac{1}{2\pi}$ , charge pump with current  $I_{cp} = 20 \,\mu$ A, and a differential voltage controlled ring oscillator with  $K_{vco} = 17 \text{ GHz/V}$ . The PLL is a third order system with a second order loop filter. The second pole in the loop filter was added to attenuate reference clock feedthrough to the VCO output.

$$\omega_N = \sqrt{\frac{K_{pdf} K_{vco} I_{cp}}{NC_1}} \tag{2}$$

and

$$\zeta = \omega_N R C_1 / 2 \tag{3}$$

Designing for a margin of around 60 degrees, a damping factor  $\zeta = 0.707$  was chosen. By choosing  $\zeta$  and using the PLL natural frequency described above, loop filter component values can be chosen using the equations for a type-II PLL, given in (2) and (3). This results in filter component values of  $R = 7.897 \,\mathrm{k\Omega}$  and  $C_1 = 1.425 \,\mathrm{pF}$ . To find the the second pole capacitor value, the unity gain frequency ( $\omega_u$ ) of the open loop PLL transfer function was calculated. Placing the second pole at  $2*\omega_u$  to reduce the reference feedthrough, a  $C_2 = 475.13 \,\mathrm{fF}$ was chosen.

Except for the PFD and the feedback divider, which were implemented using Verilog-A, the rest of the PLL blocks were implemented using transistor-level circuitry. Fig. 8 shows the differential current starved voltage controlled oscillator unit cell. The voltage  $V_{ctrl}$  controls the amount of current flowing through the inverter, thus adjusting the propagation delay of the inverter and hence the speed of the oscillator. The ring oscillator consists of six unit cells with the differential output of the final stage connected in a cross-coupled fashion to its input. Cross-coupled, minimum-sized inverters were added between positive and negative inverter chains after each stage to enforce differential switching.



Fig. 8. Schematic of a unit cell of a current starved voltage controlled oscillator

## B. Duty Cycle Corrector and Detector

The analog duty cycle corrector and detector circuits introduced in Section III will be discussed in the following text. It

must be noted that this circuit assumes small signal inputs as it relies on varying the common-mode of the differential output clocks. Hence, we can expect two types of duty cycle errors at the output due to differential to single ended conversion. The final clock required for distribution is generated using a comparator which outputs a rail-to-rail swing. Fig. 9 shows a typical scenario where the clocks can have 50% duty cycle but their DC offsets are different; sinusoidal voltages are shown here for simplicity. The second case, Fig. 10, shows that the common mode voltages may be equal but their duty cycles remain imbalanced. We observe that the duty cycle problem can be solved if the DC values of these waveforms are made equal. However, it must be noted that the maximum duty cycle correction that can be applied depends on the value of the risetime  $(t_{rise})$  and fall time  $(t_{fall})$ , relative to the pulse width. The maximum possible duty cycle correction is  $t_{rise} + t_{fall}$ .



Fig. 9. DC offset correction with 50% duty cycle waveforms



Fig. 10. Duty cycle correction with non 50% duty cycle waveforms

1) Loop transfer function: We performed an analysis of the duty cycle corrector loop by linearizing the circuit under steady state conditions. We assume that the gain of the corrector loop is  $G_{corr}$  and the detector gain is  $G_{det}$ . With input and output clocks  $v_{clk,in}$  and  $v_{clk,out}$ , respectively, we get the following equations:

$$v_{clk,out} = G_{corr}(v_{clk,in} - v_{err}) \tag{4}$$

where  $v_{err}$  is the error voltage generated by the detector, and is integrated onto capacitance  $C_d$ . Therefore,

$$v_{err} = \frac{G_{det}}{sC_d} v_{clk,out} \tag{5}$$

Using (4) and (5), we obtain

$$v_{clk,out} = \frac{sC_dG_{corr}}{sC_d + G_{corr}G_{det}} v_{clk,in} \tag{6}$$

Since  $v_{err}$  is an averaged value of the instantaneous error signal, the final output voltage  $v_{clk,out}$  is an indicator of the steady state DC imabalance between the two differential VCO input legs. Hence, we can write the duty cycle transfer function to be approximately

$$D_{out} = \frac{sC_dG_{corr}}{sC_d + G_{corr}G_{det}} D_{in} \tag{7}$$

where  $D_{out}$  and  $D_{in}$  are the output and input duty cycles. Using the final value theorem, for a step input  $D_{in} = \frac{D_0}{s}$ , we observe that the final steady state output duty cycle  $D_{out}$  is zero; this is expected as we have an integrator in the loop. Therefore, we expect no duty cycle error from the first type of error (i.e DC offset with 50% duty cycle clocks). In practice, the steady state ripple on the error signal propagates to the output clock and affects settling. A second order filter was used in our design to reduce this ripple.

2) Circuit Implementation: The corrector and detector circuits proposed in [4] are now considered. Examining transistor-level circuit design, we need to consider the amplitude of the VCO clock signals (i.e. either small signal or full swing). Fig. 11 shows the corrector circuit used in [4]. The inputs to this circuit are ck and ckb. These clocks need to be small signal to preserve the slope information of the signal. Also, a DC error signal imbalance cannot be created if there is full current switching between the corrector MOS differential pair. Fig. 12 shows the corrector implementation used in our work. The transconductance  $g_m$  of the input differential pair is chosen based on the required input swing. We assume a gain of 1 between the input and the output clocks. The PMOS symmetric loads are replaced with resistors for simplicity. The  $g_m$  of the detector differential pair is designed to be large to avoid transistor operation in the triode region, due to a large difference in error signals.



Fig. 11. Circuit diagram of duty cycle corrector used in [4]

The generated output clocks *clkout* and *clkoutb* are fed to the detector circuits shown in Fig. 13, with capacitive loading. The cross-coupled PMOS loads at the output of the detector ensure that the capacitors carry equal currents (with opposite phases), making the detector a differential charge pump circuit. The generated error signal is coupled to the corrector using negative feedback to create an imbalance in the output clock common mode voltages. It is still not clear from [4] whether the input clock to the detector must be full swing to switch



Fig. 12. Circuit diagram of the duty cycle corrector used in this work



Fig. 13. Circuit diagram of duty cycle detector used in [4]

the currents between the differential pairs. It is clear, however, from the steady state operation of the circuit, the net charge accumation on the integrating capacitor is zero. We use this fact to analyze the constraints on the detector circuit.



Fig. 14. Single ended output current waveform with timing parameters

Consider the waveform shown in Fig. 14. Under the steady state small signal operation of the circuits, the net charge accumulation on the capacitor must be zero. This means that the area under the positive half and negative half of the signal must be equal. Therefore, with a rise/fall slope of m, the positive and negative areas are, i.e.

$$(t_0)(mt_0) + (x)(mt_0) = (t_1)(mt_1) + (y)(mt_1)$$
(8)

Therefore,

$$t_0(t_0 + x) = t_1(t_1 + y) \tag{9}$$

However, we require  $2t_0+x = 2t_1+y$  under steady state for duty cycle correction. Combining this with (9) gives a trivial solution  $t_0 = t_1$ , which cannot be true. Hence, the only way we can achieve  $2t_0 + x = 2t_1 + y$  under steady state is when the current is kept constant during the error measurement. This means that the clock input signal to the detector must be full swing. The loop transfer function analysis carried out above is still valid if we consider the rise/fall times to be much smaller than the pulse width. However, in high speed systems, the signal rise/fall time is comparable to the pulse widths. Thus, the detector must be able to deliver constant current almost immediately when the input clock transitions.



Fig. 15. Circuit diagram of duty cycle detector used in this work



Fig. 16. Simulated output duty cycle with DC offset of  $60\,\mathrm{mV}$  and 50% clock duty cycle

Fig. 15 shows the detector circuit used in this work. The push pull current source is implemented using ideal voltage controlled current sources. The charge pump current is  $50 \ \mu\text{A}$ . Fig. 16 shows the simulated output duty cycle with a DC offset of 60 mV. We observe that the input duty cycle is around 62.7% and the corrected output duty cycle is 49.7%.



Fig. 17. Simulated output duty cycle with DC offset 41.6% clock duty cycle

Fig. 17 shows the simulated output differential clock voltages with an input duty cycle of 41.6% (i.e. with a pulse width of 80 ps) and 50 ps rise/fall time in a 3.2 GHz clock. The corrected output duty cycle is 50.96%.



Fig. 18. Simulated output duty cycle versus input duty cycle

In order to fully characterize the DCC loop, the input duty cycle was swept from 35% to 70%. The simulation results are shown in Fig. 18. It is visible that there is very little variation in the output duty cycle for large variations in the input duty cycle. This data confirms the effectiveness of this technique. It can also be seen that the correction technique loses effectivenes as the duty cycle exceeds a particular value. This is because, as previously mentioned, the DCC can correct the duty cycle to within  $t_{rise} + t_{fall}$ . As this limit is approached, the output duty cycle error also increases.

#### C. Duty Cycle Correction Integrated with PLL

Due to the fact that duty cycle errors arise from a DC shift between differential clocks, the mismatch between devices was modeled to introduce a differential DC shift in the PLL VCO. The worst case mismatch was calculated using the following equations:

$$\sigma^2(\Delta V_t) = \frac{A_{V_t}^2}{WL} \tag{10}$$

where  $A_{V_t} \approx 3.5 - 4 \,\mathrm{mV}\mu\mathrm{m}$  for these technologies.

To allow an approximate modeling of the mismatch, an average of the worst case threshold variations was calculated as given below.

$$\sigma(\Delta V_t) = \sqrt{0.5[\sigma^2(\Delta V_{t,NMOS}) + \sigma^2(\Delta V_{t,PMOS})]} \quad (11)$$

Hence, for the inverter, we get an approximate threshold voltage mismatch of 69.877 mV. Fig. 19 shows the simulated eye diagram at the output of the PLL without mismatch or DCC. We observe that without any mismatch, the duty cycle of the output waveform remains almost 50%.



Fig. 19. Simulated eye diagram at the output of the PLL without mismatch and without  $\ensuremath{\mathsf{DCC}}$ 

In order to characterize the effect of mismatch on the PLL output duty cycle, mismatch was introduced into the PLL VCO. The mismatch voltage calculated above was added to the gate of each VCO inverter stage with alternating polarities, as shown in Fig. 20. This causes one of the edges of the output waveform to slow down in every stage while the opposite edge speeds up equivalently. Hence, we obtain a large change in the duty cycle. Fig. 21 shows the simulated eye diagram at the output of the PLL with mismatch introduced, and no DCC. We observe a duty cycle of 55% at the output. We also observe that the introduction of VCO mismatch causes the output jitter to slightly increase.



Fig. 20. Mismatch introduction in the inverter chain

Fig. 22 shows the simulated eye diagram at the output of the PLL with mismatch and DCC active. A duty cycle of around 49.9% is obtained, thus confirming the operation of the DCC circuit.

#### IV. CONCLUSION

In this work, we discussed the state of the art techniques used to improve the fidelity of distributed high-frequency



Fig. 21. Simulated eye diagram at the output of the PLL with mismatch and without DCC



Fig. 22. Simulated eye diagram at the output of the PLL with mismatch and with DCC

clocks. Our main emphasis was an in-depth analysis of the duty cycle correction technique used in [4] to alleviate duty cycle imbalance due to mismatch in PLLs. Balanced duty cycles allow for the robust design of digital systems that use both the positive and negative clock phases. Simulation results show a worst case duty cycle variation of 2% in the output clock as the input duty cycle varies by over  $\pm 15\%$ . Integrating this correction loop with a imbalanced duty cycle PLL results in an output duty cycle with 0.1% error.

This duty cycle correction scheme works for signals where the sum of the rise and fall times is greater than the required correction to be applied. Although the rise/fall time is a significant proportion of the pulse width in high frequency systems, the variation in duty cycle is also significant and limits the accuracy of analyzed techniques. This is the key limitation of this technique. Mismatch effects in the corrector circuit also needs to be considered to obtain precise 50% duty cycle outputs. An interesting extension of this problem would be to use this technique for reduction of jitter in a PLL. By using a divide-by-2 output clock, a 50% duty cycle correction scheme could be applied to this new clock. This would result in the time periods of the original clock being stabilized, thereby reducing its long-term jitter.

#### REFERENCES

- K.Lim, C. Park, D. Kim, and B. Kim, "A Low-Noise Phase-Locked Loop Design by Loop Bandwidth Optimization," in *IEEE Journal of Solid-State Circuits*, vol. 35, no. 6, Jun 2000.
- [2] T. Ogawa and K. Taniguchi, "A 50% Duty-Cycle correction circuit for PLL output," in *IEEE International Symposium on Circuits and Systems*, vol. 4, 2002, pp. 21–24.

- [3] D. Fischette, A. Loke, M. Oshima, B. Doyle, R. Bakalski, R. DeSantis, A. Thiruvengadam, C. Wang, R. Talbot, and E. Fang, "A 45nm SOI-CMOS Dual-PLL Processor Clock System for Multi-ProtocolI/O," in *IEEE International Solid-State Circuits Conference*, Feb 2010, pp. 246– 247.
- [4] N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas, and R. Kumar, "Next Generation Intel Core Micro-Architecture (Nehalem) Clocking," in *IEEE Journal of Solid-State Circuits*, vol. 44, Apr 2009, pp. 1121– 1129.
- [5] F. Gardner, "Charge-pump phase-lock loops," in *IEEE Transactions on Communications*, no. 11, Nov 1980, pp. 1849–1858.
- [6] J. G.Maneatis, "Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques," in *IEEE Journal of Solid-State Circuits*, vol. 31, no. 11, Nov 1996.
- [7] S. Sidiropoulos, D. Liu, J. Kim, G. Wei, and M. Horowitz, "Adaptive Bandwidth DLLs and PLLs using Regulated Supply CMOS Buffers," in *Symposium on VLSI Circuits Digest of Technical Papers*, 2000, pp. 124–127.
- [8] M. Mansuri and C. Ken Yang, "Jitter optimization based on phaselocked loop design parameters," in *IEEE Journal of Solid-State Circuits*, vol. 37, Nov 2002, pp. 1375–1382.
- [9] M. Mansuri, "Low-power low-jitter on-chip clock generation," in *Ph.D. dissertation, Univ. California, Los Angeles*, Jun 2003.
- [10] M. Mansuri, A. Hadiashar, and C. Ken Yang, "Methodology for On-Chip Adaptive Jitter Minimization in Phase-Locked Loops," in *IEEE Transactions on Circuits and System-II : Analog and Digital Signal Processing*, vol. 50, no. 11, Nov 2003.
- [11] S. D.Vamvakos, C. Werner, and B. Nikolic, "Phase-Locked Loop Architecture for Adaptive Jitter Optimization," in *Proceedings of the International Symposium on Circuits and Systems, ISCAS 2004*, vol. 4, 2004.
- [12] F. Mu and C. Svensson, "Pulsewidth Control Loop in High-Speed CMOS Clock Buffers," in *IEEE Journal of Solid-State Circuits*, vol. 35, Feb 2000, pp. 134–141.
- [13] B. Stackhouse, S. Bhimji, C. Bostak, D. Bradley, B. Cherkauer, J. Desai, E. Francom, M. Gowan, P. Gronowski, D. Krueger, C. Morganti, and S. Troyer, "A 65 nm 2-Billion Transistor Quad-Core Itanium Processor ," in *IEEE Journal of Solid-State Circuits*, vol. 44, Jan 2009, pp. 18–31.
- [14] Y. Ian, J. Greason, and K. Wong, "A PLL Clock Generator with 5 to 100 MHz of Lock Range for Microprocessors," in *IEEE Journal of Solid-State Circuits*, vol. 27, Nov 1992, pp. 1599–1607.
- [15] S. Rusu and S. Tam, "Clock Generation and Distribution for the First IA-64 Microprocessor," in *IEEE ISSCC Dig. Tech. Papers*, 2000, pp. 176–177.
- [16] R. Parker, "Duty cycle measurement circuit," in http://www.freshpatents.com/Duty-cycle-measurement-circuitdt20070405ptan20070075753.php.
- [17] S. Tam, "Modern Clock Distribution Systems," in *Clocking in Modern VLSI Systems, Springer US*, 2009, pp. 46–47.