A 9.8 Gbps, 6.5 mW forwarded-clock receiver with phase interpolator and equalized current sampler in 65 nm CMOS

A full-rate energy-efficient forwarded-clock (FC) receiver is demonstrated in this paper. A current sampler with continuous-time equalization is realized with 20 GHz bandwidth in sampling for data recovery. Moreover, a phase interpolator is introduced to generate sampling clock with deskew for data recovery. The testing chip was fabricated in 65 nm CMOS process in area of 0.16 mm2. Measurement shows that the FC receiver can achieve a data-rate up to 9.8 Gbps and power consumption is 6.5 mW.


I. INTRODUCTION
Source synchronous links with forwarded-clock (FC) architecture [1]- [7] is widely deployed in parallel I/O interface due to its low power consumption, inherent correction of clock and data jitter, and appropriate jitter tracking bandwidth (JTB). In the FC receiver, the static phase offset (SPO) between input data and sampling clock is corrected at start-up; while the dynamic phase error (DPO)/jitter is tracked by forwarded clock with jitter correction.
The model of the FC receiver is shown in Fig.1. The data and the clock are sent to receiver simultaneously. However, due to PCB traces mismatch and frequency dependent delay from the channels, the data and the clock have a time misalignment at receiver side especially for high data rate. As result, the SPO has to be corrected before sampling which is realized by PI in this paper. Due to the appropriate jitter-track-bandwidth (JTB) introduced by FC receiver structure, the DPO can be also well restrained. The phaseinterpolator (PI) introduced in this paper can generate the wide-range (0 o -360 o ) of clock deskew which can cover the phase misalignment and make sure the sampling at the center of the data as shown in Fig.1. As a result, the low bit error rate can be achieved and making the FC receiver is insensitive to the jitter.
Moreover, continuous-time linear equalizer (CTLE) is widely utilized in FC receivers [7]- [9] due to its compact structure and better high frequency performances for middle-distance interconnects (such as interposer based memory-logic integration) without decision feedback equalizer (DFE) taps. The CTLE equalizer is usually followed by a sampler in traditional data recovery circuits. But the sampler always has limited bandwidth and speed due to voltage sampling structure that seriously degrades the speed even though the equalizer provides a gain-boost at high frequency to compensate channel loss [9]- [11]. In this paper, we use a current sampling structure sampler merged with the equalized function to realize high speed sampling.
Current sampler is introduced with 20 GHz bandwidth, 10 GSps sampling rate and 18 dB gain-boost at 10 GHz.   Compared to the conventional voltage sampler after the equalizer, the switched-source-follower (SSF) based current sampler is merged with one active CTLE, whose equalization is realized by inductive loading.
The testing chip was fabricated in 65 nm CMOS process within area of 0.16 mm 2 . The measurements show that: data-rate up to 9.8 Gbps can be achieved with BER below 10 -12 and energy efficiency of 0.67 mW/Gbps. The rest of the paper is organized as follows. Section II presents the equalized current sampler for data recovery. Section III discusses PI for clock recovery. The FC receiver prototype with measurements results is presented in Section IV and conclusions are drawn in Section V.
II. DATA RECOVERY: CURRENT SAMPLER WITH CONTINUOUS-TIME EQUALIZATION As shown in Fig.2 (a)-(b), the proposed current-sampler is merged with the active CTLE equalizer as follows. It consists of input buffer with inductive loading L 1 for active equalization and switched source follower (SSF) which is a current sampling structure. The merging principle of the sampler is that when CLK=1, I 1 will flow through path-I and the input buffer can boost the high frequency part of the data to realize equalization function as shown in Fig.2(a); when CLKB=1, the current I 1 will flow through path-II and M 2 will be turned off to hold the data. As such, the equalization function and the sampling function are realized by proposed circuit simultaneously. Meanwhile, the input matching of FC receiver is realized by shunt resistor R match .

A. CTLE Equalization
For middle-distance interconnects (<10 cm) such as interposers for memory-logic integration at inter-die level, a continuous-time linear equalizer (CTLE) is sufficient enough for data recovery [7]- [9] without decision feedback equalizer (DFE) taps. As shown in Fig.2 (b), when the input data with channel loss arrives at input (VIN, VIP) of the input buffer, the compensation at high frequency can be achieved by the inductive load L 1 with gain-boosting. The gain of the input buffer is targeted to have peak at 10 GHz for the compensation. As such, the value of its inductor load L 1 must be optimized. As shown in Fig.3 (a), L 1 is 1.2 nH obtained by sweeping from 0.3 nH to 2.7 nH, and is realized within a compact area of 50 um × 50 um. Moreover, the current source I 2 can be tuned from 0.6 mA to 2.4 mA for an adaptive equalization as shown in Fig.3  (b).

B. Current Sampling
Compared to the voltage sampling, the current sampling can achieve superior sampling speed [11]- [12]. To implement the current sampling, the SSF structure is commonly utilized.
As shown in Fig. 2(a), the equalized data can be recognized as "0" or "1" at point X and will be further sampled by SSF. Note that the input buffer transfers the input data from voltage domain to current domain by the transconductance of M 8 . When CLK=1, the current I 1 flows through M 2 by path-I, and the sampler tracks the equalized data at track-mode; when CLKB=1, the current I 1 flows through M 4 and R 1 by path-II, and the sampler holds the input data due to the low voltage of the node X that turns off transistor M 2 at hold mode. Moreover, the bandwidth of the SSF is also improved because the inductor L 1 can absorb part of parasitic capacitor C at the node X [12].
As a result, the equalized current sampler can realize both of the sampling and equalization functions at the same time with the low power and high energy efficiency.

III. CLOCK RECOVERY: PHASE INTERPOLATOR
In the conventional clock recovery design, the clock deskew is realized by a single ILO and the deskew is highly dependent on the offset frequency between the injected frequency and the ILO's free running frequency. What is worse, it can only provide a 90 o phase deskew. In order to achieve a larger phase deskew to cover the phase misalignment between data and clock, a phase interpolator (PI) is applied in this paper to generate clock deskew, instead of utilizing the single ILO.
As shown in Fig.4, the ILO-I with quadrature voltage controlled oscillator (QVCO) structure is firstly locked to IV. MEASUREMENT RESULTS The prototype of the proposed FC receiver was fabricated in UMC 65 nm CMOS process. The channel length is 4~5 cm on FR-4 substrate of 2-layer PCB. The test setup is shown in Fig.5. The random data is generated and transmitted by Agilent J-BERT N4903A. The chip is in area of 0.16 mm 2 with die photo as shown in Fig.5.

A. Data Recovery Measurements
Firstly, the data recovery is measured. The eye diagrams of the recovered 5 Gbps and 9.8 Gbps data with 2 15 -1 random data patterns are measured by Agilent J-BERT N4903A as shown in Fig.5 (a)-(b). The eye is well open with 200 mV, and the BER is below 10 -12 at 5 Gbps and below 10 -10 at 9.8 Gbps.

B. Clock Recovery Measurements
Secondly, the clock recovery is measured. The transient I/Q signals of ILO-I in the FC receiver are measured to check the jitter performance by Agilent Infiniium 90008 with 40 GSps sampling rate and 13 GHz bandwidth. The measured result of the 8 GHz I/Q signals is shown in Fig.7 (a), and its peak-to-peak jitter is around 20 ps as shown in Fig.7 (b). The measured BER with phase deskew is shown in Fig.8.
Lastly, Table I shows the comparison of recently published FC receivers. The proposed FC receiver achieves the data rate of 9.8Gbps and the highest energy efficiency of 0.65mW/Gbps with the full-rate architecture.
V. CONCLUSION This paper presents a FC receiver by equalized current sampler for data recovery and phase-interpolation for clock recovery implemented in 65nm CMOS. The current sampler has merged CTLE function with 18dB gain at 10 GHz and 10 GSps sampling speed with 20 GHz bandwidth. Moreover, the PI can provide 0-360 o clock deskew. The measurement results show that the data rate is up to 9.8 Gbps with the energy efficiency of 0.65mW/Gbps.