Joint carrier phase and frequency-offset estimation with parallel implementation for dual-polarization coherent receiver

Jianing Lu; Xiang Li; Songnian Fu; Ming Luo; Meng Xiang; Huibin Zhou; Ming Tang; Deming Liu

doi:10.1364/OE.25.005217

1. Introduction

To satisfy the fast-growing Internet traffic demands in fiber optical transmission systems, the utilization of high-order M-ary quadrature amplitude modulation (M-QAM) combined with coherent detection has received worldwide research interests [1–4]. Meanwhile, polarization division multiplexing (PDM) is a technique that utilizes the light polarization as another degree of freedom to easily double the transmission capacity while keeping the signal bandwidth unchanged [5,6]. Therefore, dual-polarization M-QAM (DP-M-QAM) signals have been applied in current transmission system with 100 Gb/s per channel, and will be the natural choice for the next generation 400 Gb/s optical transmission systems [7,8]. Frequency offset (FO) and phase noise are two main impairments due to the free-running lasers used as carrier source at the transmitter side as well as the local oscillator (LO) laser at the receiver side [9]. Viterbi-Viterbi (V-V) algorithm has been proposed for carrier phase estimation (CPE) of QPSK format with good tolerance to laser linewidth [10]. When it is applied to 16-QAM, QPSK partition modification is used to maintain its performance [11]. It is noted that for higher-order QAM, QPSK partition scheme is challenging to implement. Alternatively, blind phase search (BPS) is the most popular laser linewidth-tolerant algorithm which can be applied to arbitrary modulation formats [12]. However, V-V and BPS algorithms are designed without considering the FO. For example, the performance of V-V algorithm may suffer from severe degradation when the frequency offset is beyond 10⁻³ times the symbol rate at laser linewidth-symbol duration product of $6.35 \times 10^{- 5}$ [13]. Therefore, either V-V or BPS algorithm is generally implemented after the FO estimation (FOE) module. Considering the FOE, a popular solution is the fast Fourier transform based FOE (FFT-FOE) which is a non-data-aided FOE with a limited estimation resolution and a modulation-format-dependent estimation ranges ( ± 1/2M symbol rate for M-PSK and ± 1/8 symbol rate for M-QAM) [14]. Since the commercially-available lasers usually have a frequency accuracy with ± 2.5 GHz over the lifetime, the variation of the FO between LO and transmitter laser can be [-5 GHz, + 5 GHz] [15], which may exceed the estimation range of FFT-FOE with a symbol rate lower than 40 Gbaud. As a result, the consequent CPE fails to function well. In practice, laser frequency drifts over time at the MHz/s range, due to aging or temperature variation, and may also experience sudden frequency jumps due to mechanical disturbances to the laser cavity. Hence, the FO needs to be continuously tracked for achieving optimal BER performance at the receiver. A complex-weighted, decision-aided, maximum-likelihood (CW-DA-ML) carrier estimator (CE) has been proposed for joint CPE and FOE [16,17]. CW-DA-ML CE uses a CW transversal filter to generate a carrier reference phasor, and the filter weights are automatically updated by linear regression on the observed signals. A complete FOE range of ± symbol rate/2 which is independent of modulation format has achieved with low overhead cost. Especially, no operation of phase unwrapping is required, indicating of occurrence of cycle slips at either low SNR or wider laser linewidth [16]. Thanks to its observation dependent weight vector, CW-DA-ML can continuously track the FO. The performance of CW-DA-ML has been numerically investigated for single-polarization 4/8/16-QAM signals, which is called SP-CW-DA-ML. The laser linewidth tolerance of SP-CW-DA-ML is comparable to that of V-V algorithm for 4-QAM and QPSK partition scheme for 16-QAM, but worse than BPS algorithm. Although SP-CW-DA-ML has linear computation which is feasible for real-time implementation, the feedback delay D cannot be ignored during its parallel implementation. At the presence of large laser phase noise, the performance penalty originating from feedback delay is serious [16]. This is due to the difficulty in estimating the accumulated Gaussian random-walk of the carrier phase over the previous D-symbols, leading to unavailable estimated carrier phase information because of the feedback delay. Consequently, the performance of SP-CW-DA-ML is degraded dramatically. Superscalar parallelization structure (SSP) has been proposed to implement the CPE algorithms operated in a feedback manner, e.g. phase lock loop (PLL), to remove the feedback delay [18,19]. However, SSP cannot be directly applied to CW-DA-ML without any modifications, because CW-DA-ML carries out the CPE and FOE simultaneously, resulting in high overhead cost and much larger size for each buffer.

In this paper, we present a superscalar parallelization structure based dual-polarization CW-DA-ML (SSP-DP-CW-DA-ML) algorithm. Since the signals from dual polarizations suffer from identical laser phase noise with a constant phase offset [20,21], phase information of dual polarizations can be jointly processed to improve the performance of CPE. Simulation results show that our proposed algorithm can provide almost the same laser linewidth tolerance as that using FFT-FOE together with BPS. A complete frequency offset estimation range of ± symbol rate/2 is achieved. Finally, the proposed SSP-DP-CW-DA-ML is experimentally verified under the scenario of back-to-back (B2B) transmission using 10 Gbaud DP-16/32-QAM.

2. Operation principle

At the receiver side, a canonical model of the dual-polarization received signal after ideal analog-to-digital conversion (ADC), clock recovery and retiming, chromatic dispersion compensation, polarization division de-multiplexing, and polarization-mode dispersion (PMD) compensation can be written as [16]

r_{p} (k) = m_{p} (k) \exp (j (Δ ω k + θ_{p} (k))) + n_{p} (k), k = 0, 1, 2, \dots p = x, y

where p represents the X or Y polarization.

m_{p} (k)

is the k^th data symbol.

Δ ω = 2 π Δ f T

is the angular frequency offset where T is the symbol period. The

θ_{p} (k) = η (k) + θ_{p} (k - 1)

is the laser phase noise modeled as a Wiener process, where {

η (k)

} is a sequence of independence and identically distributed Gaussian random variables with mean zero and variance

σ_{p}^{2} = 2 π Δ v T

[12]. Here,

Δ v

is the combined linewidth of the transmitter and LO lasers.

n_{p} (k)

stands for additive complex white Gaussian noise (AWGN). The laser phase noise process {

θ_{p} (k)

} varies slower than the symbol rate. Therefore, SP-CW-DA-ML can approximate

θ_{p} (k)

to be time-invariant over an interval longer than LT, where L is an integer. A reference phasor (RP)

V (k + 1)

is obtained for the carrier as time

k + 1

by filtering the immediate past L symbols, as

V (k + 1) = C (k) \sum_{l = 1}^{L} w_{l} (k) r (k - l + 1) {\hat{m}}^{*} (k - l + 1)

where each

w_{l} (k)

is complex weighted, and

\hat{m} (k)

is the symbol decision of

r (k)

.

C (k) = {(\sum_{l = 1}^{L} {| \hat{m} (k - l + 1) |}^{2})}^{- 1}

is the factor to normalize the magnitude of

V (k + 1)

, making SP-CW-DA-ML applicable to all modulation formats. The filter-input vector

y (k)

is the L-by-1 filter-input vector,

y (k) = {[r (k) {\hat{m}}^{*} (k), \dots, r (k - L + 1) {\hat{m}}^{*} (k - L + 1)]}^{T}

The filter-weight vector at time k,

w (k) = {[w_{1} (k), w_{2} (k), \dots, w_{L} (k)]}^{T}

is designed to rotate each filter-input term to have the same angular FO of

Δ ω (k + 1)

for correctly estimating the carrier of

r (k + 1)

. Then

w (k)

is adaptively calculated at each time k to minimize the sum-of-error-squares cost function

J (k)

,

J (k) = \sum_{l = 1}^{k} {| \frac{r (l)}{\hat{m} (l)} - C (l - 1) w^{T} (k) y (l - 1) |}^{2}

By minimizing

J (k)

with respect to

w (k)

, the optimal

w (k)

can be obtained by

w (k) = Φ^{- 1} (k) z (k), k \geq 1

where

Φ (k) = Φ (k - 1) + C^{2} (k - 1) y^{*} (k - 1) y^{T} (k - 1)

is the time-average L-by-L autocorrelation matrix and

z (k) = z (k - 1) + \frac{C (k - 1) y^{*} (k - 1) r (k)}{\hat{m} (k)}

is the time-average L-by-1 cross-correlation vector. Initialization of

V (0) = 1

,

w (0) = 1

, and

Φ (0) = 0.01 I

are performed, where

I

is an identity matrix. The phase of

w (k)

,

a r g (w (k))

, can adapt to track

{[Δ ω, 2 Δ ω, \dots, L Δ ω]}^{T}

by a series of recursive equations which will be shown later. On the other hand, the magnitude

| w |

can be adjusted adaptively according to the

Δ v T

, i.e.

| w (1) | > | w (2) | > \dots > | w (L) |

. The larger

Δ v T

is, the larger

| w (l) |

which matches the received symbol close to the current moment is. This is because that phase noise

θ (k)

becomes less related with

θ (k - l)

as

l

increases. Hence, symbols

r (l)

further back in time carry less useful information on the laser phase noise in symbol

r (k + 1)

and thus get weighted down. When

Δ v T

is larger, this phenomenon is more apparent.

Figure 1(a) shows our proposed DSP flow including phase offset compensation as well as DP-CW-DA-ML, where $r_{x} (k)$ and $r_{y} (k)$ represent the received k^th received symbol in X and Y polarization, respectively. $m_{x} (k)$ and $m_{y} (k)$ denote the k^th training symbol in X and Y polarizations, and L is the filter length of SP-CW-DA-ML. SP-CW-DA-ML uses a training sequence for the initial symbol decisions. It has been shown that SP-CW-DA-ML algorithm converges quickly yielding rapid carrier phase and frequency tracking. Thus, a training sequence length of 2L is sufficient to aid the complex weights of the filter to track the FO [17]. The phase acquisition time of SP-CW-DA-ML, which is the number of received symbols to achieve the total phase estimation error variance within 3% error floor, is about $4 \times 10^{3}$ for all modulation formats due to its modulation format independence [17]. Therefore, in our proposed structure, the first received $N = 4 \times 10^{3}$ symbols and 2L training symbols are introduced to SP-CW-DA-ML to obtain initial reference phasors $V_{x} (i n i t i a l)$ and $V_{y} (i n i t i a l)$ for dual polarizations. Thanks to its unambiguous phase tracking range of $[0, 2 π)$ , the constant phase offset between X and Y polarization can be calculated by comparing the phase of those two reference phasors without phase unwrapping:

φ_{o f f s e t} = \arg (V_{x} (i n i t i a l)) - \arg (V_{y} (i n i t i a l))

The symbols in X polarization

r_{x} (k)

are rotated by a factor of

\exp (- j φ_{o f f s e t})

to ensure the symbols in both polarizations sharing the identical phase rotation. Then, the rotated

r_{x}' (k)

and

r_{y} (k)

are reshaped by parallel-to-serial conversion, as well as the training symbols

m_{x} (k)

and

m_{y} (k)

. All symbols and training symbols are sent into our proposed DP-CW-DA-ML, which is designed for dual-polarization joint CPE and FOE. Figure 1(b) shows the detailed schematic diagram of DP-CW-DA-ML. The

m_{x} (k)

and

m_{y} (k)

used for SP-CW-DA-ML are reused for DP-CW-DA-ML, without extra overhead cost. The filter length of DP-CW-DA-ML is 2L which is twice as that of SP-CW-DA-ML. In fact, the length of all variables in DP-CW-DA-ML is extended to twice as that of SP-CW-DA-ML. At each time

k + 1

, a reference phasor

V (k + 1)

is obtained by filtering the last 2L received samples as:

V (k + 1) = C (k) w^{T} (k) y (k)

, where

C (k)

is the normalized factor,

C (k) = {(\sum_{l = 1}^{L} {| {\hat{m}}_{x} (k - l + 1) |}^{2} + {| {\hat{m}}_{y} (k - l + 1) |}^{2})}^{- 1}

.

\hat{m} (k)

is the decision of

r (k)

. In particular,

\hat{m} (k) = m (k)

,

0 \leq k \leq 2 L - 1

.

w (k)

is the 2L-by-1 filter-weight vector designed to rotate each filter-input term to have the same angular frequency offset:

w (k) = {[w_{1} (k), w_{2} (k), w_{3} (k), w_{4} (k), \dots, w_{2 L - 1} (k), w_{2 L} (k)]}^{T}

.

y (k)

is the filter-input vector

y (k) = {[r'_{x} (k) {\hat{m}}_{x}^{*} (k), r'_{y} (k) {\hat{m}}_{y}^{*} (k), \dots, r'_{x} (k - L + 1) {\hat{m}}_{x}^{*} (k - L + 1), r'_{y} (k - L + 1) {\hat{m}}_{y}^{*} (k - L + 1)]}^{T}

. At same time, the

w_{x} (k)

and

w_{y} (k)

in the first-stage SP-CW-DA-ML have both approximately tracked

{[Δ ω, 2 Δ ω, \dots, L Δ ω]}^{T}

after first received N samples so that we can use

w_{x} (k)

and

w_{y} (k)

to initialize the filter-weight vector of DP-CW-DA-ML

w (k)

. Therefore, the parameters are initialized as

V (0) = 1

,

w (0) = {[w_{1 x} (k), w_{1 y} (k), w_{2 x} (k), w_{2 y} (k), \dots, w_{L x} (k), w_{L y} (k)]}^{T}

, and

Φ^{- 1} (0) = δ^{- 1} I_{2 L}

.

Φ (k)

is the 2L-by-2L time-average autocorrelation matrix for adaptive iteration of

w (k)

and

V (k)

. Similar with SP-CW-DA-ML, the implementation of DP-CW-DA-ML can be summarized as follows (for k

\geq 1

):

(1) Compute intermediate vector $ψ (k)$ , $ψ (k) = C (k - 1) Φ^{- 1} (k - 1) y^{*} (k - 1);$
(2) Compute gain vector $g (k)$ , $g (k) = \frac{ψ (k)}{1 + C (k - 1) y^{T} (k - 1) ψ (k)};$
(3) Compute an average prior estimation error between two polarizations $\bar{ξ} (k)$ , $\bar{ξ} (k) = \frac{(\frac{r_{x} (k)}{{\hat{m}}_{x} (k)} - V (k)) + (\frac{r_{y} (k)}{{\hat{m}}_{y} (k)} - V (k))}{2};$
(4) Recursively update filter-weight vector $w (k)$ , $w (k) = w (k - 1) + g (k) \bar{ξ} (k);$
(5) Recursively update inverse correlation matrix $Φ^{- 1} (k)$ , $Φ^{- 1} (k) = T r i {Φ^{- 1} (k - 1) - g (k) ψ^{H} (k)};$
(6) Compute next reference phasor, $V (k + 1) = C (k) w^{T} (k) y (k);$

Finally

V (k + 1)

is used to recover the received symbols

r'_{x} (k + 1)

and

r_{y} (k + 1)

at time

t = k + 1

. In a summary, DP-CW-DA-ML becomes valid under the following conditions:

Fig. 1 (a) Structure of proposed scheme with phase offset compensation; (b) Schematic diagram of DP-CW-DA-ML.

Download Full Size | PDF

(1) Phase information of dual polarizations can be jointly processed to improve the performance of CPE, by increasing the length of available symbols suffered from similar phase noise in each estimation.
(2) $w_{x} (k)$ and $w_{y} (k)$ are approximately tracked ${[Δ ω, 2 Δ ω, \dots, L Δ ω]}^{T}$ after first received N samples at the first-stage of SP-CW-DA-ML. We use the $w_{x} (k)$ and $w_{y} (k)$ to form the filter-weight vector $w (k)$ of DP-CW-DA-ML, so that there is small FO acquisition time of DP-CW-DA-ML. Please note that the $w (k)$ will track as ${[Δ ω, Δ ω, 2 Δ ω, 2 Δ ω, \dots, L Δ ω, L Δ ω]}^{T}$ in DP-CW-DA-ML.
(3) In Eq. (11), the prior estimation is calculated more accurately by averaging. Then, the updating of $w (k)$ is more stable and accurate.
(4) As mentioned, $N = 4 \times 10^{3}$ symbols are necessary to keep the phase estimation error variance within 3% for SP-CW-DA-ML. It would be better to process those symbols again with the stable $w (k)$ after the iterative process in order to achieve the lower BER. The training symbols in SP-CW-DA-ML can be reused simultaneously without extra overhead.

3. Superscalar parallelization structure based parallel implementation

In fiber optical transmission systems, parallelization processing is indispensable to reduce the required clock speed [12]. Normally, the serial input symbols are interleaved to P channels each being processed through an individual CPE module at a lower clock speed, as shown in Fig. 2. It can be seen that in such case the delay between two adjacent symbols in each channel is increased to P symbols. As a feedback algorithm, although CW-DA-ML only has linear computation, the feedback delay D = P cannot be ignored during its parallel implementation. All previous simulation results use an ideal feedback delay of D = 1 [16,17], which is the minimum in a feedback system. At the presence of large laser phase noise, performance penalty originating from feedback delay is inevitable. In other words, the laser linewidth tolerance of such implementation is reduced by a factor of P. This is due to the difficulty in estimating the accumulated Gaussian random-walk of the carrier phase over the previous D-symbols because of feedback delay. As a result, the performance of CW-DA-ML is degraded dramatically. The superscalar parallelization (SSP) structure is first proposed to implement PLL to remove the feedback delay D = P caused by the interleaving parallelization with improved performance [18]. Especially, it employs a buffer with a size of S × P symbols to have consecutive symbols in each parallelized channel. Based on this technique the feedback delay of CPE is reduced from D = P to D = 1. Then a more efficient SSP structure is proposed with inverted order of symbols in the odd channels having consecutive symbols at the beginning for each two channels [19]. By doing so, each adjacent odd and even channel can share training symbols because they have similar phase noise. Consequently, the overhead can be halved for the same buffer size. Please note that with this SSP structure, the symbols in odd channels are processed reversely. Here, we propose to adopt the SSP structure for parallel implementation of DP-CW-DA-ML. However, unlike the situation in [19], where the PLL is only employed for the compensation of phase noise, DP-CW-DA-ML carries out both the CPE and FOE simultaneously. For CPE, a few training symbols are enough. In [19], the CPE processing is independent between buffers in each channel with few training symbols at the beginning of each buffer, making the buffer size as small as possible. But it has been mentioned that it takes thousands of symbol numbers for CW-DA-ML as the acquisition time for the convergence of $w (k)$ . Then if each buffer is processed independently, the buffer size will be huge to guarantee the system performance, which is not ideal for real-time buffer-by-buffer processing. Consequently, three modifications are made for SSP-DP-CW-DA-ML, as shown in Fig. 3:

Fig. 2 Interleaving implementation of PLL during parallelized processing. CH: channel.

Download Full Size | PDF

Fig. 3 Proposed modified superscalar structure for DP-CW-DA-ML.

Download Full Size | PDF

(1) The complex weights of the filters $w (p)$ at the same channel is transferred from a block to the next block between adjacent buffers for greatly reducing the acquisition time of FO. This is because that the change rate of FO is much slower than symbol rate, i.e. at the MHz/s range in practice [17]. Then we can only use length of 2L training symbols for DP-CW-DA-ML. It should be mentioned that $w (p)$ will track ${[Δ ω, Δ ω, 2 Δ ω, 2 Δ ω, \dots, L Δ ω, L Δ ω]}^{T}$ for even channels (p is even) or ${[- Δ ω, - Δ ω, - 2 Δ ω, - 2 Δ ω, \dots, - L Δ ω, - L Δ ω]}^{T}$ for odd channels (p is odd). By this way, the continuously tracking of FO can be guaranteed simultaneously.
(2) For joint processing using the phase information of dual polarizations, the buffer size is expanded to 2S × P. In each channel of the buffer, $r'_{x} (k)$ and $r_{y} (k)$ are arranged alternatively. Training symbol sequence composed of $L / 2$ symbols from X polarization and $L / 2$ symbols form Y polarization is located at the beginning of each channel within the buffer. Then each channel can share totally 2L training symbols, whose length is equal to the filter length of DP-CW-DA-ML.
(3) Since the CW-DA-ML has an unambiguous phase tracking range of $[0, 2 π)$ as well as a very low cycle slip probability [16], we can remove the differential coding/decoding with the help of training symbols to initialize the DP-CW-DA-ML at the beginning of each buffer. Moreover, even if a cycle slip appears, it will just affect only one block and thus the performance of whole system won’t be destroyed.

4. Performance

4.1 Simulations and discussions

Simulations are conducted to examine the FO and linewidth tolerance of the proposed DP-CW-DA-ML with or without SSP structure. Other algorithms including FFT-FOE combined with BPS (FFT-FOE + BPS) and SP-CW-DA-ML [16] are also implemented for the purpose of comparison. For SP-CW-DA-ML, the filter length is chosen to be 12, which is close to the optimal value of filter length for various formats [17]. For DP-CW-DA-ML, the filter length is 24. Training symbols are arranged at the beginning of each buffer. Therefore, the overhead can be expressed as $O v e r h e a d = N_{T S} / 2 S$ , where $N_{T S}$ is the number of training symbols in each channel of a buffer, and $2 S$ is the length of buffer size. For our SSP-DP-CW-DA-ML scheme, the buffer length of $2 S = 1200$ symbols including 12 training symbols is chosen, leading to 1% overhead which is the same as previous publication [21]. In our simulation and experiment using BPS, the numbers of test phase angles are 32, 32, and 64 for QPSK, 16-QAM, and 32-QAM, respectively [12]. The FFT size of FFT-FOE is 512 per polarization [22]. The parallelization degree P is set to be 8. 2¹⁸-1 bits are used to calculate the BER for all 4/16/32-QAM symbol sequences. The two most significant bits (MSB) of each symbol are differentially encoded [17], except for SSP-DP-CW-DA-ML. The laser phase noise is modelled as a Wiener process, and ASE noise is added to adjust the OSNR. In our simulations, the SNR references at BER = 10⁻³ without differential coding are chosen as the theoretical limits [12, 19], which are 9.8 dB, 16.6 dB, and 19.6 dB for QPSK, 16-QAM, and 32-QAM, respectively. First, we evaluate the performance of phase offset estimation using SP-CW-DA-ML. In [20], the authors estimate the phase offset by observing the difference between X and Y polarizations at the V-V algorithm outputs. Meanwhile, a recursive equation is used to estimate the phase offset accurately. Thus, it is reasonable to set the convergence length of 1000 symbols. However, the V-V algorithm is operated without considering the frequency offset (FO). In another word, the phase offset is obtained after the FO compensation. For our proposed scheme, we use 4000 symbols and 2L training symbols to obtain the phase offset using SP-CW-DA-ML by calculating the initial reference phasors $V_{x} (i n i t i a l)$ and $V_{y} (i n i t i a l)$ for dual polarizations. Meanwhile, $w_{x} (k)$ and $w_{y} (k)$ obtained from the first-stage SP-CW-DA-ML can be both approximately tracked ${[Δ ω, 2 Δ ω, \dots, L Δ ω]}^{T}$ after the received 4000 samples, so that we can use $w_{x} (k)$ and $w_{y} (k)$ to initialize the filter-weight vector of DP-CW-DA-ML $w (k)$ . As a result, the FO acquisition time of DP-CW-DA-ML is substantially reduced. Therefore, 4000 symbols and 2L training symbols are used for not only estimating the phase offset, but also obtaining a stable FO for the consequent DP-CW-DA-ML. Figures 4(a) and 4(b) show BER versus filter length of SP-CW-DA-ML for the purpose of phase offset estimation, by taking both 16-QAM and 32-QAM into account. The linewidth times symbol duration products ( $Δ v \cdot T$ ) are, respectively, set to be 7e-5, 1e-4, and 2e-4, for 16-QAM. Meanwhile, the values for 32-QAM are chosen as 2e-5, 4e-5, and 6e-5, respectively. Furthermore, the OSNR of received signal is set to reach the BER of 10⁻³. We can observe that, when the filter length is within a reasonable range (9 to 18 for 16-QAM, and 9 to 15 for 32-QAM), there is no obvious performance fluctuation with various laser linewidths. Therefore, the filter lengths are fixed to 12 and 24 for SP-CW-DA-ML and DP-CW-DA-ML in our work, respectively.

Fig. 4 BER versus filter length of SP-CW-DA-ML for (a) 16-QAM, (b) 32-QAM,

Download Full Size | PDF

Next, we investigate the linewidth tolerance without considering the FO, i.e. $Δ f$ = 0. In order to concentrate on the performance penalty due to phase noise rather than feedback delay, both SP-CW-DA-ML and DP-CW-DA-ML are operated under a symbol by symbol manner with D = 1 in our simulations. It should be mentioned that for SP-CW-DA-ML, the first $4 \times 10^{3}$ symbols within the phase acquisition time are processed again with the stable $w (k)$ after the iterative process for better BER performance. Figures 5(a)-5(c) show the OSNR penalty at $B E R = 1 \times 10^{- 3}$ as a function of the linewidth and symbol duration product ( $Δ v \cdot T$ ) for QPSK, 16-QAM, and 32-QAM, respectively. The theoretical limit is used as a reference. SP-CW-DA-ML has the worst performance under almost all conditions. However, DP-CW-DA-ML achieves similar linewidth tolerance of 1 dB OSNR penalty at $B E R = 1 \times 10^{- 3}$ for QPSK and 16-QAM in comparison with FFT-FOE + BPS, but a little worse for the case of 32-QAM. It proves that by using phase information of dual polarizations for joint CPE, the performance of CW-DA-ML can be greatly improved. Moreover, we find that not only the parallel processing can be realized for CW-DA-ML using SSP, but also the OSNR penalty can be improved especially for small linewidth by removing the differential coding/decoding. SSP-DP-CW-DA-ML achieves better tolerance than FFT-FOE + BPS for QPSK and 16-QAM, while similar tolerance is observed for 32-QAM. However, the performance of SSP-DP-CW-DA-ML degrades more rapidly than BPS with the increment of linewidth. Meanwhile, the OSNR penalty of SSP-DP-CW-DA-ML becomes closer to that of DP-CW-DA-ML under the condition of larger linewidth. This phenomenon is due to the transfer process of the filters $w (p)$ from a block to next block. During the transmission with relatively stable FO, $a r g (w (p))$ has always been forced to track ${[Δ ω, Δ ω, 2 Δ ω, 2 Δ ω, \dots, L Δ ω, L Δ ω]}^{T}$ for even channels or ${[- Δ ω, - Δ ω, - 2 Δ ω, - 2 Δ ω, \dots, - L Δ ω, - L Δ ω]}^{T}$ for odd channels. However, the magnitude $| w (p) |$ will be adjusted adaptively according to current laser phase noise. Between adjacent buffers, the change state of phase noise are different due to its property as a Wiener process. Therefore, $| w (p) |$ acquired from the previous block needs to be adjusted again when it is transferred to the next block. It will result in performance degradation for SSP-DP-CW-DA-ML, especially for large linewidth. Finally, we investigate the FOE range of four algorithms. The OSNR penalty at $B E R = 1 \times 10^{- 3}$ for FOE range of [-R/2, R/2] are plotted in Figs. 6(a)-6(c) for QPSK, 16-QAM, and 32-QAM with $Δ v \cdot T$ set as $2 \times 10^{- 4}$ , $7 \times 10^{- 5}$ , and $2 \times 10^{- 5}$ , respectively. Although FFT-FOE has a fast FO acquisition time within hundreds of symbols, the FOE range is limited to [-R/8, R/8] for M-QAM. As shown in Fig. 6, our proposed DP-CW-DA-ML with/without SSP structure can achieve the same FOE range of [-R/2, R/2] as that of SP-CW-DA-ML without performance penalty.

Fig. 5 OSNR penalty at BER = 1 × 10⁻³ versus linewidth and duration product ( $Δ v \cdot T$ ) of various algorithms for (a) QPSK, (b) 16-QAM, (c) 32-QAM.

Download Full Size | PDF

Fig. 6 OSNR penalty at BER = 1 × 10⁻³ versus frequency offset and duration product ( $Δ v \cdot T$ ) of various algorithms for (a) QPSK, $Δ v \cdot T = 2 \times 10^{- 4}$ , (b) 16-QAM, $Δ v \cdot T = 7 \times 10^{- 5}$ , (c) 32-QAM, $Δ v \cdot T = 2 \times 10^{- 5}$ .

Download Full Size | PDF

4.2 Experiments and discussions

We carry out experimental verification to further investigate the performance of our proposed SSP-DP-CW-DA-ML as well as FFT-FOE + BPS for 10 Gbaud DP-16QAM and 32-QAM signals, respectively. Figure 7 shows the experimental setup. An external cavity laser (ECL) with ~100-kHz linewidths is used as transmitter laser source. The arbitrary waveform generator (AWG, Tektronix 7122C) provides 10 Gbaud binary electrical signal for both in-phase and quadrature arms of the modulator. Then, the signal is polarization division multiplexed with 140ns optical delay between two polarization tributaries. Under the B2B measurements, the variable optical attenuator (VOA) and Erbium doped fiber amplifier (EDFA) are deployed to adjust the OSNR of the received signal. At the receiving side, ECL with ~100 kHz linewidth is used as the LO to realize coherent detection. The OSNR of signal is monitored by optical spectrum analyzer (OSA). Finally, the detected electrical signals are digitized and captured by a 50 GSa/s digital sampling oscilloscope (Tektronix, DPO73304D) for offline processing. The offline DSP flow is also shown in Fig. 8. Firstly, the collected samples are processed with orthogonalization for IQ imbalance compensation [23] and down-sampling to 2 samples per symbol. After timing recovery, four 15-taps fractionally-spaced (Ts/2) finite impulse-response (FIR) filters arranged in butterfly structure are employed to achieve polarization de-multiplexing and differential group delay (DGD) mitigation. These FIR filters are first adapted by the standard constant modulus algorithm (CMA) for pre-convergence. Then, the equalization is realized by switching CMA to radius-directed equalization (RDE) algorithm. Both FOC and CPE using either the proposed SSP-DP-CW-DA-ML or FFT-FOE + BPS are implemented, before signal de-mapping and decoding. BER counting is finally carried out for performance evaluation.

Fig. 7 Experimental setup and DSP flow for 10 Gbaud DP-16/32-QAM system. AWG: arbitrary waveform generator, EDFA: erbium doped fiber amplifier, ECL: external cavity laser, OBPF: optical band-width pass filter, PC: polarization controller, PBS: polarization beam splitter, PBC: polarization beam combiner, VOA: variable optical attenuator, ASE: Amplified Spontaneous Emission.

Download Full Size | PDF

Fig. 8 BER performance as a function of OSNR, (a) 10 Gbaud DP-16-QAM, (b) 10 Gbaud DP-32-QAM.

Download Full Size | PDF

Figures 8(a) and 9(b) show the BER performance versus OSNR for 10 Gbaud DP-16-QAM and DP-32-QAM, respectively. For the FFT-FOE + BPS scheme, the number of test angle is fixed at 32. For the proposed SSP-DP-CW-DA-ML, the filter length 2L and parallelization degree P are set to be 24 and 8, respectively, which are the same as the simulation condition. As shown in Figs. 8(a) and 8(b), experimental results show that SSP-DP-CW-DA-ML can provide better performance than FFT-FOE + BPS under condition of B2B transmission. In particular, SSP-DP-CW-DA-ML reduces the required OSNR at a $B E R = 3.8 \times 10^{- 3}$ by 0.8 dB and 0.65 dB for 16-QAM and 32-QAM, respectively. The improved performance is attributed to more accurate estimation using signals from dual polarizations and removing of differential coding/decoding. In order to verify the FOE range of SSP-DP-CW-DA-ML, Figs. 9(a) and 9(b) show the BER performance versus FO at a range of [-5 GHz, + 5 GHz] for DP-16-QAM and DP-32-QAM, respectively. FO is realized by adjusting the central wavelength of the ECL at the receiving side. The OSNRs are set to 15.1 dB, 18.6 dB and 21.2 dB for 16-QAM, and 19.6 dB, 22.3 dB and 25.9 dB for 32-QAM, respectively. We can see that our SSP-DP-CW-DA-ML functions well in all FO conditions due to its complete FOE range of ± symbol rate/2. However, for FFT-FOE + BPS scheme, the signal is totally distorted and the BER reaches 0.5 out of the FO range of [-2 GHz, + 2 GHz]. This is mainly because its FOE range is limited to [-R/8, + R/8], which is [-1.25 GHz, 1.25 GHz] considering the symbol rate of 10 Gbaud. Experimental results confirm that the proposed SSP-DP-CW-DA-ML has a complete estimation range, satisfying all practical application requirements.

Fig. 9 BER versus FO for different OSNR conditions, (a) 10 Gbaud DP-16-QAM, (b) 10 Gbaud DP-32-QAM.

Download Full Size | PDF

4.3 Computation Complexity analysis

Computation complexity of DSP algorithm is critical for practical applications. The involved algorithms in our performance evaluations include FFT-FOE, BPS, SP-CW-DA-ML, and our proposed SSP-DP-CW-DA-ML. FFT-FOE and BPS are designed for estimating the frequency offset and laser phase noise, respectively. However, both SP-CW-DA-ML and SSP-DP-CW-DA-ML are employed for joint frequency offset and phase noise estimation. Please note that the computation complexity is evaluated for dual-polarization. Thus, the complexities of BPS, FFT-FOE and SP-CW-DA-ML need double. The complexities of commonly used FFT-FOE and BPS have been comprehensively investigated [17, 19]. For SP-CW-DA-ML, the complexities come from the calculation of $V (k)$ per symbol. The least-squares solution $w (k) = Φ^{- 1} (k) z (k)$ of SP-CW-DA-ML can be computed recursively using the matrix inversion lemma [16]. To obtain the phase and frequency offset estimator $V (k)$ , it needs $6 L^{2} + 14 L + 10$ real multipliers, $6 L^{2} + 8 L + 6$ real adders, and the required memory size of $L^{2} + 6 L + 4$ buffer units [17], where $L$ is the filter length. For our proposed SSP-DP-CW-DA-ML, the complexity can be divided into three parts: (1) the signals at X polarization are rotated by a factor of $\exp (- j φ_{o f f s e t})$ to ensure the symbols in dual-polarization with the identical phase rotation. It takes 4 real multipliers and 2 real adders per symbol. Since the phase offset $φ_{o f f s e t}$ can be obtained at the beginning, such operation doesn’t result in further complexity; (2) to obtain the phase and frequency offset estimator $V (k)$ , it needs $6 L_{D P}^{2} + 14 L_{D P} + 10$ real multipliers, $6 L_{D P}^{2} + 8 L_{D P} + 6$ real adders, and $L_{D P}^{2} + 6 L_{D P} + 4$ buffer units which is similar with SP-CW-DA-ML. In our dual-polarization scheme, the filter length $L_{D P}$ is twice as that of SP-CW-DA-ML $(L_{D P} = 2 L)$ ; (3) the modified superscalar parallelization structure is employed to remove the feedback delay. A buffer length of $2 S$ per parallelization is required to store the signals which consists of $r'_{x} (k)$ and $r_{y} (k)$ . In our SSP-DP-CW-DA-ML scheme, the length of $2 S = 1200$ including 12 training symbols is chosen, leading to 1% overhead. The complexities of all mentioned algorithms are summarized in Table 1.

Table 1. Complexity comparison among three methods

View Table

Our proposed SSP-DP-CW-DA-ML has higher computation complexity in comparison with SP-CW-DA-ML. However, the tolerance of laser linewidth is greatly improved by using phase information of dual-polarization. Moreover, by employing SSP structure, the feedback delay is removed, which is favorable for real-time implementation. The performance of SP-CW-DA-ML is degraded dramatically, due to the feed-back delay induced penalty. Compared with BPS, our proposed SSP-DP-CW-DA-ML needs more multipliers, comparable numbers of adders buffer units, while there is no requirement of comparators and unwrap function. The tolerance of laser linewidth of SSP-DP-CW-DA-ML is better for QPSK and 16-QAM, while similar for 32-QAM. What’s more, please note that BPS algorithm must be implemented after the FOE module. In Table 1, we can find that the complexity of FFT-FOE is very huge. Meanwhile, the range of FFT-FOE is limited to ± 1/8 symbol rate for M-QAM, while SSP-DP-CW-DA-ML can achieve a FOE range of ± 1/2 symbol rate. Generally, FFT-FOE can be used to estimate static frequency offset after N symbols, but continuous tracking of frequency offset is impossible for FFT-FOE, due to its huge computation complexity. Considering real-timing implementation, we need to do updating for several parameters. In order to obtain the phase and frequency offset estimator $V (k)$ , we employ Eq. (9)-(13) to update $w (k)$ . Meanwhile, the normalized factor $C (k)$ and inverse correlation matrix $Φ^{- 1} (k)$ need to be stored and updated. At the beginning of each buffer, training symbols are used to update $C (k)$ and $Φ^{- 1} (k)$ . However, our SSP-DP-CW-DA-ML is suitable for time-varying frequency offset environment with adaptive updating of $w (k)$ . Moreover, the feedback delay is removed using modified super scalar parallelization structure, which guarantee the real-time implementation of our proposed scheme. Considering those advantages of our SSP-DP-CW-DA-ML, we believe that it can function well under the condition of real-time processing.

5. Conclusions

A dual-polarization complex-weighted, decision-aided, maximum-likelihood with modified superscalar parallelization structure (SSP-DP-CW-DA-ML) for joint carrier phase and frequency-offset compensation is demonstrated. Meanwhile, we avoid the feedback delay-induced performance degradation by using a modified superscalar structure. In simulation, we show that our proposed algorithm has a comparable laser linewidth tolerance to that of blind phase search (BPS) algorithm. A complete frequency offset estimation range of ± symbol rate/2 can be achieved at the same time. The performance is also experimentally verified by a B2B transmission using 10 Gbaud DP-16-QAM and 32-QAM signal, where our proposed SSP-DP-CW-DA-ML shows better performance and wide FOE range.

Funding

National Natural Science Foundation of China (61575071, 61331010), National Key Scientific Instrument and Equipment Development Project (No. 2013YQ16048702), Key project of Natural Science Foundation of Hubei Province (2016AAA012),Natural Science Foundation of Hubei Province (2016CFB302),and Open Fund (2016OCTN-01) of State Key Laboratory of Optical Communication Technologies and Networks, Wuhan Research Institute of Posts &Telecommunications.

References and links

1. E. Ip and J. M. Kahn, “Feedforward carrier recovery for coherent optical communications,” J. Lightwave Technol. 25(9), 2675–2692 (2007). [CrossRef]

2. R. W. Tkach, “Scaling optical communications for the next decade and beyond,” Bell Labs Tech. J. 14(4), 3–9 (2010). [CrossRef]

3. P. J. Winzer, “High-spectral-efficiency optical modulation formats,” J. Lightwave Technol. 30(8), 3824–3835 (2012). [CrossRef]

4. S. J. Savory, “Digital filters for coherent optical receivers,” Opt. Express 16(2), 804–817 (2008). [CrossRef] [PubMed]

5. X. Zhou, L. Nelson, P. Magill, R. Issac, B. Zhu, D. Peckham, P. Borel, and K. Carlson, “4000 km transmission of 50GHz spaced, 10x494. 85-Gb/s hybrid 32-64QAM using cascaded equalization and training-assisted phase recovery,” in Proc. OFC’12 (2012), paper PDP5C.

6. G. Bosco, V. Curri, A. Carena, P. Poggiolini, and F. Forghieri, “On the performance of Nyquist-WDM terabit superchannels based on PM-BPSK, PM-QPSK, PM-8QAM or PM-16QAM subcarriers,” J. Lightwave Technol. 29(1), 53–61 (2011). [CrossRef]

7. P. J. Winzer, “Beyond 100G Ethernet,” IEEE Commun. Mag. 48(7), 26–30 (2010). [CrossRef]

8. C. Yu, S. Zhang, P. Y. Kam, and J. Chen, “Bit-error rate performance of coherent optical M-ary PSK/QAM using decision-aided maximum likelihood phase estimation,” Opt. Express 18(12), 12088–12103 (2010). [CrossRef] [PubMed]

9. F. Derr, “Coherent optical QPSK intradyne system: Concept and digital receiver realization,” J. Lightwave Technol. 10(9), 1290–1296 (1992). [CrossRef]

10. A. J. Viterbi and A. M. Viterbi, “Nonlinear estimation of PSK-modulated carrier phase with application to burst digital transmission,” IEEE Trans. Inf. Theory 29(4), 543–551 (1983). [CrossRef]

11. I. Fatadin, D. Ives, and S. J. Savory, “Laser linewidth tolerance for 16-QAM coherent optical systems using QPSK partitioning,” IEEE Photonics Technol. Lett. 22(9), 631–633 (2010). [CrossRef]

12. T. Pfau, S. Hoffmann, and R. Noé, “Hardware-efficient coherent digital receiver concept with feedforward carrier recovery for M-QAM constellations,” J. Lightwave Technol. 27(8), 989–999 (2009). [CrossRef]

13. D. S. Ly-Gagnon, S. Tsukamoto, K. Katoh, and K. Kikuchi, “Coherent detection of optical quadrature phase shift keying signals with carrier phase estimation,” J. Lightwave Technol. 24(1), 12–21 (2006). [CrossRef]

14. Y. Wang, E. Serpedin, P. Ciblat, and P. Loubaton, “Non-data aided feedforward cyclostationary statistics based carrier frequency offset estimators for linear modulations,” in Proceed. Conf. Rec. GLOBECOM 01, 1386–1390 (2001).

15. Optical Internetworking Forum, “Integrable tunable transmitter assembly multi source agreement,” OIF-ITTA-MSA-01.0, Nov. (2008).

16. A. Meiyappan, P. Y. Kam, and H. Kim, “A complex-weighted, decision-aided, maximum-likelihood carrier phase and frequency-offset estimation algorithm for coherent optical detection,” Opt. Express 20(18), 20102–20114 (2012). [CrossRef] [PubMed]

17. A. Meiyappan, P. Kam, and H. Kim, “On decision aided carrier phase and frequency offset estimation in coherent optical receivers,” J. Lightwave Technol. 31(13), 2055–2069 (2013). [CrossRef]

18. K. Piyawanno, M. Kuschnerov, B. Spinnler, and B. Lankl, “Low complexity carrier recovery for coherent QAM using superscalar parallelization,” in Proc. ECOC’10 (2010), paper We.7.A.3. [CrossRef]

19. Q. Zhuge, M. Morsy-Osman, X. Xu, M. E. Mousa-Pasandi, M. Chagnon, Z. A. El-Sahn, and D. V. Plant, “Pilot-aided carrier phase recovery for M-QAM using superscalar parallelization based PLL,” Opt. Express 20(17), 19599–19609 (2012). [CrossRef] [PubMed]

20. Y. Gao, A. P. T. Lau, S. Yan, and C. Lu, “Low-complexity and phase noise tolerant carrier phase estimation for dual-polarization 16-QAM systems,” Opt. Express 19(22), 21717–21729 (2011). [CrossRef] [PubMed]

21. M. Qiu, Q. Zhuge, Y. Gao, W. Wang, F. Zhang, and D. V. Plant, “Cycle Slip Mitigation with Joint Carrier Phase Recovery in Coherent Subcarrier Multiplexing Systems,” in Proc. OFC’16 (2016), paper Tu3K2. [CrossRef]

22. M. Selmi, Y. Jaouën, P. Ciblat, and B. Lankl, “Accurate digital frequency offset estimator for coherent PolMux QAM transmission systems,” in Proc. ECOC’09 (2009), paper P3.08.

23. I. Fatadin, S. J. Savory, and D. Ives, “Compensation of quadrature imbalance in an optical QPSK coherent receiver,” IEEE Photonics Technol. Lett. 20(20), 1733–1735 (2008). [CrossRef]

	Algorithm	Real multipliers	Real adders	Compa- rators	Buffer units
1	BPS	$2 \cdot 6 B (384 / 768)$	$2 \cdot (L + 6) \cdot B$ $(1920 / 3840)$	$2 \cdot B$ $(64 / 128)$	$2 \cdot L \cdot B$ $(1536 / 3072)$
1	FFT- FOE	$2 \cdot (2 N l o g_{2} N + 10 N + 2)$ $(28676)$	$2 \cdot (3 N l o g_{2} N + 5 N)$ $(32768)$	$2 \cdot N$ $(1024)$	$2 \cdot 2 N$ $(2048)$
2	SP-CW- DA-ML	$2 \cdot (6 L^{2} + 14 L + 10)$ $(2084)$	$2 \cdot (6 L^{2} + 8 L + 6)$ $(1932)$	0	$2 \cdot (L^{2} + 6 L + 4)$ $(440)$
3	SSP- DP-CW-DA-ML	$6 L_{D P}^{2} + 14 L_{D P} + 14$ $(3806)$	$6 L_{D P}^{2} + 8 L_{D P} + 8$ $(3654)$	0	$L_{D P}^{2} + 6 L_{D P} + 4$ $+ 2 S$ $(1924)$

Joint carrier phase and frequency-offset estimation with parallel implementation for dual-polarization coherent receiver

Abstract

1. Introduction

2. Operation principle

3. Superscalar parallelization structure based parallel implementation

4. Performance

4.1 Simulations and discussions

4.2 Experiments and discussions

4.3 Computation Complexity analysis

5. Conclusions

Funding

References and links

Cited By

Figures (9)

Tables (1)

Equations (14)

Optics Express