Skip to main content
Log in

Regularizing the effect of input noise injection in feedforward neural networks training

  • Original Article
  • Published:
Neural Computing & Applications Aims and scope Submit manuscript

Abstract

Injecting input noise during feedforward neural network (NN) training can improve generalization performance markedly. Reported works justify this fact arguing that noise injection is equivalent to a smoothing regularization with the input noise variance playing the role of the regularization parameter. The success of this approach depends on the appropriate choice of the input noise variance. However, it is often not known a priori if the degree of smoothness imposed on the FNN mapping is consistent with the unknown function to be approximated. In order to have a better control over this smoothing effect, a loss function putting in balance the smoothed fitting induced by the noise injection and the precision of approximation, is proposed. The second term, which aims at penalizing the undesirable effect of input noise injection or controlling the deviation of the random perturbed loss, was obtained by expressing a certain distance between the original loss function and its random perturbed version. In fact, this term can be derived in general for parametrical models that satisfy the Lipschitz property. An example is included to illustrate the effectiveness of learning with this proposed loss function when noise injection is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Dohler S, Ruschendorf L (2001) An approximation result for nets in functional estimation. Stat Probability Lett 52:373–380

    Article  MathSciNet  Google Scholar 

  2. Farago A, Lugosi G (1993) Strong universal consistency of neural network classifiers. IEEE Trans Inf Theory 39:1146–1151

    Article  MATH  Google Scholar 

  3. Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39:930–945

    Article  MathSciNet  MATH  Google Scholar 

  4. Dubley RM Real analysis and probability. In: Mathematics series. Wadsworth and Brooks/Cole, Pacific Grove, CA

  5. Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58

    Google Scholar 

  6. Vapnik V (1982) Estimation of dependencies based on empirical data. Springer, Berlin Heidelberg New York

  7. Pollard D (1984) Convergence of stochastic processes. Springer, Berlin Heidelberg New York

  8. Tikhonov AN, Arsenin VY (1977) Solution of Ill-posed problems. V. H. Winston, Washington, DC

  9. Wahba G (1990) Spline models for observational data. In: Series in applied mathematics, vol 59, SIAM, Philadelphia, PA

  10. Grandvalet Y, Canu S (1995) A comment on noise injection into inputs in Backpropagation learning. IEEE Trans Syst Man Cyber 25:678–681

    Article  Google Scholar 

  11. Reed R, Robert R, Oh S (1995) Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Trans Neural Netw 6:529–538

    Article  Google Scholar 

  12. Bishop CM (1995) Training with noise is equivalent to Tikhonov regularization. Neural Comput 7:108–116

    Google Scholar 

  13. An G (1996) The effects of adding noise during backpropagation training on a generalization performance. Neural Comput 8:643–674

    Google Scholar 

  14. Grandvalet Y (2000) Anisotropic noise injection for input variables relevances determination. IEEE Trans Neural Netw 11:1201–1212

    Article  Google Scholar 

  15. Lugosi G, Zeger K (1995) Nonparametric estimation via empirical risk minimization. IEEE Trans Inf Theory 41:677–687

    Article  MathSciNet  MATH  Google Scholar 

  16. Leen TK (1995) From data distributions to regularization in invariant learning. Neural Comput 7:974–981

    Google Scholar 

  17. Grandvalet Y, Canu S, Boucheron S (1997) Noise injection: theoretical prospects. Neural Comput 9:1093–1108

    Google Scholar 

  18. Matsuoka K (1992) Noise injection into inputs in back-propagation learning. IEEE Trans Neural Netw 22:436–440

    Article  Google Scholar 

  19. Bernier JL, Ortega J, Ros E, Rojas I, Prieto A (2000) A quantitative study of fault tolerance, noise immunity, and generalization ability of MLPs. Neural Comput 12:2941–2964

    Article  Google Scholar 

  20. Uykan Z, Guzelis C, Celebi ME, Koivo HN (2000) Analysis of input–output clustering for determining centers of RBFN. IEEE Trans Neural Netw 11:851–858

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abd-Krim Seghouane.

Appendices

Appendix 1

Proof of Lemma 1

Recall that F L was defined as the set of FNN function satisfying assumption 2.1, and let \(\mathcal{A} = E{\left\{ {\sup _{{F_{L} }} {\left| {J_{{emp_{{NI_{{emp}} }} }} (\theta _{L} ) - E{\left[ {(y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} } \right]}} \right|}} \right\}}\)

$$ \begin{aligned} {\mathop {\sup }\limits_{F_{L} } }{\left| {J_{{emp_{{NI_{{emp}} }} }} (\theta _{L} ) - J(\theta _{L} )} \right|} & \leq {\mathop {\sup }\limits_{F_{L} } }{\left| {J_{{emp_{{NI_{{emp}} }} }} (\theta _{L} ) - E\{ (y - f(x + \eta ,\theta _{L} ))^{2} \} } \right|} \\ & \quad + {\mathop {\sup }\limits_{F_{L} } }{\left| {E\{ (y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} \} - J(\theta _{L} ),} \right|} \\ \end{aligned} $$

where, z i =x i i , z=x+η and h(z,y)=(yf NN (z L ))2. From Theorems 1 and 2 of [15]

$$ \begin{aligned} \mathcal{A} & = E{\left\{ {{\mathop {\sup }\limits_{F_{L} } }{\left| {\frac{1} {{N.r}}{\sum\limits_{j = 1}^r {{\sum\limits_{i = 1}^N {{\left( {y_{i} - f_{{NN}} (x_{i} + \eta _{{j,i}} ,\theta _{L} )} \right)}^{2} } }} } - \frac{1} {r}{\sum\limits_{j = 1}^r {E{\left[ {(y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} } \right]}} }} \right|}} \right\}} \cr & \leq \frac{1} {r}{\sum\limits_{j = 1}^r {E{\left\{ {{\mathop {\sup }\limits_{F_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {{\left( {y_{i} - f_{{NN}} (x_{i} + \eta _{{j,i}} ,\theta _{L} )} \right)}^{2} } } - E{\left[ {(y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} } \right]}} \right|}} \right\}}} } \cr & = E{\left\{ {{\mathop {\sup }\limits_{F_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {{\left( {y_{i} - f_{{NN}} (x_{i} + \eta _{{j,i}} ,\theta _{L} )} \right)}^{2} } } - E{\left[ {(y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} } \right]}} \right|}} \right\}} \cr & = E{\left\{ {{\mathop {\sup }\limits_{F_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {h(x_{i} ,y_{i} )} } - E{\left[ {h(z,y)} \right]}} \right|}} \right\}} \cr \end{aligned} $$

where the set of functions H L is defined by

$$ {\mathop {\sup }\limits_{H_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {h(z_{i} ,y_{i} ) - E\{ h(z,y)\} } }} \right|} \to 0,\quad a.s. $$

Therefore,

$$ H_{L} = {\left\{ {h(x,y) = (y - f_{{NN}} (x,\theta _{L} ))^{2} ;{\sum\limits_{i = 1}^L {|\alpha _{l} | \leq \omega _{L} } }} \right\}}. $$

which concludes the proof.

Appendix 2

Proof of Proposition 1

It is easily shown that the sigmoid function \({\left( {\psi (x) = \frac{1}{{1 + e^{{ - x}} }}} \right)} \) has a derivative ψ’ such that, \(\left| {\psi '(x) \leq \frac{1} {4}} \right|.\) Hense ψ is a Lipschitz continuous, with a Lipschitz constant \(k = \frac{1} {4}.\). Using the expression of f NN (.), we have

$$ E{\left\{ {{\mathop {\sup }\limits_{H_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {h(z_{i} ,y_{i} )} } - E\{ h(z,y)\} } \right|}} \right\}} \to 0,\quad a.s., $$

Now, considering the Lipschitz continuity of ψ(.), we obtain:

$$ \begin{aligned} {\left| {f_{{NN}} (x_{1} ) - f_{{NN}} (x_{2} )} \right|} & = {\left| {{\sum\limits_{l = 1}^L {\alpha _{l} \psi (\beta ^{T}_{l} .x_{1} + \gamma _{l} ) - } }{\sum\limits_{l = 1}^L {\alpha _{l} \psi (\beta ^{T}_{l} .x_{2} + \gamma _{l} )} }} \right|} \cr & = {\left| {{\sum\limits_{l = 1}^L {\alpha _{l} [\psi (\beta ^{T}_{l} .x_{1} + \gamma _{l} ) - \psi (\beta ^{T}_{l} .x_{2} + \gamma _{l} )]} }} \right|} \cr & \leq {\sum\limits_{l = 1}^L {{\left| {\alpha _{l} [\psi (\beta ^{T}_{l} .x_{1} + \gamma _{l} ) - \psi (\beta ^{T}_{l} .x_{2} + \gamma _{l} )]} \right|}} } \cr & = {\sum\limits_{l = 1}^L {|\alpha _{l} |.|[\psi (\beta ^{T}_{l} .x_{1} + \gamma _{l} ) - \psi (\beta ^{T}_{l} .x_{2} + \gamma _{l} )]|.} } \cr \end{aligned} $$

where |α| and |β| represent, respectively, the L 1-norm of the vector of linear parameters and and the L 1-norm of the input to the hidden layer parameters matrix (which is taken as the sum of the absolutes values of the elements of the vector or the matrix). Therefore, the Lipschitz constant of f NN (.,θ) is \(K = \frac{1} {4}.|{\mathbf{\alpha }}|.|{\mathbf{\beta }}|.\)

Proof of Lemma 2

The previous result is used for the demonstration of this lemma. After some simple manipulation, we can write

$$ \begin{aligned} {\left| {f_{{NN}} (x_{1} ) - f_{{NN}} (x_{2} )} \right|} & \leq k{\sum\limits_{l = 1}^L {|\alpha _{l} |.|\beta ^{T}_{l} .x_{1} - \beta ^{T}_{l} .x_{2} |} } \cr & \leq k{\sum\limits_{l = 1}^L {|\alpha _{l} ||\beta ^{T}_{l} ||x_{1} - x_{2} |} } \cr & \leq k|{\mathbf{\alpha }}|.|{\mathbf{\beta }}|.|x_{1} - x_{2} |, \cr \end{aligned} $$

Under the asymptotic assumption the cross term E{ (yf NN (x L )).(f NN (x L )−f NN (x+η, θ L ))} vanishes [11, 12]. Hence:

$$ \begin{aligned} E\{ (y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} \} & = E\{ (y - f_{{NN}} (x,\theta _{L} ))^{2} \} \cr & \quad + 2E\{ (y - f_{{NN}} (x,\theta _{L} )).(f_{{NN}} (x,\theta _{L} ) - f_{{NN}} (x + \eta ,\theta _{L} ))\} \cr & \quad + E\{ (f_{{NN}} (x,\theta _{L} ) - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} \} \cr \end{aligned} $$

where |I|=d is the L 1-norm of the identity matrix and K is the Lipschitz constant of the FNN function.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Seghouane, AK., Moudden, Y. & Fleury, G. Regularizing the effect of input noise injection in feedforward neural networks training. Neural Comput & Applic 13, 248–254 (2004). https://doi.org/10.1007/s00521-004-0411-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-004-0411-6

Keywords

Navigation