Regularizing the effect of input noise injection in feedforward neural networks training

Seghouane, Abd-Krim; Moudden, Yassir; Fleury, Gilles

doi:10.1007/s00521-004-0411-6

Regularizing the effect of input noise injection in feedforward neural networks training

Original Article
Published: 02 July 2004

Volume 13, pages 248–254, (2004)
Cite this article

Neural Computing & Applications Aims and scope Submit manuscript

Abd-Krim Seghouane¹,
Yassir Moudden¹ &
Gilles Fleury¹

287 Accesses
11 Citations
Explore all metrics

Abstract

Injecting input noise during feedforward neural network (NN) training can improve generalization performance markedly. Reported works justify this fact arguing that noise injection is equivalent to a smoothing regularization with the input noise variance playing the role of the regularization parameter. The success of this approach depends on the appropriate choice of the input noise variance. However, it is often not known a priori if the degree of smoothness imposed on the FNN mapping is consistent with the unknown function to be approximated. In order to have a better control over this smoothing effect, a loss function putting in balance the smoothed fitting induced by the noise injection and the precision of approximation, is proposed. The second term, which aims at penalizing the undesirable effect of input noise injection or controlling the deviation of the random perturbed loss, was obtained by expressing a certain distance between the original loss function and its random perturbed version. In fact, this term can be derived in general for parametrical models that satisfy the Lipschitz property. An example is included to illustrate the effectiveness of learning with this proposed loss function when noise injection is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Gaussian Noise Injection Regularization for Neural Networks

MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes

Regularisation of neural networks by enforcing Lipschitz continuity

Article Open access 06 December 2020

Henry Gouk, Eibe Frank, … Michael J. Cree

References

Dohler S, Ruschendorf L (2001) An approximation result for nets in functional estimation. Stat Probability Lett 52:373–380
Article MathSciNet Google Scholar
Farago A, Lugosi G (1993) Strong universal consistency of neural network classifiers. IEEE Trans Inf Theory 39:1146–1151
Article MATH Google Scholar
Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39:930–945
Article MathSciNet MATH Google Scholar
Dubley RM Real analysis and probability. In: Mathematics series. Wadsworth and Brooks/Cole, Pacific Grove, CA
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58
Google Scholar
Vapnik V (1982) Estimation of dependencies based on empirical data. Springer, Berlin Heidelberg New York
Pollard D (1984) Convergence of stochastic processes. Springer, Berlin Heidelberg New York
Tikhonov AN, Arsenin VY (1977) Solution of Ill-posed problems. V. H. Winston, Washington, DC
Wahba G (1990) Spline models for observational data. In: Series in applied mathematics, vol 59, SIAM, Philadelphia, PA
Grandvalet Y, Canu S (1995) A comment on noise injection into inputs in Backpropagation learning. IEEE Trans Syst Man Cyber 25:678–681
Article Google Scholar
Reed R, Robert R, Oh S (1995) Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Trans Neural Netw 6:529–538
Article Google Scholar
Bishop CM (1995) Training with noise is equivalent to Tikhonov regularization. Neural Comput 7:108–116
Google Scholar
An G (1996) The effects of adding noise during backpropagation training on a generalization performance. Neural Comput 8:643–674
Google Scholar
Grandvalet Y (2000) Anisotropic noise injection for input variables relevances determination. IEEE Trans Neural Netw 11:1201–1212
Article Google Scholar
Lugosi G, Zeger K (1995) Nonparametric estimation via empirical risk minimization. IEEE Trans Inf Theory 41:677–687
Article MathSciNet MATH Google Scholar
Leen TK (1995) From data distributions to regularization in invariant learning. Neural Comput 7:974–981
Google Scholar
Grandvalet Y, Canu S, Boucheron S (1997) Noise injection: theoretical prospects. Neural Comput 9:1093–1108
Google Scholar
Matsuoka K (1992) Noise injection into inputs in back-propagation learning. IEEE Trans Neural Netw 22:436–440
Article Google Scholar
Bernier JL, Ortega J, Ros E, Rojas I, Prieto A (2000) A quantitative study of fault tolerance, noise immunity, and generalization ability of MLPs. Neural Comput 12:2941–2964
Article Google Scholar
Uykan Z, Guzelis C, Celebi ME, Koivo HN (2000) Analysis of input–output clustering for determining centers of RBFN. IEEE Trans Neural Netw 11:851–858
Article Google Scholar

Download references

Author information

Authors and Affiliations

École Supérieure d’Électricité, Service des Mesures, Plateau de Moulon, 3 rue Joliot Curie, 91192, Gif sur Yvette Cedex, France
Abd-Krim Seghouane, Yassir Moudden & Gilles Fleury

Authors

Abd-Krim Seghouane
View author publications
You can also search for this author in PubMed Google Scholar
Yassir Moudden
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Fleury
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abd-Krim Seghouane.

Appendices

Appendix 1 Proof of Lemma 1

Recall that F _L was defined as the set of FNN function satisfying assumption 2.1, and let $\mathcal{A} = E{\left\{ {\sup _{{F_{L} }} {\left| {J_{{emp_{{NI_{{emp}} }} }} (\theta _{L} ) - E{\left[ {(y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} } \right]}} \right|}} \right\}}$

$$ \begin{aligned} {\mathop {\sup }\limits_{F_{L} } }{\left| {J_{{emp_{{NI_{{emp}} }} }} (\theta _{L} ) - J(\theta _{L} )} \right|} & \leq {\mathop {\sup }\limits_{F_{L} } }{\left| {J_{{emp_{{NI_{{emp}} }} }} (\theta _{L} ) - E\{ (y - f(x + \eta ,\theta _{L} ))^{2} \} } \right|} \\ & \quad + {\mathop {\sup }\limits_{F_{L} } }{\left| {E\{ (y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} \} - J(\theta _{L} ),} \right|} \\ \end{aligned} $$

where, z _i=x _i+η_i, z=x+η and h(z,y)=(y−f _NN(z,θ_L))². From Theorems 1 and 2 of [15]

$$ \begin{aligned} \mathcal{A} & = E{\left\{ {{\mathop {\sup }\limits_{F_{L} } }{\left| {\frac{1} {{N.r}}{\sum\limits_{j = 1}^r {{\sum\limits_{i = 1}^N {{\left( {y_{i} - f_{{NN}} (x_{i} + \eta _{{j,i}} ,\theta _{L} )} \right)}^{2} } }} } - \frac{1} {r}{\sum\limits_{j = 1}^r {E{\left[ {(y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} } \right]}} }} \right|}} \right\}} \cr & \leq \frac{1} {r}{\sum\limits_{j = 1}^r {E{\left\{ {{\mathop {\sup }\limits_{F_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {{\left( {y_{i} - f_{{NN}} (x_{i} + \eta _{{j,i}} ,\theta _{L} )} \right)}^{2} } } - E{\left[ {(y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} } \right]}} \right|}} \right\}}} } \cr & = E{\left\{ {{\mathop {\sup }\limits_{F_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {{\left( {y_{i} - f_{{NN}} (x_{i} + \eta _{{j,i}} ,\theta _{L} )} \right)}^{2} } } - E{\left[ {(y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} } \right]}} \right|}} \right\}} \cr & = E{\left\{ {{\mathop {\sup }\limits_{F_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {h(x_{i} ,y_{i} )} } - E{\left[ {h(z,y)} \right]}} \right|}} \right\}} \cr \end{aligned} $$

where the set of functions H _L is defined by

$$ {\mathop {\sup }\limits_{H_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {h(z_{i} ,y_{i} ) - E\{ h(z,y)\} } }} \right|} \to 0,\quad a.s. $$

Therefore,

$$ H_{L} = {\left\{ {h(x,y) = (y - f_{{NN}} (x,\theta _{L} ))^{2} ;{\sum\limits_{i = 1}^L {|\alpha _{l} | \leq \omega _{L} } }} \right\}}. $$

which concludes the proof.

Appendix 2 Proof of Proposition 1

It is easily shown that the sigmoid function ${\left( {\psi (x) = \frac{1}{{1 + e^{{ - x}} }}} \right)} $ has a derivative ψ’ such that, $\left| {\psi '(x) \leq \frac{1} {4}} \right|.$ Hense ψ is a Lipschitz continuous, with a Lipschitz constant $k = \frac{1} {4}.$. Using the expression of f _NN(.), we have

$$ E{\left\{ {{\mathop {\sup }\limits_{H_{L} } }{\left| {\frac{1} {N}{\sum\limits_{i = 1}^N {h(z_{i} ,y_{i} )} } - E\{ h(z,y)\} } \right|}} \right\}} \to 0,\quad a.s., $$

Now, considering the Lipschitz continuity of ψ(.), we obtain:

$$ \begin{aligned} {\left| {f_{{NN}} (x_{1} ) - f_{{NN}} (x_{2} )} \right|} & = {\left| {{\sum\limits_{l = 1}^L {\alpha _{l} \psi (\beta ^{T}_{l} .x_{1} + \gamma _{l} ) - } }{\sum\limits_{l = 1}^L {\alpha _{l} \psi (\beta ^{T}_{l} .x_{2} + \gamma _{l} )} }} \right|} \cr & = {\left| {{\sum\limits_{l = 1}^L {\alpha _{l} [\psi (\beta ^{T}_{l} .x_{1} + \gamma _{l} ) - \psi (\beta ^{T}_{l} .x_{2} + \gamma _{l} )]} }} \right|} \cr & \leq {\sum\limits_{l = 1}^L {{\left| {\alpha _{l} [\psi (\beta ^{T}_{l} .x_{1} + \gamma _{l} ) - \psi (\beta ^{T}_{l} .x_{2} + \gamma _{l} )]} \right|}} } \cr & = {\sum\limits_{l = 1}^L {|\alpha _{l} |.|[\psi (\beta ^{T}_{l} .x_{1} + \gamma _{l} ) - \psi (\beta ^{T}_{l} .x_{2} + \gamma _{l} )]|.} } \cr \end{aligned} $$

where |α| and |β| represent, respectively, the L ₁-norm of the vector of linear parameters and and the L ₁-norm of the input to the hidden layer parameters matrix (which is taken as the sum of the absolutes values of the elements of the vector or the matrix). Therefore, the Lipschitz constant of f _NN(.,θ) is $K = \frac{1} {4}.|{\mathbf{\alpha }}|.|{\mathbf{\beta }}|.$

Proof of Lemma 2

The previous result is used for the demonstration of this lemma. After some simple manipulation, we can write

$$ \begin{aligned} {\left| {f_{{NN}} (x_{1} ) - f_{{NN}} (x_{2} )} \right|} & \leq k{\sum\limits_{l = 1}^L {|\alpha _{l} |.|\beta ^{T}_{l} .x_{1} - \beta ^{T}_{l} .x_{2} |} } \cr & \leq k{\sum\limits_{l = 1}^L {|\alpha _{l} ||\beta ^{T}_{l} ||x_{1} - x_{2} |} } \cr & \leq k|{\mathbf{\alpha }}|.|{\mathbf{\beta }}|.|x_{1} - x_{2} |, \cr \end{aligned} $$

Under the asymptotic assumption the cross term E{ (y−f _NN(x,θ_L)).(f _NN(x,θ_L)−f _NN(x+η, θ_L))} vanishes [11, 12]. Hence:

$$ \begin{aligned} E\{ (y - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} \} & = E\{ (y - f_{{NN}} (x,\theta _{L} ))^{2} \} \cr & \quad + 2E\{ (y - f_{{NN}} (x,\theta _{L} )).(f_{{NN}} (x,\theta _{L} ) - f_{{NN}} (x + \eta ,\theta _{L} ))\} \cr & \quad + E\{ (f_{{NN}} (x,\theta _{L} ) - f_{{NN}} (x + \eta ,\theta _{L} ))^{2} \} \cr \end{aligned} $$

where |I|=d is the L ₁-norm of the identity matrix and K is the Lipschitz constant of the FNN function.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Seghouane, AK., Moudden, Y. & Fleury, G. Regularizing the effect of input noise injection in feedforward neural networks training. Neural Comput & Applic 13, 248–254 (2004). https://doi.org/10.1007/s00521-004-0411-6

Download citation

Received: 01 April 2003
Accepted: 24 March 2004
Published: 02 July 2004
Issue Date: September 2004
DOI: https://doi.org/10.1007/s00521-004-0411-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regularizing the effect of input noise injection in feedforward neural networks training

Abstract

Access this article

Similar content being viewed by others

Adaptive Gaussian Noise Injection Regularization for Neural Networks

MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes

Regularisation of neural networks by enforcing Lipschitz continuity

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Proof of Lemma 1

Appendix 2

Proof of Proposition 1

Proof of Lemma 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regularizing the effect of input noise injection in feedforward neural networks training

Abstract

Access this article

Similar content being viewed by others

Adaptive Gaussian Noise Injection Regularization for Neural Networks

MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes

Regularisation of neural networks by enforcing Lipschitz continuity

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Proof of Lemma 1

Appendix 2

Proof of Proposition 1

Proof of Lemma 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation