Abstract
In learning systems, hyperparameters are parameters that are not learned but need to be set a priori. In Reservoir Computing, there are several parameters that needs to be set a priori depending on the task. Newcomers to Reservoir Computing cannot have a good intuition on which hyperparameters to tune and how to tune them. For instance, beginners often explore the reservoir sparsity, but in practice this parameter is not of high influence on performance for ESNs. Most importantly, many authors keep doing suboptimal hyperparameter searches: using grid search as a tool to explore more than two hyperparameters, while restraining the spectral radius to be below unity. In this short paper, we give some suggestions, intuitions, and give a general method to find robust hyperparameters while understanding their influence on performance. We also provide a graphical interface (included in ReservoirPy) in order to make this hyperparameter search more intuitive. Finally, we discuss some potential refinements of the proposed method.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We call evaluations the training of instances of reservoirs for sets of HP values.
- 2.
- 3.
A figure summarizing the results of this first random search on the Lorenz time series prediction task is available here: https://hal.inria.fr/hal-03203318/document.
- 4.
W proba and \(W_{in}\) proba have little influence on the error distribution (diagonal plots) and no linear dependence with the other parameters can be seen on figure available here: https://hal.inria.fr/hal-03203318/document.
- 5.
With grid search, by exploring v values for each of p HPs, one needs to perform \(p^v\) evaluations. While for random search, v is equal to the total number of evaluations.
- 6.
For interdependency for IS vs. ridge see https://hal.inria.fr/hal-03203318/document.
- 7.
Of course as we plot many variables with log scales, the equation would often look like \(log(Y) = a.log(X) + b\).
References
Ferreira, A., et al.: An approach to reservoir computing design and training. Expert Syst. Appl. 40(10), 4172–4182 (2013)
Schrauwen, B., et al.: An overview of reservoir computing: theory, applications and implementations. In: Proceedings of ESANN, pp. 471–482 (2007)
Jaeger, H., et al.: Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw. 20(3), 335–352 (2007)
Bergstra, J., et al.: Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings SciPy, pp. 13–20. Citeseer (2013)
Trouvain, N., Pedrelli, L., Dinh, T.T., Hinaut, X.: ReservoirPy: an efficient and user-friendly library to design echo state networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 494–505. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_40
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Hinaut, X.: Which input abstraction is better for a robot syntax acquisition model? phonemes, words or grammatical constructions? In: ICDL-EpiRob (2018)
Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks. GNRCIT GMD Techmical report, Bonn, Germany. 148, 34 (2001)
Langton, C.G.: Computation at the edge of chaos: phase transitions and emergent computation. Physica D 42(1–3), 12–37 (1990)
Legenstein, R., Maass, W.: Edge of chaos and prediction of computational performance for neural circuit models. Neural Netw. 20(3), 323–334 (2007)
Lorenz, E.N.: Deterministic nonperiodic flow. J. Atmo. Sci. 20(2), 130–141 (1963)
Lukoševičius, M.: A practical guide to applying echo state networks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 659–686. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_36
Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3(3), 127–149 (2009)
Mackey, M.C., Glass, L.: Oscillation and chaos in physiological control systems. Science 197(4300), 287–289 (1977)
McInnes, L., Healy, J., Saul, N., Großberger, L.: UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3(29), 861 (2018)
Variengien, A., Hinaut, X.: A journey in ESN and LSTM visualisations on a language task. arXiv preprint arXiv:2012.01748 (2020)
Yperman, J., Becker, T.: Bayesian optimization of hyper-parameters in reservoir computing. arXiv preprint arXiv:1611.05193 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
5 Supplementary Material
5 Supplementary Material
1.1 5.1 Implementation details
We used the ReservoirPy library [5], which has been interfaced with hyperopt [4], to make the following experiments and figures in order to dive into the exploration of HPs. We illustrated our proposed method with two different tasks of chaotic time series prediction. These tasks consist in predicting the value of the series at time step \(t+1\) given its value at time step t. To asses the performance of our models on these tasks, we performed cross validation. Each fold is composed of a train and a validation data set. The folds are defined as continuous slices of the original series, with an overlap in time, e.g. the validation set of the first fold would be the training set of the second, and the two sets would be two adjacent sequences of data in time in the series. For the last fold of the time series, the train set is defined as the last available slice of data, while the train set of the first fold is used as validation set.
For all tasks, we used a 3-fold cross validation measure on a time series composed of 6000 time steps, i.e. each fold is composed of 4000 time steps, with a train set and a validation set of 2000 time steps each. We used two metrics to perform this measure: the Normalized Root Mean Squared Error, defined in Eq. 1, and the \(R^2\) correlation coefficient, defined in Eq. 2:
where y is a time series defined over N time steps, \(\hat{y}\) is the estimated time series predicted by the model, and \(\bar{y}\) is the average value of the time series y over time. NRMSE was used as an error measure, which we expect to reach a value near 0, while \(R^2\) was used as a score, which we expect to reach 1, its maximum possible value. All measures were made by averaging this two metrics across all folds, with 5 different initializations of the models for each fold (Figs. 5 and 6).
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hinaut, X., Trouvain, N. (2021). Which Hype for My New Task? Hints and Random Search for Echo State Networks Hyperparameters. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-86383-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86382-1
Online ISBN: 978-3-030-86383-8
eBook Packages: Computer ScienceComputer Science (R0)