Preview

Industrial laboratory. Diagnostics of materials

Advanced search

Robust Selection of Multicollinear Features in Forecasting

Abstract

A problem of constructing a stable forecasting model using feature selection methods is considered. We propose to use a multicollinearity detection criterion which is necessary in the case of excessive number of features. Model is considered stable if small changes of the feature vector entail small changes of the target output vector. Mathematical definition of the model stability is also presented. Multicollinearity problem comes from correlation between features and causes loss of stability of the model. To study the properties of the detection criterion an additional research was undertaken which led to development of Belsley method. To prove the correctness and applicability of the approach to the problem of multicollinearity both theoretical reasoning and extra experiment are provided. The proposed criterion runs an algorithm to exclude correlated features, reduce dimensionality of the feature space and obtain robust estimations of the model parameters. The algorithm is based on step-regression method. The main idea is to add and remove the features consequently according to this criterion. The Lasso and LARS algorithms were chosen as the basic ones to compare with. The computational experiment is used to study an hourly-price forecasting curve problem with the proposed and the basic (reference) algorithms. The experiment is carried out using real time series of the German electricity tariffs.

About the Authors

R. G. Neichev
Московский физико-технический институт
Russian Federation


A. M. Katrutsa
Московский физико-технический институт
Russian Federation


V. V. Strizhov
Вычислительный центр РАН им. Дородницына
Russian Federation


References

1. Zinovyev A. Y., Gorban A. N., Sumner N. R. Topological grammars for data approximation / Appl. Math. Lett. 2007. Vol. 20. N 4. P. 382-386.

2. Chi-Hyuck Jun, Il-Gyo Chong. Performance of some variable selection methods when multicollinearity is present / Chemometrics and Intelligent Laboratory Systems. 2005. Vol. 78. N 1, 2. P. 103 - 112.

3. Jiang Guohua, Wang Hansheng, Li Guodong. Robust regression shrinkage and consistent variable selection through the LAD-lasso / J. Business Econ. Stat. 2008. Vol. 25. P. 347 - 355.

4. Herzog F., Hildmann M. Robust calculation and parameter estimation of the hourly price forward curve / 17th Power Systems Computation Conference. Stockholm. 2011. P. 1-7.

5. Efron B., Hastie T., Johnstone I., Tibshirani R. Least angle regression / The Annals of Statistics. 2004. Vol. 32. N 3. P. 407 - 499.

6. Степашко В. С., Ивахненко А. Г. Помехоустойчивость моделирования. - Киев: Наукова думка, 1985. - 216 с.

7. Smith H., Draper N. R. Appied regression analysis. - New York: John Wihley and Sons, 1998. - 736 p.

8. Grant P. M., Chen S., Cowan S. F. N. Orthogonal least squares learning algorithm for radial basis function network / Neural Networks. 1991. Vol. 2. N 2. P. 302 - 309.

9. Belsley A. D. Conditioning Diagnostics: Collinearity and Weak Data in Regression. - New York: John Wiley and Sons, 1991. - 396 p.

10. Abdolkhalig A. Optimized calculation of hourly price forward curve (HPFC) / Int. J. Electr. Comp. Electronics Comm. Eng. 2008. Vol. 2. N 9. P. 840 - 850.

11. Caro G., Hildmann M. What makes a good hourly price forward curve? / European Energy Market, IEEE 10th International Conference, 2013. Stockholm. P. 1 - 7.

12. Kachapova F., Kachapov I. Orthogonal projection in teaching regression and financial mathematics / J. Stat. Education. 2010. Vol. 18. N 1. P. 1 - 18.

13. Временной ряд цен на электроэнергию: https://svn.code.sf.net/p/ dmba/code/data/germanspotprice.csv

14. Леонтьева Л. Н. Выбор моделей прогнозирования цен на электроэнергию / JMLDA. 2011. Т. 1. № 2. С. 127 - 137.

15. Стрижов В. В., Крымова Е. А. Алгоритм выбора признаков линейных регрессионных моделей из конечного и счетного множеств / Заводская лаборатория. Диагностика материалов. 2011. Т. 77. №5. С. 63 - 68.

16. Tsonis A. A., Elsner J. B. Singular Spectrum Analysis. A New Tool in Time Series Analysis. - Springer US. 1996. - 164 p.


Review

For citations:


Neichev R.G., Katrutsa A.M., Strizhov V.V. Robust Selection of Multicollinear Features in Forecasting. Industrial laboratory. Diagnostics of materials. 2016;82(3):68-74. (In Russ.)

Views: 470


ISSN 1028-6861 (Print)
ISSN 2588-0187 (Online)