Preview

Industrial laboratory. Diagnostics of materials

Advanced search

ERRORS IN THE USE OF CORRELATION AND DETERMINATION COEFFICIENTS

https://doi.org/10.26896/1028-6861-2018-84-3-68-72

Abstract

Coefficients of correlation and determination are widely used in statistical analysis of data. Some of the errors attributed to their use are considered in this article. We confine ourselves to the case of two variables. The linear Pearson correlation coefficient and nonparametric rank coefficients of Spearman and Kendall are used most commonly. According to the theory of measurements, the Pearson correlation coefficient can be applied to variables measured in the interval scale (and in scales with a narrower group of permissible transformations, for example, in the ratio scale) but it cannot be used in analysis of ordinal data. Spearman and Kendall’s nonparametric rank coefficients are designed to evaluate the relationship of ordinal variables. They can also be used in scales with a narrower group of permissible transformations, for example, in the scales of intervals or ratios. The critical value in testing the significance of the difference in the correlation coefficient from zero depends on the sample size and approaches zero as the sample size grows. Therefore, the use of the «Cheddock scale» is incorrect. When using a passive experiment, the correlation coefficients can be reasonably used only for forecasting, but not for control. To obtain the statistical models valid for control, an active experiment is required. S. N. Bernshtein has shown that the effect of outliers on the Pearson correlation coefficient is very large. The effect of «inflation» of the correlation coefficient is that with increasing number of analyzed sets of predictors, the maximum of the corresponding correlation coefficients, the quality of approximation, increases noticeably. A common mistake is to use the determination coefficient to estimate the quality of the least-squares recovery.

About the Author

A. I. Orlov
Institute of high statistical technologies and econometrics, N. E. Bauman Moscow State Technical University.
Russian Federation
Moscow.


References

1. Orlov A. I. Applied statistics. — Moscow: Ekzamen, 2006. — 671 p. [in Russian].

2. Orlov A. I. Stability in socio-economic models. — Moscow: Nauka, 1979. — 296 p. [in Russian].

3. Nalimov V. V. Theory of experiment. — Moscow: Nauka, 1971. — 208 p. [in Russian].

4. Ermakov S. M., Brodskii V. Z., Zhiglyavskii A. A., et al. Mathematical theory of design of experiments. — Moscow: Fizmatlit, 1983. — 392 p. [in Russian].

5. Bernshtein S. N. On an elementary property of the correlation coefficient / Zap. Khar’k. Matem. Tov. 1932. N 5. P. 65 – 66 [in Russian]; Bernshtein S. N. Collected works. Vol. IV. Probability theory. Mathematical statistics. — Moscow: Nauka, 1964. P. 233 – 234 [in Russian].

6. Kolmogorov A. N. To the question of the suitability of the predicted formulas found statisticall / Zh. Geofiz. 1933. Vol. 3. P. 78 – 82; Kolmogorov A. N. Theory of Probability and Mathematical Statistics. — Moscow: Nauka, 1986. P. 161 – 167 [in Russian].

7. Orlov A. I. Methods for finding the most informative sets of characteristics in regression analysis / Zavod. Lab. Diagn. Mater. 1995. Vol. 61. N 1. P. 56 – 58 [in Russian].

8. Orlov A. I. The problem of multiple tests of statistical hypotheses / Zavod. Lab. Diagn. Mater. 1996. Vol. 62. N 5. P. 51 – 54.

9. Serdobol’skii V. I., Orlov A. I. Statistical analysis with a large number of parameters / Software and algorithmic support of applied multidimensional statistical analysis. Abstracts of the III All-Union School-Seminar. — Moscow: TsЙMI AN SSSR, 1987. P. 151 – 160.

10. Orlov A. I. Organizational-economic modeling: textbook. In 3 parts. Part 1. Non-numeric statistics. — Moscow: Izd. MGTU im. N. Й. Baumana, 2009. — 542 p. [in Russian].

11. Orlov A. I. Statistical control of two alternative variables and a method for verifying their independence from a set of small samples / Zavod. Lab. Diagn. Mater. 2000. Vol. 66. N 1. P. 58 – 62 [in Russian].

12. Loiko V. I., Lutsenko E. V., Orlov A. I. Modern approaches in scientometrics: monograph. — Krasnodar: KubGAU, 2017. — 532 p. https://elibrary.ru/item.asp?id=29306423 [in Russian].

13. Orlov A. I. Statistical packages — researcher tools / Zavod. Lab. Diagn. Mater. 2008. Vol. 74. N 5. P. 76 – 78 [in Russian].


Review

For citations:


Orlov A.I. ERRORS IN THE USE OF CORRELATION AND DETERMINATION COEFFICIENTS. Industrial laboratory. Diagnostics of materials. 2018;84(3):68-72. (In Russ.) https://doi.org/10.26896/1028-6861-2018-84-3-68-72

Views: 6818


ISSN 1028-6861 (Print)
ISSN 2588-0187 (Online)