Yu. A. Dubnov On Entropic Criteria For Feature Selection In Data Analysis Problems
Yu. A. Dubnov On Entropic Criteria For Feature Selection In Data Analysis Problems


The paper considers the problem of reducing the dimension of the feature space for describing objects in data analysis problems using the example of binary classification. The article provides a detailed overview of existing approaches to solving this problem and proposes several modifications. In which the dimensionality reduction is considered as the problem of extracting the most relevant information from the characteristic description of objects and is solved in terms of the Shanon's entropy. To identify the most significant features information criteria such as crossentropy, mutual information and Kullback-Leibler divergence are used.


dimentionality reduction, feature selection, classification, entropy.

PP. 60-69


1. D.L. Donoho. High-dimensional data analysis: The curses and blessings of dimensionality. Lecture delivered at the "Mathematical Challenges of the 21st Century" conference of The American Math. Society, Los Angeles, August 6-11, 2000.
2. J. Friedman, T. Hastie, and R. Tibshirani. Elements of Statistical Learning: Prediction, Inference and Data Mining. Springer, 2001.
3. C. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics), Springer, 758 p., 2006.
4. E. Alpaydin. Introduction to Machine Learning. MIT Press, 3rd ed., 640 p., 2014
5. M.A. Carreira-Perpinan. A review of dimension reduction techniques. Technical report CS-96-09, Department of Computer Science, University of Sheffield, 1997.
6. Imola K. Fodor. A survey of dimension reduction techniques, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 2002.
7. P. Cunningham. Dimension Reduction. Technical Report UCD-CSI-2007-7, University College Dublin, 2007.
8. P. Comon, C. Jutten. Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press, Oxford UK., 2010.
9. Michael W. Berry; et al. Algorithms and Applications for Approximate Nonnegative Matrix Factorization // Computational Statistics & Data Analysis, vol.52, p.155-173, 2007.
10. L. van der Maaten, G. Hinton. Visualizing High-Dimensional Data Using t-SNE // Journal of Machine Learning Research, vol.9, p.2579-2605, 2008.
11. K. Pearson. On lines and planes of closest fit to systems of points in space // Philosophical Magazine, vol.2, p.559-572, 1901.
12. I.T. Jolliffe. Principal Component Analysis, Series: Springer Series in Statistics, 2nd ed., Springer, NY, XXIX, 487p., 2002.
13. S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman. Indexing by latent semantic analysis // Journal of the American Society of Information Science, vol.41(6), p.391-407, 1990.
14. B. Schőlkopf, A. Smola, and K.-R. Mὓller. Nonlinear Component Analysis as a Kernel Eigenvalue Problem // Neural Computation, vol.10, no.5, p.1299-1319, 1998.
15. R.A. Fisher. The Use of Multiple Measurements in Taxonomic Problems // Annals of Eugenics, vol.7, p.179-188, 1936.
16. G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel approach // Neural Computation, vol.12(10), p.2385-2404, 2000.
17. G.J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience, 2004.
18. E. Bingham and H. Mannila. Random projection in dimensionality reduction:Applications to image and text data // Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York: Association for Computing Machinery, p.245-250. 2001.
19. W.B. Johnson and J. Lindenstrauss. Extensions of Lipshitz mapping into Hilbert space // In Conference in modern analysis and probability, vol.26 of Contemporary Mathematics, p.189-206, Amer. Math. Soc., 1984.
20. D. Achlioptas. Database-friendly random projections // Proceeding PODS'01 Proceedings of the twentieth ACM SIGMODSIGACT-SIGART symposium on Principles of database systems, p.274-281, 2001.
21. I. Guyon, A. Elisseeff. An Introduction to Variable and Feature Selection // Journal of Machine Learning Research, vol.3, p.1157-1182, 2003.
22. F.R. Bach. Bolasso: model consistent Lasso estimation through the bootstrap // Proceedings of the 25-th international conference on Machine learning, ICML'08, p.33-40, 2008.
23. A. Blum, P. Langley. Selection of relevant features and examples in machine learning // Artificial Intelligence, vol.97(1-2), p.245-271, 1997.
24. R. Kohavi, G. John. Wrappers for feature subset selection // Artificial Intelligence, vol.97, p.273-324, 1997.
25. T.M. Cover, J.A. Thomas. Elements of information theory. John Wiley and Sons Ltd., New-York, 561 p., 1991.
26. J. Abellán, J.G. Castellano. Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy // Entropy, vol.19, no.6, 247, 2017.
27. H.C. Peng, F. Long, C. Ding. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy // IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27(8), p.1226-1238, 2005.
28. Y. Zhang, S. Li, T. Wang, Z. Zhang. Divergence-based feature selection for separate classes // Neurocomputing, vol.101, p. 32-42, 2013.
29. N. Christianini, J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.
30. J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. Garcia, L. Sánchez, F. Herrera. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework // Journal of Multiple-Valued Logic and Soft Computing, vol.17:2-3, p.255-287, 2011.

2019 / 01
2018 / 04
2018 / 03
2018 / 02

© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".