Abstract.
Finding an interpretable nonredundant representation of realworld data is one of the key problems in Machine Learning. Biological neural networks are known to solve this problem quite well in unsupervised manner, yet unsupervised artificial neural networks either struggle to do it or require finetuning for each task individually. We associate this with the fact that a biological brain learns in the context of the relationships between observations, while an artificial network does not. We also notice that, though a naive data augmentation technique can be very useful for supervised learning problems, autoencoders typically fail to generalize transformations from data augmentations. Thus, we believe that providing additional knowledge about relationships between data samples will improve model's capability of finding useful inner data representation. More formally, we consider a dataset not as a manifold, but as a category, where the examples are objects. Two these objects are connected by a morphism, if they actually represent different transformations of the same entity. Following this formalism, we propose a novel method of using data augmentations when training autoencoders. We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network in terms of the hidden representation. We believe that the classification accuracy of a linear classifier on the learned representation is a good metric to measure its interpretability. In our experiments, present approach outperforms βVAE and is comparable with Gaussianmixture VAE.
Keywords:
machine learning, deep learning, neural networks, autoencoder, variational autoencoder, latent data representation, interpretability of the latent data representation, applications of the category theory.
DOI 10.14357/20718632200303
PP. 3039. References
1. Schmidhuber, J. H., «Learning factorial codes by predictability minimization», Neural Computation, т. 4(6), № 863879. 2. Bengio, A. Courville, and P. Vincent, «Representation learning: A review and new perspectives», IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013. 3. I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, «βVAE: Learning basic visual concepts with a constrained variational framework», ICLR, 2017. 4. Denis Kuzminykh, Daniil Polykovskiy, Alexander Zhebrak, «Extracting Invariant Features From Images Using An Equivariant Autoencoder», Proceedings of The 10th Asian Conference on Machine Learning, т. 95, № 438453, 2018. 5. R. Held, A. Hein, "Movementproduced stimulation in the development of visually guided behavior," Journal of Comparative and Physiological Psychology, vol. 56(5), no. 872876, 1963. 6. Guy Shiran, Daphna Weinshall, «MultiModal Deep Clustering: Unsupervised Partitioning of Images», arXiv:1912.02678, 2019. 7. Philip Bachman, R Devon Hjelm, William Buchwalter, «Learning Representations by Maximizing Mutual Information Across Views», arXiv:1906.00910, 2019. 8. Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton, «A Simple Framework for Contrastive Learning of Visual Representations», arXiv:2002.05709v1, 2020. 9. Diederik P. Kingma, Max Welling, «AutoEncoding Variational Bayes», International Conference on Learning Representations, 2014. 10. Mac Lane, Saunders, «Categories for the Working Mathematician», Graduate Texts in Mathematics. 5 (Second ed.). Springer, 1998. 11. Doersch, C., «Tutorial on Variational Autoencoders», arXiv:1606.05908v2, 2016. 12. Leland McInnes, John Healy and James Melville, «UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction», arXiv:1802.03426v2, 2018.
