 |
R. K. Klassen, V. A. Raikhlin Improving the Efficiency of ClusterixLike DBMS for Big Data Analytical Processing |
 |
Abstract. Commercial OLAP-systems are economically unavailable for organizations with limited financial capabilities. Analytical processing large amounts of data in these organizations can be accomplished using open source software systems on a cost-effective cluster platform. Previously created Clusterix-like DBMS were not efficient enough according to the «performance/cost» criterion. With a view to the enhance the effectiveness of such systems in the article considers their further development with a focus on a full load of processor cores and the using GPU acceleration (systems Clusterix-N, N – from New) up to the development of a system comparable in efficiency to the open source system Spark, which is currently considered the most promising. The development methodology was based on the constructive system modeling methodology. Keywords: analytic processing of significant data volumes, open source software systems on a cluster platform, increasing the efficiency of Clusterix-like DBMS, full loading of processor cores, full load of processor cores, GPU acceleration, comparison with Spark, accepted methodology. PP. 43-59. DOI 10.14357/20718632190405 References 1. E. F. Codd. Providing OLAP to user-analysts: an it mandate, Apr. 1993. Technical Report, E. F. Codd and Associates. 2. Microsoft. Parallel Query Processing //Resources and Tools for IT Professionals | TechNet. 2018. URL: https://technet.microsoft.com/enus/library/ms178065(v=sql.105).aspx (accessed: 05.04.2018). 3. Lenovo System x3950 X6 // TPC-H Result Highlights. 2016. URL: http://www.tpc.org/3321 (accessed: 10.08.2018). 4. Lenovo. System x3950 X6 Rack Server //Lenovo official website in Russia. 2017. URL: https://www3.lenovo.com/ru/ru/data-center/servers/missioncritical/System-x3950-X6/p/WMD00000002 (accessed: 15.07.2018). 5. Oracle Exadata Database Machine X7 //Oracle Russia and CIS. 2018. URL: https://www.oracle. com/ru/engineeredsystems/exadata/database-machine-x7/index.html (accessed: 10.08.2018). 6. EMC Education Services. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data // John Wiley & Sons. 432 p. 7. Xin, Reynold & Rosen, Josh & Zaharia, Matei & J. Franklin, Michael & Shenker, Scott & Stoica, Ion. (2012). Shark: SQL and Rich Analytics at Scale. Proceedings of the ACM SIGMOD International Conference on Management of Data. 10.1145/2463676.2465288. 8. Russian DBMS industry advances on «elephants» [Rossijjskaja otrasl' SUBD prodvigaetsja na «slonakh»]//Connect. 2017. No. 5-6. pp.34-38. 9. Postgres Pro DBMS //Postgres Professional. 2018. URL: https://postgrespro.ru/products/postgrespro (accessed: 03.05.2018). 10. Hellerstein J.M., Stonebraker M., Hamilton J. Architecture of a Database System //Foundations and Trends in Databases. 2007. Vol. 1. No. 2. pp. 141-259. 11. Raikhlin V.A. Simulation of Distributed Database Machines //Programming and Computer Software, Vol. 22, No. 2, 1996. pp. 68-74. 12. Raikhlin V.A., Klassen R.K. Sravnitel'no nedorogie gibridnye tekhnologii konservativnykh SUBD bol'shikh ob"-emov [Relatively inexpensive hybrid technology of large volumes conservative DBMS] //Journal of Information Technologies and Computing Systems. 2018. Vol 68. №1. P. 46-59. 13. Raikhlin V.A., Minjazev R.Sh. Mul'tiklasterizaciya raspredelennyx SUBD konservativnogo tipa [Multiclusterization of distributed dbms of conservative type] // Nonlinear world, 2011. №8. P.473-481. 14. Klassen R.K. Osobennosti ehffektivnojj obrabotki SQL zaprosov k bazam dannykh konservativnogo tipa [Features of efficient processing of SQL-queries to conservative type databases] // Journal of Information Technologies and Computing Systems. 2018. Vol 68. №4. P. 108-118. 15. Oracle. The MySQL Plugin API //MySQL Documentation. 2018. URL: https://dev.mysql.com/doc/refman/5.7/en/pluginapi. html (accessed: 09.04.2018). 16. Raikhlin V.A. Konstruktivnoe modelirovanie sistem [Constructive system modeling]. – Kazan. Publisher: «Feng» («Nauka» [«Science»]), 2005. – 304 pp. 17. Haken, Hermann. (2004). Synergetics: Introduction and Advanced Topics. 10.1007/978-3-662-10184-1. 18. Klassen R.K.: PerformSys. https://github.com/rozh1/PerformSys/ (2018). (accessed: 09.12.2018). 19. Martin J. Computer database organization. 2nd ed. New Jersey 07632: Prentice-Hall, Inc., Englewood Cliffs, 1977.713 pp. 20. Raikhlin V.A., Klassen R.K. Can GPU-accelerator significantly increase the effectiveness of conservative DBMS considerable volumes on cluster platforms? //2017 International Siberian Conference on Control and Communications (SIBCON). 2017. DOI: 10.1109/SIBCON.2017.7998474 21. CoGaDB – Column-oriented GPU-accelerated DBMS. URL: http://cogadb.cs.tudortmund.de/wordpress. (accessed: 29.01.2019) 22. PGStrom 2016. URL: https://wiki.postgresql.org/index.php?title=PGStrom&oldid=25517 (accessed: 05.10.2018). 23. Rauhe H. Finding the Right Processor for the Job Co-Processors in a DBMS, Ilmenau University of Technology, Ilmenau, Dissertation urn:nbn:de:gbv:ilm1-2014000240, 2014. 24. Wenbin F., Bingsheng H., Qiong L. Database Compression on Graphics Processors //Proc. VLDB Endow., Vol 3, No. 1-2, Sep 2010. P.670-680. 25. Bres S. Efficient query processing in co-processor-accelerated database. PhD dissertation, University of Magdeburg (2015) 26. Klassen R.K.: Clusterix-N. https://bitbucket.org/rozh/clusterixn/ (2019). (accessed: 10.03.2019).
|