Abstract.
Based on methods of inductive logic, an approach to identifying of implication relationships “If A, then b” in Big Data is considered. This approach is considered in conditions of low reliability and inconsistency of data. To work in this condition, logics with vector semantics in the form of VTF logics are used. The presence or absence of phenomena in tables of their joint occurrence is formalized by truth vectors with components v+ and v-, where v+ is a measure of the true of a statement about the presence of a phenomenon, v- is a measure of its false. On the base of statistical induction principal, the indicator of the validity of a causal relationship is calculated as the average value of the truth vectors of the corresponding non-strict propositions. The resulting value is interpreted as a non-strict probability of the relationship, which acts as a vector indicator of its validity. The applicability of the approach for processing qualitative and quantitative data, as well as data containing artifacts, is shown.
Keywords:
big data, data mining, inductive inference, non-strict probability, logic with vector semantics.
DOI 10.14357/20718632240201
EDN HUCOJV
PP. 3-14.
References
1. Formula Big Data: sem` «V» + neordinarnaya zadacha [Big Data formula: seven “Vs” + an extraordinary task]. Available at: https://www.fsight.ru/blog/formula-big-data-sem-vneordinarnaja- zadacha-2/ (accessed January 10, 2024) 2. Lobanov, A.A. 2014. Bol'shie dannye: problemy obrabotki [Big data: processing problems]. Vestnik MGTU MIREA [Bulletin of MSTU MIREA]. 3:51-58. 3. Abramova, A.A. 2023. Analiz ispol'zovaniya bol'shih dannyh dlya prinyatiya reshenij v promyshlennoj sfere [Analysis of the use of big data for decision making in the industrial sector]. Ekonomika i kachestvo sistem svyazi [Economics and quality of communication systems]. 3:13-21. 4. Kel'chevskaya, N.R., and M.S. Kolyasnikov. 2020. Ispol'zovanie bol'shih dannyh v strategicheskom upravlenii znaniyami kompanii, sleduyushchej trendam Industrii 4.0 [The use of big data in the strategic knowledge management of a company following the trends of Industry 4.0]. Liderstvo i menedzhment [Leadership and Management]. 7(3):405-426. doi: 10.18334/lim.7.3.110662. 5. Fosso Wamba, S. et al. 2015. How ‘big data’ can make big impact: Findings from a systematic review and a longitudinal case study. International Journal of Production Economics. 165: 234-246. doi: 10.1016/j.ijpe.2014.12.031. 6. Oreshkov, V.I. 2011. Intellektual'nyj analiz dannyh kak sovremennyj instrument podderzhki upravlencheskih reshenij [Data mining as a modern tool for supporting management decisions]. Vestnik Ryazanskogo gosudarstvennogo agrotekhnologicheskogo universiteta [Bulletin of the Ryazan State Agrotechnological University]. 4:55-59. 7. Emel'chenkov, E.P. 2013. Bol'shie dannye. Metody intellektual'nogo analiza [Big Data. Methods of intellectual analysis]. Sistemy komp'yuternoj matematiki i ih prilozheniya [Systems of computer mathematics and their applications]. 14:75-79. 8. Esaulenko, A.S. and N.D. Nikonenko. 2016. Bol'shie dannye. Real'nost' i perspektivy [Big data. Reality and prospects]. Upravlenie innovaciyami: teoriya, metodologiya, praktika [Innovation management: theory, methodology, practice]. 17:74-79. 9. Medvedev, D.A. 2019. Bol'shie dannye: prichiny poyavleniya i kak ih mozhno ispol'zovat' [Big data: reasons for its appearance and how it can be used]. Nauka i obrazovanie segodnya [Science and Education Today]. 4:14-16. 10. Kuzora, S.S. and I.P. Natarov. 2022. Cifrovaya transformaciya i bol'shie dannye [Digital transformation and big data]. Vestnik Rossijskogo universiteta druzhby narodov. Seriya: Gosudarstvennoe i municipal'noe upravlenie [Bulletin of the Russian Peoples' Friendship University. Series: State and municipal administration]. 9(2):150–161. doi: 10.22363/2312-8313-2022-9-2-150-161. 11. Magerramov, Z.T., V.G. Abdullaev and A.Z. Magerramova. 2017. Big Data: problemy, metody analiza, algoritmy [Big Data: problems, analysis methods, algorithms]. Radioelektronika i informatika [Radioelectronics and Informatics]. 3:42-52. 12. Kriterii kachestva dannyh [Data quality criteria]. Available at: https://loginom.ru/blog/data-quality-criteria (accessed at 10 January, 2024). 13. Dudarev, V.A. 2014. Podhod k zapolneniyu propuskov v obuchayushchih vyborkah dlya komp'yuternogo konstruirovaniya neorganicheskih soedinenij [An approach to filling gaps in training samples for computer-aided design of inorganic compounds]. Vestnik MITHT [Bulletin of MITHT]. 9(1):73-75. 14. Finn, V.K. 2004. Ob intellektual'nom analize dannykh [On intelligent data analysis]. Novosti iskusstvennogo intellekta [Artificial Intelligence News]. 3:1-20. 15. Finn, V. K. 2010. Ob opredelenii empiricheskikh zakonomernostey posredstvom DSM - metoda avtomaticheskogo porozhdeniya gipotez [On the determination of empirical patterns using JSM - the method of automatic generation of hypotheses]. Iskusstvennyy intellekt i prinyatiye resheniy [Artificial intelligence and decision making]. 4:41-48. 16. Vinogradov, D.V. 2017. Analiz rezul'tatov primeneniya VKF-sistemy: uspekhi i otkrytaya problema [Analysis of the results of using the VKF system: successes and an open problem]. Nauchno-tekhnicheskaya informatsiya. Seriya 2: Informatsionnyye protsessy i sistemy [Scientific and technical information. Series 2: Information processes and systems]. 5:1-4. 17. Panov, A.I. 2013. Vyyavleniye prichinno-sledstvennykh svyazey v dannykh psikhologicheskogo testirovaniya logicheskimi metodami [Identification of cause-and-effect relationships in psychological testing data using logical methods]. Iskusstvennyy intellekt i prinyatiye resheniy [Artificial intelligence and decision making]. 1:24–32. 18. Anshakov, O.M. et al. 2009. DSM-metod avtomaticheskogo porozhdeniya gipotez: Logicheskiye i epistemologicheskiye osnovaniya [JSM method for automatically generating hypotheses: Logical and epistemological foundations]. Moscow. Book house “LIBRIKOM”. 432 p. 19. Dunn, J.M. 1966. Algebra of Intensional Logics. Doctoral Dissertation University of Pittsburg, Ann Arbor. 20. Dunn, J.M. 1976. Intuitive semantics for first-degree entailment and “coupled trees”. Philosophical Studies. 29:149-158. 21. Belnap, N. 1977. A useful four-valued logic. Modern Uses of Multiple-Valued Logic. Dordrecht: D. Reidel Publish. Co. 8-37. 22. Belnap N. 1977. How a computer should think. Contemporary Aspects of Philosophy. Stocksfield: Oriel Press Ltd. 30-55. 23. Arshinskiy, L.V. eds. 1998. Metody obrabotki nestrogih vyskazyvanij [Methods for processing non-strict proposition]. Irkutsk: East-Siberian Institute of MIA of Russia. 40 p. 24. Ivlev, Yu.V. eds. 2004. Logika: Uchebnik 3-e izd [Logic: Textbook 3rd ed.]. Moscow: TK Welby, Prospekt Publishing House. 288 p. 25. Golenkov, V.V. eds. 2009. Statisticheskie osnovy induktivnogo vyvoda: ucheb. posobie [Statistical foundations of inductive inference: textbook]. Minsk: BSUIR. 202 p. 26. Kyburg, H.E. 1970. Probability and Inductive Logic. L.: Macmillan. 272 p. 27. Inductive Inference. Available at: https://www.sciencedirect. com/topics/mathematics/inductive-inference (accessed at 10 January, 2024). 28. Arshinskiy, L.V. and V.S. Lebedev. 2022. Ob"ektivizaciya baz znanij intellektual'nyh sistem na osnove induktivnogo vyvoda s ispol'zovaniem nestrogih veroyatnostej [Objectification of intelligent systems knowledge bases based on the inductive inference using non-strict probabilities]. Informacionnye i matematicheskie tekhnologii v nauke i upravlenii [Information and mathematical technologies in science and management]. 4:190-200. doi:10.38028/ESI.2022.28.4.015. 29. Arshinskiy L.V. 2005. Prilozhenie logik s vektornoj semantikoj k opisaniyu sluchajnyh sobytij i ocenke riska [Application of vector semantics logics for description of occasion events and risks evaluation] // Problemy analiza riska [Issues of risk analysis]. 2(3):231-248. 30. Nechetkaya logika v modelyah upravleniya i iskusstvennogo intellekta / pod red. D.A. Pospelova [Fuzzy logic in control models and artificial intelligence / ed. YES. Pospelov], eds. 1986. M.: Science. Ch. ed. physics and mathematics lit. 312 p. 31. Gottwald, S. 2000. Treatise on Many-Valued Logics. Leipzig. 604 p. 32. Arshinskiy, L.V. 2004. Ocenka istinnosti vzaimoisklyuchayushchih gipotez sredstvami vektornoj logiki [Assessing the truth of mutually exclusive hypotheses using vector logic]. Informacionnye i matematicheskie tekhnologii / Trudy Bajkal'skoj Vserossijskoj konferencii «Informacionnye i matematicheskie tekhnologii» [Information and mathematical technologies/ Proceedings of the Baikal All-Russian conference “Information and mathematical technologies”]. Irkutsk. 188-194. 33. Pima Indians Diabetes – EDA & Prediction (0.906). URL: https://www.kaggle.com/code/vincentlugat/pima-indiansdiabetes- eda-prediction-0-906/input. 34. Uroven' sahara v krovi: norma, ustanovlennaya VOZ dlya zdorovyh lyudej [Blood sugar level: the norm established by WHO for healthy people]. Available at: https://yandex. ru/health/turbo/articles?id=4419 (accessed at 10 January, 2024).
|