Automated System for Numerical Similarity Evaluation of Android Applications

Main Article Content

Abstract

This paper is devoted to the design and development of a system for automating numerical similarity assessment of Android applications. The task of application similarity evaluation is reduced to the similarity evaluation of sets of control flow graphs constructed based on code from classes.dex files of applications. The similarity value was calculated based on the similarity matrix. The algorithms of graph editing and Levenshtein distance were used to compare control flow graphs. Application similarity criteria were formulated and their representation forms were investigated. Types of Android application models and methods of their construction are presented. A prototype of the system for automating the numerical evaluation of Android-applications similarity is developed. Optimization of the software solution is performed with the help of parallel programming tools. Experiments are carried out and the conclusion is made about the ability of the developed system to detect similarities between Android applications.

Article Details

References

1. Zhou W., Zhou Y., Jiang X., Ning P. Detecting repackaged smartphone applications in third-party android marketplaces // Second ACM conference on Data and Application Security and Privacy. 2012. P. 317–326. https://doi.org/10.1145/2133601.2133640
2. Crussell J., Gibler C., Chen H. Attack of the clones: Detecting cloned applications on android markets // European Symposium on Research in Computer Security. 2012. P. 37–54. https://doi.org/10.1007/978-3-642-33167-1_3
3. Market Shocker! Iron Soldiers XDA Beta Published by Alleged Thief // Android Headline. URL: https://www.androidheadlines.com/2011/01/market-shocker-iron-soldiers-xda-beta-published-by-alleged-thief.html.
4. Fake Mobile Apps Steal Facebook Credentials, Cryptocurrency-Related Keys // TREND MICRO. URL: https://www.trendmicro.com/en_us/research/22/e/fake-mobile-apps-steal-facebook-credentials--crypto-related-keys.html.
5. Android App Bundle frequently asked questions // Android developers. URL: https://developer.android.com/guide/app-bundle/faq
6. Akhunzada A., Sookhak M., Anuar N.B., Gani A., Ahmed E., Shiraz M., Furnell S., Hayat A., Khan M.K. Man-At-The-End attacks: Analysis, taxonomy, human aspects, motivation and future directions // Journal of Network and Computer Applications. 2015. No. 48. P. 44–57. https://doi.org/10.1016/j.jnca.2014.10.009
7. Chen J., Alalfi M.H., Dean T.R., Zou Y. Detecting android malware using clone detection // Journal of Computer Science and Technology. 2015. No. 30. P. 942–956. https://doi.org/10.1007/s11390-015-1573-7
8. Wang H., Guo Y., Ma Z., Chen X. Wukong: A scalable and accurate two-phase approach to android app clone detection // Proceedings of the 2015 International Symposium on Software Testing and Analysis. 2015. P. 71–82. https://doi.org/10.1145/2771783.2771795
9. Chen K., Liu P., Zhang Y. Achieving accuracy and scalability simultaneously in detecting application clones on android markets // Proceedings of the 36th International Conference on Software Engineering. 2014. P. 175–186. https://doi.org/10.1145/2568225.2568286
10. Li L., Bissyandé TF., Papadakis M., Rasthofer S., Bartel A., Octeau D., Klein J., Traon L. Static analysis of android apps: A systematic literature review // Information and Software Technology. 2017. No. 88. P. 67–95. https://doi.org/10.1016/j.infsof.2017.04.001
11. Guan Q., Huang H., Luo W., Zhu S. Semantics-based repackaging detection for mobile apps // Engineering Secure Software and Systems: 8th International Symposium. 2016. No. 8. P. 89–105. https://doi.org/10.1007/978-3-319-30806-7_6
12. Desnos A. Android: Static analysis using similarity distance / Desnos A. // 2012 45th Hawaii international conference on system sciences. 2012. P. 5394–5403. https://doi.org/10.1109/HICSS.2012.114
13. Zhauniarovich Y., Gadyatskaya O., Crispo B., La Spina F., Moser E. FSquaDRA: Fast detection of repackaged applications // Data and Applications Security and Privacy XXVIII: 28th Annual IFIP WG 11.3 Working Conference. 2014. No. 28. P. 130–145. https://doi.org/10.1007/978-3-662-43936-4_9
14. Li L., Bissyandé TF., Klein J. Simidroid: Identifying and explaining similarities in android apps // 2017 IEEE Trustcom/BigDataSE/ICESS. 2017. P. 136–143. https://doi.org/10.1007/s11390-019-1918-8
15. The Java® Virtual Machine Specification // Oracle. URL: https://docs.oracle.com/javase/specs/jvms/se7/html/
16. Android Runtime (ART) and Dalvik // Android Open Source Project. URL: https://source.android.com/docs/core/runtime/.
17. Ratazzi E.P. Understanding and improving security of the Android operating system // PhD dissertation; Syracuse University, 2016. URL: https://surface.syr.edu/etd/592/
18. Cesare S., Xiang Y. Software similarity and classification – 1. Springer London, 2012. 88 p. https://doi.org/10.1007/978-1-4471-2909-7
19. Jones J. Abstract Syntax Tree Implementation Idioms // Proceedings of the 10th conference on pattern languages of programs (plop2003). 2003. P. 26. URL: https://hillside.net/plop/plop2003/Papers/Jones-ImplementingASTs.pdf
20. Heck A.J.P. OOP: Class Hierarchy // Persoonlijke pagina's van FNWI-medewerkers Personal pages of Science staff. URL: https://staff.fnwi.uva.nl/a.j.p.heck/Courses/JAVAcourse/ch3/s1.html
21. Ferrante J., Ottenstein K.J., Warren J.D. The program dependence graph and its use in optimization // ACM Transactions on Programming Languages and Systems (TOPLAS). 1987. No. 9 (3). P. 319–349. https://doi.org/10.1145/24039.24041
22. Callahan D., Carle A., Hall M.W., Kennedy K. Constructing the procedure call multigraph // IEEE Transactions on Software Engineering. 1990. No. 16(4). P. 483–487. https://doi.org/ 10.1109/32.54302
23. Allen F.E. Control flow analysis // ACM Sigplan Notices. 1970. No. 5(7). P. 1–19. https://doi.org/10.1145/800028.808479
24. Kruegel C., Kirda E., Mutz D., Robertson W., Vigna G. Polymorphic worm detection using structural information of executables // Recent Advances in Intrusion Detection: 8th International Symposium. 2006. No. 8. P. 207–226. https://doi.org/10.1007/11663812
25. Marcelli A., Quer S., Squillero G. The maximum common subgraph problem: A portfolio approach // arXiv:1908.06418 preprint. 2019. URL: https://www.researchgate.net/publication/335258488_The_Maximum_Common_Subgraph_Problem_A_Portfolio_Approach
26. Abu-Aisheh Z., Raveaux R., Ramel J.Y., Martineau P. An exact graph edit distance algorithm for solving pattern recognition problems // 4th International Conference on Pattern Recognition Applications and Methods. 2015. No. 1. https://doi.org/10.5220/0005209202710278
27. Левенштейн В.И. Двоичные коды с исправлением выпадений, вставок и замещений символов // Доклады Академии наук СССР. 1965. № 163.4. С. 845–848.
28. Критерии сходства программ // ООО «АйТи-Лекс». URL: http://www.it-lex.ru/legal-cases/skhodstvo-programm/
29. Myles G., Collberg C. K-gram based software birthmarks // Proceedings of the 2005 ACM symposium on Applied computing. 2005. P. 314–318. https://doi.org/10.1145/1066677.1066753
30. Liu C., Chen C., Han J., Yu P.S. GPLAG: detection of software plagiarism by program dependence graph analysis // Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 2006. . 872–881. https://doi.org/10.1145/1150402.1150522
31. Свидетельство о государственной регистрации программы для ЭВМ № 2023665834 Российская Федерация. Система автоматизации численной оценки сходства Android-приложений: № 2023664873: заявл. 16.07.2023: опубл. 20.07.2023 / В.В. Петров.
32. Петров В.В. Система автоматизации численной оценки сходства Android-приложений // Научный сервис в сети Интернет: труды XXV Всероссийской научной конференции (18–21 сентября 2023 г., онлайн). М.: ИПМ им. М.В. Келдыша, 2023. С. 283–297. https://doi.org/10.20948/abrau-2023-33