Model and Architecture of Multi-Level Similarity Analysis of Android Applications based on Static Features

Main Article Content

Valery Vladimirovich Petrov

Abstract

The paper addresses the problem of multi-level similarity analysis of Android applications based on static features in digital application collections. Such collections may contain duplicates, forks, repackaged builds, and other modified variants; malicious payloads are treated as a special case of modification rather than as a synonym of repackaging. The paper formulates a similarity function for Android applications, introduces a static application model as the working object of comparison, and presents a multi-level pipeline that separates candidate screening, in-depth pairwise analysis, result interpretation, and a decision layer. Meaningful similarity signals are sought not only in classes.dex bytecode, but also in AndroidManifest.xml, resources, APK-internal metadata, and library dependencies. A numerical similarity score is computed only when static models are built successfully; otherwise the pipeline records a dedicated technical failure status together with a normalized failure reason. Preliminary evidence is reported on a local pilot set of five core pairs and two boundary cases. These results indicate that explicit handling of shared library code may improve interpretability, but they do not yet constitute a full validation of the proposed architecture on large collections.

Article Details

How to Cite
Petrov, V. V. “Model and Architecture of Multi-Level Similarity Analysis of Android Applications Based on Static Features”. Russian Digital Libraries Journal, vol. 29, no. 3, June 2026, pp. 877-9, doi:10.26907/1562-5419-2026-29-3-877-897.

References

1. Li L. et al. Understanding Android App Piggybacking: A Systematic Study of Malicious Code Grafting // IEEE Transactions on Information Forensics and Security. 2017. Vol. 12, No. 6. P. 1269–1284. https://doi.org/10.1109/TIFS.2017.2656460
2. Petrov V.V. System of Automated Numerical Similarity Evaluation of Android Applications // Nauchnyi servis v seti Internet: Trudy XXV Vserossiiskoi nauchnoi konferentsii. 2023. P. 283–297. https://doi.org/10.20948/abrau-2023-33
3. Petrov V.V. System of Automated Numerical Similarity Evaluation of Android Applications // Russian Digital Libraries Journal. 2024. Vol. 27, No. 3. P. 336–365. https://doi.org/10.26907/1562-5419-2024-27-3-336-365
4. Petrov V.V. Automated System for Numerical Similarity Evaluation of Android Applications // Automatic Documentation and Mathematical Linguistics. 2024. Vol. 58 (Suppl. 3). P. 131–142. https://doi.org/10.3103/S0005105525700207
5. Cesare S., Xiang Y. Software Similarity and Classification. London, Springer, 2012. 88 p. https://doi.org/10.1007/978-1-4471-2909-7
6. Desnos A. Android: Static Analysis Using Similarity Distance // Proc. of the 45th Hawaii International Conference on System Sciences. 2012. P. 5394–5403. https://doi.org/10.1109/HICSS.2012.114
7. Rastogi V., Chen Y., Jiang X. DroidChameleon: Evaluating Android Anti-Malware Against Transformation Attacks // Proc. of the 8th ACM SIGSAC. 2013. P. 329–334. https://doi.org/10.1145/2484313.2484355
8. Zhauniarovich Y. et al. FSquaDRA: Fast Detection of Repackaged Applications // Data and Applications Security and Privacy XXVIII. 2014. P. 130–145. https://doi.org/10.1007/978-3-662-43936-4_9
9. Li L., Bissyande T. F., Klein J. Rebooting Research on Detecting Repackaged Android Apps // IEEE Transactions on Software Engineering. 2021. Vol. 47, No. 4. P. 676–693. https://doi.org/10.1109/TSE.2019.2901679
10. Li L., Bissyande T. F., Klein J. SimiDroid: Identifying and Explaining Similarities in Android Apps // 2017 IEEE Trustcom/BigDataSE/ICESS. 2017. P. 136–143.
https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.230
11. Backes M., Bugiel S., Derr E. Reliable Third-Party Library Detection in Android and Its Security Applications // Proc. of the ACM Conf. on Computer and Comm. Security. 2016. P. 356–367. https://doi.org/10.1145/2976749.2978333
12. Li M., Wang W., Wang P. et al. LibD: Scalable and Precise Third-Party Library Detection in Android Markets // Proc. of the 39th International Conference on Software Engineering. 2017. P. 335–346. https://doi.org/10.1109/ICSE.2017.38
13. Huang J., Zhang Y., Tan H. et al. Scalably Detecting Third-Party Android Libraries With Two-Stage Bloom Filtering // IEEE Transactions on Software Engineering. 2023. Vol. 49, No. 4. P. 2272–2284. https://doi.org/10.1109/TSE.2022.3215628