Archival Handwritten Letter Attribution using Siamese Neural Networks
Main Article Content
Abstract
This paper presents a method for the automated attribution of archival handwritten letters based on a Siamese neural network, addressing a key challenge in digital humanities – the authentication of historical documents. The research is motivated by the mass digitization of 17th to 19th-century archives, where attribution is often hindered by incomplete or inaccurate metadata about the authors.
The method is designed for real-world document collections and accounts for challenges typical of archival materials: poor-quality scans, significant handwriting variation, and substantial class imbalance (from 1 to over 50 samples per author). The use of a Siamese network architecture enables the extraction of discriminative vector representations (embeddings). Based on these embeddings, the method not only classifies documents by known authors but also effectively identifies manuscripts that do not match any known author in the archive. This significantly narrows down the pool of candidates for subsequent expert verification.
The study introduces a data preprocessing algorithm and provides a comparative analysis of two approaches to text analysis: at the image fragment level (300×300 px) and at the individual text line level. The developed tool offers archivists and philologists an effective solution for the preliminary sorting and attribution of handwritten documents large collections.
Article Details
References
2. Kiselev V., Kropotov D., Pronina N. Handwritten documents author verification based on the siamese network // The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. 2024. Vol. XLVIII-2/W5-2024. P. 73–78. https://doi.org/10.5194/isprs-archives-XLVIII-2-W5-2024-73-2024
3. Bromley J., Bentz J., Bottou L., Guyon I., Lecun Y., Moore C., Sackinger E., Shah R. Signature verification using a "siamese" time delay neural network // International Journal of Pattern Recognition and Artificial Intelligence. 1993. Vol. 7, No. 4. P. 669–688. https://doi.org/10.1142/S0218001493000339
4. Solomon E., Woubie A., Emiru E.S. Deep learning-based face recognition method using siamese network. 2024. https://doi.org/10.48550/arXiv.2312.14001
5. Yin W., Schütze H. Convolutional neural network for paraphrase identification // Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015. P. 901–911. https://doi.org/10.3115/v1/N15-1091
6. Koch G., Zemel R., Salakhutdinov R. et al. Siamese neural networks for one-shot image recognition // ICML Deep Learning Workshop. 2015. Vol. 2, No. 1. P. 1–30.
7. Chopra S., Hadsell R., LeCun Y. Learning a similarity metric discriminatively, with application to face verification // 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). 2005. Vol. 1. P. 539–546. https://doi.org/10.1109/CVPR.2005.202
8. Hadsell R., Chopra S., LeCun Y. Dimensionality reduction by learning an invariant mapping // 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). 2006. Vol. 1. P. 1735–1742. https://doi.org/10.1109/CVPR.2006.100
9. Schroff F., Kalenichenko D., Philbin J. Facenet: A unified embedding for face recognition and clustering // 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. P. 815–823. https://doi.org/10.1109/CVPR.2015.7298682
10. Souibgui M.A., Biswas S., Jemni S.K., Kessentini Y., Forn´es A., Llado´s J., Pal U. Docentr: An end-to-end document image enhancement transformer. 2022. P. 1699–1705. https://doi.org/10.1109/ICPR56361.2022.9956101.
11. Wood D.E., Salzberg S.L. Kraken: ultrafast metagenomic sequence classification using exact alignments // Genome Biology. 2014. Vol. 15, No. 1. P. R46. https://doi.org/10.1186/gb-2014-15-3-r46
12. Shu L., Xu H., Liu B. Doc: Deep open classification of text documents. 2017. P. 2911–2916. https://doi.org/10.18653/v1/D17-1314.
13. Kiselev V., Pronina N. Machine attribution of handwriting in solving source studies problems (based on the correspondence of G.N. Potanin) // Imagology and Comparative Studies. 2025. No. 24.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.
All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.
Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.
By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.
RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.
Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.
The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.
We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.