Hiding in Meaning: Semantic Encoding for Generative Text Steganography
Main Article Content
Abstract
We propose a novel framework for steganographic text generation that hides binary messages within semantically coherent natural language using latent-space conditioning of large language models (LLMs). Secret messages are first encoded into continuous vectors via a learned binary-to-latent mapping, which is used to guide text generation through prefix tuning. Unlike prior token-level or syntactic steganography, our method avoids explicit word manipulation and instead operates entirely within the latent semantic space, enabling more fluent and less detectable outputs. On the receiver side, the latent representation is recovered from the generated text and decoded back into the original message. As a key theoretical contribution, we provide a robustness guarantee: if the recovered latent vector lies within a bounded distance of the original, exact message reconstruction is ensured, with the bound determined by the decoder’s Lipschitz continuity and the minimum logit margin. This formal result offers a principled view of the reliability–capacity trade-off in latent steganographic systems. Empirical evaluation on both synthetic data and real-world domains such as Amazon reviews shows that our method achieves high message recovery accuracy (above 91%), strong text fluency and competitive capacity up to 6 bits per sentence element while maintaining resilience against neural steganalysis. These findings demonstrate that latent conditioned generation offers a secure and practical pathway for embedding information in modern LLMs.
Article Details
References
2. Moraldo H.H. An Approach for Text Steganography Based on Markov Chains // ArXiv. 2014. Vol. abs/1409.0915.
3. Fang T., Jaggi M., Argyraki K. Generating steganographic text with LSTMs // arXiv preprint arXiv:1705.10742. 2017.
4. Yang Z.-L., Guo X.-Q., Chen Z.-M., Huang Y.-F., Zhang Y.-J. RNN-stega: Linguistic steganography based on recurrent neural networks // IEEE Transactions on Information Forensics and Security. 2018. Vol. 14, No. 5. P. 1280–1295.
5. Yang Z.-L., Zhang S.-Y., Hu Y.-T., Hu Z.-W., Huang Y.-F. VAE-Stega: linguistic steganography based on variational auto-encoder // IEEE Transactions on Information Forensics and Security. 2020. Vol. 16. P. 880–895.
6. Ziegler Z., Deng Y., Rush A. M. Neural Linguistic Steganography // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. P. 1210–1215.
7. Dai F.Z., Cai Z. Towards near-imperceptible steganographic text // arXiv preprint arXiv:1907.06679. 2019.
8. Zhang S., Yang Z., Yang J., Huang Y. Provably Secure Generative Linguistic Steganography// Findings of the AssociationforComputational Linguistics: ACLIJCNLP 2021. 2021. P. 3046–3055.
9. Ding J., Chen K., Wang Y., Zhao N., Zhang W., Yu N. Discop: Provably Secure Steganography in Practice Based on “Distribution Copies” // 2023 IEEE Symposium on Security and Privacy (SP) / IEEE Computer Society. 2023. P. 2238– 2255.
10. Borisov V., Seßler K., Leemann T., Pawelczyk M., Kasneci G. Languagemodels are realistic tabular data generators // arXiv preprint arXiv:2210.06280. 2022.
11. Chia Y.K., Bing L., Poria S., Si L. RelationPrompt: Leveraging prompts to generate synthetic data for zero-shot relation triplet extraction // arXiv preprint arXiv:2203.09101. 2022.
12. Schick T., Schütze H. Generating datasets with pretrained language models // arXiv preprint arXiv:2104.07540. 2021.
13. Meng Y., Huang J., Zhang Y., Han J. Generating training data with language models: Towards zero-shot language understanding // Advances in Neural Information Processing Systems. 2022. Vol. 35. P. 462–477.
14. Ye J., Gao J., Li Q., Xu H., Feng J., Wu Z., Yu T., Kong L. Zerogen: Efficient zero-shot learning via dataset generation // arXiv preprint arXiv:2202.07922. 2022.
15. Wang Y., Ma X., Chen Z., Luo Y., Yi J., Bailey J. Symmetric cross entropy for robust learning with noisy labels // Proceedings of the IEEE/CVF international conference on computer vision. 2019. P. 322–330.
16. Gao J., Pi R., Yong L., Xu H., Ye J., Wu Z., Zhang W., Liang X., Li Z., Kong L. Self-guided noise-free data generation for efficient zero-shot learning // International Conference on Learning Representations (ICLR 2023). 2023.
17. Chen D., Lee C., Lu Y., Rosati D., Yu Z. Mixture of Soft Prompts for Controllable Data Generation // arXiv preprint arXiv:2303.01580. 2023.
18. Yu Y., Zhuang Y., Zhang J., Meng Y., Ratner A., Krishna R., Shen J., Zhang C. Large language model as attributed training data generator: A tale of diversity and bias // arXiv preprint arXiv:2306.15895. 2023.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Presenting an article for publication in the Russian Digital Libraries Journal (RDLJ), the authors automatically give consent to grant a limited license to use the materials of the Kazan (Volga) Federal University (KFU) (of course, only if the article is accepted for publication). This means that KFU has the right to publish an article in the next issue of the journal (on the website or in printed form), as well as to reprint this article in the archives of RDLJ CDs or to include in a particular information system or database, produced by KFU.
All copyrighted materials are placed in RDLJ with the consent of the authors. In the event that any of the authors have objected to its publication of materials on this site, the material can be removed, subject to notification to the Editor in writing.
Documents published in RDLJ are protected by copyright and all rights are reserved by the authors. Authors independently monitor compliance with their rights to reproduce or translate their papers published in the journal. If the material is published in RDLJ, reprinted with permission by another publisher or translated into another language, a reference to the original publication.
By submitting an article for publication in RDLJ, authors should take into account that the publication on the Internet, on the one hand, provide unique opportunities for access to their content, but on the other hand, are a new form of information exchange in the global information society where authors and publishers is not always provided with protection against unauthorized copying or other use of materials protected by copyright.
RDLJ is copyrighted. When using materials from the log must indicate the URL: index.phtml page = elbib / rus / journal?. Any change, addition or editing of the author's text are not allowed. Copying individual fragments of articles from the journal is allowed for distribute, remix, adapt, and build upon article, even commercially, as long as they credit that article for the original creation.
Request for the right to reproduce or use any of the materials published in RDLJ should be addressed to the Editor-in-Chief A.M. Elizarov at the following address: amelizarov@gmail.com.
The publishers of RDLJ is not responsible for the view, set out in the published opinion articles.
We suggest the authors of articles downloaded from this page, sign it and send it to the journal publisher's address by e-mail scan copyright agreements on the transfer of non-exclusive rights to use the work.