Hiding in Meaning: Semantic Encoding for Generative Text Steganography

Main Article Content

Oleg Yurievich Rogov
Dmitrii Evgenievich Indenbom
Dmitrii Sergeevich Korzh
Darya Valeryaevna Pugacheva
Vsevolod Alexandrovich Voronov
Elena Viktorovna Tutubalina

Abstract

We propose a novel framework for steganographic text generation that hides binary messages within semantically coherent natural language using latent-space conditioning of large language models (LLMs). Secret messages are first encoded into continuous vectors via a learned binary-to-latent mapping, which is used to guide text generation through prefix tuning. Unlike prior token-level or syntactic steganography, our method avoids explicit word manipulation and instead operates entirely within the latent semantic space, enabling more fluent and less detectable outputs. On the receiver side, the latent representation is recovered from the generated text and decoded back into the original message. As a key theoretical contribution, we provide a robustness guarantee: if the recovered latent vector lies within a bounded distance of the original, exact message reconstruction is ensured, with the bound determined by the decoder’s Lipschitz continuity and the minimum logit margin. This formal result offers a principled view of the reliability–capacity trade-off in latent steganographic systems. Empirical evaluation on both synthetic data and real-world domains such as Amazon reviews shows that our method achieves high message recovery accuracy (above 91%), strong text fluency and competitive capacity up to 6 bits per sentence element while maintaining resilience against neural steganalysis. These findings demonstrate that latent conditioned generation offers a secure and practical pathway for embedding information in modern LLMs.

Article Details

How to Cite
Rogov, O. Y., D. E. Indenbom, D. S. Korzh, D. V. Pugacheva, V. A. Voronov, and E. V. Tutubalina. “Hiding in Meaning: Semantic Encoding for Generative Text Steganography”. Russian Digital Libraries Journal, vol. 28, no. 5, Dec. 2025, pp. 1165-8, doi:10.26907/1562-5419-2025-28-5-1165-1185.

References

1. Karimov E., Varlamov A., Ivanov D., Korzh D., and Rogov O.Y. Novel. LossEnhanced Universal Adversarial Patches for Sustainable Speaker Privacy. — 2025. — 2505.19951.
2. Moraldo H.H. An Approach for Text Steganography Based on Markov Chains // ArXiv. 2014. Vol. abs/1409.0915.
3. Fang T., Jaggi M., Argyraki K. Generating steganographic text with LSTMs // arXiv preprint arXiv:1705.10742. 2017.
4. Yang Z.-L., Guo X.-Q., Chen Z.-M., Huang Y.-F., Zhang Y.-J. RNN-stega: Linguistic steganography based on recurrent neural networks // IEEE Transactions on Information Forensics and Security. 2018. Vol. 14, No. 5. P. 1280–1295.
5. Yang Z.-L., Zhang S.-Y., Hu Y.-T., Hu Z.-W., Huang Y.-F. VAE-Stega: linguistic steganography based on variational auto-encoder // IEEE Transactions on Information Forensics and Security. 2020. Vol. 16. P. 880–895.
6. Ziegler Z., Deng Y., Rush A. M. Neural Linguistic Steganography // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. P. 1210–1215.
7. Dai F.Z., Cai Z. Towards near-imperceptible steganographic text // arXiv preprint arXiv:1907.06679. 2019.
8. Zhang S., Yang Z., Yang J., Huang Y. Provably Secure Generative Linguistic Steganography// Findings of the AssociationforComputational Linguistics: ACLIJCNLP 2021. 2021. P. 3046–3055.
9. Ding J., Chen K., Wang Y., Zhao N., Zhang W., Yu N. Discop: Provably Secure Steganography in Practice Based on “Distribution Copies” // 2023 IEEE Symposium on Security and Privacy (SP) / IEEE Computer Society. 2023. P. 2238– 2255.
10. Borisov V., Seßler K., Leemann T., Pawelczyk M., Kasneci G. Languagemodels are realistic tabular data generators // arXiv preprint arXiv:2210.06280. 2022.
11. Chia Y.K., Bing L., Poria S., Si L. RelationPrompt: Leveraging prompts to generate synthetic data for zero-shot relation triplet extraction // arXiv preprint arXiv:2203.09101. 2022.
12. Schick T., Schütze H. Generating datasets with pretrained language models // arXiv preprint arXiv:2104.07540. 2021.
13. Meng Y., Huang J., Zhang Y., Han J. Generating training data with language models: Towards zero-shot language understanding // Advances in Neural Information Processing Systems. 2022. Vol. 35. P. 462–477.
14. Ye J., Gao J., Li Q., Xu H., Feng J., Wu Z., Yu T., Kong L. Zerogen: Efficient zero-shot learning via dataset generation // arXiv preprint arXiv:2202.07922. 2022.
15. Wang Y., Ma X., Chen Z., Luo Y., Yi J., Bailey J. Symmetric cross entropy for robust learning with noisy labels // Proceedings of the IEEE/CVF international conference on computer vision. 2019. P. 322–330.
16. Gao J., Pi R., Yong L., Xu H., Ye J., Wu Z., Zhang W., Liang X., Li Z., Kong L. Self-guided noise-free data generation for efficient zero-shot learning // International Conference on Learning Representations (ICLR 2023). 2023.
17. Chen D., Lee C., Lu Y., Rosati D., Yu Z. Mixture of Soft Prompts for Controllable Data Generation // arXiv preprint arXiv:2303.01580. 2023.
18. Yu Y., Zhuang Y., Zhang J., Meng Y., Ratner A., Krishna R., Shen J., Zhang C. Large language model as attributed training data generator: A tale of diversity and bias // arXiv preprint arXiv:2306.15895. 2023.


Most read articles by the same author(s)