Abstract:
Currently, the major part of transmitted and stored data is unstructured, and the amount of unstructured data is growing rapidly each year, although it is hardly searchable, unqueryable, and its processing is not automated. At the same time, there is a growth of electronic document management systems. This paper proposes a solution for extracting data from paper documents considering their structure and layout based on document photos. By examining different approaches, including neural networks and plain algorithmic methods, we present their results and discuss them.