Our platform semantha® provides flexible web services that you can use out-of-the-box for semantic processing of text documents and data extractions. Depending on the application, these services can be combined as desired. In many cases, one of our applications is already sufficient – for example, for a hotspot analysis or a direct comparison of documents. Furthermore, we make all analyses available via a standardized REST interface. With it, semantha® can be optimally used for your application and integrated into your process and IT landscape.
All semantic services of semantha are accessible via a JSON-based REST API and can thus be integrated into any other services. Developer documentation is included in each server component. An API key is required to use the API.
semantha can process text documents regardless of the file format (as long as there is text). she can read text documents (.docx, .txt, .xml, .json), tabular data (.xlsx) or PDF documents (.pdf). Also, she can digest special file formats, such as the XML-based ReqIF format. Other XML formats can be prcoessed using xslt. Furthermore, it is possible to upload ZIP archives and to import all documents in it en bloc.
Results can be displayed and analyzed directly in the application. Alternatively, they can be exported as commented PDF files or Excel tables. If you want to further process the results you are looking for in another software, they can also be extracted directly.
The majority of documents processed with semantha are PDF files today. However, in PDF documents, in comparison to other formats such as Word documents (.docx), no document structure can be read out, such as headings, paragraphs, etc. In order to better understand the structure and layout of a PDF file, we have developed the Document Annotator.
The Document Annotator can be used to visually combine documents of the same type, such as package inserts, quotation letters, etc. into one document type. The annotator is taught using sample documents. Here, methods of artificial intelligence (AI) from the field of machine learning (ML) are used to create a document-type model. This model is then applied to new unknown documents of this type to better recognize the document structure. This then helps in the future processing of such documents with semantha, so that they can be processed semantically in an optimal way.
Additionally, you can also set a document type to ignore certain pages of a document (for example: always ignore the cover page.) or to read only a certain area on the pages (only the upper half of the document or only the right column of a document, e.g. for bilingual documents etc.).