Our platform semantha provides flexible web services that you can use out-of-the-box for semantic processing of text documents and data extractions. Depending on the application, these services can be combined as desired. In many cases, one of our applications is already sufficient – for example, for a hotspot analysis or a direct comparison of documents. Furthermore, we make all analyses available via a standardized REST interface. With it, semantha can be optimally used for your application and integrated into your process and IT landscape.
The semantha Analyzer can search your entire documents for relevant topics (hotspots) and highlight all applicable areas in no time. The hotspots are specified by the user in any desired formulation. If you like, you can export the results as Excel or PDF file.
The semantha Compare user interface is used for the display and user-guided revision of a semantic comparison. Missing contents/paragraphs can be displayed directly or differences in the paragraphs can be displayed visually. To make recurring comparisons more efficient, intermediate results can be saved.
semantha Requirements supports the evaluation of new specifications. Based on historical specifications semantha takes over the classification of requirements, the identification of risks and the cross-reference to external standards and other applicable documents. Through this input, the requirements process gains in efficiency and quality.
Interface (REST API)
All semantic services of semantha are accessible via a JSON-based REST API and can thus be integrated into any other services. Developer documentation is included in each server component and in the SKDs (see for example the semantha-sdk for Python) . An API key is required to use the API.
semantha can process text documents regardless of the file format (as long as there is text). She can read text documents from Microsoft Word, Microsoft Powerpoint, LibreOffice/OpenOffice (.docx, .pptx, .odt, .txt), tabular data (.xlsx) or PDF documents (.pdf). Also, she can digest special file formats, such as the XML-based ReqIF format. Other XML formats can be processed using custom XSL transformations. Furthermore, it is possible to upload ZIP archives and to import all documents in it en bloc.”
Extraction and Processing
Results can be displayed and analyzed directly in the application. Alternatively, they can be exported as commented PDF files or Excel tables. If you want to further process the results you are looking for in another software, they can also be extracted directly.
Document Annotator (Document Types)
The majority of documents processed with semantha are PDF files today. However, in PDF documents, in comparison to other formats such as Word documents (.docx), no document structure can be read out, such as headings, paragraphs, etc. In order to better understand the structure and layout of a PDF file, we have developed the Document Annotator.
The Document Annotator can be used to visually combine documents of the same type, such as package inserts, quotation letters, etc. into one document type. The annotator is taught using sample documents. Here, methods of artificial intelligence (AI) from the field of machine learning (ML) are used to create a document-type model. This model is then applied to new unknown documents of this type to better recognize the document structure. This then helps in the future processing of such documents with semantha, so that they can be processed semantically in an optimal way.
Additionally, you can also set a document type to ignore certain pages of a document (for example: always ignore the cover page.) or to read only a certain area on the pages (only the upper half of the document or only the right column of a document, e.g. for bilingual documents etc.).