Smart Cluster: Bringing order to chaos

5. May 2023
Timo Haberl

semantha is a platform that excels in matching text based on meaning independent of the words chosen. The basic functionality is simple: Give semantha an example of the text you would like to find, our artificial intelligence (AI) understands the meaning of the text and can then find sections in your documents with the same or similar meaning. But what if we don’t have examples to give her and ask her to analyse a large amount of text? We have taken a step in that direction with Smart Cluster.

How does Smart Cluster work?

If you’ve worked with semantha you may already know that the AI creates fingerprints from words and sentences to understand the meaning behind text. Then she compares the similarity between two fingerprints to find texts with similar meanings. We have made this more general with Smart Cluster and can compare many fingerprints to each other at the same time. This allows semantha to form clusters – groups of fingerprints that are close together and thus contain the same meanings. When the cluster has been formed, semantha can look at the words used in the text, find keywords and use these to give the cluster a name. At the end of the process, semantha will return a list of clusters with keywords as names, and a list of the documents that belong in each cluster. All of this happens with the same out-of-the-box language models that are already built into the AI.

In the video, Solution Engineer Joseph Daniels briefly shows the new Smart Cluster feature.

There are of course variables that go into the analysis which users will be able to configure, including choosing which documents to analyse with Smart Custer, defining how large the clusters need to be or how similar the contents of a cluster need to be to one another. If you expect some documents to be independent and not belong to any cluster, you will also have the option to include or reduce outliers.

How can Smart Cluster help?

Generating clusters of similar content at the click of a button can bring great insight into unstructured document sets at many different levels:

Research

If you collect text or documents for any kind of research, from marketing to science, you will want to be able to easily classify and cluster them so that you can find relevant information quickly. Smart Cluster can cluster the documents themselves or “just” the paragraphs in the documents, that way finding a collection of information about a specific topic is as easy as selecting the relevant cluster.

Archive organisation

Many companies have a large digital footprint. Those that have been around for a while likely have archives of unstructured documents. Information will be forgotten if it is impractical to retrieve, but with Smart Cluster you can structure archives and easily add new documents to the existing clusters.

Communication

There are many communication channels that can be monitored and analysed, but when people use free text it is difficult to automate any process. Inbound letters, answers in questionnaires, internal meeting notes – these can all be grouped and classified with Smart Cluster for further analysis.

Supporting other semantha use cases

Many use cases in semantha require giving her examples of text to find again. Configuring a new use case requires our customers to manually find these examples and upload them into the library. With Smart Cluster, you can let semantha recognise the topics providing you with a selection of hotspots ready to choose from. For example, upload a few insurance contracts, semantha finds a cluster related to terrorism and you have paragraphs labelled to find terrorism clauses in future contracts!

We’re not done yet

With Smart Cluster providing support in these different use cases, it is a great and logical addition to semantha’s portfolio, but the journey doesn’t stop there. We have a number of ideas in the pipeline to enhance the features of Smart Cluster, including cluster manipulation and visual representations. If you want to try Smart Cluster or have any suggestions for your use case, let us know!

Picture: AdobeStock / arthead

Contract/document analysis, review, and comparison

Enterprise search

Requirements and specifications analysis

ESG report data analysis

Compliance management

Risk analysis underwriting

Contract/document review for reinsurance

Mass claims analysis

Correspondence automation

Video analysis