Automated Framework for Making Sense of Unstructured Data

Automated Framework for Making Sense of Unstructured Data

80-90 percent of data generated daily is unstructured data and, it is growing at 55-60 percent each year. Businesses had no other way than to manually read and categorize unstructured data to gain insights from it, until recently, unstructured data analytics tools made this job easier. 

Businesses are leveraging unstructured data to:

  1. Listen to the Voice of the Customer (VoC.)
  2. Discover the gap between the customer and the product.
  3. Improve the customer’s experience.

We were approached by a company for the same purpose.

The Client:

The client provided information and intelligence to most of the top 500 US companies.

The Problem:

Analyzing news articles for gathering insights on companies, events, and people constituted a good chunk of our client’s market research team. A typical person can read through and analyze at most 100 articles per day and, not all of them can be gleaned to extract meaningful insights. However, for an average company, thousands of articles need to be analyzed. Doing this manually is time-consuming and likely to compromise the quality of insights delivered.

The Solution:

Aidetic came up with an automated framework that enabled the client to enter the name of the company to be analysed and get results within a few minutes. Not only did we enable getting the insights quickly, but the results were also organized neatly into interactive visualizations.

We came up with the following workflow to arrive at high-quality results within an automated framework:

  • First, we fetched articles from news source APIs based on specified keywords and time frames. This news was first preprocessed and enriched before feeding into the NLP engine.
  • Upon feeding the data into the NLP engine, our proprietary algorithms analyzed the text to extract name entities, flag events, and organize pieces of text for subsequent modeling.
  • Next, we employed a topic modeling framework that used the Non-negative Matrix Factorization (NMF) algorithm to segregate text data into topics.
  • Then, using our proprietary algorithms, interactive visualizations were generated that made analyzing the huge corpus of documents easier, organized, and intuitive.

Our models also listed the most talked-about people, organizations, and events in each article so that the user could have a brief yet solid overview of the contents of an article with a cursory look. 

Results:

The automated framework of topic modeling allowed the client’s market research team to filter important information quickly, helped them organize the information contained in their unstructured data reliably, and needless to say, saved them countless hours overall.

Below is an example of the visualization that our software produces. All of the articles are neatly plotted in the graph colour-coded by the topic they talk about. The user can hover over the articles to see the titles, keywords, entities, and events being talked about. If they find it interesting, they can open it with a single click. The user can also zoom in to see local clusters of information.

Technology Stack:

Leave a Reply