How one of the worlds largest market research companies monetized data and reduced operating costs by bringing their customers, vendors and analysts on a single platform using Niyuj’s Enterprise Data Storage and Analytics Engine

Executive Summary

The ability to access, analyze, and manage vast volumes of data while rapidly evolving the Information Architecture is increasingly critical to market research companies.

Our customer provides products and services that ensure reliable and secure delivery of market research information to their customers

Niyuj worked with our customer, one of the top five market research companies in the world, with offices across three continents to help

them create a new service offering that generated additional revenue, and address ongoing operational management inefficiencies to reduce operating costs.

This paper presents a specific use case. However the approach and guidance offered is the by product of dozens of projects and highlights the choices that Niyuj customers faced and the decisions that they made leveraging Niyuj’s experience and expertise across many industries.

Key Business Challenges

Our customer being one of the world’s largest market research companies has access to data from a variety of sources. This data includes but is not limited to market research information about companies, products and services that spans different dimensions like technology, geography, sector and industry.

However the sheer volume and velocity of data made traditional approaches to capturing, reporting and analysis ineffective. As a result, data feeds are either not utilized at all, or need to be manually processed from time to time to keep it manageable and consumable to the end customer.

Moreover, as new instances of the different dimensions in which our customer operates, emerge, the data volume and velocity increase exponentially making it an even larger problem.

Our customer has a clear strategy to leverage data and analytics to
  1. Improve customer experience and monetize data through increased average revenue per user.
  2. Optimize performance of internal and external analysts and save costs.

Using machine learning to score and analyze the data, and a state of the art cloud based platform to deliver the results efficiently so customers can store, retrieve and consume over 1 million data points, personalized based on their behavior and preferences.

The Environment

Our customer has an ecosystem of vendors, customers and analysts who are the critical stakeholders. Our customer manually gathers data from vendors, which is then analyzed and scored by analysts based on a host of different criteria across multiple dimensions like technology, geography, market and industry. This analysis is then compiled into a series of reports, which is also visualized graphically into a quadrant for a comparative assessment of different vendors across different dimensions and corresponding criteria. This entire process is managed and operated manually by an ever-expanding team of analysts, customer account managers and vendor relationship managers.

The Data Problem

Each product and service offering from a vendor has multiple data points for each of the criteria specified in a multi-dimensional space defined by dimensions like technology, geography, market and industry. Each data point is scored and commented on by all the stakeholders in the system like analysts, customers and other vendors. With several thousand users in the network, and each creating data, we end up with millions of data points that grow to over a billion data points very quickly. The first challenge was to address the data volume and velocity problem.

The Process

The next was to address the data volume and velocity problem with least disruption to ongoing operational process. The aim was to create a common system for all stakeholders, so they could all collaborate on the same platform and based on different parameters and weights that are configured, the scoring algorithm automatically scores the data points, and consequently the companies, their products and services.

The Solution Architecture

Given the legacy constraints Niyuj came up with an architecture that addresses the most urgent need of being able to store and manage the volume of data, a personalized scoring algorithm that is accurate and independently scalable, and mechanism to deliver this in a plug and play model with a cloud based technology stack that can scale, while also being extensible in the future for more advanced analytics like predictive analytics, forecasting etc.

The Data

Data is created by each stakeholder’s interaction with the system. Vendors create data by adding information about their products and services, which are then rated and commented on by Analysts, and Customers consume this data in the form of reports.

Behavioral information of each stakeholder is captured. For example, this is information about who viewed what, whether they liked or disliked something, whether they commented on something etc.

All the above data, which includes but is not limited to the scores provided by the different stakeholders (primarily Analysts) and the behavioral data above, is factored into the scoring algorithm which scores the content as well as the actors (stakeholders) in the system to derive a key opinion leader (KOL) score. The KOL score of the users is considered when the user behavior is factored in. For example, if the CTO of IBM commented on a particular piece of data, it would hold far more weight then a junior analyst. This results in a continuous feedback scoring algorithm which provides a 360° view of the system.

Based on the behavior of users and their preferences, the content is scored and delivered in a personalized manner based on customer intent. Users can also create personalized quadrant reports and compare vendors, their products and services based on custom criteria across dimensions that are important to them.

Creation

Data is created by each stakeholder’s interaction with the system. Vendors create data by adding information about their products and services, which are then rated and commented on by Analysts, and Customers consume this data in the form of reports.

Behavioral information of each stakeholder is captured. For example, this is information about who viewed what, whether they liked or disliked something, whether they commented on something etc.

Scoring

All the above data, which includes but is not limited to the scores provided by the different stakeholders (primarily Analysts) and the behavioral data above, is factored into the scoring algorithm which scores the content as well as the actors (stakeholders) in the system to derive a key opinion leader (KOL) score. The KOL score of the users is considered when the user behavior is factored in. For example, if the CTO of IBM commented on a particular piece of data, it would hold far more weight then a junior analyst. This results in a continuous feedback scoring algorithm which provides a 360° view of the system.

Personalization

Based on the behavior of users and their preferences, the content is scored and delivered in a personalized manner based on customer intent. Users can also create personalized quadrant reports and compare vendors, their products and services based on custom criteria across dimensions that are important to them.

The System

Once content and behavioral data is acquired via a state of the art web interface, it needs to be persisted in a manner that can be efficiently retrieved in the future. The choice was to tradeoff storage for retrieval efficiency so as to support efficient data retrieval at scale that could then be used for downstream static and dynamic analytics and more intelligent processing using machine learning techniques. Also given the structure of the data is continuously in flux, we needed a data store that is resilient to these changes.

Niyuj investigated using traditional relational databases like Mysql, and determined that it would not lend it self to changes in the structure of data while independently being able to scale for storage and retrieval based on the volume of data in play. Niyuj proposed to a nosql data store. Given the numerous providers offering nosql databases, it was important to pick the appropriate one for our use case.

Niyuj proposed to a nosql data store. Given the numerous providers offering nosql databases, it was important to pick the appropriate one
for our use case.

In order to do this Niyuj evaluated different options based on the following criteria
  1. The nature and inherent characteristics of the data being processed.
  2. The nature of the storage and retrieval
  3. The requirements for disaster recovery and high availability.
  4. The volume and velocity

Increased revenue and reduced operating costs through a common technology platform for all stakeholders that’s independently scalable

Database

The combination of relational and non-relational features drove the choice for using Postgres as our choice for the data store. This resulted in allowing us to scale write throughput while maintaining the same level of read performance and resilience to changes in data structure

Angular JS web application

Since we were dealing with large volumes of data that need to be visualized and processed, and multiple different interactions possible we needed a Web UI framework that can scale. This requirement drove our choice of Angular JS as the WebUI framework. This framework has features that allow data to be asynchronously changed without requiring the refresh of the whole page.

Scale Analytics

The combination of Jasper reports and Postgres made it possible to provide richer analytics on significantly larger volume of data. This made it possible to leverage existing reporting templates that were carefully designed with the end users.

Machine learning and predictive analytics

With the availability of large volumes of data that can be retrieved efficiently, it was possible to integrate advanced machine learning frameworks like apache mahout and spark to do predictive analytics on behavioral data. This made it possible to monetize data through advanced analytics and new use cases.

Increased Revenue

The combination of the Big data infrastructure, advanced machine learning, and a start of the art web UI, allowed us to provide a personalized experience for all stakeholders allowing our customer to scale their business without the need to increase processing staff to keep up with the growth in business.

Client's Perspective