How the worlds largest broadcasting and cable company monetized data and reduced operating costs by effectively managing the volume and velocity of network data using Niyuj’s Enterprise Data Storage and Analytics Engine

Executive Summary

The ability to access, analyze, and manage vast volumes of data while rapidly evolving the Information Architecture is increasingly critical to communications service providers.

Our customer provides products and services that ensure reliable and secure delivery of IP based voice, video and data to communication services providers.

Niyuj worked with our customer the world’s largest broadcasting and cable company, head quartered in Philadelphia, PA, to help them create

new service offerings that generate additional revenue, and address ongoing operational management inefficiencies that reduce operating costs.

This paper presents a specific use case. However the approach and guidance offered is the by product of dozens of projects and highlights the choices that Niyuj customers faced and the decisions that they made leveraging Niyuj’s experience and expertise across many industries.

Key Business Challenges

Our end user being the world’s largest broadcasting and cable company has access to data from a variety of sources. This data includes but is not limited to Call detail records (CDR’s), Network and signaling data from protocols like SIP, data from network management systems (NMS) and network devices like Netflow data and log data like syslog.

However the sheer volume and velocity of data made traditional approaches to reporting and analysis ineffective to ingest and analyze this kind of data. As a result, data feeds are either not utilized at all, or need to be discarded from time to time to keep it manageable.

Moreover, as new network technologies like IoT sensors and mobile Internet devices emerge, the data volume and velocity increase exponentially making it an even larger problem.

Our end user has a clear strategy to leverage data and analytics to:
  1. Improve customer experience and monetize data through increased average revenue per user.
  2. Optimize network operations and save costs.

Using Redis, Cassandra, Jasper reports to efficiently consume, store and retrieve over 1 Billion points of network data per month coming in at over 20,000 per minute.

The Environment

Our customer has a deployment of 20,000 devices (session border controllers). These include a mix of IP-To-IP session border controllers and multiservice gateways to form a unified communication platform used to connect enterprises to cloud based services like SIP Trucking, hosted PBX’s and video centric IP communications. This entire infrastructure is managed, provisioned and configured by a cloud based service control center, which also provides real time visibility into performance and service quality.

The Data Problem

Each session border controller appliance transmits SIP analytics and syslog data every minute. This data includes but is not limited to CDR’s (call detail records). With 20,000 appliances in the network, and each sending data very minute, we end up with over 1 M data points an hour, 28 M data points a day, and almost a billion data points a month. The first challenge was to address the data volume and velocity problem.

The Legacy Infrastructure

The next was to address the data volume and velocity problem with least disruption to ongoing operation. The legacy system for analytics ran off a mysql relational database, used jasper reports for charting and analysis, and was hosted in a virtualized environment using VMWare virtualized images.

The Solution Architecture

Given the legacy constraints from both the application perspective and the underlying infrastructure Niyuj came up with an architecture that addresses the most urgent need of being able to manage the velocity and volume of data in a plug and play model with the existing technology stack causing least disruption, while also being extensible in the future for more advanced analytics like predictive analytics, forecasting etc.

This architecture has three main components, the ability to consume the data at the desired velocity, store the large volume of data for efficient retrieval for further analysis, and handle infrastructure SLA’s for disaster recovery and high availability.

The Data

Niyuj setup a very efficient technique to consume data in a reliable and generic manner using highly efficient queuing infrastructure. Niyuj used open source technologies like Redis and RabbitMQ, which are cost effective enterprise scale software products with a large community and user base. This choice provides a way to consume data at high velocity while transparently managing redundancy so that data is not lost even at high velocity. It works with our legacy platform infrastructure and needs no special platform pre-requisites. Moreover, the technology is agnostic to the format and structure of the data.

Once the data was acquired from the network infrastructure, it required to be persisted in a manner that can be efficiently retrieved in the future. The choice was to tradeoff storage for retrieval efficiency so as to support efficient data retrieval at scale that could then be used for downstream static and dynamic analytics and more intelligent processing using machine learning techniques.

Niyuj studied the legacy mysql relational database, and determined that it would not scale to store and retrieve the volume of data efficiently and proposed to use a nosql data store. Given the numerous providers offering nosql databases, it was important to pick the appropriate one for our use case.

In order to do this Niyuj evaluated different options based on the following criteria

1. The nature and inherent characteristics of the data being processed.

2. The nature of the storage and retrieval

3. The requirements for disaster recovery and high availability.

4. The volume and velocity

Keeping the above criteria in mind, Niyuj recommended Cassandra as the data store given that we were dealing primarily with time series data, which comes in at high velocity but needs to be retrieved periodically for reporting and analytics. Cassandra provides high write throughput and very efficient retrieval for large volumes of data that fits into its columnar structure. More over the reporting is on metrics that appear in groups that are accessed together. Cassandra given its peer multimode architecture inherently provides high availability and disaster recovery.

Given the legacy application already used the jasper-reporting framework, and jasper has adapters for Cassandra, Niyuj modified the retrieval business logic to retrieve from Cassandra, while keeping the reporting templates the same. Niyuj also processed this data leveraging machine learning techniques and stored the results back in Cassandra so it could be visualized using the existing reporting framework. This prevented any disruption to both operations as well as users who were used to consuming reports in a certain way.

The Data Plane

Niyuj setup a very efficient technique to consume data in a reliable and generic manner using highly efficient queuing infrastructure. Niyuj used open source technologies like Redis and RabbitMQ, which are cost effective enterprise scale software products with a large community and user base. This choice provides a way to consume data at high velocity while transparently managing redundancy so that data is not lost even at high velocity. It works with our legacy platform infrastructure and needs no special platform pre-requisites. Moreover, the technology is agnostic to the format and structure of the data.

The Data Store

Once the data was acquired from the network infrastructure, it required to be persisted in a manner that can be efficiently retrieved in the future. The choice was to tradeoff storage for retrieval efficiency so as to support efficient data retrieval at scale that could then be used for downstream static and dynamic analytics and more intelligent processing using machine learning techniques.

Niyuj studied the legacy mysql relational database, and determined that it would not scale to store and retrieve the volume of data efficiently and proposed to use a nosql data store. Given the numerous providers offering nosql databases, it was important to pick the appropriate one for our use case.

In order to do this Niyuj evaluated different options based on the following criteria

1. The nature and inherent characteristics of the data being processed.

2. The nature of the storage and retrieval

3. The requirements for disaster recovery and high availability.

4. The volume and velocity

Keeping the above criteria in mind, Niyuj recommended Cassandra as the data store given that we were dealing primarily with time series data, which comes in at high velocity but needs to be retrieved periodically for reporting and analytics. Cassandra provides high write throughput and very efficient retrieval for large volumes of data that fits into its columnar structure. More over the reporting is on metrics that appear in groups that are accessed together. Cassandra given its peer multimode architecture inherently provides high availability and disaster recovery.

Reporting and analytics framework

Given the legacy application already used the jasper-reporting framework, and jasper has adapters for Cassandra, Niyuj modified the retrieval business logic to retrieve from Cassandra, while keeping the reporting templates the same. Niyuj also processed this data leveraging machine learning techniques and stored the results back in Cassandra so it could be visualized using the existing reporting framework. This prevented any disruption to both operations as well as users who were used to consuming reports in a certain way.

Reduced operating costs through technology for scale, high availability and disaster recovery. Increased revenue through new use cases that are made possible via predictive analytics.

Scale improvements

The combination of Redis and Cassandra resulted in a 5X increase in write throughput while maintaining the same level of consistency and resilience to dataloss we got from the legacy mysql database. This allowed us to store and retrieve 100 X more data, which resulted in richer reports and analytics.

Disaster recovery and High Availability

Cassandra because of this distributed multimode (peer to peer) architecture inherently provides disaster recovery and high availability. This allowed us to move from external software like pacemaker and DRBD that was being used in the legacy system. This architecture is extensible in the future as well.

Scale Analytics

The combination of Jasper reports and Cassandra made it possible to provide richer analytics on significantly larger volume of data, without disrupting the existing reporting infrastructure. This made it possible to leverage existing reporting templates that were carefully designed with the end users.

Machine learning and predictive analytics

With the availability of large volumes of data that can be retrieved efficiently, it was possible to integrate advanced machine learning frameworks like apache mahout and spark to do predictive analytics on network data. This made it possible to monetize data through advanced analytics and new use cases.

Increased Revenue

The combination of the Big data infrastructure and advanced machine learning allowed us to explore several new use cases like Traffic classification, bandwidth provisioning, personalized billing plans, load balancing, predicting failures etc.

Client's Perspective