All-Purpose Big Data Pipeline for DDMR Data Marketplace

The Intelliarts team tackled the development of an end-to-end data pipeline, helped to process large data sets, and transform them into valuable insights.

Solution Highlights

  • Built an end-to-end data pipeline, including data collection, storage, processing, and delivery
  • Made it possible to process hundreds of data terabytes per day
  • Helped the customer grow to operating clusters with 2000 cores and 3700 GB of RAM
  • Secured DDMR’s business in terms of data leakage
  • Provided space savings and cost optimization thanks to more efficient data regrouping
Solution highlights
Entrust your big data project to experts
Having solid expertise in big data, our data scientists can help you incorporate big data into your business processes and empower decision-making.
Talk to us

About the Project

Customer:

DDMR is a US-based data-driven market research company that acts as a provider of clickstream data. The company sells data to other organizations to help its end-users to get a better understanding of their business and customer preferences and, hence, maximize the chance of successful sales.

Challenges & Project Goals:

As clickstream data is the DDMR’s core product, managing and processing big data became their biggest headache and major business challenge. The company contacted Intelliarts with a request to redesign, develop, and implement a large-scale data transformation and data augmentation system.

Solution:

Through the 8 years of partnership with DDMR, Intelliarts built an efficient and scalable big data solution, which helped the customer manage big data, from the moment of data collection through storage and processing to delivery. We also optimized the solution to make it more reliable and secure.

Business Value Delivered:

  • An end-to-end data pipeline, with efficient data collection, storage, processing, and delivery
  • Annual revenue multiplied a few times and the increased base of loyal customers
  • Ability to process large volumes of data and translate them into actionable insights
  • Opportunity to access the necessary data easily and without delays
  • Increased reliability and reduced human error
Location: US
Industry: Information Technology
Partnership period: Apr 2015 — Feb 2023
Services:

Data Engineering

Expertise:

Big Data

Technologies used: Databricks, AWS, Amazon EMR, Scala, Python, Spark, Snowflake, DBT, AWAA, Terraform, Ansible
Testimonials

Intelliarts’ work is amazing. Thanks to their help, we’ve been able to process hundreds of terabytes of data per day. We’ve definitely seen an increase in our revenue since we started working with them.

Jawad Laraqui
CEO @DDMR

Make big data the centerpiece of your digital transformation with the help of Intelliarts.
Get in touch with us

Technology Solution

Intelliarts helped DDMR to develop and optimize an end-to-end data pipeline over the years of partnership. Here we define the major milestones of this project:

  • Data collection. In the first phase of cooperation, our data engineers developed a platform-agnostic data collection tool with browsers as a data source, taking into account the compliance requirements.
  • Data storage. By splitting data into cold and hot storage, we allowed the customer to access the necessary data easily and without delays via hot storage. This decision also helped to save costs on data storage since the company started to use cold storage for the data it rarely needed access to. Another benefit of the introduction of cold vs. hot storage was the solution to the data compression problem. This way, the company could save space and cut costs by grouping data more efficiently before uploading it to storage.
  • Data transformation. Another challenge we faced was the inefficient data processing pipeline using Java Hadoop and Cascading frameworks. So we redesigned the data pipeline and transferred data processing services to Spark and Databricks. In a few years, our data experts replaced this tech with the new industry standard, i.e., Snowflake, DBT, and AWAA, with the idea to substitute ETL processes with ELT and lighter cloud solutions.
Computer illustration
  • DevOps. A critical improvement to the DDMR’s data pipeline was the introduction of Terraform, which increased DevOps Maturity in the project. Together with the use of the cloud provisioning tool Ansible, these changes allowed us to automate the building and provisioning of the infrastructure. Thus, we made it more reliable and reduced the human factor. The Intelliarts team also provided 24/7 support for the project by adding automatic notifications and efficient trend monitoring. By making the infrastructure multi-region, we helped DDMR provide its services to the end users faster and maintain consistency of workloads.
  • Security. Our data engineers secured the business in terms of data leakage and introduced data cleaning and other measures that help the customer protect sensitive data on its part. Also, we supported the implementation of GDPR, PII detection, and sensitive data removal.
  • Value-added products. At some moment, the Intelliarts team moved from solving engineering problems to also tackling product challenges. Having volumes of clickstream data, we offered DDMR to produce spin-off products to expand the market share. For example, we performed data augmentation for clickstream data by mapping domains to public company stock tickers. This allowed the company to provide a great value proposition for fintech companies, which were looking for this type of data as an extra source for their analytics.

Business Outcomes

By tackling the development of DDMR’s data pipeline, Intelliarts helped the company to find the most efficient way to process large datasets of clickstream data and transform them into valuable insights. As a result of all the taken actions, the customer:

  • Multiplied its annual revenue a few times and won a base of loyal customers. During the years of our cooperation, the project grew from zero operating clusters to those with 2000 cores and 3700 GB of RAM. And the company continues to grow and develop.
  • Became able to process hundreds of terabytes of data per day.
  • Was ahead of all technology trends, which led to greater efficiency, speed, and ease of use of the big data solution we built. For DDMR, it also meant better productivity and optimized processes.
  • Significantly increased its end-user satisfaction — we never waited until the end user came to us with a challenge but tried to act on the lead. Whenever our engineers saw the end user didn’t access the data for some time, we reached out to them and asked whether any assistance was needed.
  • Produced several spin-off products based on our recommendation. One of them was the result of data augmentation we performed, which helped the company create a new value proposition for fintech companies looking for a specific type of data.
Build scalable and efficient data pipelines from scratch together with us.
Let’s discuss your project
Related Cases