Staff Writer

Target's Data Makeover: Navigating Analytics & Scalability

The transformation of data from analog to digital has been astounding, and Vinay Joshi, Senior Director of Data Engineering at Target, is at the forefront of that shift. He recently shared a deep dive into how Target restructured its analytical data management, overcoming challenges with forward-thinking solutions that changed the game in data utilization.

"Finding a data set is like finding a needle in a haystack. Moreover, if you're lucky to find it, there's no guarantee it's the right needle," Vinay Joshi, Senior Director of Data Engineering at Target

In 2018, Target found itself grappling with a paradox: it had an ocean of data, but the information wasn't helping decision-makers. It was like looking for a needle in an oversized haystack, made even messier with redundant data. The year 2018 wasn't just a data dilemma for Target; it was a watershed moment for the industry. Companies were drowning in data but parched for actionable insights. The 'data paradox' was the talk of the town, with businesses collecting more data than they knew what to do with. Redundant data wasn't just a Target issue; it was an industry-wide mess that turned data lakes into confusing swamps.

Users were wrestling with data inconsistencies and complicated database joins just to make basic sense of it all. Old-school data management techniques, a relic from the days of relational databases, were adding extra work for the end users. Managing thousands of ETL (Extract, Transform, Load) jobs was no picnic, either, and lack of visibility led to complaints about data quality. Wrestling with data inconsistencies is a common challenge, often worsened by non-integrated software systems. Complicated database joins continue to be a hurdle, especially when data quality and consistency are at stake. The old ways of managing data are becoming increasingly inadequate, as highlighted by the need for better data governance and the threat of compromised data security.

Joshi and his crew zeroed in on the issues, dividing them into three buckets: data curation, maintenance, and consumption. To tackle these, they set four cornerstone goals:

  • Achieve consistent data architecture and language.
  • Implement governance to certify data sets.
  • Boost efficiency through pre-joined, domain-centric data sets.
  • Increase transparency with standardized observability and data lineage.

The focus on data curation is in line with the industry's recognition of its importance in machine learning and analytics. “Data maintenance is not just about storing data but also involves ensuring its quality, reliability, and usability”, says George Brown, Advisor Cloud Engineer at Gainwell Technologies, an IT Consulting firm in Tennessee, United States. As for data consumption, it's about making data more useful for users, especially in data discovery and analysis. The four cornerstone goals set by Joshi and his team could serve as a blueprint for any organization looking to overhaul its data management systems.

Enter the Analytical Data Platform Architecture, Target's in-house solution. It's built on three pillars:

  • Atomic Histories: Clean, enriched raw data forms the backbone of domain-specific histories.
  • Cross-Domain Joins: These allow aggregation of data from various domains.
  • Access Control: APIs act as the gateway for operational systems, while data scientists and analysts interact directly with the data sets.

This structure didn't just simplify data management; it instilled rigorous process controls and established benchmarks to certify data sets for company-wide usage. The approach aligns well with the trends in data architecture for 2023, which emphasize the importance of hybrid and multi-cloud data architectures. The concept of Atomic Histories is akin to the industry's focus on real-time analytics. Cross-Domain Joins resonate with the trend towards Data Mesh, allowing for more flexible and comprehensive data aggregation. Access Control is crucial in an era where data accessibility is a key focus, ensuring that the right people have the right access to the data.

Before rolling out this architecture, there was the not-so-small matter of upskilling engineers. Many were SQL pros but newbies when it came to modern tech stacks like Spark or Scala. The team addressed this by bringing in fresh talent, investing in hands-on training, and nurturing an innovative culture. In 2023, the need for upskilling in Spark and Scala has become more critical than ever, making Target's approach a timely one.

“Kelsa gave back precious problem-solving time to the engineers and provided standardized observability, ensuring transparency for data consumers.”

Key to this initiative was Kelsa, a framework that streamlined pipeline development. It took care of repetitive tasks and standardized how data lineage and observability were implemented, giving engineers the space to innovate. Yet the road was far from smooth. As the business grew, so did data volumes, and scalability became a concern. The team's solution? A layered parallel processing framework that handled data at scale, ensuring timely delivery and consistent quality.

The impact of these changes wasn't just theoretical; it led to tangible outcomes. Data sets were cut down by 90%, and the number of joins reduced by 55%. Data pipelines could process historical data 75?ster than before.

Looking toward to the future, Joshi and his squad understand the work isn't over. They aim to shift from frameworks to centralized platforms, further democratizing data creation under standardized quality controls.

“Our journey from a legacy platform to a modern tech stack was not without bumps, but we navigated challenges through product thinking and innovation.”

To sum it up, Target's data management overhaul shows what can be achieved with clear goals, ingenious strategies, and a relentless pursuit of excellence. It's not just about surviving in the digital age; it's about thriving. This echoes the importance of continuous learning and adaptability in 2023, making Target's journey a case study in modern data management.

Watch the full video of the talk, here.

Have questions or comments about this article? Reach out to us here.

Banner Image Credits: Vinay Joshi at Great International Developer Summit

See Highlights

Hear What Attendees Say

PwC

“Once again Saltmarch has knocked it out of the park with interesting speakers, engaging content and challenging ideas. No jetlag fog at all, which counts for how interesting the whole thing was."

Cybersecurity Lead, PwC

Intuit

“Very much looking forward to next year. I will be keeping my eye out for the date so I can make sure I lock it in my calendar."

Software Engineering Specialist, Intuit

GroupOn

“Best conference I have ever been to with lots of insights and information on next generation technologies and those that are the need of the hour."

Software Architect, GroupOn

Hear What Speakers & Sponsors Say

Scott Davis

“Happy to meet everyone who came from near and far. Glad to know you've discovered some great lessons here, and glad you joined us for all the discoveries great and small."

Web Architect & Principal Engineer, Scott Davis

Dr. Venkat Subramaniam

“Wonderful set of conferences, well organized, fantastic speakers, and an amazingly interactive set of audience. Thanks for having me at the events!"

Founder of Agile Developer Inc., Dr. Venkat Subramaniam

Oracle Corp.

“What a buzz! The events have been instrumental in bringing the whole software community together. There has been something for everyone from developers to architects to business to vendors. Thanks everyone!"

Voltaire Yap, Global Events Manager, Oracle Corp.