/Data

Data Teams Are Getting Larger, Faster

tl;dr: "But something happens when a data team grows past 10 people. You no longer know if the data you use is reliable, the lineage is too large to make sense of and end-users start complaining about data issues every other day." Mikkel discusses how to deal with scaling teams.

featured in #334


Emerging Architectures For Modern Data Infrastructure

- Matt Bornstein Jennifer Li Martin Casado tl;dr: "To help data teams stay on top of the changes happening in the industry, we’re publishing in this post an updated set of data infrastructure architectures. They show the current best-in-class stack across both analytic and operational systems, as gathered from numerous operators we spoke with over the last year."

featured in #304


Why We Switched Our Data Orchestration Service

- Guillaume Perchais tl;dr: "Within Spotify, we run 20,000 batch data pipelines defined in 1,000+ repositories, owned by 300+ teams — daily. The majority of our pipelines rely on two tools: Luigi (Python) and Flo (Java). The data orchestration team decided to move away from these tools, and in this post, the team details why the decision was made, and the journey they took to make the transition."

featured in #301


Why Becoming A Data-Driven Organization Is So Hard

- Randy Bean tl;dr: Being data-driven has been a priority for companies but many have seen mixed results. According to a survey of executives, company culture is a harder hurdle to clear than any technical problem, and the explosion of the amount of data, privacy concerns and data ownership keep making the task harder. Randy offers three principles: (1) Think different and be creative. (2) Fail fast, learn faster. (3) Focus on the long-term.

featured in #296


The Next Big Challenge For Data Is Organizational

- Bryan Offutt tl;dr: "Software development teams have a few key characteristics that make them efficient, even at scale:" (1) Specialization of specific roles. (2) Modularization, as problems are broken into chunks. (3) Clarity of ownership. (4) Organizational buy-in that tech debt needs management. Bryan argues that the structure around data teams are in different place for each of the above characteristic, and we are at a "tipping point" for this to change.

featured in #273


Building The DataDog Platform For Processing Timeseries Data At Massive Scale

- Vadim Semenov tl;dr: Podcast interview with Vadim Semenov discussing "the systems that DataDog has built to power their business, and how their teams are organized to allow for rapid growth and massive scale." 

featured in #167