/Data

Struggling with Snowflake Costs? Try our Cost Optimization Calculator

tl;dr: Snowflake costs skyrocket for SaaS providers because the need to deliver real-time, interactive analytics is always on. If your Snowflake bill is spiraling, try our cost optimization calculator to discover your potential savings when using a Snowflake warehouse for ad-hoc queries. (No form required)

featured in #501


Top 5 Challenges of Designing Your Data Warehouse for Multi-Tenant Analytics

tl;dr: Data warehouses are built to store large volumes of data from numerous sources, not for SaaS platforms working with multi-tenant analytics where data security is vital. This guide helps you avoid the headaches that come with that architecture mismatch featuring solutions from our analytics experts.

featured in #499


Custom Data Models: The Key to Unlocking Powerful Embedded Analytics

- Brian Dreyer tl;dr: Without custom data models, even the most advanced analytics fail to deliver value, leading to customer churn. If you’re a SaaS leader, learn why custom data models are imperative for multi-tenant software platforms and four features of conventional data warehousing that are limiting your growth.

featured in #495


How DoorDash Used A Service Mesh To Manage Data Transfer, Reducing Hops And Cloud Spend

- Levon Stepanian Hochuen Wong tl;dr: There have been many benefits gained through DoorDash’s evolution from a monolithic application architecture to one based on microservices. The new architecture has reduced the time required for development, test, and deployment and at improved scalability and resiliency. DoorDash observed an uptick in data transfer costs, which prompted the engineering team to investigate alternative ways to provide the same level of service more efficiently. 

featured in #483


Data Quality Score: The Next Chapter Of Data Quality At Airbnb

- Clark Wright tl;dr: "With 1.4 billion cumulative guest arrivals as of year-end 2022, Airbnb’s growth pushed us to an inflection point where diminishing data quality began to hinder our data practitioners. Weekly metric reports were difficult to land on time. Seemingly basic metrics like “Active Listings” relied on a web of upstream dependencies. Conducting meaningful data work required significant institutional knowledge to overcome hidden caveats in our data." Clark discusses the implementation of a Data Quality Score.

featured in #471


Clerk Webhooks: Data Sync with Convex

- Dev Agrawal tl;dr: “Composing an application out of multiple data sources can be challenging, and while Clerk’s Backend APIs work great for most use cases, Webhooks offer the next level of integration that enables you to take full advantage of your existing stack. This post will cover how to synchronize user data from Clerk into your own backend using Webhooks.”

featured in #470


The Ultimate Guide To Modernizing Your Data Import Solution

tl;dr: When should you invest in modernizing your tech stack to drive long-term success? Discover crucial business milestones that signal the need for a tech upgrade and learn how to evaluate alternative solutions.

featured in #465


From Big Data To Better Data: Ensuring Data Quality With Verity

- Michael McPhillips tl;dr: Michael emphasizes that "data quality is paramount for accurate insights," highlighting the challenge of ensuring data reliability. Michael introduces Lyft’s in-house data quality platform, Verity, which has an exhaustive flow that starts with the following steps: (1) Data Profiling: Incoming data is scrutinized for its structure, schema, and content. This allows it to identify potential anomalies and inconsistencies. (2) Customizable Rules Engine: Enables data experts to define specific data quality rules tailored to their unique needs. These rules encompass everything from data format validations to more intricate domain-specific checks. (3) Automated Quality Checks: Once the rules are set, they are applied to incoming data streams, scanning each data point, seeking discrepancies.

featured in #457


Best Practices For Collecting And Querying Data From Multiple Sources

- Zoe Steinkamp tl;dr: In a data-centric era, efficiently collecting and querying data from diverse sources is paramount. Zoe Steinkamp emphasizes the importance of best practices in data collection, such as optimizing ingestion pipelines and advanced querying. With varied data streams like IoT and cloud computing, single-database storage is outdated. Instead, strategies like effective data modeling and understanding data sources are vital. Tools like InfluxDB, a time series database, and Pandas, a Python library, facilitate data management and analysis. Leveraging multiple data sources optimizes cost, efficiency, and user experience.

featured in #449


You Don’t Have To Sacrifice Streaming Data Performance To Cut Cloud Costs

tl;dr: Redpanda is faster and more efficient than Apache Kafka… but how much faster exactly? We ran 200+ hours of benchmarks to find out how both platforms perform for various workloads and hardware configurations. Here’s our breakdown on how Redpanda achieves 10x the performance while cutting cloud spend by over $500k.

featured in #407