<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data Engineering ACID]]></title><description><![CDATA[Every Friday, we deliver your weekend win: copy-paste tutorial, cost-optimisation technique, CFPs worth your pitch, and fresh ideas from the field. Stop surfing fluff.]]></description><link>https://newsletter.e6data.com</link><image><url>https://substackcdn.com/image/fetch/$s_!TZyV!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f89561c-1fae-4ea0-9c5e-568932e1c37d_1024x1024.png</url><title>Data Engineering ACID</title><link>https://newsletter.e6data.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 19 May 2026 04:41:28 GMT</lastBuildDate><atom:link href="https://newsletter.e6data.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[e6data, Inc.]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[dataengineering.acid@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[dataengineering.acid@substack.com]]></itunes:email><itunes:name><![CDATA[Data Engineering ACID | e6data]]></itunes:name></itunes:owner><itunes:author><![CDATA[Data Engineering ACID | e6data]]></itunes:author><googleplay:owner><![CDATA[dataengineering.acid@substack.com]]></googleplay:owner><googleplay:email><![CDATA[dataengineering.acid@substack.com]]></googleplay:email><googleplay:author><![CDATA[Data Engineering ACID | e6data]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Semantic Layers Beat Prompts, Flink Goes Agentic, Flotilla's 18x Speed Jump, and Why Your Spark Jobs Struggle With Images]]></title><description><![CDATA[How AI is reshaping data infrastructure: from agentic stream processing to production-grade semantic layers and purpose-built multimodal engines.]]></description><link>https://newsletter.e6data.com/p/semantic-layers-beat-prompts-flink</link><guid isPermaLink="false">https://newsletter.e6data.com/p/semantic-layers-beat-prompts-flink</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Mon, 13 Oct 2025 16:03:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g-fo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g-fo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g-fo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp 424w, https://substackcdn.com/image/fetch/$s_!g-fo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp 848w, https://substackcdn.com/image/fetch/$s_!g-fo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp 1272w, https://substackcdn.com/image/fetch/$s_!g-fo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g-fo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp" width="1456" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Dead Poets Society | Hyde Park Picture House (HPPH) Leeds&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Dead Poets Society | Hyde Park Picture House (HPPH) Leeds" title="Dead Poets Society | Hyde Park Picture House (HPPH) Leeds" srcset="https://substackcdn.com/image/fetch/$s_!g-fo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp 424w, https://substackcdn.com/image/fetch/$s_!g-fo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp 848w, https://substackcdn.com/image/fetch/$s_!g-fo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp 1272w, https://substackcdn.com/image/fetch/$s_!g-fo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff49af35f-6f46-4d80-856c-068903dcf647_2000x1200.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image Credits: Google</figcaption></figure></div><h3>&#128269; <strong>&#8220;Show me your GitHub&#8221; - The Quest for Real Data Engineering Projects</strong></h3><p>A semi-senior data engineer asked the r/dataengineering community to share <strong>production-grade GitHub repos</strong>, hoping to see how senior engineers structure projects, choose tools, and design architectures. The responses revealed a fundamental problem: most real data engineering work is proprietary.</p><ul><li><p>The standout contribution:<a href="https://www.reddit.com/r/dataengineering/comments/1nupeh2/could_senior_data_engineers_share_examples_of/"> tobymao&#8217;s sqlglot</a>, a SQL parser and transpiler that handles dialect translation across 20+ SQL variants (it&#8217;s what powers dialect conversion in DBT, Airflow, and a bunch of other tools you probably use daily)</p></li></ul><ul><li><p>The technical gap is real: junior engineers are learning from tutorials with toy datasets, not systems that handle petabyte-scale data with complex SLA requirements, data quality checks, and orchestration patterns</p></li></ul><ul><li><p>The consensus: strong fundamentals matter more than tool knowledge (CI/CD pipelines, proper testing, schema management, idempotent workflows - the infrastructure that keeps data pipelines from becoming unmaintainable messes)</p></li></ul><p></p><h3>&#9889; <strong>When SQL Isn&#8217;t Enough (And When It Absolutely Is)</strong></h3><p>A <a href="https://www.reddit.com/r/dataengineering/comments/1nveze1/why_spark_and_many_other_tools_when_sql_can_do/">Reddit thread</a> kicked off with a pointed question: if <strong>Snowflake&#8217;s SQL handles complex transformations (CTEs, window functions, UDFs), why do we need Spark, Airflow, and DBT?</strong> The answers cut through a common architectural confusion.</p><ul><li><p><strong>Spark vs. SQL</strong> is a category error: Spark is a distributed compute engine that happens to support SQL (via Spark SQL), while Snowflake is a cloud data warehouse where SQL is the primary interface (you&#8217;re comparing an execution engine to a storage-compute platform)</p></li></ul><ul><li><p>The <strong>cost-performance trade-off</strong>: Snowflake charges for compute time, so transforming 10TB of raw logs directly in Snowflake can be 5-10x more expensive than preprocessing with Spark on cheaper compute, then loading cleaned data</p></li></ul><ul><li><p><strong>Where Snowflake SQL breaks down</strong>: when you need to read from Kafka streams, hit external APIs mid-transformation, or apply complex ML models that aren&#8217;t SQL-expressible (Snowpark helps, but you&#8217;re still constrained by what the warehouse supports)</p></li></ul><p></p><h3>&#129302; <strong>Flink Gets Agentic: Real-Time AI That Actually Responds to Events</strong></h3><p><a href="https://medium.com/@Joannahe/flink-agents-an-event-driven-ai-agent-framework-based-on-apache-flink-45688be46dad">Flink Agents</a> is a new framework from the <strong>Apache Flink community</strong> that <strong>combines stateful stream processing with LLM-based agents</strong>. If you&#8217;ve been trying to build <strong>AI systems that react to real-time event streams (not batch data)</strong>, this architecture is worth understanding.</p><ul><li><p>The technical setup: Flink handles event ingestion, stateful processing, and exactly-once semantics, while AI agents (LLM-powered) make decisions based on streaming context (think fraud detection that adapts its rules based on emerging patterns, not static thresholds)</p></li></ul><ul><li><p>Why this matters: traditional AI data tools work in request-response mode with static data; <strong>Flink Agents can maintain conversation state across millions of concurrent event streams with sub-second latency</strong> (Flink&#8217;s distributed snapshotting gives you fault tolerance without losing agent state)</p></li></ul><ul><li><p>Real use cases emerging: real-time content moderation for live streams that adapts to context, intelligent alerting systems that reduce false positives by understanding event sequences, automated trading systems that process market data and execute decisions in the same pipeline</p><p></p></li></ul><h3>&#129504; <strong>Text-to-SQL Is Just the Appetizer: Building Production AI Data Analysts</strong></h3><p><a href="https://www.pedronasc.com/articles/lessons-building-ai-data-analyst">Pedro Nascimento&#8217;s deep-dive</a> on building Findly&#8217;s AI data analyst is packed with architectural lessons from taking text-to-SQL from demo to production. His core argument: <strong>the SQL generation is 20% of the problem; the other 80% is context, validation, and multi-step reasoning.</strong></p><ul><li><p><strong>Multi-agent architecture</strong>: separate agents for planning (decompose &#8220;analyze cohort retention&#8221; into specific steps), SQL generation (with compile-time validation), Python execution (for post-query transforms like statistical tests), and result synthesis (the system runs 5-7 LLM calls per complex query, not one)</p></li></ul><ul><li><p><strong>Semantic layer as context</strong>: they use <a href="https://www.pedronasc.com/articles/lessons-building-ai-data-analyst">Malloy</a> to define metrics, joins, and business logic in code, then compile it to optimized SQL (this gives you type-checking and prevents the LLM from hallucinating table relationships - it&#8217;s working with a known schema graph)</p></li></ul><ul><li><p><strong>RAG as a recommendation pipeline</strong>: keyword search (BM25) for exact term matching &#8594; embedding search (dense retrieval) for semantic similarity &#8594; fine-tuned reranker (instruction-following model) to pick the top-k most relevant schema fragments (they found off-the-shelf rerankers underperform by 15-20% without domain fine-tuning)</p></li></ul><ul><li><p><strong>Latency architecture</strong>: fast models (GPT-4o mini) for planning and simple queries, reasoning models (Claude Sonnet, Gemini 2.5 Pro) for complex SQL generation, aggressive caching at every layer (they hit 200-300ms for cached queries, 3-5s for complex uncached analysis)</p></li></ul><p></p><h3>&#128640; <strong>Flotilla: When Spark Can&#8217;t Handle Your Multimodal Data (18x Faster)</strong></h3><p>Daft&#8217;s new <a href="https://www.daft.ai/blog/introducing-flotilla-simplifying-multimodal-data-processing-at-scale">distributed execution engine Flotilla</a> is purpose-built for <strong>workloads that mix structured data with images, videos, PDFs, and audio</strong>. If you&#8217;ve tried processing terabytes of images with Spark and watched it crawl, the architectural choices here are instructive.</p><ul><li><p>Performance claims: <strong>18x faster than Spark and Ray Data</strong> on multimodal benchmarks (they tested on image embedding generation across 10TB+ datasets - the kind of preprocessing needed for training vision models or RAG over images)</p></li></ul><ul><li><p>The technical difference: Spark&#8217;s task scheduler assumes uniform task duration and homogeneous data, which breaks down when some tasks process 100KB images and others handle 50MB videos (Flotilla does <strong>content-aware scheduling and dynamic resource allocation based on actual data characteristics</strong>)</p></li></ul><ul><li><p>Where this matters: ML preprocessing pipelines that need to resize/normalize millions of images, extract frames from videos, run inference models (CLIP embeddings, OCR), and join results with structured metadata (doing this efficiently requires understanding data size distribution, not just partition count)</p></li></ul><ul><li><p><strong>No manual tuning</strong>: Spark typically needs careful partition sizing, memory configuration, and shuffle tuning for multimodal data; Flotilla profiles data characteristics and auto-tunes (which matters when your data engineer doesn&#8217;t want to become a Spark performance expert)</p><p></p></li></ul><h3><strong>&#128161; e6data AI Analyst Early Access</strong></h3><p>We&#8217;re launching <strong><a href="https://forms.gle/5bWwxmyKvRKTqM2o6">early access to e6data AI Analyst</a></strong>, and honestly, it&#8217;s about time someone built a data querying interface that actually understands how humans think about data. Ask questions exactly as they occur to you, get contextual multi-turn conversations, and for once, actually enjoy the follow-up process.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;115d3cdb-75ec-4a3d-8ce3-e777a9bf502e&quot;,&quot;duration&quot;:null}"></div><ul><li><p><strong>95%+ accuracy on enterprise workloads</strong>: Because &#8220;pretty good&#8221; genuinely isn&#8217;t good enough when you&#8217;re dealing with 1000+ tables and the kind of complex relationships that make SQL joins look like abstract art</p></li></ul><ul><li><p><strong>Multi-turn conversations that make sense</strong>: Your data can finally talk back in a way that doesn&#8217;t require translating human curiosity into rigid query syntax</p></li></ul><ul><li><p><strong>Zero migration headaches</strong>: Works with your existing data platform because we know exactly how much &#8220;fun&#8221; those migration projects actually are</p></li></ul><p><strong>&#8594; Get early access <a href="https://forms.gle/5bWwxmyKvRKTqM2o6">here</a></strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Spark Testing Pain, GPU Reliability Reality, Agent Speculation, and the Art of Profiling Pre-optimizing]]></title><description><![CDATA[From 16-byte string optimizations to AI agent workloads: how data engineering is evolving beyond human query patterns, plus the real performance wins that actually matter.]]></description><link>https://newsletter.e6data.com/p/spark-testing-pain-gpu-reliability</link><guid isPermaLink="false">https://newsletter.e6data.com/p/spark-testing-pain-gpu-reliability</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 26 Sep 2025 12:30:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FKVv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FKVv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FKVv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!FKVv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!FKVv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!FKVv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FKVv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Karate Kid' y el renacimiento de una saga - AXN Espa&#241;a&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Karate Kid' y el renacimiento de una saga - AXN Espa&#241;a" title="Karate Kid' y el renacimiento de una saga - AXN Espa&#241;a" srcset="https://substackcdn.com/image/fetch/$s_!FKVv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!FKVv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!FKVv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!FKVv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48c732b9-7589-4e1e-8dfa-861042948926_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image Source: <a href="https://www.axn.es/blog/karate-kid-renacimiento-de-una-saga">Google</a></figcaption></figure></div><h3>&#128295; <strong>German Strings: The 16-Byte Memory Trick That&#8217;s Everywhere Now</strong></h3><p>Remember when string processing was just something you accepted would be slow? The data engineering world has quietly adopted this elegant optimization called <strong><a href="https://www.e6data.com/blog/german-strings-faster-analytics">German Strings</a></strong>, and it&#8217;s delivering <strong>3x faster string comparisons across analytics engines</strong>. The core insight is beautifully simple: pack everything into a fixed 16-byte struct with length, inline prefixes, and buffer references instead of pointer-heavy layouts.</p><ul><li><p><strong>Cache locality wins</strong>: Fixed 16-byte structs mean your CPU can load multiple string headers per cache line (compared to scattered std::string objects at ~24 bytes each)</p></li></ul><ul><li><p><strong>Prefix shortcuts</strong>: Most string comparisons short-circuit on the 4-byte inline prefix before touching full payloads for about 95% of equality checks in practice</p></li></ul><ul><li><p><strong>Zero-copy operations</strong>: Substrings become offset adjustments, not memory copies (which is huge for window functions and text manipulation)</p></li></ul><p>What strikes us is how this &#8220;small&#8221; change ripples through everything from Parquet ingestion to dictionary encoding.</p><p></p><h3>&#129302; <strong>AI Agents Are Breaking Our Data Systems (And That&#8217;s Actually Good)</strong></h3><p>LLM agents query data completely differently than humans. While we write careful, targeted SQL, agents perform what researchers call <strong><a href="https://arxiv.org/pdf/2509.00997">&#8220;agentic speculation&#8221;</a></strong>: high throughput exploratory querying where they might run dozens of variations to figure out what they actually need.</p><ul><li><p><strong>Volume explosion</strong>: Traditional systems optimized for human query patterns (batch, predictable) suddenly face agent workloads that are continuous and exploratory</p></li></ul><ul><li><p><strong>New interface needs</strong>: Agents need different abstractions than SQL, think more like &#8220;give me data to help with X&#8221; rather than &#8220;SELECT specific_columns FROM known_table&#8221;</p></li></ul><ul><li><p><strong>Steerability requirements</strong>: Unlike humans who adapt their queries, agents need systems that can guide them toward efficient query patterns</p></li></ul><p>This feels like one of those paradigm shifts where we&#8217;ll look back and say <strong>&#8220;of course data systems needed to be agent-first.&#8221;</strong> The<a href="https://arxiv.org/pdf/2509.00997"> </a><strong><a href="https://arxiv.org/pdf/2509.00997">paper</a></strong> hints at fascinating research directions around query interfaces designed for AI reasoning rather than human syntax.</p><p></p><h3>&#129514; <strong>Why Spark Unit Testing Feels Like Pulling Teeth (And How Pybujia Might Fix It)</strong></h3><p>That <a href="https://www.reddit.com/r/dataengineering/comments/1nnhtxt/why_dont_data_engineers_unit_test_their_spark_jobs/">Reddit thread</a> about Spark testing hit close to home, didn&#8217;t it? The brutal truth is that <strong>creating DataFrame fixtures is genuinely painful</strong>- you end up with more boilerplate than actual test logic, and debugging multi-table joins in tests becomes its own engineering project.</p><ul><li><p><strong>Fixture fatigue</strong>: Setting up realistic DataFrames for testing often takes longer than writing the actual Spark job (which explains why so many teams skip it)</p></li></ul><ul><li><p><strong>Debug complexity</strong>: When a test fails on a complex transformation, good luck figuring out which of your 47 fixture setup lines caused the issue</p></li></ul><ul><li><p><strong>Markdown magic</strong>: The Pybujia approach of defining table fixtures in Markdown tables is surprisingly elegant- readable test data that doesn&#8217;t require DataFrame constructor gymnastics</p></li></ul><p>What&#8217;s interesting is how this mirrors the broader testing philosophy debate: do we mock everything or test against realistic data? For Spark jobs that inherently deal with data shape and volume, the realistic fixture approach probably wins.</p><p></p><h3>&#9889; <strong>GPU Training Reality Check: H100 vs GB200 Performance vs Reliability</strong></h3><p>SemiAnalysis dropped some sobering hardware truths about the <strong><a href="https://semianalysis.com/2025/08/20/h100-vs-gb200-nvl72-training-benchmarks/">GB200 NVL72 versus H100 comparison</a></strong>. Yes, GB200 shows impressive performance-per-dollar on paper, but reliability issues in large-scale training runs are creating real operational headaches.</p><ul><li><p><strong>Power efficiency gains</strong>: GB200 NVL72 delivers meaningful improvements in cost-per-token metrics, especially for sustained training workloads</p></li></ul><ul><li><p><strong>Reliability tax</strong>: The newer architecture faces stability challenges that can crater your training run after hours or days of progress (ouch)</p></li></ul><ul><li><p><strong>Ecosystem maturity</strong>: Software stack optimization for H100 is simply more mature. Sometimes the &#8220;boring&#8221; choice is the right infrastructure choice</p></li></ul><p>This reinforces something we&#8217;ve seen repeatedly: <strong>breakthrough hardware performance often comes with operational complexity that isn&#8217;t obvious in benchmarks.</strong> For production ML training, reliability might matter more than peak performance when you&#8217;re thinking about wall-clock time to trained model.</p><p></p><h3>&#128161; <strong>Real-World Performance Wins That Actually Moved the Needle</strong></h3><p>This <a href="https://www.reddit.com/r/dataengineering/comments/1mzjnms/what_reallife_changes_have_you_made_that_gave_a/">Reddit discussion</a> about performance improvements in practice revealed some gems. The <strong>most upvoted responses weren&#8217;t about fancy algorithms, they were about fundamentals</strong> like proper partitioning strategies, eliminating unnecessary shuffles, and (shocker) actually profiling before optimizing.</p><ul><li><p><strong>Partitioning precision</strong>: Moving from default hash partitioning to deliberate partition strategies based on actual query patterns (especially for time-series data)</p></li></ul><ul><li><p><strong>Shuffle surgery</strong>: Identifying and eliminating unnecessary shuffles through better join ordering and data locality planning</p></li></ul><ul><li><p><strong>Memory reality checks</strong>: Right-sizing executor memory and actually monitoring GC pressure instead of guessing (which apparently many teams skip)</p></li></ul><p>Multiple engineers mentioned that their biggest wins came from profiling tools revealing bottlenecks they hadn&#8217;t suspected- classic reminder that intuition about performance is often wrong.</p><p></p><h3><strong>&#128161; e6data AI Analyst Early Access</strong></h3><p>We&#8217;re launching <strong><a href="https://forms.gle/5bWwxmyKvRKTqM2o6">early access to e6data AI Analyst</a></strong>, and honestly, it&#8217;s about time someone built a data querying interface that actually understands how humans think about data. Ask questions exactly as they occur to you, get contextual multi-turn conversations, and for once, actually enjoy the follow-up process.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;fbcf826f-033f-4075-8e14-4178dcce74e4&quot;,&quot;duration&quot;:null}"></div><ul><li><p><strong>95%+ accuracy on enterprise workloads</strong>: Because &#8220;pretty good&#8221; genuinely isn&#8217;t good enough when you&#8217;re dealing with 1000+ tables and the kind of complex relationships that make SQL joins look like abstract art</p></li></ul><ul><li><p><strong>Multi-turn conversations that make sense</strong>: Your data can finally talk back in a way that doesn&#8217;t require translating human curiosity into rigid query syntax</p></li></ul><ul><li><p><strong>Zero migration headaches</strong>: Works with your existing data platform because we know exactly how much &#8220;fun&#8221; those migration projects actually are</p></li></ul><p><strong>&#8594; Get early access <a href="https://forms.gle/5bWwxmyKvRKTqM2o6">here</a></strong></p><p></p><h3><strong>Community &amp; Events:</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qi4i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qi4i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 424w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 848w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1272w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qi4i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Streaming + Lakehouse fusion &quot;,&quot;title&quot;:&quot;Streaming + Lakehouse fusion &quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Streaming + Lakehouse fusion " title="Streaming + Lakehouse fusion " srcset="https://substackcdn.com/image/fetch/$s_!qi4i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 424w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 848w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1272w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We are hosting the next <strong><a href="https://www.meetup.com/bangalore-streams/events/310884955/">Lakehouse Days with Bengaluru Streams</a></strong> on data streaming, lakehouse architecture, and the future of real-time analytics this <strong>27th September.</strong></p><p>AND we&#8217;re <strong>hunting for data engineers</strong> who get excited about AI and aren&#8217;t afraid to build the future.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[XTX's 500PB Open Source Lessons, Enterprise RAG Reality, Cachey, and the Art of Performance Engineering]]></title><description><![CDATA[When algorithmic trading meets open source, RAG systems hit enterprise reality, and custom optimization beats general-purpose excellence.]]></description><link>https://newsletter.e6data.com/p/xtxs-500pb-open-source-lessons-enterprise</link><guid isPermaLink="false">https://newsletter.e6data.com/p/xtxs-500pb-open-source-lessons-enterprise</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 19 Sep 2025 13:03:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kI-B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kI-B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kI-B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kI-B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kI-B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kI-B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kI-B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg" width="800" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;SLUMDOG MILLIONAIRE, Dev Patel, 2008. &#169;Fox Searchlight/courtesy Everett Collection&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="SLUMDOG MILLIONAIRE, Dev Patel, 2008. &#169;Fox Searchlight/courtesy Everett Collection" title="SLUMDOG MILLIONAIRE, Dev Patel, 2008. &#169;Fox Searchlight/courtesy Everett Collection" srcset="https://substackcdn.com/image/fetch/$s_!kI-B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kI-B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kI-B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kI-B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c150a9a-6cf9-463e-8400-857a4921c523_800x533.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://www.rollingstone.com/tv-movies/tv-movie-lists/oscar-best-picture-winners-21st-century-ranked-1234685153/million-dollar-baby-2004-2-1234685178/">Rollingstone</a></figcaption></figure></div><h3>&#127959;&#65039; <strong>XTX Markets Just Open-Sourced Their Exabyte-Scale Filesystem (And It's Actually Interesting)</strong></h3><p>When an algorithmic trading firm tells you they've been running <strong>500PB across 40,000 drives "without losing a single byte"</strong> - well, that gets your attention. XTX Markets just <strong><a href="https://www.xtxmarkets.com/tech/2025-ternfs/">open-sourced TernFS</a></strong>, their distributed filesystem, and reading through their technical deep-dive is like getting a masterclass in what large-scale storage actually looks like in practice.</p><ul><li><p><strong>The metadata sharding approach is clever</strong>: 256 logical shards from day one means no rebalancing nightmares when you scale (which is exactly the kind of forward-thinking that separates real engineering from "we'll figure it out later")</p></li></ul><ul><li><p><strong>Immutable files with snapshot protection</strong>: Once written, files can't be changed, but you get automatic protection against the dreaded rm -rf moments (truth!)</p></li></ul><ul><li><p><strong>Purpose-built for substantial files</strong>: Median size of 2MB means this isn't trying to be everything to everyone - it's optimized for the data engineering reality where you're actually processing meaningful datasets</p></li></ul><p>A good way to think about this is that TernFS acknowledges the brutal economics of large-scale data storage and builds solutions around those constraints rather than pretending they don't exist. The firm started with "a couple of desktops and an NFS server" and ended up needing to store hundreds of petabytes - which is probably a more relatable scaling journey than most of us would like to admit.</p><div><hr></div><h3>&#128202; <strong>Databricks Shares Their Database Reliability Playbook (The Honest Version)</strong></h3><p>Database reliability conversations usually involve a lot of uptime percentages and theoretical guarantees, but this <a href="https://www.databricks.com/blog/databricks-databricks-scaling-database-reliability?utm_source=substack&amp;utm_medium=email">Databricks piece</a> cuts through that to talk about what actually works when your database is the nervous system of someone's business operations (which is terrifying when you think about it).</p><ul><li><p><strong>Monitoring that actually helps</strong>: They focus on capturing the full context of failures, not just error counts - because knowing something broke is utterly useless without understanding why it broke</p></li></ul><ul><li><p><strong>Automated recovery with human judgment</strong>: Smart enough to handle the obvious cases automatically, but wise enough to escalate the weird edge cases to humans who can actually think through novel problems</p></li></ul><ul><li><p><strong>Making reliability a cultural priority</strong>: Treating reliability as a first-class engineering concern rather than something you retrofit later (which should be obvious but apparently isn't)</p></li></ul><p>What's interesting here is the implicit acknowledgment that reliability engineering isn't fundamentally a technical problem - it's an organizational one where technology is just the implementation mechanism. You can have perfect monitoring and automated failover, but if your team culture doesn't prioritize reliability, you'll still find creative ways to break things.</p><div><hr></div><h3>&#9889; <strong>Cachey: A Read-Through Cache That Actually Understands Object Storage</strong></h3><p>Object storage is fantastic until you need to read the same blob repeatedly, at which point you remember why caching was invented in the first place. <strong><a href="https://github.com/s2-streamstore/cachey">Cachey</a></strong> is a new open-source read-through <strong>cache designed specifically for S3-compatible storage</strong>, and it has that refreshing quality of solving exactly one problem really well.</p><ul><li><p><strong>Hybrid memory-disk caching using the Foyer library</strong>: Intelligent about what goes where based on access patterns, so your frequently accessed data stays fast without your cache becoming a memory hog</p></li></ul><ul><li><p><strong>S3-compatible everything</strong>: Works with any S3-like storage and includes a /fetch API for pre-signed URLs (which is exactly how you'd want to integrate this into existing workflows)</p></li></ul><ul><li><p><strong>Built for immutable blobs</strong>: Acknowledges that most object storage use cases involve data that doesn't change, so the cache can be much more aggressive about retention</p></li></ul><p>This falls into that category of tools that make you think "why didn't this exist already?" The answer, of course, is that building good caching is harder than it looks, but when someone gets it right, it feels obvious in retrospect.</p><div><hr></div><h3>&#129302; <strong>Building RAG Systems at Enterprise Scale: The Brutal Realities Nobody Discusses</strong></h3><p>This <a href="https://www.reddit.com/r/dataengineering/comments/1nj27gt/building_rag_systems_at_enterprise_scale_our/">Reddit thread</a> is one of those refreshingly honest discussions that cuts through the RAG evangelism to talk about what actually happens when you try to implement <strong>RAG in banking, pharma, and legal environments</strong>. Spoiler: it's considerably messier than the conference talks suggest.</p><ul><li><p><strong>OCR noise is the silent productivity killer</strong>: Real enterprise documents aren't clean markdown files - they're scanned PDFs with inconsistent formatting that systematically destroys your chunking strategies</p></li></ul><ul><li><p><strong>Metadata becomes mission-critical</strong>: Enterprise documents have complex relationships and hierarchical structures that simple vector similarity completely misses (which explains why your retrieval quality is mysteriously terrible)</p></li></ul><ul><li><p><strong>Domain-specific chunking strategies are non-negotiable</strong>: Generic text splitting fails spectacularly when dealing with legal contracts or financial reports that have meaningful structural boundaries</p></li></ul><p>The honest truth about RAG in production environments is that roughly 80% of your engineering effort goes into data quality and preprocessing work, not the sophisticated ML components that get all the conference attention. It's unglamorous work, but it's the difference between a demo and a system that actually works.</p><div><hr></div><h3>&#128295; <strong>Open-Source Ingestion Tools in 2025: What's Actually Working According to Practitioners</strong></h3><p>This<a href="https://www.reddit.com/r/dataengineering/comments/1ng9w5e/whats_your_opensource_ingest_tool_these_days/"> community discussion</a> reveals fascinating patterns about <strong>which open-source ingestion tools are winning in practice</strong>. The responses tell a story about how the data tooling landscape has matured (and where it definitely hasn't).</p><ul><li><p><strong>dlthub and DLT are gaining serious momentum</strong>: Simple, well-documented tools are consistently beating feature-heavy platforms because most teams just want reliable data movement without vendor complications</p></li></ul><ul><li><p><strong>DuckDB keeps appearing in unexpected contexts</strong>: Its combination of performance and operational simplicity makes it a compelling choice for transform-heavy ingestion workflows</p></li></ul><ul><li><p><strong>Direct database integration remains surprisingly popular</strong>: Many teams are bypassing specialized ingestion tools entirely and going straight to PostgreSQL (which says something interesting about complexity creep in data tooling)</p></li></ul><p>The pattern emerging here is that teams are gravitating toward boring, reliable solutions over exciting, complex ones - which is probably a healthy sign that the industry is maturing beyond the "let's rebuild everything" phase.</p><div><hr></div><h3>&#128161; <strong>e6data AI Analyst Early Access</strong></h3><p>We're launching <strong><a href="https://forms.gle/5bWwxmyKvRKTqM2o6">early access to e6data AI Analyst</a></strong>, and honestly, it's about time someone built a data querying interface that actually understands how humans think about data. Ask questions exactly as they occur to you, get contextual multi-turn conversations, and for once, actually enjoy the follow-up process.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;4a1b5caf-05e4-472d-af3b-1a3504b1ca75&quot;,&quot;duration&quot;:null}"></div><ul><li><p><strong>95%+ accuracy on enterprise workloads</strong>: Because "pretty good" genuinely isn't good enough when you're dealing with 1000+ tables and the kind of complex relationships that make SQL joins look like abstract art</p></li></ul><ul><li><p><strong>Multi-turn conversations that make sense</strong>: Your data can finally talk back in a way that doesn't require translating human curiosity into rigid query syntax</p></li></ul><ul><li><p><strong>Zero migration headaches</strong>: Works with your existing data platform because we know exactly how much "fun" those migration projects actually are</p></li></ul><p><strong>&#8594; Get early access <a href="https://forms.gle/5bWwxmyKvRKTqM2o6">here</a></strong></p><p></p><h3><strong>Community &amp; Events:</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qi4i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qi4i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 424w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 848w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1272w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qi4i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Streaming + Lakehouse fusion &quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Streaming + Lakehouse fusion " title="Streaming + Lakehouse fusion " srcset="https://substackcdn.com/image/fetch/$s_!qi4i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 424w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 848w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1272w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p>The team published a new technical read on<a href="https://www.e6data.com/blog/partition-projection-data-lakehouse"> </a><strong><a href="https://www.e6data.com/blog/german-strings-faster-analytics">German Strings: The 16-Byte Secret to Faster Analytics</a></strong>!</p></li><li><p>We are hosting the next <strong><a href="https://www.meetup.com/bangalore-streams/events/310884955/">Lakehouse Days with Bengaluru Streams</a></strong> on data streaming, lakehouse architecture, and the future of real-time analytics this <strong>27th September.</strong></p></li><li><p>You'll also find us at: <strong><a href="https://www.bigdataldn.com/en-gb.html">Big Data London</a></strong> (24-25 Sep, 2025)</p></li></ol><p>AND we're <strong>hunting for data engineers</strong> who get excited about AI and aren't afraid to build the future. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Rust's Perf Survey'25, AI eating code, Apache Wayang, and Data Engineering's Wall Street Moment]]></title><description><![CDATA[When performance matters more than preferences: the evolving reality of data engineering work with AI.]]></description><link>https://newsletter.e6data.com/p/rusts-perf-survey25-ai-eating-code</link><guid isPermaLink="false">https://newsletter.e6data.com/p/rusts-perf-survey25-ai-eating-code</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 12 Sep 2025 12:59:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OERv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OERv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OERv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg 424w, https://substackcdn.com/image/fetch/$s_!OERv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg 848w, https://substackcdn.com/image/fetch/$s_!OERv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!OERv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OERv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg" width="1440" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Moneyball Review | Movie - Empire&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Moneyball Review | Movie - Empire" title="Moneyball Review | Movie - Empire" srcset="https://substackcdn.com/image/fetch/$s_!OERv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg 424w, https://substackcdn.com/image/fetch/$s_!OERv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg 848w, https://substackcdn.com/image/fetch/$s_!OERv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!OERv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3f612ed-da75-4af3-a00b-0a4dc2d47a11_1440x810.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image credits: <a href="https://www.google.com/url?sa=i&amp;url=https%3A%2F%2Fwww.empireonline.com%2Fmovies%2Freviews%2Fmoneyball-review%2F&amp;psig=AOvVaw3Ak9Ac5nztMiWS6anO31CT&amp;ust=1757674573974000&amp;source=images&amp;cd=vfe&amp;opi=89978449&amp;ved=0CBUQjRxqFwoTCMDRj5rG0I8DFQAAAAAdAAAAABAM">Google</a></figcaption></figure></div><h3>&#128640; <strong>Rust's Build Performance Reality Check: 3,700 Developers Weigh In</strong></h3><p>The <a href="https://blog.rust-lang.org/2025/09/10/rust-compiler-performance-survey-2025-results/">Rust community</a> just dropped some serious insights about compile times, and spoiler alert: it's complicated. The 2025 Compiler Performance Survey pulled in over <strong>3,700 responses</strong> with an <strong>average satisfaction rating of 6/10.</strong></p><ul><li><p><strong>Incremental rebuilds are the biggest pain point</strong> - 55% of developers wait more than <strong>10 seconds</strong> for rebuilds (ouch!)</p></li></ul><ul><li><p><strong>Workspace dependencies create unnecessary cascade rebuilds</strong> - change one crate, recompile everything downstream</p></li></ul><ul><li><p><strong>Linking is always from scratch</strong> - no incremental magic here, but LLD linker adoption is coming to help</p></li></ul><p>The fascinating part? Build experience varies wildly across workflows. Some developers love Rust's performance compared to C++, while others point enviously at Go's speed. What's clear is that optimizing for different workflows requires completely different solutions (which explains why this is so hard to solve universally).</p><p></p><h3>&#9889; <strong>From 8 Days to 90 Minutes: The 75GB CSV Streaming Success Story</strong></h3><p>A <a href="https://www.reddit.com/r/dataengineering/comments/1ncs01c/how_i_streamed_a_75gb_csv_into_sql_without/">data engineer just shared their journey</a> of ingesting a massive 75GB CSV into SQL Server without melting their laptop. </p><ul><li><p><strong>Java's InputStream + BufferedReader combo proved to be the hero</strong> - streaming without loading everything into memory</p></li></ul><ul><li><p><strong>Batching and parallel ingestion techniques</strong> cut processing time from 8 days to 90 minutes per file</p></li></ul><ul><li><p><strong>Tool selection matters more than you think</strong> - sometimes the "data science" tools aren't the right hammer for every nail</p></li></ul><p>This reinforces something we see repeatedly: when dealing with truly large datasets, stepping outside your comfort zone language-wise can yield dramatic performance improvements (and will be fun).</p><p></p><h3>&#129302; <strong>"70% of My Workload is All Used by AI" - The New Data Engineering Reality</strong></h3><p>The<a href="https://www.reddit.com/r/dataengineering/comments/1nd2r08/70_of_my_workload_is_all_used_by_ai/"> </a><strong><a href="https://www.reddit.com/r/dataengineering/comments/1nd2r08/70_of_my_workload_is_all_used_by_ai/">r/dataengineering community</a></strong> is buzzing with a confession that hits close to home: most data engineering work is now AI-driven. This isn't just about building ML pipelines anymore.</p><ul><li><p><strong>AI systems are becoming the primary consumers of data infrastructure</strong> - not just dashboards and reports</p></li></ul><ul><li><p><strong>Data engineers are evolving into AI enablement specialists</strong> whether we planned for it or not</p></li></ul><ul><li><p><strong>The skills gap is real</strong> - traditional ETL knowledge needs to expand into ML ops, vector databases, and real-time inference pipelines</p></li></ul><p>A good way to think about this is: if your data architecture wasn't designed with AI workloads in mind, you're probably going to need some retrofitting soon (if you haven't already started).</p><p></p><h3>&#128200; <strong>Oracle Hits Record Highs as Wall Street Discovers Data Engineering</strong></h3><p><strong><a href="https://www.cnbc.com/2025/09/10/oracle-stock-cloud-backlog-ai.html">Oracle's stock</a></strong> is shattering records, and the Street is <strong>attributing it directly to the AI and data engineering boom</strong> across enterprises. This isn't just another tech stock story.</p><ul><li><p><strong>Enterprise data modernization is driving serious investment</strong> - companies are finally putting real money behind their data strategies</p></li></ul><ul><li><p><strong>Data engineering roles are becoming strategic, not just operational</strong> - we're moving from "keep the lights on" to "enable the business transformation"</p></li></ul><ul><li><p><strong>The talent premium is real</strong> - when your stock price depends on data capabilities, you pay up for the people who can deliver them</p></li></ul><p>What's interesting here is how quickly the narrative has shifted from "data engineering is a cost center" to "data engineering is a competitive advantage" (and Wall Street has noticed).</p><p></p><h3>&#128260; <strong>Apache Wayang: The Federation Play for Multi-Engine Processing</strong></h3><p><strong><a href="https://wayang.apache.org/">Apache Wayang</a></strong> (still incubating) is making some bold claims about federated data processing - up to 150x faster than centralized platforms by keeping data in place.</p><ul><li><p><strong>Application independence across processing engines</strong> - write once, run on Spark, Flink, PostgreSQL, Java Streams, or whatever</p></li></ul><ul><li><p><strong>In-situ processing philosophy</strong> - avoid the expensive data movement that kills performance in traditional architectures</p></li></ul><ul><li><p><strong>Minimal code changes for engine switching</strong> - the holy grail of not being locked into specific processing frameworks</p></li></ul><p>This feels like a response to the very real pain of vendor lock-in and the complexity of managing multiple processing engines in modern data stacks. Time will tell if the federation approach delivers on its promises, but the problem it's solving is definitely real.</p><div><hr></div><h3><strong>&#128161; Meet e6data AI Analyst (Early Access)</strong></h3><p>Ask questions exactly as you think them, get contextual multi-turn conversations, and actually enjoy the follow-up process for once. Early access here.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kE0T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kE0T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png 424w, https://substackcdn.com/image/fetch/$s_!kE0T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png 848w, https://substackcdn.com/image/fetch/$s_!kE0T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png 1272w, https://substackcdn.com/image/fetch/$s_!kE0T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kE0T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:646924,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/173343512?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kE0T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png 424w, https://substackcdn.com/image/fetch/$s_!kE0T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png 848w, https://substackcdn.com/image/fetch/$s_!kE0T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png 1272w, https://substackcdn.com/image/fetch/$s_!kE0T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F000c355f-1204-4817-bd2e-df0e3ffeb223_2910x1522.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>95%+ accuracy on enterprise workloads</strong> - because "pretty good" isn't good enough when you're dealing with 1000+ tables and complex relationships</p></li></ul><ul><li><p><strong>Multi-turn conversations</strong>- make your data talk back</p></li><li><p><strong>Zero migration required</strong> - works with your existing data platform (we know how much fun those migration projects are)</p></li></ul><p></p><h3><strong>Community &amp; Events:</strong></h3><ol><li><p>The team published a new technical deep dive on<a href="https://www.e6data.com/blog/partition-projection-data-lakehouse"> </a><strong><a href="https://www.e6data.com/blog/partition-projection-data-lakehouse">Partition Projection in Data Lakehouses</a></strong> this week!</p></li><li><p>Also, we are hosting the next <strong><a href="https://www.meetup.com/bangalore-streams/events/310884955/">Lakehouse Days with Bengaluru Streams</a></strong> on data streaming, lakehouse architecture, and the future of real-time analytics this <strong>27th September.</strong> </p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qi4i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qi4i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 424w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 848w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1272w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qi4i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Streaming + Lakehouse fusion &quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Streaming + Lakehouse fusion " title="Streaming + Lakehouse fusion " srcset="https://substackcdn.com/image/fetch/$s_!qi4i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 424w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 848w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1272w, https://substackcdn.com/image/fetch/$s_!qi4i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a1f4fcd-062b-40c7-99d3-c5fe6adbdf87_2245x1263.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol start="3"><li><p>You'll also find us at:</p></li></ol><ul><li><p><strong><a href="https://www.bigdataldn.com/en-gb.html">Big Data London</a></strong> (24-25 Sep, 2025)</p></li><li><p><strong><a href="https://www.databricks.com/dataaisummit/worldtour?region=apac">Databricks World Tour Mumbai</a></strong> (19 Sep, 2025)<br></p></li></ul><p>AND we're <strong>hunting for data engineers</strong> who get excited about AI and aren't afraid to build the future. No corporate speak here - just cool problems and smart people to solve them with.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Fluss Fast-Tracks, Rust's Learning Gains, Parquet's Duality, Kafka-Iceberg Paths, and Snowflake Costs]]></title><description><![CDATA[This week we explore the latest emerging formats and projects in data engineering with their real-life implications at scale.]]></description><link>https://newsletter.e6data.com/p/fluss-fast-tracks-rusts-learning</link><guid isPermaLink="false">https://newsletter.e6data.com/p/fluss-fast-tracks-rusts-learning</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 05 Sep 2025 13:02:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0Ccj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Ccj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Ccj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0Ccj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0Ccj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0Ccj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Ccj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg" width="1456" height="962" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:962,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;As Seen on 'Good Will Hunting': A Winslow Homer Rip-Off by Gus Van Sant&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="As Seen on 'Good Will Hunting': A Winslow Homer Rip-Off by Gus Van Sant" title="As Seen on 'Good Will Hunting': A Winslow Homer Rip-Off by Gus Van Sant" srcset="https://substackcdn.com/image/fetch/$s_!0Ccj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0Ccj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0Ccj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0Ccj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67920f90-2535-4dbd-9a97-b53f17371d63_2092x1382.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image Credits: Google (Good Will Hunting)</figcaption></figure></div><h3>&#128293; <strong>Apache Fluss: Flink's Fast New Table Storage Engine That Actually Gets Changelog Right</strong></h3><p><a href="https://jack-vanlightly.com/blog/2025/9/2/understanding-apache-fluss">Alibaba's newest contribution to the Apache ecosystem</a> tackles the painful reality that <strong>even the best lakehouse formats like Paimon just aren't fast enough for real-time data</strong> engineering (truth!).</p><ul><li><p>Fluss provides <strong>dual-tier architecture</strong> with RocksDB-backed tablet servers for hot data and tiering to Paimon for historical storage (finally, someone who understands that object storage alone isn't enough for microsecond-latency use cases)</p></li></ul><ul><li><p>Primary key tables now generate <strong>efficient changelogs</strong> without the lookup hell that makes Paimon painful for high-throughput streaming</p></li></ul><ul><li><p>Client-side stitching intelligently merges real-time and historical data, giving <strong>Flink jobs a unified view</strong> without the complexity of managing multiple storage layers</p></li></ul><p>What's fascinating is how Fluss essentially admits that the <strong>"one table format to rule them all" dream is dead</strong> - sometimes you need speed, sometimes you need scale, and the magic happens in making them work together seamlessly.</p><div><hr></div><h3>&#129408; <strong>The Rust Productivity Paradox: Why Fighting the Compiler Actually Makes You Faster</strong></h3><p>That brutal initial <a href="http://@https://lubeno.dev/blog/rusts-productivity-curve">learning curve with Rust's ownership model</a> isn't a bug - it's the feature that eventually makes you surprisingly productive</p><ul><li><p>Rust forces you to think about memory safety and concurrency upfront, eliminating entire classes of runtime bugs that usually haunt production systems</p></li></ul><ul><li><p>The ownership model creates surprisingly maintainable codebases where refactoring doesn't feel like defusing a bomb</p></li></ul><ul><li><p>Strong typing and pattern matching reduce cognitive load once you internalize the patterns, making complex data transformations more readable than traditional imperative code</p></li></ul><p><strong>For data engineers tired of mysterious Spark crashes and memory leaks in long-running pipelines, Rust might be worth the investment -</strong> or so does our e6data team champion now.</p><div><hr></div><h3>&#9889; <strong>Kafka-to-Iceberg Integration: Three Paths, Each With Its Own Gotchas</strong></h3><p><a href="https://rmoff.net/2025/08/18/kafka-to-iceberg-exploring-the-options/">Robin Moffatt</a> breaks down the eternal data engineering question: <strong>how do you get streaming data from Kafka into your lakehouse without losing your sanity (or your data quality)?</strong></p><ul><li><p><strong>Flink SQL</strong> provides the most control but requires managing yet another streaming runtime and understanding Flink's occasionally mystical state management</p></li></ul><ul><li><p><strong>Kafka Connect</strong> offers operational simplicity but can struggle with complex transformations and schema evolution (great until you need to do anything beyond basic ETL)</p></li></ul><ul><li><p><strong>Confluent's Tableflow</strong> promises managed convenience but locks you into their ecosystem and pricing model (the classic build-vs-buy decision with modern cloud economics)</p></li></ul><p>Seems like here there's no silver bullet - your choice depends entirely on whether you value operational simplicity, transformation flexibility, or vendor independence most.</p><div><hr></div><h3>&#128202; <strong>The Two Parquets: Why Your Files Might Not Play Nice Together</strong></h3><p><a href="https://www.jeronimo.dev/the-two-versions-of-parquet/">Jer&#243;nimo L&#243;pez</a> exposes an uncomfortable truth about the Parquet ecosystem - we're living in a world of format fragmentation that most data engineers don't even realize exists.</p><ul><li><p><strong>Parquet v1 and v2 aren't just version numbers</strong> - they represent fundamentally different encoding schemes that affect both performance and compatibility across tools</p></li></ul><ul><li><p><strong>Many popular engines still default to v1 for compatibility</strong>, leaving performance gains on the table (looking at you, Spark with your conservative defaults)</p></li></ul><ul><li><p>The <strong>ecosystem's partial adoption</strong> creates invisible data pipeline bottlenecks where different tools read the same files with wildly different performance characteristics</p></li></ul><p>This is a perfect example of why understanding your file formats matters - that "simple" Parquet file might be the reason your queries are mysteriously slow, and switching versions could be a free <strong>2x performance improvement.</strong></p><div><hr></div><h3>&#128184; <strong>Snowflake: When Your Data Warehouse Bill Exceeds Your AWS Compute</strong></h3><p>A candid <a href="https://www.reddit.com/r/snowflake/comments/1n78hxj/snowflake_costs_are_killing_our_logistics_margins/">Reddit discussion</a> about logistics margins being devoured by Snowflake costs reveals the hidden complexity of cloud data warehouse economics</p><ul><li><p>Real-world usage patterns often trigger <strong>expensive auto-scaling and clustering</strong> that can turn a reasonable monthly bill into budget-busting surprises</p></li></ul><ul><li><p>The community suggests aggressive query optimization, warehouse right-sizing, and even considering hybrid architectures to regain cost control (We write about it <a href="https://www.e6data.com/query-and-cost-optimization-hub/snowflake-query-optimization">here</a>)</p></li></ul><ul><li><p>Many organizations discover too late that Snowflake's consumption model works great for predictable workloads but can be <a href="https://www.e6data.com/query-and-cost-optimization-hub/snowflake-cost-optimization-15-proven-tactics-to-cut-your-snowflake-cost">financially catastrophic for spiky, exploratory analytics</a></p></li></ul><p>This thread is a sobering reminder that in the cloud era, <strong>architectural decisions have direct P&amp;L impact</strong> - and that understanding your cost model is as important as understanding your query performance.</p><div><hr></div><h3>&#128161; <strong>We Built the Cost Optimization Hub Your Teams Been Demanding </strong></h3><p><strong><a href="https://www.e6data.com/query-and-cost-optimization-hub">Our Query and Cost Optimization Hub</a></strong> - a comprehensive guide that actually tells you how to optimize your compute engine(s) for max output. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n70W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n70W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png 424w, https://substackcdn.com/image/fetch/$s_!n70W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png 848w, https://substackcdn.com/image/fetch/$s_!n70W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png 1272w, https://substackcdn.com/image/fetch/$s_!n70W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n70W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png" width="1456" height="828" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:828,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:599965,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/172763373?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n70W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png 424w, https://substackcdn.com/image/fetch/$s_!n70W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png 848w, https://substackcdn.com/image/fetch/$s_!n70W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png 1272w, https://substackcdn.com/image/fetch/$s_!n70W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bee6eb9-f126-43b3-8bf5-5b76ac10ec12_2886x1642.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Multi-engine coverage</strong> across Snowflake, Databricks, BigQuery, Redshift, and more - because your stack is probably more fragmented than you'd like to admit, and optimization is necessary</p></li></ul><ul><li><p><strong>Beginner to advanced techniques</strong> with actual code examples, not just vague suggestions about "right-sizing your clusters" (looking at you, vendor documentation)</p></li></ul><h3><strong>Community &amp; Events:</strong></h3><p>While we don't have specific CFPs to announce this time, keep an eye on the usual suspects -<strong> <a href="https://www.datacouncil.ai/data-council-2026">DataCouncil</a></strong>, and the regional meetups where the best conversations happen in the hallway track anyway.</p><p>You'll also find us at:</p><ul><li><p><strong><a href="https://www.bigdataldn.com/en-gb.html">Big Data London</a></strong> (24-25 Sep, 2025)</p></li><li><p><strong><a href="https://www.databricks.com/dataaisummit/worldtour?region=apac">Databricks World Tour Mumbai</a></strong> (19 Sep, 2025)</p></li></ul><p>Don't miss <strong>Bengaluru Streams x Lakehouse Days on 27 Sep</strong> - <a href="https://luma.com/Lakehouse-days-with-e6data">subscribe to our calendar</a> for registration. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Microsoft Fabric SQL Mirroring, BI Chaos, Medallion Architecture, Wide Tables for Warehouses, and AI for Enterprises]]></title><description><![CDATA[This week's reality checks from the data engineering frontlines on Text-to-SQL, wide tables, Medallion architecture, Microsoft Fabric, and more (plus where to find us IRL)]]></description><link>https://newsletter.e6data.com/p/bi-chaos-microsoft-fabric-sql-mirroring</link><guid isPermaLink="false">https://newsletter.e6data.com/p/bi-chaos-microsoft-fabric-sql-mirroring</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Tue, 02 Sep 2025 13:03:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1k2X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1k2X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1k2X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1k2X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1k2X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1k2X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1k2X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Interstellar: Ideas in Multitudes | Film Talk&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Interstellar: Ideas in Multitudes | Film Talk" title="Interstellar: Ideas in Multitudes | Film Talk" srcset="https://substackcdn.com/image/fetch/$s_!1k2X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1k2X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1k2X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1k2X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dc95bb4-33ac-4168-b7fa-7993dbf3ccb4_1920x1200.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image Credits: Google Images</figcaption></figure></div><h3>&#128202; <strong>Wide Tables: The Performance Paradox in Modern Data Warehouses</strong></h3><p>Turns out even cloud-native warehouses have their limits when it comes to handling tables with hundreds of columns - who would have thought more isn't always better? Let&#8217;s say what this <a href="https://www.reddit.com/r/dataengineering/comments/1n2qrgu/do_modern_data_warehouses_struggle_with_wide/">Reddit discussion</a> says:</p><ul><li><p><strong>Column count matters more than row count</strong> for query performance, especially when your SELECT * queries start timing out (looking at you, analysts who never learned to specify columns)</p></li></ul><ul><li><p><strong>Indexing and partitioning strategies</strong> can save wide tables from themselves, but proper schema design beats optimization band-aids every time</p></li></ul><ul><li><p><strong>The normalization vs denormalization debate</strong> continues - wide tables offer query simplicity but at the cost of maintenance complexity and storage efficiency</p></li></ul><p>Modern doesn't mean magical - even Snowflake and BigQuery have physics to contend with, so design your schemas thoughtfully rather than throwing everything into one massive table.</p><p></p><h3>&#128260; <strong>SQL Server Mirroring: A Journey Through On-Prem to Cloud Reality</strong></h3><p>One <a href="https://www.reddit.com/r/MicrosoftFabric/comments/1n2d0yt/mirroring_an_onprem_sql_server_my_story/">brave soul documented</a> their <strong>Microsoft Fabric</strong> <strong>mirroring</strong> adventure, complete with the inevitable "it's more complicated than the documentation suggests" moments we all know and love.</p><ul><li><p><strong>Thorough testing isn't optional</strong> - what works in dev might fail spectacularly in production with real data volumes and network constraints (Murphy's Law applies double to data replication)</p></li></ul><ul><li><p><strong>Unexpected downtime happens</strong> even with the best planning, so having rollback strategies and communication plans ready isn't paranoia, it's professionalism</p></li></ul><ul><li><p><strong>Real-world accounts beat vendor demos</strong> every time - this kind of honest experience sharing helps the community avoid the same pitfalls (we need more of this transparency)</p></li></ul><p>Migration stories like this are worth their weight in gold because they show the messy reality behind glossy case studies (plus they remind us we're not alone in fighting these battles).</p><p></p><h3>&#129302; <strong>ToolFront: Open Source Text-to-SQL That Actually Works</strong></h3><p>Finally, <a href="https://www.reddit.com/r/dataengineering/comments/1n46piw/i_opensourced_a_text2sql_rag_for_all_your/">someone built a text-to-SQL tool that doesn't hallucinate database schemas</a> - this open-source Python library provides AI agents with safe, read-only access to understand and query your actual database structure.</p><ul><li><p><strong>Schema hallucination is a real problem</strong> when AI models invent table names and columns that don't exist (we've all seen those confidently wrong SQL queries)</p></li></ul><ul><li><p><strong>Read-only by design</strong> means you can let AI explore without worrying about accidental data modification or deletion (security through architecture, not just permissions)</p></li></ul><ul><li><p><strong>Bridging natural language and SQL</strong> effectively could democratize data access for non-technical users while keeping data teams in the loop</p></li></ul><p>This feels like a genuine step toward making AI useful for database interactions rather than just impressive demos.</p><p></p><h3>&#127919; <strong>When Your BI Users Go Rogue (And How to Manage the Chaos)</strong></h3><p>Ever wonder what happens when business users start creating their own reports without guardrails? This <a href="https://www.reddit.com/r/dataengineering/comments/1n1hy4g/how_do_you_handle_your_bi_setup_when_users/">Reddit thread</a> dives into the eternal struggle between user autonomy and data governance (spoiler: it's messier than your staging tables).</p><ul><li><p><strong>Self-service can backfire spectacularly</strong> when users don't understand the underlying data model (leading to those "why don't our numbers match?" conversations)</p></li></ul><ul><li><p><strong>The consensus points toward controlled flexibility</strong> - give users sandbox environments and clear training rather than locking everything down (nobody wants to be the data team that says "no" to everything)</p></li></ul><ul><li><p><strong>Dynamic datasets beat one-off reports</strong> - building reusable, parameterized datasets serves multiple users better than custom builds for every stakeholder request</p></li></ul><p>The sweet spot isn't choosing between control and chaos, but designing systems that let users explore safely while keeping your sanity intact (trust me, future you will thank present you).</p><p></p><h3>&#127959;&#65039; <strong>The Medallion Architecture Farce: When Marketing Meets Data Modeling</strong></h3><p>This brutal takedown from <a href="https://www.confessionsofadataguy.com/the-medallion-architecture-farce/">Confessions of a Data Guy</a> pulls no punches about Databricks' "medallion architecture" - calling it: rebranded data warehousing concepts with shinier marketing.</p><ul><li><p><strong>Bronze/Silver/Gold is just RAW -&gt; FACT/DIM with extra steps</strong> (and extra storage costs that conveniently benefit your cloud provider)</p></li></ul><ul><li><p><strong>The confusion is real</strong> - Reddit threads still ask "what's the difference between Silver and Gold?" because the distinctions are genuinely unclear (when seasoned engineers can't explain it simply, something's wrong)</p></li></ul><ul><li><p><strong>Three decades of proven data warehousing patterns</strong> work just fine in your lakehouse - no need to invent new terminology for staging, transformation, and marts</p></li></ul><p>Sometimes the emperor really has no clothes, and this piece reminds us to steel-man vendor propositions before adopting their latest architectural innovations.</p><div><hr></div><h3><strong>Community &amp; Events:</strong></h3><p>We recently hosted an<a href="https://www.linkedin.com/posts/pranav-a-a029921b1_sre-sre-meetup-activity-7368130502072991744-brpf?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAACnF3gsBj1XmPTNO09ii5XVPJVgsplAPAWs"> </a><strong>SRE meetup right here at the e6data office</strong> where Pranav from our team spoke about <strong><a href="https://www.linkedin.com/posts/pranav-a-a029921b1_sre-sre-meetup-activity-7368130502072991744-brpf?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAACnF3gsBj1XmPTNO09ii5XVPJVgsplAPAWs">"Battle-Tested GitOps with ArgoCD"</a></strong> to a packed house. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!erow!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!erow!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg 424w, https://substackcdn.com/image/fetch/$s_!erow!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg 848w, https://substackcdn.com/image/fetch/$s_!erow!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!erow!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!erow!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg" width="1456" height="820" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;No alternative text description for this image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="No alternative text description for this image" title="No alternative text description for this image" srcset="https://substackcdn.com/image/fetch/$s_!erow!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg 424w, https://substackcdn.com/image/fetch/$s_!erow!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg 848w, https://substackcdn.com/image/fetch/$s_!erow!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!erow!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abeb749-e46e-4c28-bb86-52aca429f58e_2048x1153.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Also, </strong>We're hitting the road (and the skies) this season! You'll find us at </p><ul><li><p><strong><a href="https://www.bigdataldn.com/en-gb.html">Big Data London</a></strong> (24-25 Sep, 2025)</p></li><li><p><strong><a href="https://www.databricks.com/dataaisummit/worldtour?region=apac">Databricks World Tour Mumbai</a></strong> (19 Sep, 2025)</p></li></ul><p>Two events where the data engineering community gathers to share war stories and debate the merits of yet another lakehouse architecture. </p><p><strong>We&#8217;re growing! Explore open engineering roles &#8594; <a href="https://www.e6data.com/careers">here</a></strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[1.4T-Event Spotify Dashboard Machine, Perfect Query Plans, Databricks Pro Playbook, and Agents for Data Analysts]]></title><description><![CDATA[Real-world experience to ace Databricks DE Pro, How Spotify handles their dashboards, resilient scraping tactics, why &#8220;optimal&#8221; query plans mislead, and AI agents for enterprise data.]]></description><link>https://newsletter.e6data.com/p/14t-event-spotify-dashboard-machine</link><guid isPermaLink="false">https://newsletter.e6data.com/p/14t-event-spotify-dashboard-machine</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 22 Aug 2025 12:31:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!OE7K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OE7K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OE7K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg 424w, https://substackcdn.com/image/fetch/$s_!OE7K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg 848w, https://substackcdn.com/image/fetch/$s_!OE7K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!OE7K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OE7K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Ford v Ferrari (2019) directed by James Mangold &#8226; Reviews, film + cast &#8226;  Letterboxd&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Ford v Ferrari (2019) directed by James Mangold &#8226; Reviews, film + cast &#8226;  Letterboxd" title="Ford v Ferrari (2019) directed by James Mangold &#8226; Reviews, film + cast &#8226;  Letterboxd" srcset="https://substackcdn.com/image/fetch/$s_!OE7K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg 424w, https://substackcdn.com/image/fetch/$s_!OE7K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg 848w, https://substackcdn.com/image/fetch/$s_!OE7K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!OE7K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb037dd-f917-49c2-9844-f7dbd473b565_1200x675.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A tribute to the iconic scene from &#8220;Ford vs Ferrari&#8221; this week! {Image sources: Google}</figcaption></figure></div><h3><strong>&#9881;&#65039; When PostgreSQL's "Perfect" Query Plans Aren't</strong></h3><p><strong>Tomas Vondra</strong> shared a <strong><a href="https://vondra.me/posts/how-often-is-the-query-plan-optimal/">fascinating case study</a></strong> this week that every data engineer should bookmark. <em><strong>A PostgreSQL query planner was confidently choosing a 5-second index scan when a 2-second sequential scan would have won.</strong></em></p><ul><li><p>The details reveal why database optimization remains more art than science: Bitmap scans often <strong>outperform index scans by 10x for 1-5%</strong> selectivity ranges. </p></li><li><p>Cost estimates built on coarse statistics make "perfect" planning impossible. Hardware quirks like prefetching and cache warmth can shift the winner mid-execution. <em><strong>"The optimizer's best guess may only be 'good enough,'" as Vondra puts it</strong>, and this resonates with anyone who's spent time performance-tuning production queries.</em></p></li><li><p>What this means for data engineers is clear: even with decades of development, database optimizers operate with incomplete information by design. They're fast because they use simplified stats, but that <strong>simplification comes with blind spots.</strong></p></li></ul><p>The implication is that we should <strong>benchmark our critical queries ourselves.</strong> Don't assume the planner knows best, especially in cloud environments where the hardware abstraction adds even more complexity to the cost model.</p><p></p><h3><strong>&#127891; How to Ace the Databricks Pro Exam?</strong></h3><p>We came across a cool <a href="https://www.reddit.com/r/databricks/comments/1mscplb/i_scored_95_on_the_databricks_data_engineer/">Reddit thread</a> this week from someone who absolutely crushed the <strong>Databricks Data Engineer Professional exam with a 95% score</strong>. What struck us wasn't just the impressive result, but their honest breakdown of what actually moved the needle versus what felt like busy work.</p><ul><li><p>The conventional wisdom says to <strong>rely heavily on courses</strong> like Derar Alhussein's Udemy offering. This test-taker had a different take: skim it for breadth, but don't expect it to carry you across the finish line. </p></li><li><p>The real meat was in drilling down on <strong>Delta Lake, Spark Structured Streaming, and security concepts</strong>. These aren't just exam topics - they're the daily reality of most data engineering teams we encounter.</p></li><li><p>What we found most valuable was their emphasis on <strong>rotating through multiple practice exams</strong>. Not because repetition breeds success, but because it exposes the outdated "gotcha" syntax questions that still lurk in these certifications. </p></li><li><p>The data ecosystem moves fast, but <strong>exam content often lags behind.</strong></p></li></ul><p></p><h3><strong>&#127911; Inside Spotify's 1.4 Trillion-Event Data Engine</strong></h3><p>Sometimes you encounter a number that makes you pause and reconsider your assumptions about scale. For us this week, it was <strong>Spotify's</strong> <strong><a href="https://www.junaideffendi.com/p/spotify-data-tech-stack">1.4 trillion events per day</a></strong>. Not per month. Per day. What fascinated us wasn't just the raw volume, but how they've architected around it. </p><ul><li><p>The stack reads like a greatest hits of modern data infrastructure: Pub/Sub feeding into Beam and Flink pipelines, orchestrated through <strong>38,000 Flyte workflows</strong>, with data landing in BigQuery and GCS/HDFS to serve roughly <strong>5,000 Looker and Tableau dashboards for 6,000 users.</strong></p></li><li><p>The <strong>migration story from Luigi/Flo to Flyte</strong> particularly caught our attention. Fragmented orchestration and visibility gaps - these are the unglamorous problems that don't make conference talks but absolutely cripple teams at scale.</p></li><li><p><strong>Spotify's solution was refreshingly standard</strong>: battle-tested GCP primitives plus a scheduler that actually works. When you're processing 1.4 trillion events daily, the temptation to over-engineer is immense. Instead, they've doubled down on observability and reliability over cleverness.</p></li></ul><p></p><h3><strong>&#128375;&#65039; Web Scraping at Scale: The Uncomfortable Truths</strong></h3><p>A<a href="https://www.reddit.com/r/dataengineering/comments/1mtr5d2/how_do_you_manage_web_scraping_pipelines_at_scale/"> community discussion</a> this week perfectly captured <strong>why large-scale web scraping feels like an endless game. </strong></p><ul><li><p>The usual suspects were all there: <strong>rotating proxies, CAPTCHA solvers, the perpetual arms race</strong> between scrapers and anti-bot measures.</p></li><li><p>Rotating proxies and CAPTCHA solvers buy you time, but they still break. Alerting becomes mandatory because failure is inevitable, not just possible.</p></li><li><p>The wisdom in the thread gravitated toward two key insights: <strong>Hidden or internal APIs are absolute gold - always exhaust these options before building brittle DOM parsers.</strong></p></li><li><p>And past a certain pain threshold, <strong>most teams simply buy the data instead of maintaining Selenium farms</strong>. This maps to a broader pattern we see in data engineering. We often treat scraping as a permanent ETL source when it should be viewed as a stopgap. The smart teams we know invest in robust monitoring and plan their exit strategy (paid APIs, direct partnerships) from day one.</p></li></ul><div><hr></div><h3><strong>&#129302; Our Next Lakehouse Days: AI Agents for Enterprise Data</strong></h3><p>We're hosting another <strong><a href="https://lu.ma/8ufzg6gi">hands-on meetup in Bengaluru,</a></strong> and this time we're diving deep into something every data team is grappling with: <strong>AI agents that actually work at enterprise scale</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N0Y0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N0Y0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg 424w, https://substackcdn.com/image/fetch/$s_!N0Y0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg 848w, https://substackcdn.com/image/fetch/$s_!N0Y0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!N0Y0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N0Y0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg" width="375" height="387.1875" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:826,&quot;width&quot;:800,&quot;resizeWidth&quot;:375,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;graphical user interface, application, Teams&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="graphical user interface, application, Teams" title="graphical user interface, application, Teams" srcset="https://substackcdn.com/image/fetch/$s_!N0Y0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg 424w, https://substackcdn.com/image/fetch/$s_!N0Y0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg 848w, https://substackcdn.com/image/fetch/$s_!N0Y0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!N0Y0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20e93f1b-af8b-43cb-9a17-0d6b76c46e35_800x826.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The agenda goes <strong>beyond the usual "ChatGPT + your database" demos.</strong> We're tackling the real problems: how do you handle messy schemas without requiring perfect catalogs? How do you avoid the join errors that plague most text-to-SQL attempts? And critically, how do you move from basic query-and-result to genuinely conversational, context-aware workflows?</p><p><strong>Talks we're hosting:</strong></p><ul><li><p><strong>Bharath Harish</strong> (our Head of Product) will break down how we built Text-to-SQL agents with <strong>95%+ accuracy</strong> <strong>for enterprise scale</strong>. He'll cover knowledge graph-driven relationship discovery, SQL engine-like planning to reduce hallucinations, and performance techniques like partition key usage and optimized CTEs.</p></li></ul><ul><li><p><strong>Harsh Sharma</strong> from Flipkart will share why his team chose Milvus (open-source vector database) over traditional approaches for recommendation systems, plus the indexing and retrieval strategies that make it work at e-commerce scale.</p></li></ul><p><strong>Register here: https://lu.ma/8ufzg6gi</strong></p><p></p><h3><strong>Community &amp; Events</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XRx_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XRx_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png 424w, https://substackcdn.com/image/fetch/$s_!XRx_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png 848w, https://substackcdn.com/image/fetch/$s_!XRx_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png 1272w, https://substackcdn.com/image/fetch/$s_!XRx_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XRx_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png" width="1456" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XRx_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png 424w, https://substackcdn.com/image/fetch/$s_!XRx_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png 848w, https://substackcdn.com/image/fetch/$s_!XRx_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png 1272w, https://substackcdn.com/image/fetch/$s_!XRx_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe710a5eb-380d-46a1-a924-028f4eaefc52_6584x3980.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>AWS Storage Day: </strong>Rajath Gowda, our founding data engineer spoke about S3 tables and how e6data powers them, at the AWS event!</p></li><li><p><strong>Blog Series</strong>: Building a Modern Data Pipeline in Snowflake- <strong><a href="https://www.e6data.com/blog/snowflake-modern-data-pipeline-snowpipe-managed-iceberg-tables">deep dive</a></strong> from our engineering team</p></li><li><p><strong>Hiring</strong>: We&#8217;re growing! Explore open engineering roles &#8594; <strong><a href="https://www.e6data.com/careers">here</a></strong></p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Cursor & GPT-5 took over our newsletter!]]></title><description><![CDATA[Skewed joins are quiet cost multipliers. One hot key creates massive shuffle imbalance, long-tail tasks, and 2&#8211;5x cost. Here's how to fix it.]]></description><link>https://newsletter.e6data.com/p/we-wrote-this-newsletter-completely</link><guid isPermaLink="false">https://newsletter.e6data.com/p/we-wrote-this-newsletter-completely</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 08 Aug 2025 13:03:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4mHP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey there,</p><p>So we're experimenting with GPT-5 for this week's newsletter. Jury's still out, but here's what it cooked up for you...</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4mHP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4mHP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png 424w, https://substackcdn.com/image/fetch/$s_!4mHP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png 848w, https://substackcdn.com/image/fetch/$s_!4mHP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png 1272w, https://substackcdn.com/image/fetch/$s_!4mHP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4mHP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png" width="1456" height="915" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:915,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:527790,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/170430040?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4mHP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png 424w, https://substackcdn.com/image/fetch/$s_!4mHP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png 848w, https://substackcdn.com/image/fetch/$s_!4mHP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png 1272w, https://substackcdn.com/image/fetch/$s_!4mHP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3314f8d6-d85d-47a7-802e-5501d82e03bd_2914x1832.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sneak-peak: PMM team at work</figcaption></figure></div><h2><strong>The problem that's probably burning your budget right now</strong></h2><p>You know that feeling when one <strong>Spark job takes 3x longer than it should?</strong> Nine times out of ten, it's skewed joins. Some hot key (anonymous users, null tenant IDs, that one power user with 10M events) creates a few massive shuffle partitions while the rest of your cluster sits there twiddling its thumbs.</p><p>I see this constantly in <strong>r/dataengineering</strong> threads. "Why is my job so slow?" Usually it's join skew torching their AWS bill.</p><p></p><h2><strong>Here's the copy-paste fix</strong></h2><p><strong>The setup that's probably familiar:</strong></p><pre><code>from pyspark.sql import functions as F

# This looks innocent enough...

events = spark.table("fact_events") # 100M rows
users = spark.table("dim_users") # 1M rows

# But user_id has that one enterprise customer with 20% of all events

joined = events.join(users, "user_id", "left")</code></pre><p><strong>Step 1: Turn on the good stuff (AQE)</strong></p><pre><code># These should honestly be defaults by now

spark.conf.set("spark.sql.adaptive.enabled", "true")
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true")
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionRowCountThreshold", 250000)
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes", 268435456) # 256MB

# Now run the same join - AQE will split those massive partitions

joined = events.join(users, "user_id", "left")</code></pre><p><strong>Step 2: Broadcast the small side (if it actually IS small)</strong></p><pre><code># Rule of thumb: if your dimension is &lt; 500MB after filters, broadcast it

joined = events.join(users.hint("BROADCAST"), "user_id", "left")</code></pre><p><strong>Step 3: For the really stubborn cases - salting</strong></p><pre><code># When you have that one customer that's just... massive
HOT_CUSTOMER_ID = 42  # Replace with your actual hot key
SALT_BUCKETS = 16

events_salted = (events
    .withColumn("salt", 
        F.when(F.col("user_id") == HOT_CUSTOMER_ID, 
               (F.rand() * SALT_BUCKETS).cast("int"))
         .otherwise(F.lit(0))))

users_salted = (users
    .withColumn("salt",
        F.when(F.col("user_id") == HOT_CUSTOMER_ID,
               F.sequence(F.lit(0), F.lit(SALT_BUCKETS-1)))
         .otherwise(F.array(F.lit(0))))
    .withColumn("salt", F.explode("salt")))

# Join on both user_id AND salt
joined = events_salted.join(users_salted, ["user_id", "salt"], "left")
# Drop the salt column downstream - it's just a technical detail</code></pre><p></p><h2><strong>How to spot this in the wild</strong></h2><p><strong>Spark UI red flags:</strong></p><ul><li><p>Stage timeline looks like a hockey stick (most tasks finish fast, a few stragglers kill you)</p></li><li><p>Shuffle read size distribution is all over the place</p></li><li><p>One executor is pegged while others are idle</p></li></ul><p><strong>Quick diagnostic:</strong></p><pre><code># Always run this before big joins

events.groupBy("user_id").count().orderBy(F.desc("count")).show(20, False)</code></pre><p><strong>If the top row has 10x more records than the second row, you've got skew.</strong></p><p></p><h2><strong>If you're using Snowflake/BigQuery instead</strong></h2><p><strong>Snowflake:</strong></p><pre><code>-- Force broadcast with hints
SELECT /*+ BROADCAST(users) */
    events.user_id, 
    SUM(events.amount)
FROM events 
JOIN users ON events.user_id = users.user_id
WHERE events.event_date &gt;= CURRENT_DATE - 7
GROUP BY 1;</code></pre><p><strong>BigQuery:</strong></p><ul><li><p>Cluster your large tables by join keys</p></li><li><p>Use partitioning aggressively to limit scanned bytes</p></li><li><p>Pre-aggregate when possible</p></li></ul><h2><strong>The real talk</strong></h2><p>Look, skewed joins are everywhere. That anonymous user ID, deleted records with NULL foreign keys, your biggest enterprise customer - they all create hot partitions. AQE helps, but sometimes you need to get your hands dirty with salting.</p><p>The good news? Once you fix it, it stays fixed. </p><div><hr></div><blockquote><p><strong>Human here, let us know how much our experiment has succeeded in comments. We&#8217;ll love to test more!</strong></p></blockquote><p>Now, onto the other things:<br></p><h2>Launching e6data&#8217;s Hybrid Data Lakehouse: 10x Faster Queries, Near-Zero Egress, Sub-Second Latency</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qUPJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qUPJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png 424w, https://substackcdn.com/image/fetch/$s_!qUPJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png 848w, https://substackcdn.com/image/fetch/$s_!qUPJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png 1272w, https://substackcdn.com/image/fetch/$s_!qUPJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qUPJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png" width="1456" height="851" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:851,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qUPJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png 424w, https://substackcdn.com/image/fetch/$s_!qUPJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png 848w, https://substackcdn.com/image/fetch/$s_!qUPJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png 1272w, https://substackcdn.com/image/fetch/$s_!qUPJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5014c1-4d05-4316-8073-6f0015eb99d9_1600x935.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">e6data&#8217;s Hybrid Data Lakehouse Architecture: An Example Setup</figcaption></figure></div><p>Most enterprises today deal with high egress costs, governance issues, migration, and latency issues in hybrid lakehouses due to their fundamental architectural limitations.</p><p>We are solving this with our <strong>federated SQL engine + hybrid cluster architecture</strong>. The architecture is designed such that the hybrid cluster is abstracted out from the end user's querying experience, and they get to write queries as though there were a single cluster talking to all these data sources.</p><p>Used in production by customers, the latest benchmarks on e6data&#8217;s hybrid data lakehouse deployment are as follows:</p><ul><li><p><strong>10x speed</strong> by keeping compute local.</p></li><li><p><strong>~0%</strong> egress fees</p></li><li><p>Adding cache reduces another <strong>~40% off latency </strong>with no extra data movement.</p></li></ul><p>For more details, benchmark setup, case study, product feature and docs, refer to this <a href="https://www.e6data.com/blog/hybrid-data-lakehouse-10x-faster-queries-near-zero-egress-sub-second-latency">page</a>!</p><h3><strong>Community &amp; Events</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jj-k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jj-k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png 424w, https://substackcdn.com/image/fetch/$s_!Jj-k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png 848w, https://substackcdn.com/image/fetch/$s_!Jj-k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png 1272w, https://substackcdn.com/image/fetch/$s_!Jj-k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jj-k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png" width="1456" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Jj-k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png 424w, https://substackcdn.com/image/fetch/$s_!Jj-k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png 848w, https://substackcdn.com/image/fetch/$s_!Jj-k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png 1272w, https://substackcdn.com/image/fetch/$s_!Jj-k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf1ab06a-e260-477c-ac3c-50c859af7927_1646x995.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Lakehouse Days</strong>&#8211; We are hosting a new one this month on Agentic AI. Sign up to our <strong><a href="https://lu.ma/Lakehouse-days-with-e6data">calendar</a></strong> for first updates!</p></li><li><p><strong>Blog Series</strong> &#8211; Embedding Essentials: Cosine Similarity in SQL &#8212; <strong><a href="https://www.e6data.com/blog/embedding-essentials-cosine-similarity-sql-with-vectors">deep dive</a></strong> from our engineering team</p></li><li><p><strong>Hiring</strong> &#8211; We&#8217;re growing! Explore open engineering roles &#8594; <strong><a href="https://www.e6data.com/careers">here</a></strong></p></li></ul><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/we-wrote-this-newsletter-completely?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/we-wrote-this-newsletter-completely?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.e6data.com/p/we-wrote-this-newsletter-completely?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p>]]></content:encoded></item><item><title><![CDATA[Embeddings Exposed, Kafka in 300 Lines, Redundant SQL, and Lakehouse Costs Breakdown]]></title><description><![CDATA[Deep dives into vectors, minimalist Kafka builds, primary keys, SQL as a language, and transparent lakehouse pricing,]]></description><link>https://newsletter.e6data.com/p/embeddings-exposed-kafka-in-300-lines</link><guid isPermaLink="false">https://newsletter.e6data.com/p/embeddings-exposed-kafka-in-300-lines</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 01 Aug 2025 12:52:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4LrM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4LrM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4LrM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4LrM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4LrM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4LrM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4LrM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;10 Things I Hate About You | Netflix&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="10 Things I Hate About You | Netflix" title="10 Things I Hate About You | Netflix" srcset="https://substackcdn.com/image/fetch/$s_!4LrM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4LrM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4LrM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4LrM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa52c17b0-7008-4aab-a21e-59ec760db279_2400x1350.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>&#128300; How to make Vector Embeddings?</h2><p>An interactive <a href="https://huggingface.co/spaces/hesamation/primer-llm-embedding?section=bert_(bidirectional_encoder_representations_from_transformers)">Hugging Face Space</a> peels back the <strong>1,536-D token matrix</strong> of <strong>DeepSeek-R1-Distill-Qwen-1.5B</strong>, letting you poke real vectors with cosine-similarity tools.</p><ul><li><p>Dissects <strong>static token vectors vs contextual hidden-state embeddings</strong>, clarifying why only the first layer stays fixed while deeper layers mutate with self-attention. </p></li><li><p>Code shows how <code>torch.nn.Embedding </code>is lifted, cached, and queried in milliseconds&#8212;then uses <code>torch.topk</code> on cosine scores to map local neighbourhoods. </p></li><li><p>Notebook prints raw rows: e.g. the token "HTML" expands to a 1,536-float vector, proving verbatim that the model&#8217;s dictionary sits in RAM not VRAM.</p></li></ul><p></p><h2>&#128736;&#65039; Kafka Core in ~300 LOC of Python</h2><p>A Redditor <strong><a href="https://www.reddit.com/r/dataengineering/comments/1mc9qcp/built_kafka_from_scratch_in_python_inspired_by/">rewrites Kafka&#8217;s 2011 design in pure Python</a></strong>: single-threaded broker, producers, consumers, offset tracking, no ZooKeeper, no partitions.</p><ul><li><p><strong>Append-only log + pull-based</strong> consumer loop illustrate back-pressure mechanics.</p></li><li><p>Community calls out the <strong>missing bits</strong>: partitioning, segment compaction, persistence, concurrency-safety.</p></li><li><p>Author plans to add partitions next; goal is architectural clarity, not prod-grade throughput.</p></li></ul><p><strong>Our Takeaway:</strong> Re-implementing the log-structured core is the fastest way to grok why sequential disk I/O + monotonic offsets scale.</p><p></p><h2>&#128273; Primary Keys vs Petabytes: When Constraints Bite Back</h2><p>Another <a href="https://www.reddit.com/r/dataengineering/comments/1m9xbk5/primary_keys_am_i_crazy/">Reddit thread </a>debates surrogate <strong>hashes &amp; GUIDs versus brute-force uniqueness checks.</strong></p><ul><li><p>Surrogate keys (MD5/UUID) enable SCD2 change tracking while decoupling from source schemas.</p></li><li><p>One team&#8217;s 5.7 trillion-row sensor fact grew its 8-byte PK index to <strong>45 TB uncompressed</strong>- <strong>sometimes the index is the data.</strong></p></li><li><p>Others drop keys on <strong>ultra-wide fact tables</strong>, trading insert speed for read-time de-duplication.</p></li></ul><p><strong>Our Takeaway:</strong> Keys are good insurance until the index dwarfs payload, then explore late-binding dedupe or zone maps.</p><p></p><h2>&#128465;&#65039; SQL Spec Gripes: MERGE, RIGHT JOIN &amp; Recursive Rumbles</h2><p>A  Reddit <a href="https://www.reddit.com/r/dataengineering/comments/1m8g4e6/are_some_parts_of_the_sql_spec_hot_garbage/">rant</a> <strong>blacklists  </strong><code>MERGE</code><strong>, recursive CTEs, and </strong><code>RIGHT JOIN</code>, sparking a highly heated debate.</p><ul><li><p><strong>OP slams</strong> non-idempotent <code>MERGE</code>, &#8220;clever&#8221; recursive CTEs, and unreadable RIGHT JOINs</p></li><li><p><strong>Veterans defend</strong> recursive CTEs for graph traversals; say RIGHT JOIN is merely rare, not wrong</p></li><li><p><strong>Tangent</strong>: BigQuery&#8217;s new pipe operator (<code>|&gt;</code>) pitched as a saner logical-order alternative</p></li></ul><p><strong>Our Takeaway:</strong> A disciplined, cross-engine SQL subset&#8212;plus pipeline syntax&#8212;beats holy wars over edge-case clauses.</p><div><hr></div><h2>&#128161; Cost Calculator: Instantly Price Your Lakehouse</h2><p>We&#8217;re launching an interactive <strong><a href="https://www.e6data.com/pricing">Cost Calculator</a></strong> that shows what you&#8217;ll spend on e6data versus legacy engines, before you deploy.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HHrS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HHrS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png 424w, https://substackcdn.com/image/fetch/$s_!HHrS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png 848w, https://substackcdn.com/image/fetch/$s_!HHrS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!HHrS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HHrS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png" width="1456" height="674" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:674,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:230428,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/169733725?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HHrS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png 424w, https://substackcdn.com/image/fetch/$s_!HHrS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png 848w, https://substackcdn.com/image/fetch/$s_!HHrS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!HHrS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9781694-a316-4f69-974d-a7ba05c588a7_2578x1194.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>Estimates <strong>query costs on your lakehouse platform in &lt; 1 min.</strong></p></li><li><p>Pick cloud (AWS, Azure, GCP), region, cluster size, hours/day, and see side-by-side monthly totals in seconds.</p></li></ul><div><hr></div><h3>Community &amp; Events</h3><ul><li><p><strong>Lakehouse Days Replay</strong> &#8211; Missed our &#8220;<strong><a href="https://lu.ma/kzs7jh63">From Stream to Lakehouse</a></strong>&#8221; session? Catch the full recording on <strong><a href="https://www.youtube.com/channel/UC2Q6NW9E3lbUKrYm551fwDA">e6data&#8217;s YouTube</a></strong>.</p></li><li><p><strong>Blog Series</strong> &#8211; <em>Building a Modern Data Pipeline in Snowflake: From Snowpipe to Managed Iceberg Tables with Sync Checks</em> &#8212; <strong><a href="https://www.e6data.com/blog/snowflake-modern-data-pipeline-snowpipe-managed-iceberg-tables">deep dive</a></strong> from our engineering team (Part 1 out now).</p></li><li><p><strong>Hiring</strong> &#8211; We&#8217;re growing! Explore open engineering roles &#8594; <strong><a href="https://www.e6data.com/careers">here</a></strong></p></li></ul><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/embeddings-exposed-kafka-in-300-lines?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/embeddings-exposed-kafka-in-300-lines?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.e6data.com/p/embeddings-exposed-kafka-in-300-lines?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p>]]></content:encoded></item><item><title><![CDATA[LLMs Stall, Queries Spill, and e6data Bridges Delta, Hudi, Iceberg & Polaris]]></title><description><![CDATA[Tactics to tame runaway memory, curb Bronze disasters, refactor tests,& AI drama. Learn how to build a modern data pipeline in Snowflake and our product updates on Iceberg, Polaris, Hudi, Delta]]></description><link>https://newsletter.e6data.com/p/llms-stall-queries-spill-e6data-unifies-delta-hudi-iceberg-polaris</link><guid isPermaLink="false">https://newsletter.e6data.com/p/llms-stall-queries-spill-e6data-unifies-delta-hudi-iceberg-polaris</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 25 Jul 2025 11:31:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Yx4i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yx4i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yx4i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Yx4i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Yx4i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Yx4i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yx4i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:240394,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/169209738?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yx4i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Yx4i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Yx4i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Yx4i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe51f72bd-146a-4d00-bb16-4f1653db7625_1600x898.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#128736;&#65039; <strong>OpenRewrite Beats Gemini AI at JUnit Migration</strong><br>This week, our senior engineer pitted Google&#8217;s Gemini against OpenRewrite when moving tests from <a href="https://docs.openrewrite.org/running-recipes/popular-recipe-guides/migrate-from-junit-4-to-junit-5">JUnit 4 &#8594; 5</a>&#8212;Gemini timed-out, while OpenRewrite finished in minutes.<br>&#8226; Gemini burned API quota and stalled on &#8220;complex&#8221; tests<br>&#8226; OpenRewrite migrated nearly all tests in ~10 min; only edge cases needed tweaks<br>&#8226; Moral: pick the right-sized tool. LLMs aren&#8217;t a silver bullet for structured code refactors<br><strong>Our Takeaway:</strong> Rule-based refactoring still trumps LLM magic for deterministic migrations. The big a-ha moment for him was, it passed all tests except tpox (known) and one geospatial test (which is because it uses assertThrows, and the expected error message is coming different)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PqJZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PqJZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png 424w, https://substackcdn.com/image/fetch/$s_!PqJZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png 848w, https://substackcdn.com/image/fetch/$s_!PqJZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png 1272w, https://substackcdn.com/image/fetch/$s_!PqJZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PqJZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png" width="1456" height="1869" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1869,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:917708,&quot;alt&quot;:&quot;migrate junit4 tests in executor to junit5&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/169209738?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="migrate junit4 tests in executor to junit5" title="migrate junit4 tests in executor to junit5" srcset="https://substackcdn.com/image/fetch/$s_!PqJZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png 424w, https://substackcdn.com/image/fetch/$s_!PqJZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png 848w, https://substackcdn.com/image/fetch/$s_!PqJZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png 1272w, https://substackcdn.com/image/fetch/$s_!PqJZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36f86e7b-06fe-4af0-9189-a3fad173f12e_1648x2116.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">migrate junit4 tests in executor to junit5</figcaption></figure></div><p>&#128190; <strong>Databases Chase Unified Memory Limits</strong><br>An <a href="https://buttondown.com/jaffray/archive/unified-memory-management/">in-depth post</a> argues modern query engines must police themselves with strict memory caps and spill-to-disk safety nets&#8212;language choice matters.<br>&#8226; Two must-haves: kill a query before it OOMs, and spill gracefully when it would<br>&#8226; Go&#8217;s silent allocations make tracking painful; Rust&#8217;s lack of fallible allocators is &#8220;annoying&#8221; but fixable<br>&#8226; Teams layer custom allocators/monitors to meter both memory and CPU for every pipeline operator<br><strong>Our Takeaway:</strong> Expect next-gen engines to ship with first-class resource governors, not bolt-ons.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Join 900+ data engineers who start their Fridays with Data Engineering ACID. It&#8217;s free!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#128678; <strong>Stopping the Bronze Data Deluge</strong><br><a href="https://www.reddit.com/r/dataengineering/comments/1m7hlxu/how_do_you_handle_incremental_full_loads_in_a/">A Reddit thread</a> tackles medallion-layer mayhem when incremental feeds piggy-back on full loads and accidentally delete 95 % of data.<br>&#8226; Hash-based &#8220;delete-on-absence&#8221; logic breaks with sparse incrementals<br>&#8226; Community suggests adding <code>load_type</code>, <code>is_current</code>, and soft-delete flags instead of hard deletes<br>&#8226; Keep only 30&#8211;90 days in Bronze; archive deep history cold and manage SCD 2 in Silver<br><strong>Our Takeaway:</strong> Treat Bronze as a transient landing zone and preserve history higher up and let incrementals append, not annihilate.</p><p>&#129482; <strong>Delta, Iceberg &amp; Hudi: Same Parquet, Different Vibes</strong><br>Another <a href="https://www.reddit.com/r/dataengineering/comments/1m517th/why_do_delta_iceberg_and_hudi_all_feel_the_same/">lively debate</a> says the three lakehouse table formats feel like Oracle, Postgres and MySQL&#8212;different logos, same spoon.<br>&#8226; Hudi wins for low-latency streaming upserts; Delta shines in Databricks&#8217; managed world; Iceberg rules with vendor-neutral engine support<br>&#8226; All wrap Parquet files with metadata layers; feature gaps are narrowing fast<br>&#8226; Consensus: by 2025, Iceberg is the safest long-term bet for neutrality across Snowflake, Flink, Trino &amp; friends<br><strong>Our Takeaway:</strong> Master the concepts of partitioning, snapshots, manifest files and you can hop between formats as ecosystems blur. Btw, we launched an <a href="https://www.e6data.com/blog/improved-support-iceberg-polaris-hudi-delta-lake">improved support for Delta, Iceberg, Hudi, and Delta</a> this week. </p><div><hr></div><h3>&#128161; Product Launch</h3><p><strong>Product Update: Improved support for Iceberg, Polaris, Hudi &amp; Delta Lake</strong></p><p>Our improved support lets you natively read and browse all four open-table formats <a href="https://docs.e6data.com/product-documentation/release-notes-and-updates/23rd-july-2025">(Iceberg, Polaris, Hudi, Delta Lake) inside one e6data workspace</a> with zero-copy ingestion, cross-catalog joins, Polaris RBAC/Unity Catalog enforcement, and a unified, metadata-aware query path.</p><h3>&#128218; Blog Release</h3><p><strong>Snowflake Snowpipe &#8594; Managed Iceberg Tables (with sync checks)</strong></p><p>Our lead engineer wrote a complete <a href="https://www.e6data.com/blog/snowflake-modern-data-pipeline-snowpipe-managed-iceberg-tables">blueprint for building a modern Snowflake pipeline</a> that ingests with <strong>Snowpipe</strong>, transforms via SQL/stored procedures, writes to <strong>Snowflake&#8209;managed Iceberg tables</strong>, and adds <strong>row&#8209;level sync checks</strong>&#8212;all while keeping the data open and queryable by any Iceberg-compatible engine. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3uV2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3uV2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png 424w, https://substackcdn.com/image/fetch/$s_!3uV2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png 848w, https://substackcdn.com/image/fetch/$s_!3uV2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png 1272w, https://substackcdn.com/image/fetch/$s_!3uV2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3uV2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png" width="1456" height="735" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:735,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:568282,&quot;alt&quot;:&quot;Snowflake Modern Data Pipeline Diagram&#8239;&#8211;&#8239;Snowpipe Ingestion to Managed&#8239;Iceberg Tables with Sync&#8239;Checks&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/169209738?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Snowflake Modern Data Pipeline Diagram&#8239;&#8211;&#8239;Snowpipe Ingestion to Managed&#8239;Iceberg Tables with Sync&#8239;Checks" title="Snowflake Modern Data Pipeline Diagram&#8239;&#8211;&#8239;Snowpipe Ingestion to Managed&#8239;Iceberg Tables with Sync&#8239;Checks" srcset="https://substackcdn.com/image/fetch/$s_!3uV2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png 424w, https://substackcdn.com/image/fetch/$s_!3uV2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png 848w, https://substackcdn.com/image/fetch/$s_!3uV2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png 1272w, https://substackcdn.com/image/fetch/$s_!3uV2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f494e7-e478-4715-abdd-4eed7ead67fa_3808x1922.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Snowflake Modern Data Pipeline Diagram&#8239;&#8211;&#8239;Snowpipe Ingestion to Managed&#8239;Iceberg Tables with Sync&#8239;Checks</figcaption></figure></div><p><strong>Key highlights:</strong></p><ul><li><p>End&#8209;to&#8209;end SQL: landing/core tables, <code>CREATE ICEBERG TABLE</code>, and a <code>transform_orders()</code> stored procedure.</p></li><li><p><strong>Snowpipe AUTO_INGEST</strong> from S3 to landing tables; continuous or scheduled loads.</p></li><li><p><strong>Data quality &amp; sync checks</strong> (row counts, diffs, optional checksums) between curated and Iceberg tables.</p></li><li><p><strong>Snowflake Tasks</strong> to automate the daily run (<code>CRON 0 2 * * * UTC</code>).</p></li><li><p><strong>Open catalog, multi&#8209;engine interoperability</strong>: query the same Iceberg data from Snowflake, Spark, Trino, Presto, e6data, etc.</p></li><li><p>Benefits called out: automated ingestion, curated &amp; governed analytics layer, and zero lock&#8209;in via Iceberg&#8217;s open metadata</p></li></ul><div><hr></div><h3>&#129309; Community &amp; Events</h3><ul><li><p><strong>Lakehouse Days Replay</strong> &#8211;  ICYMI, here is the full recording of our <strong>&#8220;<a href="https://lu.ma/kzs7jh63">&#8203;From Stream to Lakehouse</a>&#8221;</strong> is on<a href="https://www.youtube.com/channel/UC2Q6NW9E3lbUKrYm551fwDA"> </a><strong><a href="https://www.youtube.com/channel/UC2Q6NW9E3lbUKrYm551fwDA">e6data&#8217;s YouTube</a></strong>.</p></li><li><p><strong><a href="https://www.e6data.com/blog/snowflake-modern-data-pipeline-snowpipe-managed-iceberg-tables">Building a Modern Data Pipeline in Snowflake: From Snowpipe to Managed Iceberg Tables with Sync Checks</a></strong>: an in-depth blog series by our engineering team</p></li><li><p><strong>Meet Us Today at <a href="https://machinecon.aimmediahouse.com/">MachineCon USA</a></strong> &#8211; We&#8217;ll be on-site; come say hi!</p></li><li><p><strong>Hiring:</strong> We&#8217;re growing! Check out open engineering roles [<strong><a href="https://e6data.zohorecruit.in/jobs/Careers">here</a></strong>]</p></li></ul><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/llms-stall-queries-spill-e6data-unifies-delta-hudi-iceberg-polaris?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/llms-stall-queries-spill-e6data-unifies-delta-hudi-iceberg-polaris?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.e6data.com/p/llms-stall-queries-spill-e6data-unifies-delta-hudi-iceberg-polaris?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How to Rust CGP, Iceberg's the wrong spec?, Spark's testimony, Froid-inspired UDF engine, and more]]></title><description><![CDATA[Today we talk about Rust again, why Iceberg might have a metadata issue, Spark's future, testing frameworks, and building a better UDF-engine inspired by Microsoft&#8217;s Froid framework.]]></description><link>https://newsletter.e6data.com/p/how-to-not-vibe-code-iceberg-the</link><guid isPermaLink="false">https://newsletter.e6data.com/p/how-to-not-vibe-code-iceberg-the</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 18 Jul 2025 12:30:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!F7cS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F7cS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F7cS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg 424w, https://substackcdn.com/image/fetch/$s_!F7cS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg 848w, https://substackcdn.com/image/fetch/$s_!F7cS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!F7cS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F7cS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg" width="1440" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The Godfather Review | Movie - Empire&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Godfather Review | Movie - Empire" title="The Godfather Review | Movie - Empire" srcset="https://substackcdn.com/image/fetch/$s_!F7cS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg 424w, https://substackcdn.com/image/fetch/$s_!F7cS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg 848w, https://substackcdn.com/image/fetch/$s_!F7cS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!F7cS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b8fd246-4aea-4560-bb70-be076c396788_1440x810.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>&#129302; Rust Context-Generic Programming (CGP) in a Nutshell</h3><p>A very <a href="https://medium.com/lifefunk/understanding-rust-cgp-context-generic-programming-not-so-a-beginners-guide-9c09be297dc4">insightful take on Rust CGP</a> took over our engineering team this week. Here are a few highlights:</p><ul><li><p><strong>CGP swaps &#8220;objects&#8221; for contexts</strong>: functions declare the few capabilities they need via consumer traits, and any struct that offers those capabilities can satisfy them. The compiler wires everything together, so there&#8217;s no runtime cost. </p></li><li><p><strong>It tackles Rust&#8217;s big headaches</strong>&#8212;bloated trait APIs, orphan-rule conflicts, forced public types, copy-pasted code&#8212;by cleanly splitting provider-and-consumer roles and letting overlapping implementations live side-by-side.</p></li><li><p>Power tools include <code>HasField</code> and <code>cgp_auto_getter</code> macros, <strong>extensible records/builders</strong> for piecemeal struct construction, and <strong>extensible variants/visitors</strong> for enums that grow without breaking old code. </p></li></ul><p><strong>Our Takeaway</strong>: Use CGP when systems have lots of inter-dependent services or tight performance demands; stick with plain trait-object DI for small, stable parts. CGP keeps tests easy and mixes real or mock back-ends just by swapping contexts.</p><p></p><h3>&#129482; Iceberg, The Right Idea &#8211; The Wrong Spec</h3><p>A very <a href="https://database-doctor.com/posts/iceberg-is-wrong-1.html">contrarian blog by the Database Doctor</a> called <strong>&#8220;Iceberg, The Right Idea &#8211; The Wrong Spec&#8221;</strong> is claiming history on why object-store tables risk repeating 1990s storage issues:</p><ul><li><p>The author shows how old-school filesystems and today&#8217;s object stores <strong>choke on millions of tiny files</strong>, lock handling, and fragmentation; in contrast, classic databases solved these &#8220;space-management&#8221; headaches decades ago.</p></li><li><p>Because <strong>databases already handle storage, compression, atomic writes</strong>, and bit-rot, pushing data into object stores just shifts cost and pain to users while giving cloud vendors more lock-in. </p></li><li><p>Data lake tables still need fast, tiny metadata updates, yet <strong>object stores are not the place for that</strong>; even at multi-petabyte scale, metadata stays small enough to fit on a single server, so a real metadata database beats file blobs.</p></li><li><p>Conclusion: <strong>openness matters</strong> but be wary until it tackles metadata and lock-in as cleanly as proven database engines do.</p></li></ul><p><strong>Our Takeaway:</strong> Before betting a 100% on open-table formats, scrutinise how they handle metadata at scale to unlock the best RoI. <br></p><h3>&#128165; <strong>Spark: Still Got the Fire?</strong></h3><p>Another <a href="https://www.reddit.com/r/dataengineering/comments/1lenuvj/how_many_of_you_are_still_using_apache_spark_in/">r/dataengineering thread</a> asks whether Apache Spark is pass&#233; in 2025&#8212;and finds the community surprisingly loyal. <strong>&#8220;Yes&#8212;when the data&#8217;s </strong><em><strong>truly</strong></em><strong> big.&#8221;:</strong></p><ul><li><p><strong>Petabyte shops</strong> say Spark &#8220;just works&#8221; for multi-PB tables </p></li><li><p>Critics highlight launch <strong>latency and opaque debugging</strong>, steering smaller teams to Polars first </p></li><li><p>Hidden cost of &#8220;<strong>tool churn</strong>&#8221;: migrating stacks as volumes grow burns hiring and context-switching budget </p></li><li><p>Many would <em><strong>still</strong></em><strong> choose Spark</strong> for once workloads cross the Petabyte barrier; below that, lightweight engines win on ergonomics</p></li></ul><p><strong>Our Takeaway:</strong> Spark isn&#8217;t dead&#8212;just reserve it for workloads that justify the cluster costs.<br></p><h3>&#129514; <strong>Unit Tests &#8800; Data Quality Checks</strong></h3><p>Another great <a href="https://www.reddit.com/r/dataengineering/comments/1ljq8r4/unit_tests_data_quality_checks_cmv/?utm_source=share&amp;utm_medium=web3x&amp;utm_name=web3xcss&amp;utm_term=1&amp;utm_content=share_button">Reddit debate</a> <strong>separates</strong> <strong>build-time tests from run-time data validation.</strong></p><ul><li><p><strong>Unit testing</strong> is about making sure that some dependency change or code refactor doesn&#8217;t result in bad code that gives wrong results. Integration and e2e testing are about the whole integrated pipeline performing as expected. </p></li><li><p><strong>Data quality checks</strong> are about checking the integrity of production data as it&#8217;s already flowing, each time it flows. It&#8217;s a &#8220;runtime&#8221; construct, ie after your code is released.</p></li><li><p>The community agrees the two are complementary; <strong>DQ systems should be platform-agnostic and drive feedback to data stewards. </strong></p></li><li><p><strong>Overlap is rare</strong>: unit tests may cover complex transforms, but DQ thresholds still catch unexpected business-data anomalies.</p></li></ul><p><strong>Our Takeaway:</strong> Keep CI tests and data-quality observability as distinct layers&#8212;each fails fast in its own domain.</p><p></p><h3>We have made a faster Froid-inspied UDF engine!</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tVFM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tVFM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png 424w, https://substackcdn.com/image/fetch/$s_!tVFM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png 848w, https://substackcdn.com/image/fetch/$s_!tVFM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png 1272w, https://substackcdn.com/image/fetch/$s_!tVFM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tVFM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png" width="1456" height="470" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:470,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:174656,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/168618814?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tVFM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png 424w, https://substackcdn.com/image/fetch/$s_!tVFM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png 848w, https://substackcdn.com/image/fetch/$s_!tVFM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png 1272w, https://substackcdn.com/image/fetch/$s_!tVFM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f28a7b8-98ec-4738-92a3-118d9d1e6641_1858x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Why do Developers Reach for UDFs?</figcaption></figure></div><ul><li><p>UDFs let you keep loops and business rules right inside SQL, but the usual multi-statement kind makes the engine hop out for every row, <strong>killing speed.</strong> </p></li><li><p>That drag comes from three things: context-switch overhead, row-by-row work, and an optimizer that can&#8217;t peek inside the function&#8212;<strong>turning some queries 10-1000&#215; slower.</strong> </p></li><li><p>Databricks and Snowflake sidestep the pain by only allowing single-expression UDFs, which forces devs to cram logic into one line or shift it to external code. </p></li><li><p><strong><a href="http://UDF">e6data&#8217;s Froid-style</a></strong> <strong>&#8220;inliner&#8221; dissolves multi-statement UDFs back into one relational plan</strong>, so the optimizer can parallelize freely and queries run orders faster&#8212;without changing user code.</p></li></ul><p></p><h3>Community &amp; Events</h3><ul><li><p><strong>Lakehouse Days Replay</strong> &#8211; The full recording of our <strong>&#8220;<a href="https://lu.ma/kzs7jh63">&#8203;From Stream to Lakehouse</a>&#8221;</strong> is now live on<a href="https://www.youtube.com/channel/UC2Q6NW9E3lbUKrYm551fwDA"> </a><strong><a href="https://www.youtube.com/channel/UC2Q6NW9E3lbUKrYm551fwDA">e6data&#8217;s YouTube</a></strong>.</p></li><li><p><strong><a href="https://www.e6data.com/blog/inside-e6datas-froid-inspired-udf-engine">Inside e6data&#8217;s Froid-Inspired UDF Engine</a></strong>: an in-depth blog series by our engineering team</p></li><li><p><strong>Meet Us at <a href="https://machinecon.aimmediahouse.com/">MachineCon USA</a></strong> &#8211; We&#8217;ll be on-site; come say hi!</p></li><li><p><strong>Hiring:</strong> We&#8217;re growing! Check out open engineering roles [<strong><a href="https://e6data.zohorecruit.in/jobs/Careers">here</a></strong>]</p></li></ul><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/how-to-not-vibe-code-iceberg-the?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/how-to-not-vibe-code-iceberg-the?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.e6data.com/p/how-to-not-vibe-code-iceberg-the?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[How to Rust, Cursor's vector search dump, Supabase MCP's leak (?), Apache Arrow Summit, and more]]></title><description><![CDATA[Every Friday, we deliver your weekend win: copy-paste tutorial, cost-optimisation technique, CFPs worth your pitch, and fresh ideas from the field. Stop surfing fluff.]]></description><link>https://newsletter.e6data.com/p/how-to-rust-cursors-vector-search</link><guid isPermaLink="false">https://newsletter.e6data.com/p/how-to-rust-cursors-vector-search</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 11 Jul 2025 13:03:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ItDt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ItDt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ItDt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ItDt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ItDt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ItDt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ItDt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg" width="1024" height="425" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:425,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Everything Everywhere All At Once (2022) &#8211; Scene by Green&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Everything Everywhere All At Once (2022) &#8211; Scene by Green" title="Everything Everywhere All At Once (2022) &#8211; Scene by Green" srcset="https://substackcdn.com/image/fetch/$s_!ItDt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ItDt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ItDt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ItDt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a51e279-cb33-4101-937e-9922a3f72e07_1024x425.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>&#129408; <strong>How to Write Idiomatic Rust</strong></h3><p><strong>David Drysdale&#8217;s freely available </strong><em><strong><a href="https://www.lurklurk.org/effective-rust/">Effective Rust</a></strong></em> condenses years of field experience into short, actionable items here:</p><ul><li><p><strong>Six chapters</strong> cover types, traits, concepts, dependencies, tooling and beyond&#8212;35 focused lessons in total.</p></li><li><p>Rust&#8217;s rigorous type system is the star: if it compiles, it probably works, though newcomers still wrestle with lifetimes and the borrow checker.</p></li><li><p><strong>Practical advice</strong> spans Clippy linting, CI setup and taming dependency creep for production-ready crates.</p></li></ul><p><strong>Our Takeaway:</strong> Must-read. Rust is taking over the code.</p><p></p><h3>&#128269; <strong>Why Cursor Might Dump Vector Search for Plain Text</strong></h3><p>A <strong><a href="https://www.linkedin.com/posts/jjackyliang_i-have-a-bold-prediction-cursor-is-going-activity-7346586323849211904-Ou6r/">LinkedIn post</a> by Jacky Liang from Tigerdata</strong> argues that grep-style lexical search beats embeddings for code&#8212;<strong>and Cursor could follow Claude Code&#8217;s lead.</strong></p><ul><li><p>Claude Code uses only exact-match tools like <code>grep</code> and outperforms Cursor&#8217;s vector pipeline for code retrieval </p></li><li><p>The architects behind Claude Code are joining Cursor, sparking talk of ripping out vectors or relegating them to coarse filtering </p></li><li><p>Commenters note a broader industry shift: vectors excel at semantic chat, but keyword search wins for structured codebases</p></li></ul><p><strong>Our Takeaway:</strong> Expect a renaissance of smarter lexical search in developer tools&#8212;embeddings aren&#8217;t always the right hammer.</p><p></p><h3>&#128680; <strong>Supabase MCP&#8217;s Can Dump your Private Tables?</strong></h3><p>A <strong><a href="https://www.generalanalysis.com/blog/supabase-mcp-blog">new write-up</a></strong> shows how a single malicious support-ticket message can make <strong>Supabase&#8217;s MCP</strong> dump your private tables.</p><ul><li><p>An attacker embeds SQL-style instructions in a ticket; the Cursor IDE&#8217;s LLM agent obediently executes them and posts the results back to the thread.</p></li><li><p>Because the agent runs with the all-powerful <code>service_role</code>, it sails past row-level security and can read anything.</p></li><li><p><strong>Fixes proposed</strong>: run MCP in read-only mode and add prompt-injection filters to scrub user text before it hits the model.</p></li></ul><p><strong>Our Takeaway:</strong> Be extra careful with IAM in the new and evolving era.<br></p><h3>&#128161; <strong>Apache Arrow Summit 2025 &#8211; CFP Now Open!</strong></h3><p><strong><a href="https://lnkd.in/dTZ39Ty2">Apache Arrow Summit</a></strong> is on <strong>2 Oct 2025 in Paris</strong>, hosted by <strong>PyData Paris</strong>. It&#8217;s the place to gather the Arrow community, share what&#8217;s coming next, and spark new ideas.</p><ul><li><p>Call for Proposals is <strong>open now</strong> &#8211; submit talks by <strong>26 July</strong></p></li><li><p>Topics welcome: Arrow internals, ecosystem tooling, real-world use cases, road-map discussions</p></li><li><p>Meet contributors, users and maintainers face-to-face in the Paris!<br></p></li></ul><h3><strong>Community &amp; Events:</strong></h3><ul><li><p><strong>Event:</strong> <em>Lakehouse Days</em> &#8212; &#8220;<strong>Real&#8209;time Streaming Ingest</strong>&#8221; is <strong>tomorrow</strong> at<strong>, AWS Office, Bangalore</strong> with speakers from <strong>Confluent, AWS, and e6data</strong>. <em>Subscribe to the <strong><a href="https://lu.ma/Lakehouse-days-with-e6data">calendar</a></strong> to register early for future events!</em></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Trk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Trk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 424w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 848w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 1272w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Trk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png" width="271" height="316.2287087912088" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1699,&quot;width&quot;:1456,&quot;resizeWidth&quot;:271,&quot;bytes&quot;:3968834,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/167509682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!6Trk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 424w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 848w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 1272w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>We are going to <strong><a href="https://machinecon.aimmediahouse.com/">MachineCon USA</a></strong> in the New York City this July 25th!</p></li><li><p><strong>Hiring:</strong> We&#8217;re growing! Check out open engineering roles [<strong><a href="https://e6data.zohorecruit.in/jobs/Careers">here</a></strong>].</p></li></ul><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/agent-built-junit-suites-cdc-vs-daily?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&amp;token=eyJ1c2VyX2lkIjozMzY5NTEzMTYsInBvc3RfaWQiOjE2NzUwOTY4MiwiaWF0IjoxNzUyMTc1Mzc3LCJleHAiOjE3NTQ3NjczNzcsImlzcyI6InB1Yi00ODUyMjE2Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.DQESL70eQ20xYvtmtAa6hdFZdbWsxLwOqCYmWYU1vbs&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/how-to-rust-cursors-vector-search?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.e6data.com/p/how-to-rust-cursors-vector-search?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p>]]></content:encoded></item><item><title><![CDATA[Agent-Built JUnit Suites, CDC vs Daily Snapshots, Ducklake vs Iceberg, Queries, Costs, and more]]></title><description><![CDATA[Every Friday, we deliver your weekend win: copy-paste tutorial, cost-optimisation technique, CFPs worth your pitch, and fresh ideas from the field. Stop surfing fluff.]]></description><link>https://newsletter.e6data.com/p/agent-built-junit-suites-cdc-vs-daily</link><guid isPermaLink="false">https://newsletter.e6data.com/p/agent-built-junit-suites-cdc-vs-daily</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 04 Jul 2025 13:02:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!G3W-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NJdO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NJdO!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif 424w, https://substackcdn.com/image/fetch/$s_!NJdO!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif 848w, https://substackcdn.com/image/fetch/$s_!NJdO!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif 1272w, https://substackcdn.com/image/fetch/$s_!NJdO!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NJdO!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif" width="722" height="307.572" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:213,&quot;width&quot;:500,&quot;resizeWidth&quot;:722,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Farce the Music: Famous Movie Scenes Country Reaction Gifs&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Farce the Music: Famous Movie Scenes Country Reaction Gifs" title="Farce the Music: Famous Movie Scenes Country Reaction Gifs" srcset="https://substackcdn.com/image/fetch/$s_!NJdO!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif 424w, https://substackcdn.com/image/fetch/$s_!NJdO!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif 848w, https://substackcdn.com/image/fetch/$s_!NJdO!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif 1272w, https://substackcdn.com/image/fetch/$s_!NJdO!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13643942-e489-405b-9dff-4dc66f8c88db_500x213.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p></p><h3>&#128295; <strong>We</strong> used a few agents to bring in JUNit runner support for regression run, it sort of worked?</h3><p>An engineer from our team pointed <strong>IntelliJ&#8217;s Junie agent</strong> (Claude 3.7 under the hood) at a hand-rolled regression harness (~2000 SQL queries) and asked it to surface each query as an IDE-runnable JUnit test. Here is what went down:</p><ul><li><p><strong>Agent #1 &#8212; </strong><em><strong>Junie</strong></em><strong> + JUnit 4 reflection adapter.</strong> A single prompt asked Junie (IntelliJ&#8217;s Claude-powered agent) to analyse <code>RegressionTest.runTests</code> and emit an adapter that registers every query as a JUnit test. Junie delivered code that <strong>works</strong>, but hangs on brittle reflection hooks and produces hash-based test names with no Run/Debug actions.</p></li><li><p><strong>Agent #1, retry &#8212; JUnit 5 Parameterized / Dynamic tests.</strong> After nudging Junie toward Jupiter, it rewrote the runner, solving the name issue but introducing new ones: incremental discovery kept appending tests mid-run; only ~70 % of the 2006 tests were detected, and per-test stdout/stderr never reached the console.</p></li><li><p><strong>Edge-cases that still bite.</strong></p><ul><li><p>IDE agent loops on feedback instead of asking for help.</p></li><li><p>Environment variables (default schema, storage endpoint) are ignored, so failed tests hit remote buckets.</p></li></ul></li></ul><p><strong>Our Takeaway:</strong> IDE agents can scaffold a runner in minutes, but deep plumbing&#8212;stable discovery, clean logs, env-aware execution&#8212;still needs a human who knows JUnit extension points. <strong>Treat today&#8217;s agents as junior devs: great at boilerplate, lost in integration hell.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G3W-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G3W-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png 424w, https://substackcdn.com/image/fetch/$s_!G3W-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png 848w, https://substackcdn.com/image/fetch/$s_!G3W-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png 1272w, https://substackcdn.com/image/fetch/$s_!G3W-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G3W-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png" width="420" height="312.6666666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:536,&quot;width&quot;:720,&quot;resizeWidth&quot;:420,&quot;bytes&quot;:98011,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/167509682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G3W-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png 424w, https://substackcdn.com/image/fetch/$s_!G3W-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png 848w, https://substackcdn.com/image/fetch/$s_!G3W-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png 1272w, https://substackcdn.com/image/fetch/$s_!G3W-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9ee9cf-fe0f-4bf3-a8c5-6f40d7cbdf76_720x536.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">BTS of how it went down. :p</figcaption></figure></div><p></p><h3>&#128260; <strong>CDC vs. Daily Snapshots?</strong></h3><p>Data engineers on <a href="https://www.reddit.com/r/dataengineering/comments/1llt6y9/do_you_use_cdc_if_yes_how_does_it_benefit_you/">Reddit</a> recently argued (and sort of concluded) that <strong>CDC is cheaper, safer and more granular than nightly full snapshots:</strong></p><ul><li><p>Pulling changes straight from the DB WAL avoids table locks and SELECT * hell.</p></li><li><p>Real-time extraction doesn&#8217;t force real-time delivery&#8212;buffer it rather then landing hourly or daily.</p></li><li><p>For 100M+ rows tables, daily snapshots hammer network &amp; compute; Debezium-style CDC streams are faster.</p></li></ul><p><strong>Our Takeaway:</strong> Unless your tables are tiny, CDC + append-only logs win on both cost and correctness, while still letting teams materialize snapshot views whenever they like.</p><p></p><h3>&#129414; <strong>DuckLake vs. Iceberg &#8212; Metadata Wars (Again?)</strong></h3><p><a href="https://www.reddit.com/r/dataengineering/comments/1lmmhz4/will_ducklake_overtake_iceberg/">r/dataengineering</a> is hot with another metadata war this week:</p><ul><li><p><strong>One-binary catalog.</strong> DuckLake boots from a single DuckDB file; you&#8217;re querying with plain SQL in minutes, no Hive metastore, no Spark session spin-up.</p></li><li><p><strong>Adoption hurdle.</strong> Redditors call it a &#8220;pure improvement&#8221; over Iceberg but say they&#8217;ll jump only after Spark / Trino / Flink connectors and real petabyte tests&#8212;land.</p></li><li><p><strong>Could just fold in.</strong> Even Iceberg fans note its catalog is already pluggable; a SQL-backed option like DuckLake might slip upstream instead of forking the ecosystem.</p></li></ul><p><strong>Our Takeaway:</strong> The fight isn&#8217;t about table files but about where metadata lives. We will watch out for connector support: once the major engines speak DuckLake, JSON catalogs might become legacy.</p><p></p><h3>&#128161; <strong>We&#8217;re cooking up a <a href="https://www.e6data.com/query-and-cost-optimization-hub">Query &amp; Cost Optimization Hub</a></strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mZ60!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mZ60!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png 424w, https://substackcdn.com/image/fetch/$s_!mZ60!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png 848w, https://substackcdn.com/image/fetch/$s_!mZ60!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png 1272w, https://substackcdn.com/image/fetch/$s_!mZ60!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mZ60!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png" width="713" height="384.9024725274725" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:786,&quot;width&quot;:1456,&quot;resizeWidth&quot;:713,&quot;bytes&quot;:472756,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/167509682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mZ60!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png 424w, https://substackcdn.com/image/fetch/$s_!mZ60!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png 848w, https://substackcdn.com/image/fetch/$s_!mZ60!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png 1272w, https://substackcdn.com/image/fetch/$s_!mZ60!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ab7515-7716-46c5-82e0-d6f96a0c4b0e_2784x1502.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;re launching: a hub which distills <a href="https://www.e6data.com/query-and-cost-optimization-hub">cost-optimising and latency-crushing playbooks</a> </p><ul><li><p>One for or every major compute engine&#8212;<strong>Snowflake, Databricks, BigQuery, Redshift, Athena, ClickHouse, Fabric, Starburst, and more.</strong> </p></li><li><p>Filter by engine, choose &#8220;query&#8221; or &#8220;cost&#8221; mode, and grab beginner-to-advanced tactics. </p></li></ul><p>First one is out with <strong><a href="https://www.e6data.com/query-and-cost-optimization-hub/snowflake-cost-optimization-15-proven-tactics-to-cut-your-snowflake-cost">Snowflake Cost Optimisation for Beginners</a></strong>, rest to follow soon.</p><p></p><h3><strong>Community &amp; Events:</strong></h3><ol><li><p><strong>Event:</strong> <em>Lakehouse Days</em> &#8212; &#8220;<strong>Real&#8209;time Streaming Ingest</strong>&#8221; on <strong>12 July, AWS Office, Bangalore</strong> with speakers from <strong>Confluent, AWS, and e6data</strong>. <em>Subscribe to the <strong><a href="https://lu.ma/Lakehouse-days-with-e6data">calendar</a></strong> and register before seats run out!</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Trk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Trk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 424w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 848w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 1272w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Trk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png" width="271" height="316.2287087912088" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1699,&quot;width&quot;:1456,&quot;resizeWidth&quot;:271,&quot;bytes&quot;:3968834,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.e6data.com/i/167509682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Trk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 424w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 848w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 1272w, https://substackcdn.com/image/fetch/$s_!6Trk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F305313c1-492f-441b-ac5c-a9d592732983_4800x5600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p></li><li><p><strong>Hiring:</strong> We&#8217;re growing! Check out open engineering roles [<strong><a href="https://e6data.zohorecruit.in/jobs/Careers">here</a></strong>].</p></li></ol><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/agent-built-junit-suites-cdc-vs-daily?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Please like, follow, and share if you liked it. </p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/agent-built-junit-suites-cdc-vs-daily?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.e6data.com/p/agent-built-junit-suites-cdc-vs-daily?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div>]]></content:encoded></item><item><title><![CDATA[How to check Snowflake costs, My slowing Spark pipeline, IndiCS CFPs, NL2SQL MCP, and more]]></title><description><![CDATA[Every Friday, we deliver your weekend win: copy-paste tutorial, cost-optimisation technique, CFPs worth your pitch, and fresh ideas from the field. Stop surfing fluff.]]></description><link>https://newsletter.e6data.com/p/how-to-check-snowflake-costs-my-slowing</link><guid isPermaLink="false">https://newsletter.e6data.com/p/how-to-check-snowflake-costs-my-slowing</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 27 Jun 2025 13:02:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!D8kd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D8kd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D8kd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg 424w, https://substackcdn.com/image/fetch/$s_!D8kd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg 848w, https://substackcdn.com/image/fetch/$s_!D8kd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!D8kd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D8kd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg" width="1456" height="823" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:823,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Ryan Gosling, Brad Pitt go full un-hot in 'Big Short'&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Ryan Gosling, Brad Pitt go full un-hot in 'Big Short'" title="Ryan Gosling, Brad Pitt go full un-hot in 'Big Short'" srcset="https://substackcdn.com/image/fetch/$s_!D8kd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg 424w, https://substackcdn.com/image/fetch/$s_!D8kd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg 848w, https://substackcdn.com/image/fetch/$s_!D8kd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!D8kd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9de1f557-8af6-466e-b72f-323c1141e6b1_3200x1808.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>&#129302; Is Your Snowflake Bill <em>Really</em> High? Here is How to Check</h3><ul><li><p><strong>Warehouse size</strong> &#8211; Anything above <strong>LARGE (&gt; 8 credits/hr/cluster)</strong> is too much. Cost doubles each size step (XS 1 &#8594; 4XL 128 credits/hr); leave a 4XL up for 8 hrs and you burn <strong>$2 k&#8211;$6 k</strong>.</p></li><li><p><strong>Idle ratio</strong> &#8211; If a warehouse sits <strong>idle &gt; 30&#8211;40 %</strong> of the time, you&#8217;re paying for nothing (billing is per-second but with a 60s minimum).</p></li><li><p><strong>Cloud-services share</strong> &#8211; When cloud-services charges exceed <strong>10 %</strong> of daily compute, you&#8217;re past the free tier and likely over-using serverless features.</p></li><li><p><strong>Query cost</strong> &#8211; Interactive query over <strong>0.1 credit (~$0.20&#8211;$0.60)</strong> starts to sting; Dashboards usually finish under <strong>0.01 credit</strong>.</p></li><li><p><strong>Account run-rate</strong> &#8211; If your 90-day burn rate is <strong>&gt;20 % above</strong> your annual commitment pace, you&#8217;re in high-cost territory.</p></li><li><p><strong>Serverless features</strong> &#8211; Snowpipe, materialised views, auto-clustering, etc. should stay <strong>&lt;15 %</strong> of total compute; once they creep higher, review them.</p></li></ul><p><strong>Takeaway</strong> &#8211; If two or more bullets fire, you have real savings hiding in plain sight. <strong>Watch next week for our beginner &amp; advanced optimisation guides.</strong></p><p></p><h3>&#128227; Call for Proposals: ACM India <strong>IndiCS</strong> Seminars</h3><p><strong><a href="https://india.acm.org/research/indics">IndiCS</a></strong> is ACM India&#8217;s Dagstuhl&#8209;style, fully&#8209;immersive seminar series that brings 45 invited researchers together for 3&#8211;5&#8239;days of deep dives at rotating venues across India. <strong>The program funds student travel and encourages open&#8209;ended collaboration on frontier CS topics.</strong></p><p><strong>Next deadline:</strong> <strong>31 July 2025</strong> (for seminars in autumn/winter 2025). A second window closes <strong>31 December 2025</strong> for spring/summer 2026 slots. </p><p></p><h3>&#128012; <strong>Spark&#8217;s Hybrid Engine: Unified &#8800; Fast</strong></h3><p><strong>Vanilla Spark juggles three execution paths</strong>&#8212;row&#8209;oriented Volcano, JVM&#8209;generated WholeStage, and a thin slice of vectorization. That Swiss&#8209;army flexibility serves logs, streams, ML, and SQL in one runtime, but it also brings GC churn, branchy code, and frequent fall&#8209;backs when UDFs or exotic types appear.</p><p>For classic lakehouse OLAP (Parquet + flat schemas, few UDFs), a <em>fully</em> vectorized kernel wins. Vendors are swapping Spark&#8217;s final stage:</p><ul><li><p><strong>Databricks Photon</strong>, <strong>Apache Gluten</strong>, <strong>DataFusion Comet</strong> drop C++/Rust kernels into the DAG and see <strong>2&#8211;4&#215; speed&#8209;ups</strong> with no API break.</p></li><li><p><strong>Spark&#8217;s own vector path is narrow</strong>&#8212;only some Parquet/ORC scans use SIMD batches&#8212;so WholeStage still dominates most queries.</p></li></ul><p><strong>Takeaway</strong>: <strong>Spark is slow for pure OLAP by design, not accident</strong>. If your workload is <strong>&gt;&#8239;90&#8239;% columnar SQL, a vector back&#8209;end or Arrow&#8209;native engine can reduce runtime and cost</strong>&#8212;keep vanilla Spark for truly mixed jobs. <a href="https://semyonsinchenko.github.io/ssinchenko/post/why-spark-is-slow/">Read more</a>.</p><p></p><h3>&#129514; <strong>Why Data Engineers Still Skip Tests &#8212; Reddit Reacts</strong></h3><p>A <strong><a href="https://www.reddit.com/r/dataengineering/comments/1lioeql/why_data_engineers_dont_test_according_to_reddit/">r/dataengineering thread</a></strong> boiled the problem down to four points:</p><ol><li><p><strong>Cost &amp; deadlines</strong> &#8211; quick&#8209;and&#8209;dirty DAGs ship first; bug&#8209;fixes bill to the next sprint.</p></li><li><p><strong>Skills gap</strong> &#8211; many DEs start in SQL/Excel land, meet <code>pytest</code> only after pain.</p></li><li><p><strong>Unstable inputs</strong> &#8211; schemas drift, APIs mutate; unit tests feel brittle, so teams lean on downstream DQ checks.</p></li><li><p><strong>Tooling debt</strong> &#8211; unclear lines between unit, data&#8209;quality, and prod monitors stalls adoption despite dbt, GX, SQLMesh.</p></li></ol><p><strong>Takeaway</strong>: testing is an org&#8209;level habit, not a novel tech problem. Seed the habit with lightweight assertions + CI on critical transforms&#8212;rerunning petabyte jobs is the expensive path.</p><p></p><h3>&#129504; <strong>Ask in English, Get Perfect SQL &#8212; Even Across Hundreds of Tables</strong></h3><p>We&#8217;re sipping our own Kool-Aid again at e6data: <strong>NL2SQL v0</strong> plugs into e6-MCP and turns a plain-language prompt into a rock-solid, schema-faithful query&#8212;even when your data stores looks like a city map.</p><p>How? An <em>agentic</em> triple-play:</p><ul><li><p><strong>Vector search</strong> surfaces the likeliest tables &amp; columns</p></li><li><p><strong>Random-walk graph traversal</strong> traces real relationships, not guesswork</p></li><li><p><strong>Cross-attention re-ranker</strong> locks the winning set, then lets the agent self-reflect and patch any schema slip-ups</p></li></ul><p>The result: accurate answers, zero hard-coded limits. Where Databricks stalls at 25 tables and Snowflake taps out at 10, NL2SQL sails past&#8212;ready for enterprise-scale schemas.</p><p>&#10145;&#65039; <strong>Join the <a href="https://www.e6data.com/contact/demo">private beta</a></strong></p><p></p><h3><strong>Community &amp; Events:</strong></h3><ol><li><p><strong>Blog:</strong> <em>Iceberg Catalogs 2025</em> &#8212; a deep dive into modern metadata management across Project Nessie, Apache Gravitino, Apache Polaris, Lakekeeper, and more. [<strong><a href="https://www.e6data.com/blog/iceberg-catalogs-2025-emerging-catalogs-modern-metadata-management">Read here</a></strong>]</p></li><li><p><strong>Event:</strong> <em>Lakehouse Days</em> &#8212; &#8220;<strong>Real&#8209;time Streaming Ingest</strong>&#8221; on <strong>12 July, Bangalore</strong> with speakers from <strong>Confluent</strong>. <em>Subscribe to the <strong><a href="https://lu.ma/Lakehouse-days-with-e6data">calendar</a></strong> for early registration.</em></p></li><li><p><strong>Hiring:</strong> We&#8217;re growing! Check out open engineering roles [<strong><a href="https://e6data.zohorecruit.in/jobs/Careers">here</a></strong>].<br></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/how-to-check-snowflake-costs-my-slowing?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/p/how-to-check-snowflake-costs-my-slowing?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.e6data.com/p/how-to-check-snowflake-costs-my-slowing?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p><br></p></li></ol>]]></content:encoded></item><item><title><![CDATA[Deep-Fake Interviews, ETL Bottlenecks, and Shift-Left Showdowns]]></title><description><![CDATA[Your weekly sweep of top 3 trends and hacks in data engineering by leading tech companies a peek at what we&#8217;re building at e6data.]]></description><link>https://newsletter.e6data.com/p/deep-fake-interviews-elt-bottlenecks</link><guid isPermaLink="false">https://newsletter.e6data.com/p/deep-fake-interviews-elt-bottlenecks</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 20 Jun 2025 13:31:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Elw2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Elw2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Elw2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Elw2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Elw2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Elw2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Elw2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Elw2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Elw2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Elw2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Elw2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd10a62-d1e3-4730-9e3a-db2cf87b3654_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><br>&#129302; Deep-Fake Devs: Well, this could have been a sitcom episode</h3><p>Recently, a hiring manager spent <strong>30 minutes interviewing a deepfake AI bot </strong>(as a data engineer), AND the internet is hilarious about it. <br><br><strong>Redditors point out their irony</strong>: HR wanted algorithms to pick humans, so now algorithms are applying for the jobs. Since ATS filters let bot-written CVs sail through (thanks to tools like Jobscan, HyperWrite&#8217;s Resume Aligner), real engineers suffered at times. Give the entire thread a read <a href="https://www.reddit.com/r/dataengineering/comments/1l9y4pf/ai_is_literally_coming_for_you_job/">here</a>. <br><br><strong>Takeaway</strong>: The HR industry for tech hiring is (and will more and more) adding checks like live&#8209;coding, camera checks, put your hand in front of your face, or &#8220;tell me five facts about X, unrelated to engineering&#8221;&#8209;style prompts to help things get better.<br></p><h3>&#9881;&#65039; Modern ETL is all about living up to modern software deployment principles</h3><p>In yet another <a href="https://www.reddit.com/r/dataengineering/comments/1kxb3ip/dbt_slower_than_original_etl/">ETL rant</a>, a Reddit user claimed that an <strong>Oracle migration to dbt</strong> went wrong from one stored procedure to hundreds of models, stretching runtime from 1 hour to 4 hours. The community notes that dbt&#8217;s strength is testable, incremental ETL, not raw speed, and that each model incurs orchestration overhead, and Oracle&#8217;s optimizer prefers fatter queries. <br><br><strong>Takeaway</strong>: Power users fix it by materialising large views, batching models, or shifting heavy lifts to Spark/Python.</p><h3><br>&#128737;&#65039; Shift Left or Shift Blame?</h3><p><a href="https://www.reddit.com/r/dataengineering/comments/1ldtqzt/start_right_shift_left_is_that_just_another/">Reddit</a> is split on yet again or whether this is an actual trend or just another marketing gimmick: <strong>&#8220;start right&#8221;</strong> (spray downstream observability) then <strong>&#8220;shift left&#8221;</strong> (convert findings into upstream data contracts). <br><br>Some people argue it is the latter, while few give examples like- If a dashboard user spots bad numbers, you trace the issue back to the raw data and repair it there. That single upstream fix saves every downstream report, not just the one someone complained about. Which is why shift left is more of a stronger detection and prevention strategy. <br><br><strong>Takeaway</strong>: Choose your battles depending on your scale. Meanwhile, vendors already sell both ends of the funnel.</p><p></p><h3>We&#8217;re fixing multi-cloud pain, too &#8212; query data wherever it sits</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_xDI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_xDI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png 424w, https://substackcdn.com/image/fetch/$s_!_xDI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png 848w, https://substackcdn.com/image/fetch/$s_!_xDI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png 1272w, https://substackcdn.com/image/fetch/$s_!_xDI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_xDI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png" width="602" height="351.8557692307692" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:851,&quot;width&quot;:1456,&quot;resizeWidth&quot;:602,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_xDI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png 424w, https://substackcdn.com/image/fetch/$s_!_xDI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png 848w, https://substackcdn.com/image/fetch/$s_!_xDI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png 1272w, https://substackcdn.com/image/fetch/$s_!_xDI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c05924-5d23-40c7-99fa-21e80eaf4664_1600x935.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;re sipping our own Kool-Aid at e6data: <strong>Hybrid Lakehouse</strong> lets you drop a single, location-aware cluster next to every storage bucket&#8212;on-prem, AWS, or &#8220;that other cloud.&#8221; Compute stays put, so nothing mixes across regions, <strong>cutting egress fees by up to 99%</strong> and keeping every table compliant and governed.</p><p>With affinity-aware execution and streaming ingest baked in, dashboards light up in <strong>sub-minute latency</strong>&#8212;no duplicate copies, no forklift migrations, and zero ACL drift.</p><p><em><strong>Spin up a hybrid cluster? &#10145;&#65039; <a href="https://www.e6data.com/contact/demo">Book a 15-min walkthrough</a><br></strong></em></p><h3>Some cool stories from our team:</h3><ol><li><p><a href="https://www.e6data.com/blog/vector-semantic-search-lakehouse-faster-insight-unstructured-data">Vector Search on S3 through SQL</a>- breaking down e6data&#8217;s ability to query unstructured data</p></li><li><p><a href="https://www.e6data.com/blog/geospatial-analytics-performance-bottleneck-h3-vs-quadkey-for-spatial-indexing">Solving Geospatial Analytics Performance Bottleneck: H3 vs Quadkey</a>: Unlocking geospatial analytics in production at scale</p></li><li><p><a href="https://www.linkedin.com/pulse/hive-metastore-apache-iceberg-catalog-ankur-ranjan-qmhpc/?trackingId=GkwDThlRRJWQUqlV2lJNoQ%3D%3D">Hive Metastore as an Apache Iceberg Catalog</a>: moving pieces, end-to-end setup, locking, and concurrency quirks<br></p></li></ol><h3>Meet us on the Road</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NelP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NelP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png 424w, https://substackcdn.com/image/fetch/$s_!NelP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png 848w, https://substackcdn.com/image/fetch/$s_!NelP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!NelP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NelP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png" width="241" height="212.08" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1056,&quot;width&quot;:1200,&quot;resizeWidth&quot;:241,&quot;bytes&quot;:2205413,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NelP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png 424w, https://substackcdn.com/image/fetch/$s_!NelP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png 848w, https://substackcdn.com/image/fetch/$s_!NelP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!NelP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351466b1-0562-45a9-bffe-e4f92a0b60cb_1200x1056.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>We had fun at the <strong><a href="https://www.databricks.com/dataaisummit">Data + AI Summit</a></strong>, <strong><a href="https://www.snowflake.com/en/summit/">Snowflake Summit</a></strong>, and <strong><a href="https://aws.amazon.com/events/summits/mumbai/">AWS Summit </a></strong>this week. Here&#8217;s what&#8217;s went down:</p><ul><li><p>Vishnu took the stage at the Data + AI Summit showing how e6data lifts your lakehouse to any environment&#8212;cloud, on-prem, or hybrid.</p></li><li><p>Bharath, our Head of Product, took the stage at the AWS Summit to present how e6data powers vector search.</p></li><li><p>We exchanged ideas with a mountain of cutting-edge teams, stocking the team&#8217;s calendar with months of interesting conversations.</p></li></ul><p>Also, the<a href="https://lu.ma/Lakehouse-days-with-e6data"> </a><strong><a href="https://lu.ma/Lakehouse-days-with-e6data">Lakehouse Days community</a></strong> has officially 1200 members and counting from top engineering teams like Netflix, Uber, Meta, and more. <strong>We are hosting the next one on July 12th in Bangalore, subscribe to the <a href="https://lu.ma/Lakehouse-days-with-e6data">calendar</a> for the first updates!</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID! Subscribe and share your love!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Spark’s Crossroads, Polars Momentum, and YouTube’s Guardrails for Petabytes]]></title><description><![CDATA[Your weekly sweep of top 3 trends and hacks in data engineering by leading tech companies a peek at what we&#8217;re building at e6data.]]></description><link>https://newsletter.e6data.com/p/sparks-crossroads-polars-momentum</link><guid isPermaLink="false">https://newsletter.e6data.com/p/sparks-crossroads-polars-momentum</guid><dc:creator><![CDATA[Data Engineering ACID | e6data]]></dc:creator><pubDate>Fri, 13 Jun 2025 14:45:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!F-3x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F-3x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F-3x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!F-3x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!F-3x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!F-3x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F-3x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F-3x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!F-3x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!F-3x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!F-3x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F659ccbea-7d95-4d1c-8f9c-2b0623406216_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>&#128293;&#8220;Spark Is the New Hadoop&#8221; &#8212; or Is It?</h2><p>A widely<a href="https://www.reddit.com/r/dataengineering/comments/1kb974e/spark_is_the_new_hadoop/"> shared post</a> argues we&#8217;ve reached <strong>peak&#8239;Spark</strong>: JVM GC pain, sluggish startup, and the rise of Rust/C++ query engines (Photon, DataFusion&#8239;+&#8239;Comet, Velox, Daft) point to a future with lighter, Arrow&#8209;native back&#8209;ends. Critics counter that Spark&#8217;s <em>API</em> and rich ecosystem&#8212;not its engine&#8212;are the real stickiness, and drop&#8209;in Rust/C++ cores may keep it alive.</p><p><strong>Takeaway:</strong> Decouple APIs from engines and keep an eye on emerging Rust projects before declaring Spark &#8220;<strong>dead.</strong>&#8221;<br></p><h2>&#9654;&#65039; <strong>YouTube&#8209;Scale Guardrails</strong></h2><p><em>Decades in, <a href="https://blog.youtube/news-and-events/happy-birthday-youtube-20/">YouTube still breaks the curve</a>: </em><strong>20&#8239;million</strong> videos are uploaded <strong>per day, 20&#8239;billion</strong> videos are stored, and <strong>3.5&#8239;billion</strong> daily likes.</p><p>At that size, a forgotten SELECT * or rogue cluster can cost millions. Therefore, the data infra team enforces automated guardrails:</p><ul><li><p>Mandatory partitions (upload_date or shard_id)</p></li><li><p>Query cost alerts &#8805; $1,000</p></li><li><p>&#8220;Design for safe failure&#8221;, so teams accidentally don&#8217;t melt the budget<br></p></li></ul><h2><strong>&#128059; Polars &amp; the &#8220;Goldilocks&#8221; Zone</strong></h2><p>Early adopters report <strong>5&#8211;7x speed&#8209;ups</strong> (and ~80&#8239;% cost cuts) by replacing PySpark jobs with Rust&#8209;based <strong><a href="https://pola.rs/">Polars</a></strong> on a single <code>r6g.8xlarge</code>. Key features:</p><ul><li><p>Lazy execution by default</p></li><li><p>Full threading on Arrow columns</p></li><li><p>Upcoming distributed mode via Ray</p></li><li><p>Optional GPU acceleration through RAPIDS</p></li></ul><p>Expect Polars&#8239;+&#8239;Iceberg to gobble the 10&#8239;GB to 1&#8239;TB jobs that don&#8217;t justify a full cluster.</p><p></p><h2><strong>We are working on something cool too! (land and query event data in &lt;15 seconds)</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mtt3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mtt3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png 424w, https://substackcdn.com/image/fetch/$s_!Mtt3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png 848w, https://substackcdn.com/image/fetch/$s_!Mtt3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png 1272w, https://substackcdn.com/image/fetch/$s_!Mtt3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mtt3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png" width="942" height="330" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:330,&quot;width&quot;:942,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mtt3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png 424w, https://substackcdn.com/image/fetch/$s_!Mtt3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png 848w, https://substackcdn.com/image/fetch/$s_!Mtt3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png 1272w, https://substackcdn.com/image/fetch/$s_!Mtt3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27709739-0938-4e4f-a7dd-7d423dc9b269_942x330.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;re drinking our own Kool-aid at e6data: the team is working on a new product offering that lands Kafka (and soon CDC) streams directly into your lakehouse and makes them queryable in sub-second latency&#8212;no Flink clusters, no real-time OLAP database, no data shuffles. This can help <strong>reduce time-to-insight by 95%</strong>, and relieve some of that infrastructure headache. <em>Look for native CDC support and mobile-SDK ingestion in v2 next quarter.</em></p><p><strong>Early&#8209;access &#10145;&#65039;<a href="https://www.e6data.com/contact/demo"> Book a 15&#8209;min walkthrough</a></strong></p><p></p><h3><strong>Fresh From the Team</strong></h3><ul><li><p><strong>Podcast:</strong> <em>Decoding e6data&#8217;s Architecture</em> &#8211; our Head of Engineering with Pete&#8239;Soderling, <em>Zero Prime</em> (<a href="https://www.e6data.com/blog/e6data-architectural-bets-zero-prime-podcast">listen</a>)</p></li><li><p><strong>Blog:</strong> <em>Why Catalogs Matter &#8211; the Book&#8209;keeping of Apache Iceberg</em> (<a href="https://www.linkedin.com/pulse/why-catalogs-matter-book-keeping-apache-iceberg-ankur-ranjan-0mfrc/?trackingId=J81LuGU1RdidyyIYdRSkSw%3D%3D">read</a>)</p></li><li><p><strong>Tech Note:</strong> <em>Eliminating Redundant Computations with Automatic CTE Detection</em> (<a href="https://www.e6data.com/blog/eliminating-redundant-computations-query-plans-automatic-cte-detection">read</a>)<br></p></li></ul><h3><strong>Meet Us on the Road</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jONB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jONB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png 424w, https://substackcdn.com/image/fetch/$s_!jONB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png 848w, https://substackcdn.com/image/fetch/$s_!jONB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png 1272w, https://substackcdn.com/image/fetch/$s_!jONB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jONB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png" width="720" height="431.57894736842104" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/208df38f-667d-45c4-9a05-cb27781a404f_684x410.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:410,&quot;width&quot;:684,&quot;resizeWidth&quot;:720,&quot;bytes&quot;:539488,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jONB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png 424w, https://substackcdn.com/image/fetch/$s_!jONB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png 848w, https://substackcdn.com/image/fetch/$s_!jONB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png 1272w, https://substackcdn.com/image/fetch/$s_!jONB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208df38f-667d-45c4-9a05-cb27781a404f_684x410.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>We are at the <a href="https://www.snowflake.com/en/summit/">Snowflake Summit</a>, Databricks&#8217; <a href="https://www.databricks.com/dataaisummit">Data + AI Summit </a>2025, and the <a href="https://www.gartner.com/en/conferences/apac/data-analytics-india">Gartner Summit</a> this month. Here&#8217;s what&#8217;s going down:</strong></p><ul><li><p>Vishnu is taking the stage at the Data + AI Summit to present how e6data can help deploy your data platform (like Databricks) to any environment: cloud, on-prem, or hybrid.</p></li><li><p>The engineering team is demoing our real-time streaming ingest to the world at our booths at both the Summits</p></li><li><p>Giveaway: Enter to win limited&#8209;edition <em>Star Wars</em> LEGO&#174; sets<br></p></li></ul><h3><strong>Lakehouse&#8239;Days Community</strong></h3><blockquote><p>We just crossed <strong>1,100 members</strong> from teams like Netflix, Uber, and Meta. Next meetup: <strong>June 21&#8239;</strong>&#8212;<a href="https://lu.ma/Lakehouse-days-with-e6data">add the calendar</a> to get first dibs. Subscribe to our <a href="https://www.youtube.com/@e6data">YouTube channel</a> for previous session recordings.</p></blockquote><p><em>Thanks for reading&#8212;see you next month!</em> &#128640;</p><p></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.e6data.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Data Engineering ACID- Subscribe and share your love!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>