Andrew Fairless, Ph.D.
/About/Bio
/Projects
/Reposts
/Tags
/Categories
Entries tagged :: data engineering
.
2025-08-07
What I Read: database, scale
2025-08-05
What I Read: Python Polars
2025-07-17
What I Read: Iceberg, Lance
2025-07-10
What I Read: Clickhouse Clusters
2025-06-04
What I Read: composable data platforms
2025-05-12
What I Read: AI-ready data
2025-05-05
What I Read: single-node processing
2025-03-31
What I Read: data engineering
2025-03-19
What I Read: polars, pandas
2025-03-12
What I Read: Incremental Jobs, Data Quality
2025-03-11
What I Read: Declarative Data Stack
2025-02-19
What I Read: data migrations
2024-12-10
What I Read: Future of Distributed Systems
2024-12-05
What I Read: data catalogs
2024-11-04
What I Read: Big Data is Dead
2024-10-29
What I Read: Generative AI Platform
2024-10-17
What I Read: Data Flywheels, LLM
2024-09-16
What I Read: Musings on AI Engineering
2024-09-12
What I Read: SSD, Database
2024-09-10
What I Read: Command-line Tools, Faster
2024-09-09
What I Read: Scaling to Multi-Terabyte Datasets
2024-08-27
What I Read: string ufuncs, NumPy 2.0
2024-07-23
What I Read: PostgreSQL
2024-07-10
What I Read: anti-patterns, data reuse
2024-05-29
What I Read: predicate pushdown
2024-05-28
What I Read: Cloud native data loaders ML
2024-05-08
What I Read: How fast process CSV file
2024-04-18
What I Read: Scaling ChatGPT, Engineering Challenges
2024-04-15
What I Read: Probabilistic Linkage, Data Deduplication
2024-04-09
What I Read: Deploy Model
2024-03-28
What I Read: SQL order
2024-03-27
What I Read: Database Disassembly
2024-03-21
What I Read: misleading GPU, CPU benchmarks
2024-02-28
What I Read: Navigating Data Tensions
2024-01-18
What I Read: Unify Batch and ML Systems
2024-01-11
What I Read: Enterprise AI, RAG + Fine Tuning
2023-12-20
What I Read: Distributed Training, Finetuning
2023-12-14
What I Read: data integration
2023-12-13
What I Read: Retrieval Augmented Generation at scale
2023-12-07
What I Read: LLM Apps, Data Pipelines
2023-11-28
What I Read: Data, The Land DevOps Forgot
2023-10-19
What I Read: Composable Data Systems
2023-09-20
Add Columns to Polars DataFrames Quickly
There are straightforward, slow ways to do things, and then there are faster ways. Know how to choose.
Read more ⟶
2023-08-28
What I Read: What we dont talk about
2023-08-02
What I Read: Why Most Data Projects Fail
2023-07-25
What I Read: Kubernetes, Batch
2023-05-18
What I Read: MLOps, Data Engineering
2023-05-02
What I Read: databases
2023-04-26
What I Read: Bloom filter
2023-04-12
What I Read: Infrastructure
2023-04-05
What I Read: Data Product vs. Service
2023-03-27
What I Read: SQL pipelines
2023-03-14
What I Read: Feature Platforms
2023-03-08
What I Read: Data Pipeline Design Patterns
2023-03-01
What I Read: Build vs. Buy, Modern Data Stack
2023-02-20
What I Read: Data Engineering 2023 Predictions
2023-02-16
What I Read: Realtime ML
2023-02-15
What I Read: ELT Schedules, Root Cause Analysis
2023-02-01
What I Read: Realtime ML Pipelines
2023-01-24
What I Learn: Simplest Data Pipeline
2023-01-18
What I learn: How, learn machine learning
2023-01-04
What I Read: Data Pipeline Smoke Tests
2022-12-01
What I Read: Data Engineers, What’s the profession about
2022-11-28
What I Read: data catalogs, metadata
2022-11-01
What I Read: ML Engineering
2022-10-31
What I Read: deliberately create data
2022-09-21
What I Read: streaming for data scientists
2022-09-05
What I Read: Is Data Scientist Still the Sexiest Job?
2022-08-17
What I Read: Pandas Anti-Patterns
2022-08-15
What I Read: Hidden Technical Debts
2022-08-03
What I Read: State of Data Engineering 2022
2022-07-25
What I Read: data replication in production
2022-07-18
What I Read: Death of Data Modeling
2022-07-05
What I Read: Bundling into the Database
2022-06-28
What I Read: Deploying Deep Learning
2022-06-14
What I Read: Should Warehouse Be Immutable?
2022-06-13
What I Read: Modern Stack for ML Infrastructure
2022-05-25
What I Read: Real World Recommendation System
2022-05-23
What I Read: Dataset-Centric Visualization
2022-04-25
What I Read: Data Observability vs. Data Testing
2022-04-04
What I Read: Scale Real-time Data Infrastructure
2022-03-16
What I Watch: Engineering For Data
2022-03-09
What I Read: Real-time machine learning
2022-02-23
What I Read: How Should Organizations Structure their Data?
2022-02-02
What I Read: MLOps Documentation
2022-01-04
What I Read: AntiPatterns, MLOps
2021-12-21
What I Read: From Data Engineer to SysAdmin: Put down the K8s cluster
2021-12-20
What I Read: Lessons on ML Platforms
2021-11-22
What I Read: Is the data engineer still the “worst seat at the table?”
2021-11-10
What I Read: Data Orchestration w/ Nick Schrock (Elementl)
2021-11-03
What I Read: ETL Pipelines with Airflow
2021-10-27
What I Watch: Streaming Data Systems w/ Eric Sammer (Decodable)
2021-09-09
What I Read: The dysfunctions of Data Engineering
2021-09-08
What I Read: Machine Learning, Rendezvous Architecture
2021-09-01
What I Read: Against SQL
2021-08-26
What I Read: The one data platform to rule them all
2021-08-12
What I Read: Building Data Platform
2021-08-12
What I Read: Building Data Platform
2021-07-12
What I Read: Dask vs Vaex
2021-07-06
What I Read: What is a Data Mesh?
2021-06-21
What I Read: Hiring Data Scientists
2021-05-31
What I Read: Feature stores
2021-05-11
What I Read: Models of Data Science teams
2021-04-14
What I Read: Common Errors when Debugging Airflow DAGs
2021-03-30
What I Read: Kedro Pipelines with Airflow
2021-03-01
What I Watch: Future of Data Engineering
2021-03-01
What I Read: Long Live Data Discovery
2021-02-25
What I Read: Feature Store vs Data Warehouse
2021-02-20
What I Read: What is Data Observability?
2021-02-19
What I Watch: Functional Data Engineering
2021-02-14
What I Read: How DAGs grow
2021-02-04
What I Read: Architectures for Modern Data Infrastructure
2021-01-27
What I Read: data quality, ML Ops
2021-01-19
What I Read: Maintaining Machine Learning in Production
2021-01-17
What I Read: Data Scientists Should Be More End-to-End
2021-01-12
What I Read: Intro to Data Engineering for Data Scientists
2021-01-11
What I Read: Making Netflix’s Data Infrastructure Cost-Effective
2021-01-07
What I Read: Running Machine Learning at Scale
2020-12-23
What I Read: DevOps for ML Data