Features
- In-process OLAP database — no server needed
- Directly queries CSV, Parquet, JSON, and S3 files
- Columnar storage with vectorized execution
- Full SQL support with window functions and CTEs
Pros
- Blazing fast analytical queries on local data
- Zero setup — embeds into any application
- Query files directly without importing
Cons
- Not designed for concurrent write-heavy OLTP workloads
- Single-node only — no distributed mode
- Newer with a growing but smaller ecosystem
Overview
DuckDB is an in-process analytical database management system, often described as “SQLite for analytics.” It’s designed for fast analytical queries on local data, using columnar storage and vectorized execution to process large datasets efficiently without a separate server process.
DuckDB can directly query CSV, Parquet, JSON, and even remote files on S3 without importing them first. This makes it invaluable for data analysis, ETL pipelines, and any scenario where you need to quickly analyze large datasets locally.
When to Use
DuckDB is ideal for data analysis, ETL pipelines, local analytics dashboards, and any workload involving analytical queries on structured data. It’s perfect when you need SQLite-like simplicity but for analytical (OLAP) rather than transactional (OLTP) workloads.
Getting Started
npm install duckdb
import duckdb from "duckdb";
const db = new duckdb.Database(":memory:");
db.all(
"SELECT * FROM read_csv_auto('data.csv') WHERE amount > 100",
(err, rows) => { /* process rows */ }
);