ClickHouse

ClickHouse

Column-oriented database for real-time analytics

Features

  • Columnar storage with extreme compression
  • Vectorized query execution for analytical workloads
  • Real-time data ingestion at millions of rows per second
  • Distributed queries across clusters

Pros

  • Fastest analytical database for large-scale queries
  • Handles petabytes of data efficiently
  • Real-time ingestion and querying simultaneously

Cons

  • Not suited for transactional (OLTP) workloads
  • UPDATE and DELETE operations are expensive
  • Operational complexity for cluster management

Overview

ClickHouse is an open-source column-oriented database management system designed for online analytical processing (OLAP). Originally developed at Yandex for web analytics, it can process billions of rows and gigabytes of data per second on a single server.

ClickHouse achieves its performance through columnar storage (only reading columns needed for a query), vectorized execution (processing data in batches), and aggressive compression. It supports distributed deployments for horizontal scaling and integrates with tools like Kafka for real-time data pipelines.

When to Use

ClickHouse is the right choice for real-time analytics dashboards, log and event analysis, time-series data at scale, and any workload involving aggregations over billions of rows. For transactional workloads, use PostgreSQL or MySQL.

Getting Started

docker run -d --name clickhouse \
  -p 8123:8123 -p 9000:9000 \
  clickhouse/clickhouse-server

# HTTP interface
curl 'http://localhost:8123/' --data "SELECT 1"
CREATE TABLE events (
  timestamp DateTime,
  user_id UInt32,
  event String
) ENGINE = MergeTree()
ORDER BY timestamp;