Month: November 2022

Apache Iceberg

What does Apache Iceberg do ?

  • manages large slow-changing tabular data and gives a sql interface to the data so that it can be queried efficiently
  • breaks files into partitions and stores those files into an object store such as s3. partitions can be filtered based on the partition key(s). the partitioning is “invisible partitioning” meaning it is done by the system for you, without exposing the details to the client.
  • separates out metadata management from the data. metadata is not stored in the data files.
  • separates table schema away from the data . a change of column name will not affect the data files. see Schema Evolutuon.
  • allows accessing data as it existed at a specific point in time. this Time Travel feature is useful for auditing, debugging and reproducing issues that occurred in the past . Time travel is implemented using “snapshot isolation” which allows multiple versions of the same table to exist at the same time. (Copy on Write is used in the implementation)
  • provides ACID compliant transactions for data modifications and snapshot isolation for queries, which help ensure consistency and correctness of data
  • does all this through a lightweight design with minimal coordination

Figure. iceberg table format is used by multiple engines and is capable of writing to multiple storage types. source.

Ryan Blue’s discussion on the rationale for the design is here and a presentation with performance improvements is at

“By building support for Iceberg, data warehouses can skip the query layer and share data directly. Iceberg was built on the assumption that there is no single query layer. Instead, many different processes all use the same underlying data and coordinate through the table format along with a very lightweight catalog. Iceberg enables direct data access needed by all of these use cases and, uniquely, does it without compromising the SQL behavior of data warehouses.”

The client is a java jar file which can be embedded.

How does iceberg store files in s3 ?

The top level directory contains the table’s metadata files including the schema and partition information. The metadata files are stored in S3 object store using the table name as the s3 prefix.

The data files are stored in a directory structure that reflects the table partitioning. Partition values are encoded in the directory name.



Why a new table format –

A hands-on look at Iceberg table by dremio is here .

A blog on the Adobe experience with Iceberg is here.

A blog on creating a real-time datawarehouse with Flink and Iceberg –