Implementation status

This page summarizes the features supported by different Parquet implementations.

Note: This is a work in progress and we would welcome help expanding its scope.

Legend

The value in each box means:

  • ✅: supported
  • ❌: not supported
  • (R/W): partial reader/writer only support
  • (blank) no data

Implementations:

Physical types

Data typeC++JavaGoRustcuDF
BOOLEAN
INT32
INT64
INT96 (1)
FLOAT
DOUBLE
BYTE_ARRAY
FIXED_LEN_BYTE_ARRAY
  • (1) This type is deprecated, but as of 2024 it’s common in currently produced parquet files

Logical types

Data typeC++JavaGoRustcuDF
STRING
ENUM✅(*)
UUID✅(*)
8, 16, 32, 64 bit signed and unsigned INT
DECIMAL (INT32)
DECIMAL (INT64)
DECIMAL (BYTE_ARRAY)
DECIMAL (FIXED_LEN_BYTE_ARRAY)
DATE
TIME (INT32)
TIME (INT64)
TIMESTAMP (INT64)
INTERVAL✅(*)
JSON✅(*)✅(*)
BSON✅(*)✅(*)
LIST
MAP
UNKNOWN (always null)
FLOAT16✅(*)

(*): Only supported to use its annotated physical type

Encodings

EncodingC++JavaGoRustcuDF
PLAIN
PLAIN_DICTIONARY
RLE_DICTIONARY
RLE
BIT_PACKED (deprecated)❌(*)(R)
DELTA_BINARY_PACKED
DELTA_LENGTH_BYTE_ARRAY
DELTA_BYTE_ARRAY
BYTE_STREAM_SPLIT

(*): Partial read support, but only in the case of level data with a bitwidth of 0

Compressions

CompressionC++JavaGoRustcuDF
UNCOMPRESSED
BROTLI(R)
GZIP(R)
LZ4 (deprecated)
LZ4_RAW
LZO
SNAPPY
ZSTD

Other format level features

C++JavaGoRustcuDF
xxHash-based bloom filters(R)(R)
Bloom filter length (1)(R)(R)
Statistics min_value, max_value
Page index
Page CRC32 checksum
Modular encryption
Size statistics (2)
  • (1) In parquet.thrift: ColumnMetaData->bloom_filter_length

  • (2) In parquet.thrift: ColumnMetaData->size_statistics

High level data APIs for Parquet feature usage

FormatC++JavaGoRustcuDF
External column data (1)(W)
Row group “Sorting column” metadata (2)(W)
Row group pruning using statistics
Row group pruning using bloom filter
Reading select columns only
Page pruning using statistics
  • (1) In parquet.thrift: ColumnChunk->file_path

  • (2) In parquet.thrift: RowGroup->sorting_columns