summaryrefslogtreecommitdiff
path: root/misc/arrow-datafusion/pkg-descr
blob: 259b42075fd252b46fce320c139e042d57389dc0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
DataFusion is an extensible query planning, optimization, and execution
framework, written in Rust, that uses Apache Arrow as its in-memory format.

Features:
- SQL query planner with support for multiple SQL dialects
- DataFrame API
- Parquet, CSV, JSON, and Avro file formats are supported natively. Custom
  file formats can be supported by implementing a `TableProvider` trait.
- Supports popular object stores, including AWS S3, Azure Blob
  Storage, and Google Cloud Storage. There are extension points for implementing
  custom object stores.

Use Cases:
DataFusion is modular in design with many extension points and can be
used without modification as an embedded query engine and can also provide
a foundation for building new systems. Here are some example use cases:
- DataFusion can be used as a SQL query planner and query optimizer, providing
  optimized logical plans that can then be mapped to other execution engines.
- DataFusion is used to create modern, fast and efficient data
  pipelines, ETL processes, and database systems, which need the
  performance of Rust and Apache Arrow and want to provide their users
  the convenience of an SQL interface or a DataFrame API.