ORC
ORC (Optimized Row Columnar) is a columnar storage file format designed for efficient data storage and processing in big data environments, particularly within the Hadoop ecosystem. It provides high compression ratios, fast query performance, and supports complex data types like structs, lists, and maps. ORC files are optimized for read-heavy analytical workloads, making them ideal for data warehousing and large-scale data processing.
Developers should use ORC when working with Hadoop-based data lakes or data warehouses, as it significantly reduces storage costs and improves query performance for analytical queries compared to row-based formats. It is especially beneficial in Apache Hive, Apache Spark, or Presto environments where columnar pruning and predicate pushdown can skip irrelevant data during scans. Use cases include log analysis, business intelligence reporting, and ETL pipelines that require efficient storage and fast aggregations on large datasets.