HDFS is a distributed filesystem for large-scale data storage, part of the Apache Hadoop ecosystem.
- High-throughput access to large datasets
- Fault-tolerant storage across nodes
- Tight integration with Hadoop components
- Big data storage and analytics
- Data lakes for processing pipelines
- Distributed storage for batch workloads
- Open-source and self-hosted
- Typically deployed with other Hadoop components
¶ History and References