HDFS is a distributed filesystem for large-scale data storage, part of the Apache Hadoop ecosystem.
- High-throughput access to large datasets
- Fault-tolerant storage across nodes
- Tight integration with Hadoop components
- Big data storage and analytics
- Data lakes for processing pipelines
- Distributed storage for batch workloads
- Open-source and self-hosted
- Typically deployed with other Hadoop components
- Documentation: https://hadoop.apache.org/docs/
- Source Code: https://github.com/apache/hadoop
HDFS Setup