¶ Apache Cassandra History
¶ Origins and Development
Apache Cassandra was originally developed at Facebook in 2008 to handle massive inbox search workloads. Inspired by Amazon’s Dynamo paper and Google’s BigTable, it was designed to solve the problem of inbox search at scale. The project was open-sourced through Apache incubation in 2009 and became a top-level Apache project in 2010.
- 2008: Initial development at Facebook
- 2009: Open-sourced and entered Apache Incubator
- 2010: Graduated to Apache Top-Level Project
- 2011: Cassandra 1.0 released with major stability improvements
¶ Growth and Maturity (2012-2017)
- 2012: Cassandra 2.0 introduced lightweight transactions and CQL improvements
- 2013: Cassandra 2.1 brought materialized views and improved compaction
- 2015: Cassandra 3.0 introduced virtual tables and improved tooling
- 2016: Cassandra 3.6 added support for materialized view improvements and better performance
- 2017: Cassandra 3.11 released as the last 3.x series with long-term support
- 2018-2020: Focus on stability and bug fixes for 3.x series
- 2021: Cassandra 4.0 released with virtual nodes by default, improved observability, and enhanced security
- 2022-2024: Continued improvements to 4.x series with performance optimizations
- 2024: Cassandra 5.0 released with revolutionary storage engine improvements
- Cassandra 4.0: Virtual nodes enabled by default, improving cluster balancing
- Cassandra 5.0: Introduction of Trie-based memtables and Big Tier Index (BTI) SSTable format, dramatically improving write performance and memory efficiency
- Early versions: Size-tiered and leveled compaction strategies
- Cassandra 3.x: Time-window compaction strategy for time-series data
- Cassandra 5.0: Unified Compaction Strategy (UCS) combining benefits of multiple strategies
- Early versions: Basic authentication and authorization
- Cassandra 4.x: Improved security features and encryption options
- Cassandra 5.0: Dynamic Data Masking for compliance and privacy
¶ Query and Indexing
- Cassandra 2.x: Secondary indexes introduction
- Cassandra 3.x: SASI (Software-Administered Secondary Indexes)
- Cassandra 5.0: Storage-Attached Indexes (SAI) with better performance and reliability
¶ Current Status and Future
Cassandra remains relevant for write-heavy, horizontally scaled deployments where operators accept the complexity of distributed data modeling. The 5.0 release represents the most significant advancement since 4.0, with improvements in:
- Storage engine efficiency (Trie memtables, BTI SSTables)
- Compaction performance (Unified Compaction Strategy)
- Query capabilities (vector search, advanced indexing)
- Security features (Dynamic Data Masking)
The project continues to evolve with focus on AI/ML workloads, improved operational efficiency, and enhanced security features. Cassandra is particularly well-suited for time-series data, IoT applications, and other high-throughput use cases.