Architecture
Git as a Database
Git is traditionally known as a version control system for code, but its internal architecture presents characteristics that make it suitable for chronological data storage.
Git Internal Structure
Git stores data in four main types of objects:
Blobs: Store file contents
Trees: Represent directories and contain references to blobs and other trees
Commits: Capture the state of the repository at a specific point in time
Tags: Point to specific commits with friendly names
This structure creates a content-addressed database, where each object is identified by the SHA-1 hash of its content.
Alignment with Database Concepts
In ChronDB, these concepts are mapped to database terminology:
Git Repository → Database
Git Branch → Schema
Directory → Table/Collection
File → Document/Record
Commit → Transaction
Commit Hash → Transaction ID
Tag → Named Snapshot
ChronDB Architecture
ChronDB is built in layers:
Storage Layer: Uses JGit to interact with Git's internal structure
Indexing Layer: Delegates to Apache Lucene for fast document search, supporting configurable secondary and composite indexes, geospatial shapes, and a planner-driven execution engine with result caching
Access Layer: Provides multiple interfaces (Clojure API, REST, Redis, PostgreSQL)
Concurrency Layer: Manages concurrent transactions and conflicts
Data Flow
Write operations are converted to Git operations
Documents are serialized as JSON and stored as files
Each transaction results in a Git commit
Indices are updated to reflect changes
Reads can access any point in time using specific commits
Transaction Metadata via Git Notes
Every commit recorded by ChronDB receives a Git note under the chrondb ref. The note payload is JSON and contains:
A transaction identifier (
tx_id) shared by all commits that belong to the same logical operationThe origin (for example
rest,redis,sql,cli)Optional user information and request correlation ids
Semantic flags such as
bulk-load,rollback,migration, orautomated-mergeAdditional metadata supplied by the protocol handler (HTTP endpoint, Redis command, SQL table, etc.)
These notes provide an append-only audit trail without mutating commit messages or tracked files. Operators can inspect them using standard tooling:
Because they live outside the object graph, notes can be replicated, filtered, and queried independently of the document contents while preserving Git’s immutable history.
Indexing Layer Details
The Lucene layer receives document mutations from the storage layer and updates the appropriate secondary indexes. It maintains statistics about term distributions and query plans so that complex requests—such as multi-field boolean filters or temporal slices—can be executed without scanning entire collections. When a query arrives, the planner determines the optimal combination of indexes, warms the cache when necessary, and streams results back to the access layer. Geospatial fields are stored in BKD trees, while full-text fields use analyzers that can be tuned per collection.
Architecture Benefits
Immutability: Data is never overwritten, only added
Traceability: Complete history of changes
Reliability: Leveraging Git's proven robustness
Flexibility: Support for multiple protocols and interfaces
Last updated