Architecture

Git as a Database

Git is traditionally known as a version control system for code, but its internal architecture presents characteristics that make it suitable for chronological data storage.

Git Internal Structure

Git stores data in four main types of objects:

Blobs: Store file contents
Trees: Represent directories and contain references to blobs and other trees
Commits: Capture the state of the repository at a specific point in time
Tags: Point to specific commits with friendly names

This structure creates a content-addressed database, where each object is identified by the SHA-1 hash of its content.

Alignment with Database Concepts

In ChronDB, these concepts are mapped to database terminology:

Git Repository → Database
Git Branch → Schema
Directory → Table/Collection
File → Document/Record
Commit → Transaction
Commit Hash → Transaction ID
Tag → Named Snapshot

ChronDB Architecture

ChronDB is built in layers:

Storage Layer: Uses JGit to interact with Git's internal structure
Indexing Layer: Delegates to Apache Lucene for fast document search, supporting configurable secondary and composite indexes, geospatial shapes, and a planner-driven execution engine with result caching
Access Layer: Provides multiple interfaces (Clojure API, REST, Redis, PostgreSQL)
Concurrency Layer: Manages concurrent transactions and conflicts

Data Flow

Write operations are converted to Git operations
Documents are serialized as JSON and stored as files
Each transaction results in a Git commit
Indices are updated to reflect changes
Reads can access any point in time using specific commits

Transaction Metadata via Git Notes

Every commit recorded by ChronDB receives a Git note under the chrondb ref. The note payload is JSON and contains:

A transaction identifier (tx_id) shared by all commits that belong to the same logical operation
The origin (for example rest, redis, sql, cli)
Optional user information and request correlation ids
Semantic flags such as bulk-load, rollback, migration, or automated-merge
Additional metadata supplied by the protocol handler (HTTP endpoint, Redis command, SQL table, etc.)

These notes provide an append-only audit trail without mutating commit messages or tracked files. Operators can inspect them using standard tooling:

git log --show-notes=chrondb
git notes --ref=chrondb show <commit>

Because they live outside the object graph, notes can be replicated, filtered, and queried independently of the document contents while preserving Git’s immutable history.

Indexing Layer Details

The Lucene layer receives document mutations from the storage layer and updates the appropriate secondary indexes. It maintains statistics about term distributions and query plans so that complex requests—such as multi-field boolean filters or temporal slices—can be executed without scanning entire collections. When a query arrives, the planner determines the optimal combination of indexes, warms the cache when necessary, and streams results back to the access layer. Geospatial fields are stored in BKD trees, while full-text fields use analyzers that can be tuned per collection.

Architecture Benefits

Immutability: Data is never overwritten, only added
Traceability: Complete history of changes
Reliability: Leveraging Git's proven robustness
Flexibility: Support for multiple protocols and interfaces

PreviousVersion Control Features NextProtocols Overview

Last updated 1 hour ago

Good afternoon

hashtagGit as a Database

hashtagGit Internal Structure

hashtagAlignment with Database Concepts

hashtagChronDB Architecture

hashtagData Flow

hashtagTransaction Metadata via Git Notes

hashtagIndexing Layer Details

hashtagArchitecture Benefits