Indexing and Search
ChronDB uses Apache Lucene 9.8 for full-text and structured search. Every document saved through any protocol (REST, Redis, SQL, FFI) is automatically indexed. This guide covers how indexing works, available query types, and performance tuning.
How Indexing Works
Write Path
When a document is saved:
The document is stored in Git (immutable commit)
The document is indexed in Lucene with all fields
The Lucene NRT (Near-Real-Time) reader is refreshed
Reads see indexed documents immediately after the write returns.
Field Type Detection
Lucene needs to know field types for storage and querying. ChronDB automatically detects types from document values:
String
StringField + TextField
Exact match + full-text
Integer/Long
LongPoint + NumericDocValuesField
Range queries + sorting
Float/Double
DoublePoint + NumericDocValuesField
Range queries + sorting
Boolean
StringField
Exact match ("true" / "false")
Nested Object
Flattened with dot notation
address.city becomes a field
Every field is also stored as a StoredField so the original document can be retrieved from the index.
Full-Text Analysis
Text fields are indexed twice:
Exact match: stored as-is for term queries and filters
Analyzed: processed through Lucene's
StandardAnalyzerfor full-text search (tokenized, lowercased, stop words removed)
Full-text search fields use the _fts suffix internally (e.g., content becomes content_fts).
Query Types
ChronDB supports several query types through the AST query system. All query types are available across REST, Redis, and SQL protocols.
Term Query
Exact match on a field value:
Wildcard Query
Pattern matching with * (multiple characters) and ? (single character):
Range Query
Numeric and string range comparisons:
Full-Text Search (FTS)
Search analyzed text fields using natural language:
FTS queries are processed through Lucene's QueryParser, which supports:
Boolean operators:
AND,OR,NOTPhrase search:
"exact phrase"Field targeting:
title:databaseGrouping:
(database OR store) AND distributed
Prefix Query
Match documents where a field starts with a given string:
Exists / Missing
Check for field presence or absence:
Boolean Combinations
Combine queries with boolean logic:
must
All conditions must match (AND)
should
At least one should match (OR)
must-not
None of these must match (NOT)
filter
Like must but does not affect relevance scoring
AST Query System
For a comprehensive reference on building structured queries, including pagination, sorting, and cursor-based navigation, see AST Queries.
Quick example via REST:
Sorting
Search results can be sorted by any indexed field:
Sort type is detected automatically from the field name:
Fields containing
date,time,created,updated→ long (timestamp)Fields containing
age,count,size,price,score→ numericDefault → string (lexicographic)
Background Reindexing
ChronDB maintains index consistency through automatic background reindexing:
On Startup
When ChronDB starts, a background process walks all Git commits to ensure every document is present in the Lucene index. This handles:
First startup after restoring from a Git backup
Recovery after a crash that left the index incomplete
Index rebuilds after deleting the index directory
Periodic Maintenance
A scheduled task runs reindexing verification every hour (default). This is safe for production — it processes incremental batches and does not block reads or writes.
Monitoring Reindexing
Forcing a Full Reindex
To rebuild the index from scratch:
Performance Tuning
NRT Reader Configuration
The Near-Real-Time reader pool controls how quickly new writes become visible in search results. Default settings work well for most workloads:
Batch size: 100 documents before committing to the index
Commit interval: Periodic flush to ensure durability
RAM buffer: In-memory buffer before flushing segments to disk
Query Performance Tips
Use specific field queries over FTS —
term("status", "active")is faster thanfts("content", "status:active")Use
filterinstead ofmust— filter clauses skip scoring, which is faster when relevance ordering is not neededLimit results — always set a
limitto avoid loading all matchesPrefer cursor-based pagination — for deep pagination (offset > 1000), use the
aftercursor instead ofoffsetAvoid leading wildcards —
*@example.comrequires scanning all terms;user@*only scans terms starting withuser@
Index Storage
The Lucene index is stored alongside the Git data directory. On production systems:
Use SSDs for the data directory
Ensure the filesystem supports
mmap(ext4, xfs, APFS)Monitor index size: it typically grows to 30-50% of the raw document data size
Last updated
Was this helpful?