Version: 3.0
Date: February 2026
Status: Production
The CPE cache system provides persistent storage and retrieval of NVD CPE API responses to minimize redundant API calls during CVE analysis. The system uses a sharded architecture with hash-based distribution, lazy loading, and proactive memory management.
Architecture: 16-shard distributed cache (compact JSON format)
Memory Management: Proactive eviction with configurable memory limits
Data Integrity: 4-layer validation prevents corrupted data from entering cache
Hash-based distributed cache with lazy loading and compact JSON persistence.
4-layer validation (HTTP → JSON → Schema → Serialization) ensures only valid NVD data enters cache.
Proactive eviction maintains configurable shard limits to prevent memory exhaustion during processing.
Standalone script (refresh_nvd_cpe_base_strings_cache.py) for controlled cache updates independent of runtime expiration settings.
Implementation: See src/analysis_tool/storage/cpe_cache.py
ShardedCPECache._get_shard_index() - MD5 hash-based shard routingShardedCPECache._get_shard_filename() - Shard file naming conventionShard Distribution:
File Structure:
cache/
cpe_base_strings/
cpe_cache_shard_00.json
cpe_cache_shard_01.json
...
cpe_cache_shard_15.json
cache_metadata.json (shard count, last modified)
Key Benefits:
Implementation: See src/analysis_tool/storage/cpe_cache.py _load_shard() method
max_loaded_shards limit (default: 4)Memory Management Example:
Processing Run (default: max 4 shards in memory):
Load shard 0 (CPE batch 1)
Load shard 3 (CPE batch 2)
Load shard 7 (CPE batch 3)
Load shard 11 (CPE batch 4)
Attempt to load shard 5:
→ Memory limit reached (4/4 shards loaded)
→ Save shard 0 if dirty
→ Evict shard 0 from memory
→ Load shard 5 (now 4/4 shards: 3,7,11,5)
Continue processing...
End of run:
→ Save all dirty shards
→ Evict all shards
→ Memory freed for next run
Configuration:
{
"cache_settings": {
"cpe_cache": {
"max_loaded_shards": 4 // Default: 4 (~1.2GB memory)
}
}
}
Core Class: ShardedCPECache in src/analysis_tool/storage/cpe_cache.py
Key Methods:
get() - Retrieve cache entry with lazy shard loadingput() - Store cache entry with validation and proactive eviction_load_shard() - Lazy load with proactive memory limit enforcementsave_all_shards() - Persist loaded shards to disk (skips clean shards)save_changed_shards_only() - Efficient save for end-of-run cleanupevict_all_shards() - Clear memory while keeping singleton aliveload_shard_from_disk() - Static method for external shard loadingsave_shard_to_disk() - Static method for external shard savingData Validation: validate_nvd_cpe_response() in src/analysis_tool/core/schema_validator.py
gatherData.py at API boundary (lines 856, 920)Global Manager: GlobalCPECacheManager - Singleton wrapper for session-level cache management
Script: utilities/refresh_nvd_cpe_base_strings_cache.py
Purpose: Standalone forced refresh of oldest cached CPE base strings independent of runtime expiration settings
Phase 1: Discovery
/cpematch/2.0 API for changes since oldest entryPhase 2: Selective Refresh
/cpes/2.0 APIPhase 3: Finalize
The script includes reactive corruption handling that activates when shard loading fails:
Detection & Diagnosis:
Recovery Actions:
Note: Deleted shard data is permanently lost. For full recovery, consider periodic backups of cache/cpe_base_strings/.
Key Functions:
find_oldest_cache_entry() - Scans shards to find oldest timestamp (Phase 1)query_cpematch_changes() - Queries NVD change tracking API (Phase 1)extract_unique_cpe_bases() - Extracts unique CPE base strings (Phase 1)query_nvd_cpes_api() - Retrieves full metadata from CPE API (Phase 2)flush_staged_updates() - Merges updates while preserving statistics (Phase 3)diagnose_shard_corruption() - Analyzes corruption cause and provides diagnosticslog_corruption_diagnostics() - Logs detailed corruption information before recovery# Run cache refresh (requires NVD API key configured in config.json)
python -m utilities.refresh_nvd_cpe_base_strings_cache
Requirements:
config.json (default_api_key setting)cache/cpe_base_strings/Configuration File: config.json
Relevant Settings:
{
"cache_settings": {
"cpe_cache": {
"sharding": {
"num_shards": 16
},
"max_loaded_shards": 4,
"auto_save_threshold": 50,
"refresh_strategy": {
"notify_age_hours": 0
}
}
}
}
Key Parameters:
num_shards: Number of shard files (default: 16)max_loaded_shards: Memory limit in shards (default: 4 ≈ 1.2GB)auto_save_threshold: Trigger auto-save after N new entries (default: 50)notify_age_hours: Runtime expiration threshold (recommended: 0 to disable auto-refresh)CPE Cache Storage:
cache/cpe_base_strings/cpe_cache_shard_00.json through cpe_cache_shard_15.json - Individual cache shardscache/cache_metadata.json - Shard metadata (count, timestamps)Schema Cache (organized by source):
cache/schemas/nvd_project/nvd_cpes_2_0_schema.json - NVD CPE API 2.0 schemacache/schemas/nvd_project/nvd_cves_2_0_schema.json - NVD CVE API 2.0 schemacache/schemas/nvd_project/nvd_source_2_0_schema.json - NVD Source API 2.0 schemacache/schemas/cve_program/cve_cve_5_2_schema.json - CVE Program List V5.2 schemacache/schemas/first_cvss/cvss-v*.json - FIRST CVSS schemas (v2.0, v3.0, v3.1, v4.0)cache/schemas/analysis_tool/cpe_base_strings_cache_schema.json - Analysis Tool CPE base strings cache schema (optimized)Scripts:
Test Suites:
Run Tests:
python test_suites/tool_infrastructure/test_cpe_cache.py
python test_suites/tool_infrastructure/test_cpe_cache_eviction.py
python test_suites/validation/test_nvd_schema_validation.py
Implementation: See src/analysis_tool/storage/cpe_cache.py
Behavior: load_shard_from_disk() raises RuntimeError if shard file exists but cannot be loaded (corruption, I/O error).
Recovery: Investigate and repair corrupted shard file, then retry operation.
Behavior: save_shard_to_disk() logs warnings but does not raise exceptions. Cache remains in memory for retry on next save attempt.
Rationale: Transient I/O errors should not terminate long-running analysis. Data persists in memory until successful save.