Analysis_Tools

CPE Cache System - Reference Documentation

Version: 3.0
Date: February 2026
Status: Production


Overview

The CPE cache system provides persistent storage and retrieval of NVD CPE API responses to minimize redundant API calls during CVE analysis. The system uses a sharded architecture with hash-based distribution, lazy loading, and proactive memory management.

Architecture: 16-shard distributed cache (compact JSON format)
Memory Management: Proactive eviction with configurable memory limits
Data Integrity: 4-layer validation prevents corrupted data from entering cache


Architecture Components

1. Sharded Cache Storage

Hash-based distributed cache with lazy loading and compact JSON persistence.

2. Data Validation System

4-layer validation (HTTP → JSON → Schema → Serialization) ensures only valid NVD data enters cache.

3. Memory Management

Proactive eviction maintains configurable shard limits to prevent memory exhaustion during processing.

4. Manual Refresh Tool

Standalone script (refresh_nvd_cpe_base_strings_cache.py) for controlled cache updates independent of runtime expiration settings.


Sharded Cache Architecture

Hash-Based Shard Distribution

Implementation: See src/analysis_tool/storage/cpe_cache.py

Shard Distribution:

File Structure:

cache/
  cpe_base_strings/
    cpe_cache_shard_00.json
    cpe_cache_shard_01.json
    ...
    cpe_cache_shard_15.json
  cache_metadata.json        (shard count, last modified)

Key Benefits:

  1. Lazy Loading: Only load shards containing requested CPE strings
  2. Fast Access: Load individual shards in ~0.1s as needed
  3. Memory Management: Proactive eviction enforces hard memory limits (default: 4 shards max)
  4. Scalability: System handles large cache sizes without memory exhaustion
  5. Data Integrity: NVD schema validation prevents corrupted data from entering cache

Proactive Memory Eviction

Implementation: See src/analysis_tool/storage/cpe_cache.py _load_shard() method

Memory Management Example:

Processing Run (default: max 4 shards in memory):
  Load shard 0 (CPE batch 1)
  Load shard 3 (CPE batch 2)
  Load shard 7 (CPE batch 3)
  Load shard 11 (CPE batch 4)
  
  Attempt to load shard 5:
    → Memory limit reached (4/4 shards loaded)
    → Save shard 0 if dirty
    → Evict shard 0 from memory
    → Load shard 5 (now 4/4 shards: 3,7,11,5)
  
  Continue processing...
  
  End of run:
    → Save all dirty shards
    → Evict all shards
    → Memory freed for next run

Configuration:

{
  "cache_settings": {
    "cpe_cache": {
      "max_loaded_shards": 4  // Default: 4 (~1.2GB memory)
    }
  }
}

Cache Implementation

Core Class: ShardedCPECache in src/analysis_tool/storage/cpe_cache.py

Key Methods:

Data Validation: validate_nvd_cpe_response() in src/analysis_tool/core/schema_validator.py

Global Manager: GlobalCPECacheManager - Singleton wrapper for session-level cache management


Manual Cache Refresh Script

Script: utilities/refresh_nvd_cpe_base_strings_cache.py
Purpose: Standalone forced refresh of oldest cached CPE base strings independent of runtime expiration settings

Refresh Process

Phase 1: Discovery

Phase 2: Selective Refresh

Phase 3: Finalize

Automatic Corruption Recovery

The script includes reactive corruption handling that activates when shard loading fails:

Detection & Diagnosis:

Recovery Actions:

Note: Deleted shard data is permanently lost. For full recovery, consider periodic backups of cache/cpe_base_strings/.

Implementation Details

Key Functions:

Usage

# Run cache refresh (requires NVD API key configured in config.json)
python -m utilities.refresh_nvd_cpe_base_strings_cache

Requirements:


Configuration

Configuration File: config.json

Relevant Settings:

{
  "cache_settings": {
    "cpe_cache": {
      "sharding": {
        "num_shards": 16
      },
      "max_loaded_shards": 4,
      "auto_save_threshold": 50,
      "refresh_strategy": {
        "notify_age_hours": 0
      }
    }
  }
}

Key Parameters:

File Locations

CPE Cache Storage:

Schema Cache (organized by source):

Scripts:

Testing

Test Suites:

Run Tests:

python test_suites/tool_infrastructure/test_cpe_cache.py
python test_suites/tool_infrastructure/test_cpe_cache_eviction.py
python test_suites/validation/test_nvd_schema_validation.py

Error Handling

Implementation: See src/analysis_tool/storage/cpe_cache.py

Load Failures

Behavior: load_shard_from_disk() raises RuntimeError if shard file exists but cannot be loaded (corruption, I/O error).

Recovery: Investigate and repair corrupted shard file, then retry operation.

Save Failures

Behavior: save_shard_to_disk() logs warnings but does not raise exceptions. Cache remains in memory for retry on next save attempt.

Rationale: Transient I/O errors should not terminate long-running analysis. Data persists in memory until successful save.