This document details the complete workflow process that transforms CVE List V5 affected array data into the cpeDetermination content in NVD-ish records. The process involves multiple transformation stages, API queries, and data consolidation steps.
CVE List V5 Affected Entry
↓
1. Platform Entry Creation
↓
2. CPE Base String Generation
↓
3. NVD /cpes/ API Query & Processing
↓
4. Top 10 CPE Suggestions Generation
↓
5. Confirmed Mappings Detection
↓
6. NVD-ish Record Integration
↓
Final cpeDetermination Structure
Location: src/analysis_tool/core/processData.py → processCVEData()
{
"vendor": "example_vendor",
"product": "example_product",
"versions": [
{
"version": "1.0.0",
"status": "affected",
"lessThan": "1.2.5"
}
],
"platforms": ["windows", "linux"]
}
{
'sourceID': 'security@example-vendor.com',
'sourceRole': 'CNA',
'rawPlatformData': {affected_entry_copy},
'platformEntryMetadata': {
'dataResource': 'CVEAPI',
'platformFormatType': 'vendor_product_versions',
'hasCPEArray': False,
'cpeBaseStrings': [], # Populated in Stage 2
'cpeVersionChecks': [version_objects],
'duplicateRowIndices': []
},
'rawCPEsQueryData': [],
'sortedCPEsQueryData': [],
'trimmedCPEsQueryData': []
}
Key Functions:
processCVEData(): Main entry pointdetermine_platform_format_type(): Categorizes affected entry structurecreate_product_key(): Generates unique keys for duplicate detectionLocation: src/analysis_tool/core/processData.py → suggestCPEData()
| Affected Entry Property | CPE Attribute Target | Processing Function | Example Transform |
|---|---|---|---|
vendor |
vendor | Direct mapping | "microsoft" |
product |
product | Direct mapping | "windows_server" |
platforms |
targetHW | Supported value mapping | ["x64", "x86"] → Generate base string per array value |
packageName |
vendor/product | Scoped package parsing | "@angular/core" → "angular" + "core" |
cpes |
vendor/product/etc | CPE string parsing | "cpe:2.3:a:apache:kafka:2.8.0:*:*:*:*:*:*:*" → Multiple search variants |
Note: collectionURL and repo fields are not currently supported in the CPE determination process. While these fields are collected in platform data, they are not processed for CPE generation. Future enhancement could add URL parsing to extract vendor/product components from repository URLs.
Step 1: Primary Property Detection
# Direct vendor/product mapping (highest priority)
if 'vendor' in platform_data and 'product' in platform_data:
vendor_value = platform_data['vendor']
product_value = platform_data['product']
Step 2: PackageName Parsing
# Scoped package: "@angular/core" → vendor="angular", product="core"
if package_name.startswith('@'):
vendor_value, product_value = package_name[1:].split('/')
Step 3: Platforms Array Processing
# Process platforms array to generate architecture-specific CPE variants
if 'platforms' in platform_data and isinstance(platform_data['platforms'], list):
platforms = platform_data['platforms']
for platform_item in platforms:
platform_string = platform_item.lower() if isinstance(platform_item, str) else ""
# Skip placeholder values (n/a, unknown, etc.)
is_placeholder = platform_string in [v.lower() for v in GENERAL_PLACEHOLDER_VALUES]
if is_placeholder:
continue # Skip mapping attempt for placeholder values
# Supported platform patterns generate CPE base strings:
if "32-bit" in platform_string or "x32" in platform_string or "x86" in platform_string:
targetHW = "x86"
rawMatchString = f"cpe:2.3:*:{vendor}:{product}:*:*:*:*:*:*:{targetHW}:*"
cpeBaseStrings.append(constructSearchString(breakoutCPEAttributes(rawMatchString), "baseQuery"))
elif "64-bit" in platform_string or "x64" in platform_string or "x86_64" in platform_string:
targetHW = "x64"
rawMatchString = f"cpe:2.3:*:{vendor}:{product}:*:*:*:*:*:*:{targetHW}:*"
cpeBaseStrings.append(constructSearchString(breakoutCPEAttributes(rawMatchString), "baseQuery"))
# Platform validation and mapping:
# ONLY architecture platforms generate CPE strings (mapped to targetHW field):
# 'x86', 'x86_64', 'x64', 'arm', 'arm64', '32-bit', '64-bit' → targetHW values
# Other values throw warnings for proper support mapping review
Step 4: Placeholder Detection & Validation
# Uses GENERAL_PLACEHOLDER_VALUES from platform_entry_registry.py
GENERAL_PLACEHOLDER_VALUES = [
'unspecified', 'unknown', 'none', 'undefined', 'various',
'n/a', 'not available', 'not applicable', 'unavailable',
'na', 'nil', 'tbd', 'to be determined', 'pending',
'not specified', 'not determined', 'not known', 'not listed',
'not provided', 'missing', 'empty', 'null', '-',
'see references', 'see advisory', 'check', 'noted', 'all'
]
if vendor_value.lower() in [v.lower() for v in GENERAL_PLACEHOLDER_VALUES]:
return [] # Skip CPE generation for placeholder values
For each supported platform architecture, a complete CPE base string generation and validation pipeline occurs:
Important: The examples below represent common enumeration patterns but do not include all possible combinations. The system generates CPE base strings for all enumeration combinations of all present affected entry properties as defined in Section 2.1. Each affected entry with different property combinations (vendor, product, packageName, platforms, cpes) will generate its unique set of CPE Base Strings based on available data.
The system generates CPE base strings organized by specificity level, from least to most specific:
Single Attribute Patterns (Least Specific):
cpe:2.3:*:vendor:*:*:*:*:*:*:*:*:*:*cpe:2.3:*:*:*product*:*:*:*:*:*:*:*:*cpe:2.3:*:*:*package_name*:*:*:*:*:*:*:*:*cpe:2.3:*:group_id:*:*:*:*:*:*:*:*:*:*cpe:2.3:*:*:*artifact_id*:*:*:*:*:*:*:*:*Enumeration: 1-5 patterns per entry based on available fields
Dual Attribute Patterns (Moderate Specificity):
cpe:2.3:*:vendor:*product*:*:*:*:*:*:*:*:*cpe:2.3:*:vendor:*packageName*:*:*:*:*:*:*:*:*cpe:2.3:*:group_id:*artifact_id*:*:*:*:*:*:*:*:*cpe:2.3:a:alphasoft:dataprocessor:*:*:*:*:*:*:*:*cpe:2.3:a:alphasoft:*dataprocessor*:*:*:*:*:*:*:*:*Enumeration: 2-5 patterns per entry based on data combinations
Triple Attribute Patterns (High Specificity):
cpe:2.3:*:vendor:*product*:*:*:*:*:*:*:x64:*cpe:2.3:*:vendor:*package*:*:*:*:*:*:*:x86:*cpe:2.3:*:group_id:*artifact_id*:*:*:*:*:*:*:x64:*Enumeration: (Dual patterns) × (Number of supported architectures)
Each CPE undergoes multiple transformation stages:
formatFor23CPE(): Transforms raw attribute strings to CPE 2.3 compliant format:
normalizeToASCII()* and :) BEFORE escaping to prevent malformed CPE constructionbreakoutCPEAttributes(): Parses CPE strings into component dictionary with validation for malformed entriesconstructSearchString(): Converts components to “baseQuery” format with wildcarded product fields (*product*) for broader NVD matchingcurateCPEAttributes(): Applies vendor/product normalization including vendor prefix removal, version pattern cleaning, and suffix trimmingvendor Curation (curateCPEAttributes('vendor')):
apache_software_foundation → apachemicrosoft_inc → microsoftapple_inc. → appleproduct Curation (curateCPEAttributes('product')):
apache_tomcat → tomcatnotepad_software → notepadfirefox_version → firefoxjquery_plugin → jquerychrome_95.0.1 → chromeoffice_version_2019 → officenodejs_v18.2 → nodejsvendorProduct Curation (curateCPEAttributes('vendorProduct')):
"apache" + "apache/kafka" → "apache" + "kafka""lunary-ai" + "lunary-ai\/lunary" → "lunary-ai" + "lunary""microsoft" + "microsoft_office" → "microsoft" + "office"platform Curation (curateCPEAttributes('platform')):
"ios" → "iphone_os", "mac os" → "macos", etc."x86_64" → "x64", "32-bit" → "x86", "aarch64" → "arm64"(targetSW, targetHW, was_mapped) where either SW or HW can be None if not applicablepackageName Processing (Colon-Delimited):
"org.apache:kafka" → GroupId: "org.apache""org.apache:kafka" → ArtifactId: "kafka"Each generated CPE undergoes comprehensive validation before inclusion:
NVD API Compatibility (is_nvd_api_compatible()):
cpe:2.3: prefix\: or \*) in vendor/product fields\: pattern in raw CPE string (indicates malformed escaping)Specificity Validation (validate_cpe_specificity()):
Deduplication Process (deriveCPEMatchStringList()):
| Example | Generated CPE Base String | Status |
|---|---|---|
raw |
cpe:2.3:*:raw_vendor:raw_product:*:*:*:*:*:*:*:* |
✅ Validated |
curated |
cpe:2.3:*:curated_vendor:curated_product:*:*:*:*:*:*:mapped_targetHW:* |
✅ Validated |
culled |
cpe:2.3:*:*:b:*:*:*:*:*:*:*:* |
❌ Culled: “Two characters or less ‘b’ in only populated attribute ‘product’” |
Result Storage:
cpeBaseStrings: Array of validated, unique CPE strings ready for NVD queriesculledCpeBaseStrings: Array of rejected CPE strings with specific culling reasons for transparency# Input: {
# "vendor": "apache",
# "product": "apache_kafka",
# "packageName": "org.apache:kafka",
# "platforms": ["x64", "x86"],
# "cpes": ["cpe:2.3:a:apache:kafka:2.8.0:*:*:*:*:*:*:*"]
# # Note: collectionURL and repo fields are not processed for CPE generation
# }
# SINGLE ATTRIBUTE PATTERNS (5 patterns):
[
"cpe:2.3:*:apache:*:*:*:*:*:*:*:*:*:*", # vendor-only
"cpe:2.3:*:*:*kafka*:*:*:*:*:*:*:*:*", # product-only (curated)
"cpe:2.3:*:*:*apache_kafka*:*:*:*:*:*:*:*:*", # product-only (raw)
"cpe:2.3:*:org.apache:*:*:*:*:*:*:*:*:*:*", # packageName GroupId-only
"cpe:2.3:*:*:*kafka*:*:*:*:*:*:*:*:*" # packageName ArtifactId-only
]
# DUAL ATTRIBUTE PATTERNS (7 patterns):
[
"cpe:2.3:*:apache:*apache_kafka*:*:*:*:*:*:*:*:*", # vendor + product (raw)
"cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:*:*", # vendor + product (curated)
"cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:*:*", # vendor + packageName
"cpe:2.3:*:org.apache:*kafka*:*:*:*:*:*:*:*:*", # packageName GroupId + ArtifactId
# CPE Array Processing (from cpes: ["cpe:2.3:a:apache:kafka:2.8.0:*:*:*:*:*:*:*"]):
"cpe:2.3:a:apache:kafka:*:*:*:*:*:*:*:*", # Exact CPE (version wildcarded)
"cpe:2.3:a:apache:*kafka*:*:*:*:*:*:*:*:*" # Wildcarded product search
]
# TRIPLE ATTRIBUTE PATTERNS (14 patterns = 7 dual × 2 architectures):
# x64 Architecture Variants:
[
"cpe:2.3:*:apache:*apache_kafka*:*:*:*:*:*:*:x64:*", # vendor + product (raw) + x64
"cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:x64:*", # vendor + product (curated) + x64
"cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:x64:*", # vendor + packageName + x64
"cpe:2.3:*:org.apache:*kafka*:*:*:*:*:*:*:x64:*", # packageName GroupId + ArtifactId + x64
"cpe:2.3:a:apache:kafka:*:*:*:*:*:*:x64:*", # Exact CPE + x64
"cpe:2.3:a:apache:*kafka*:*:*:*:*:*:*:x64:*", # Wildcarded product search + x64
"cpe:2.3:*:apache:*apache_kafka*:*:*:*:*:*:*:x64:*" # vendor + product (raw) + x64
]
# x86 Architecture Variants:
[
"cpe:2.3:*:apache:*apache_kafka*:*:*:*:*:*:*:x86:*", # vendor + product (raw) + x86
"cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:x86:*", # vendor + product (curated) + x86
"cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:x86:*", # vendor + packageName + x86
"cpe:2.3:*:org.apache:*kafka*:*:*:*:*:*:*:x86:*", # packageName GroupId + ArtifactId + x86
"cpe:2.3:a:apache:kafka:*:*:*:*:*:*:x86:*", # Exact CPE + x86
"cpe:2.3:a:apache:*kafka*:*:*:*:*:*:*:x86:*", # Wildcarded product search + x86
"cpe:2.3:*:apache:*apache_kafka*:*:*:*:*:*:*:x86:*" # vendor + product (raw) + x86
]
# TOTAL: 19 unique CPE base strings
# Note: packageName processing handles colon-delimited formats (Maven, npm, etc.) by splitting
# into GroupId and ArtifactId components for comprehensive coverage
Location: src/analysis_tool/core/processData.py → bulkQueryandProcessNVDCPEs()
API Endpoint: gatherData.gatherNVDCPEData(apiKey, "cpeMatchString", query_string)
Query Parameters:
cpeMatchString: CPE base string to searchresultsPerPage: 2000 (maximum)API Response Structure:
{
"resultsPerPage": 2000,
"startIndex": 0,
"totalResults": 156,
"format": "NVD_CPE",
"version": "2.0",
"timestamp": "2024-11-10T20:30:45.123",
"products": [
{
"cpe": {
"cpeName": "cpe:2.3:a:example_vendor:example_product:2.1.4:*:*:*:*:*:*:*",
"deprecated": false,
"created": "2024-01-15T15:20:10.000",
"lastModified": "2024-01-15T15:20:10.000"
}
}
]
}
Location: analyzeBaseStrings() in src/analysis_tool/core/processData.py
This stage performs comprehensive data extraction, consolidation, and mapping between NVD 2.0 /cpes/ API responses and affected entry data from CVE List V5 records.
API Response Structure Processing:
# Input: NVD API /cpes/ response
{
"totalResults": 156,
"resultsPerPage": 2000,
"products": [
{
"cpe": {
"cpeName": "cpe:2.3:a:apache:kafka:2.8.0:*:*:*:*:*:*:*",
"deprecated": false,
"created": "2024-01-15T15:20:10.000",
"lastModified": "2024-01-15T15:20:10.000",
"refs": [
{"ref": "https://kafka.apache.org", "type": "Vendor"}
]
}
}
]
}
# Processing: Extract and normalize CPE components
for product in json_response["products"]:
cpe_name = product["cpe"]["cpeName"]
cpe_attributes = breakoutCPEAttributes(cpe_name)
base_cpe_name = constructSearchString(cpe_attributes, "base")
Base String Aggregation:
cpe:2.3:a:apache:kafka:2.8.0:*:*:*:*:*:*:* → cpe:2.3:a:apache:kafka:*:*:*:*:*:*:*Statistical Accumulation:
base_strings = defaultdict(lambda: {
"depTrueCount": 0, # Count of deprecated CPE entries for this base
"depFalseCount": 0, # Count of active CPE entries for this base
"versionsFound": 0, # Total version matches found
"versionsFoundContent": [], # Detailed version match objects
"references": [] # Aggregated reference data with frequency tracking
})
Input Mapping: Each affected entry contains cpeVersionChecks derived from version constraints. Version checks are performed against ALL CPE Names (both active and deprecated):
# Example from affected entry processing
cpeVersionChecks = [
{"version": "2.8.0"},
{"lessThan": "3.0.0"},
{"lessThanOrEqual": "2.8.5"}
]
Reference Extraction: Only from non-deprecated CPE products to avoid outdated provenance data
if not product["cpe"]["deprecated"] and 'refs' in product['cpe']:
for ref in product['cpe']['refs']:
ref_url = ref.get('ref', '')
ref_type = ref.get('type', 'Unknown')
# Frequency tracking for duplicate references
existing_ref = find_existing_reference(ref_url, ref_type)
if existing_ref:
existing_ref['frequency'] += 1
else:
add_new_reference(ref_url, ref_type, frequency=1)
Location: bulkQueryandProcessNVDCPEs() in src/analysis_tool/core/processData.py
Purpose: Track which affected entries contain relevant CPE Base String query results.
Mapping Structure Example:
# Example mapping between CPE query strings and affected entry indices
row_query_mapping = {
"cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:*:*": [0, 2, 5], # 3 entries interested
"cpe:2.3:*:apache:*:*:*:*:*:*:*:*:*:*": [0, 1, 2, 5], # 4 entries interested
"cpe:2.3:*:*:*kafka*:*:*:*:*:*:*:*:*": [0, 2], # 2 entries interested
"cpe:2.3:*:microsoft:*office*:*:*:*:*:*:*:*:*": [1, 3, 4], # 3 entries interested
"cpe:2.3:*:org.apache:*kafka*:*:*:*:*:*:*:*:*": [2] # 1 entry interested
}
# Benefits:
# - Single API call per unique CPE Base String
# - Entry-specific version matching against same NVD 2.0 /cpes/ API content
Processing Flow: For each unique CPE base string, the system queries the NVD API once and applies entry-specific version constraints to generate tailored results for each interested affected entry.
Notes:
Thequery_analysis_resultsstructure shown below represents the internal processing format withbase_stringsas a container object. This will be transformed into the final NVD-ishcpeDeterminationMetadataarray format where each CPE base string becomes a separate object withcpeBaseStringas a property.
This metadata provides transparency into the analysis process, data quality metrics and contextually relevant data for each CPE determination. See II.B. CPE Determination Metadata (NVD /cpes/ API Query Results).
Consolidated Response Structure:
# Per-query statistics from analyzeBaseStrings() for single CPE base string query
query_analysis_results = {
"base_strings": {
# Each base string represents version-wildcarded grouping of CPE products from the query
"cpe:2.3:a:apache:kafka:*:*:*:*:*:*:*": {
"depTrueCount": 5, # Deprecated CPE products for this version-wildcarded base
"depFalseCount": 142, # Active CPE products for this version-wildcarded base
"versionsFound": 2, # Count of version matches for this base
"versionsFoundContent": [
{"version": "cpe:2.3:a:apache:kafka:2.8.0:*:*:*:*:*:*:*"},
{"lessThan": "cpe:2.3:a:apache:kafka:3.0.0:*:*:*:*:*:*:*"}
],
"references": [
{"url": "https://kafka.apache.org", "type": "Vendor", "frequency": 3},
{"url": "https://github.com/apache/kafka", "type": "Advisory", "frequency": 1}
]
},
"cpe:2.3:a:apache:kafka_client:*:*:*:*:*:*:*": {
"depTrueCount": 2, # Different product variant found in same query
"depFalseCount": 28, # Active CPE products for this variant
"versionsFound": 0, # No version matches for this variant
"versionsFoundContent": [],
"references": []
}
}
}
# Row-specific results mapping query results to affected entries
row_specific_results = {
0: query_analysis_results, # Results for affected entry 0 from this query
1: query_analysis_results, # Results for affected entry 1 from this query
# ... additional entry mappings for entries interested in this CPE base string
}
Location: src/analysis_tool/core/processData.py → reduceToTop10()
Function: consolidateBaseStrings()
Purpose: Transform per-query analysis results from Stage 3 into consolidated CPE determination metadata for each affected entry (dataframe row), preparing data for ranking and final selection.
Processing Scope: Each affected entry is processed individually to consolidate its unique set of CPE query results.
Input Data Sources:
# For each affected entry row:
row_data = {
'sortedCPEsQueryData': {
# Multiple query results from Stage 3
"cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:*:*": {
"base_strings": {...}, # analyzeBaseStrings() output
"total_deprecated": 7,
"total_active": 170
},
"cpe:2.3:*:*:*kafka*:*:*:*:*:*:*:*:*": {
"base_strings": {...},
"total_deprecated": 12,
"total_active": 89
}
# ... additional query results for this affected entry
},
'platformEntryMetadata': {
'cpeVersionChecks': [...], # Version constraints from affected entry
'cpeSourceTypes': [...], # Source type tracking
'cpeBaseStrings': [...] # Original generated CPE base strings
}
}
Consolidation Process:
Cross-Query Deduplication
Confidence Measurement through Search Source Diversity: Count how many different CPE base string search queries returned the same result (searchCount) as a confidence indicator. When multiple independent search strategies discover the same CPE base string, it increases confidence the determination is relevant.
# Example: Same base string discovered through multiple independent searches
# Query 1 (vendor+product): "cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:*:*"
# Query 2 (product-only): "cpe:2.3:*:*:*kafka*:*:*:*:*:*:*:*:*"
# Query 3 (packageName-only): "cpe:2.3:*:*:*kafka*:*:*:*:*:*:*:*:*"
# All discover: "cpe:2.3:a:apache:kafka:*:*:*:*:*:*:*:*"
Search Source Value Tracking
Confidence Measurement through Search Source Specificity: Identify the searchSource categories that returned analysis results. When results are available from a search source that is considered more valuable (e.g., cpes array data vs raw vendor extraction), it increases confidence the determination is relevant.
Search Source Hierarchy Examples (ranked by value/specificity):
searchSourcecveAffectedCPEsArray –> (explicit CVE cpes arrays)searchSourcevendorproduct –> (vendor + product combinations)searchSourceproduct –> (product-only searches)searchSourcevendor –> (vendor-only searches)# These fields drive composite_priority ranking: primary_priority + secondary_priority
# Higher value sources receive better ranking positions in final top 10 selection
Version Match Validation and Deduplication
Function: compare_versions()
Purpose: Validate and refine the existing versionsFoundContent data that was populated during Stage 3 (analyzeBaseStrings()). This consolidation step ensures version match consistency across merged base strings and deduplicates version matches. Version matches between the affected entry content and the CPE Names increases confidence the determination is relevant.
Metadata Consolidation:
depTrueCount and depFalseCount are preserved from the original query that first discovered each base stringversionsFoundContent arrays are carried forward for subsequent version comparison processingsearchCount indicates how many different generation methods found the same base stringConsolidated Structure Example:
# Result for single affected entry after consolidation
unique_base_strings = {
"cpe:2.3:a:apache:kafka:*:*:*:*:*:*:*:*": {
# Preserved from original analyzeBaseStrings() output
"depTrueCount": 5,
"depFalseCount": 142,
"versionsFound": 2,
"versionsFoundContent": [...],
"references": [...],
# Added during consolidation
"searchCount": 3, # Found through 3 different generation methods
"searchSourcevendor": "cpe:2.3:*:apache:*:*:*:*:*:*:*:*:*:*",
"searchSourcevendorproduct": "cpe:2.3:*:apache:*kafka*:*:*:*:*:*:*:*:*",
"searchSourcecveAffectedCPEsArray": "cpe:2.3:a:apache:kafka:2.8.0:*:*:*:*:*:*:*"
},
"cpe:2.3:a:apache:kafka_client:*:*:*:*:*:*:*:*": {
"depTrueCount": 2,
"depFalseCount": 28,
"versionsFound": 0,
"versionsFoundContent": [],
"references": [],
"searchCount": 1,
"searchSourceproduct": "cpe:2.3:*:*:*kafka*:*:*:*:*:*:*:*:*"
}
# ... additional base strings for this affected entry
}
Function: sort_base_strings()
Purpose: Sort consolidated CPE base strings to identify the Top 10 most relevant determinations for each affected entry.
Sort Key Calculation: Each CPE base string gets a 7-tuple score. Lower values rank higher.
# Step 1: Calculate source type priority (0-13)
has_cpes_array = any(key.startswith('searchSourcecveAffectedCPEsArray'))
has_vendor_product = any(key.startswith('searchSourcevendorproduct'))
has_product = any(key.startswith('searchSourceproduct'))
has_vendor = any(key.startswith('searchSourcevendor'))
primary = 0 if has_cpes_array else 10 # CVE cpes = 0, Generated = 10
if has_vendor_product: secondary = 0
elif has_product: secondary = 1
elif has_vendor: secondary = 2
else: secondary = 3
composite_priority = primary + secondary # Final: 0-13
# Step 2: Calculate deprecation metrics
dep_true_count = attributes.get('depTrueCount', 0)
dep_false_count = attributes.get('depFalseCount', 0)
total_results = dep_true_count + dep_false_count
all_deprecated = (dep_false_count == 0 and dep_true_count > 0)
deprecation_ratio = dep_true_count / total_results if total_results > 0 else 1.0
# Step 3: Create final sort key
sort_key = (
composite_priority, # 1. Source priority (0=CVE cpes+vendor+product, 13=Generated+other)
all_deprecated, # 2. Has active CPEs? (False=good, True=bad)
deprecation_ratio, # 3. Deprecated ratio (0.0=all active, 1.0=all deprecated)
-dep_false_count, # 4. Active CPE count (more is better)
-attributes.get('searchCount', 0), # 5. Search source diversity (more is better)
-attributes.get('versionsFound', 0), # 6. Version matches (more is better)
-total_results # 7. Total CPE volume (more is better)
)
Output: Top 10 entries from sorted results become the final CPE determinations.
Location: src/analysis_tool/core/processData.py
Purpose: Detect and validate authoritative, human-verified CPE mappings for affected entries using external mapping files and alias matching logic.
This stage operates as an independent feature within the --cpe-determination processing pipeline, providing confirmed CPE base strings that represent verified mappings for specific vendor/product combinations.
Functions: find_confirmed_mappings(), extract_confirmed_mappings_for_affected_entry(), process_confirmed_mappings()
Function: load_mapping_file(source_id)
cnaId fieldconfirmedMappings arrayFunction: extract_confirmed_mappings_for_affected_entry() in src/analysis_tool/core/processData.py
Extract alias data from CVE List V5 affected entry fields into raw_platform_data structure for matching:
vendor, product, platforms (used in alias matching)modules, packageName, repo, programRoutines, programFiles, collectionURL (extracted but not used in current matching logic)Function: check_alias_match(alias, raw_platform_data) in src/analysis_tool/core/processData.py
Matching Logic: The confirmed mappings system uses an exact-match approach with comprehensive placeholder filtering:
vendor and product for exact matches (case-insensitive, normalized)GENERAL_PLACEHOLDER_VALUES is excluded from matching before comparisonplatform (singular) field against platforms array, filtering out placeholder platformsFunction: filter_most_specific_cpes(confirmed_cpe_bases)
cpeBaseString values where aliases matchedis_more_specific_than() to compare CPE componentsLocation: src/analysis_tool/storage/nvd_ish_collector.py
Purpose: Convert all CPE determination processing results (Top 10, confirmed mappings, searched/culled strings) into the final NVD-ish record structure.
The collector integrates multiple data sources into the final cpeDetermination structure:
```json { “cpeDetermination”: { “sourceId”: “Hashmire/Analysis_Tools v0.2.0”, “cvelistv5AffectedEntryIndex”: “cve.containers.cna.affected.[0]”, “top10SuggestedCPEBaseStrings”: [ { “cpeBaseString”: “cpe:2.3:a:example_vendor:example_product::::::::”, “rank”: “1” }, { “cpeBaseString”: “cpe:2.3:a:example_vendor:::::::::”, “rank”: “2” } ], “confirmedMappings”: [ “cpe:2.3:a:example_vendor:example_product::::::::”, “cpe:2.3:a:example_vendor:::::::::” ], “cpeMatchStringsSearched”: [ “cpe:2.3:a:example_vendor:example_product::::::::”, “cpe:2.3:a:example_vendor:::::::::”, “cpe:2.3::example_vendor:example_product::::::::”, “cpe:2.3:a::example_product::::::::” ], “cpeMatchStringsCulled”: [ { “cpeString”: “cpe:2.3:::::::::::”, “reason”: “insufficient_specificity_vendor_product_required” }, { “cpeString”: “cpe:2.3::extremely_long_vendor_name_that_exceeds_one_hundred_characters:::::::::”, “reason”: “nvd_api_field_too_long” } ] } }