The generate_dataset.py
utility generates CVE datasets from NVD API with integrated analysis workflow.
Query CVEs by lastModified
date:
# CVEs modified in the last N days (max 120)
python generate_dataset.py --last-days 30
# CVEs modified within date range
python generate_dataset.py --start-date 2024-01-01 --end-date 2024-01-31
# CVEs modified since last run
python generate_dataset.py --since-last-run
# Default: CVEs with statuses "Received", "Awaiting Analysis", "Undergoing Analysis"
python generate_dataset.py
# Custom statuses
python generate_dataset.py --statuses "Received" "Modified"
Analysis runs automatically unless disabled:
# Generate dataset and run analysis (default)
python generate_dataset.py --last-days 7
# Generate dataset only
python generate_dataset.py --last-days 7 --no-analysis
# Pass options to analysis tool
python generate_dataset.py --last-days 7 --external-assets
# Show when last dataset generation occurred
python generate_dataset.py --show-last-run
Each dataset generation creates a timestamped run directory:
runs/[timestamp]_[context]/
├── logs/
│ ├── dataset_tracker.json # Run tracking metadata
│ ├── cve_dataset.txt # Generated dataset
│ ├── workflow_log.json # Real-time dashboard data
│ └── dashboard_data.json # Additional monitoring data
└── generated_pages/ # HTML reports (if analysis enabled)
During dataset generation:
dashboards/generateDatasetDashboard.html
in browserruns/[timestamp]_[context]/
structure# Generate initial dataset for recent CVEs
python generate_dataset.py --last-days 30
# Daily differential updates
python generate_dataset.py --since-last-run
# Analyze specific time period
python generate_dataset.py --start-date 2024-06-01 --end-date 2024-06-15
# Dataset only (no analysis)
python generate_dataset.py --since-last-run --no-analysis
The system creates dataset_tracker.json
in each run’s logs/
directory:
{
"last_full_pull": "2024-07-02T10:30:00.000000",
"run_history": [
{
"run_id": "status_based_20240702_103000",
"run_type": "status_based",
"timestamp": "2024-07-02T10:30:00.000000",
"cve_count": 150,
"output_file": "runs/[timestamp]/logs/cve_dataset.txt"
}
]
}
Entry Point: Use python generate_dataset.py
from project root