The generate_dataset.py
utility includes intelligent dataset management capabilities while maintaining full backward compatibility.
Query CVEs based on their lastModified
date using NVD API parameters:
--last-days N
- CVEs modified in the last N days (max 120)--start-date
and --end-date
- CVEs modified within a specific date rangeExamples:
# CVEs modified in the last 30 days
python -m src.analysis_tool.utilities.generate_dataset --last-days 30
# CVEs modified in January 2024
python -m src.analysis_tool.utilities.generate_dataset --start-date 2024-01-01 --end-date 2024-01-31
Generate datasets containing only CVEs that have changed since your last run:
# Generate dataset for CVEs modified since the last run
python -m src.analysis_tool.utilities.generate_dataset --since-last-run
All dataset generation runs are automatically tracked in datasets/dataset_tracker.json
. View your history:
# Show when the last dataset generation occurred
python -m src.analysis_tool.utilities.generate_dataset --show-last-run
Automatically run the analysis tool after dataset generation:
# Generate dataset and immediately analyze it
python -m src.analysis_tool.utilities.generate_dataset --last-days 7 --run-analysis --api-key YOUR_KEY
All existing functionality remains unchanged:
# Traditional status-based generation still works exactly as before
python -m src.analysis_tool.utilities.generate_dataset --statuses "Received" "Awaiting Analysis"
python -m src.analysis_tool.utilities.generate_dataset --test-mode
The tracking system creates datasets/dataset_tracker.json
with this structure:
{
"last_full_pull": "2024-07-02T10:30:00.000000",
"run_history": [
{
"run_id": "status_based_20240702_103000",
"run_type": "status_based",
"timestamp": "2024-07-02T10:30:00.000000",
"cve_count": 150,
"output_file": "/path/to/datasets/cve_dataset.txt"
}
]
}
Initial Setup:
# Generate your first dataset for recent CVEs
python -m src.analysis_tool.utilities.generate_dataset --last-days 30 --output initial_dataset.txt --run-analysis
Regular Updates:
# Daily/weekly differential updates
python -m src.analysis_tool.utilities.generate_dataset --since-last-run --run-analysis
Investigating Specific Periods:
# Analyze CVEs from a specific incident timeframe
python -m src.analysis_tool.utilities.generate_dataset --start-date 2024-06-01 --end-date 2024-06-15 --output incident_analysis.txt
No migration needed. The enhanced version: