RULEMINIMIZER Advanced Hashcat Rule Processor

Multi-Modal Rule Deduplication with GPU Acceleration & Interactive Processing

Multi-Core CPU
GPU Acceleration
NumPy Optimized
OpenCL Support

Enhanced Architecture & Next-Gen Features

RuleMinimizer v2 now features GPU-accelerated processing, OpenCL validation, recursive file discovery, and smart processing selection for optimal performance across all dataset sizes.

  • GPU Acceleration: OpenCL-based rule counting and validation
  • Smart Processing: Auto-selection between CPU/GPU based on dataset size
  • File Discovery: Recursive directory scanning with max depth control
  • Interactive Mode: Real-time processing with immediate feedback
  • Hashcat Validation: Full rule validation for CPU/GPU compatibility

Enhanced Technical Stack

• PyOpenCL GPU Processing
• Recursive File Discovery
• Smart Processing Selection
• Interactive Processing Loop
• Hashcat Rule Validation
• Multi-format Rule Support
• Real-time Statistics
• Memory Management

New Enhanced Features

GPU Acceleration
• OpenCL-based rule counting
• GPU-accelerated validation
• Automatic CPU fallback
• Memory-optimized kernels
--no-gpu to disable
Smart Processing
• Auto CPU/GPU selection
• Dataset size analysis
• Performance recommendations
• Memory usage optimization
Based on rule count
File Discovery
• Recursive directory scanning
• Multiple file extensions
• Configurable depth limit
• Duplicate prevention
.rule, .rules, .hr, .txt
Interactive Mode
• Real-time processing
• Immediate statistics
• Step-by-step filtering
• Dataset reset capability
Live progress updates
Hashcat Validation
• CPU/GPU compatibility
• Rule syntax validation
• Operation limit enforcement
• Invalid rule removal
Mode 5 in interactive
Enhanced Levenshtein
• GPU-accelerated option
• Interactive distance setting
• Progress tracking
• Large dataset warnings
Mode 6 in interactive

Six Interactive Processing Modes

Mode 1: Min Occurrence
• Filter by occurrence threshold
• Remove low-frequency rules
• Statistical significance
count ≥ threshold
Mode 2: Top-N Statistical
• Keep top N frequent rules
• Pareto principle
• Automated suggestions
first N by frequency
Mode 3: Functional Minimization
• Full rule engine simulation
• Multi-core processing
• Semantic redundancy removal
remove identical outputs
Mode 4: Inverse Mode
• Keep rules BELOW threshold
• Capture "long tail" rules
• Dual-phase attack support
skip top N, keep rest
Mode 5: Hashcat Cleanup
• Rule validation
• CPU/GPU compatibility
• Syntax checking
• Invalid rule removal
hashcat standards
Mode 6: Levenshtein Filter
• Similarity detection
• GPU acceleration
• Distance threshold
• Semantic grouping
remove similar rules

Enhanced Processing Pipeline

Phase 1: Smart File Discovery & Data Loading

Recursive File Discovery
File Detection
• Recursive directory scanning (max depth: 3)
• Multiple extensions: .rule, .rules, .hr, .hashcat, .txt
• Duplicate prevention
• Progress reporting
Smart Loading
• Auto CPU/GPU processing selection
• Dataset size estimation
• Performance recommendations
• Memory optimization

Phase 2: Smart Processing Selection

Intelligent Method Selection
GPU Processing (<50K rules):
OpenCL-accelerated counting
Fastest for small datasets
CPU Processing (<1M rules):
Optimized Counter-based
Balanced performance
Chunked CPU (>2M rules):
Memory-efficient processing
Handles massive datasets

Phase 3: Interactive Processing Loop

Real-time Statistics
• Current dataset size
• Unique rule count
• Processing history
• Immediate feedback
• Reset capability
Live Processing
• Apply filters sequentially
• See results immediately
• Compare different approaches
• Export at any stage
• Pareto analysis on-demand

Phase 4: Advanced Processing Features

Enhanced Processing Capabilities
Hashcat Validation:
• Full rule syntax validation
• CPU/GPU compatibility checking
• Operation limit enforcement
• Invalid rule removal
GPU Acceleration:
• OpenCL rule counting
• Memory-optimized kernels
• Automatic fallback handling
• Progress tracking

GPU Acceleration & OpenCL Features

OpenCL Rule Counting

Kernel Operations:
• DJB2 hash calculation
• Rule length computation
• Unique flag determination
• Occurrence counting
Optimizations:
• Work group size tuning
• Memory-efficient buffers
• Fallback error handling
• Progress tracking

Smart Processing Selection

Dataset Size Analysis:
• <50K rules: GPU recommended
• 50K-100K: Either method
• >100K: CPU recommended

User Control:
• Auto-selection (default)
• Force CPU (--no-gpu)
• Interactive choice
• Performance warnings

GPU Performance Benefits

Small Datasets (<50K):
2-5x faster than CPU
Parallel hash computation
Memory bandwidth utilization
Medium Datasets (50K-500K):
Balanced performance
Memory considerations
User choice recommended
Large Datasets (>500K):
CPU recommended
Memory safety
Stable processing

OpenCL Implementation

• Device discovery and selection
• Context and queue management
• Kernel compilation
• Buffer allocation and transfer
• Error handling and fallbacks
• Memory optimization

Enhanced File Discovery & Management

Recursive Discovery

Directory Scanning:
• Max depth: 3 levels (configurable)
• Multiple file extensions supported
• Duplicate file prevention
• Progress reporting

Supported Extensions:
.rule, .rules, .hr, .hashcat, .txt

Smart Processing

Automatic Selection:
• Dataset size estimation
• Processing method recommendation
• Memory usage optimization
• Performance warnings

User Interaction:
• Processing method choice
• Confirmation for large datasets
• Real-time progress updates

Discovery Output Example

[SEARCH] Scanning directory: ./rules (max depth: 3)
[FOUND] Rule file: ./rules/basic.rule
[FOUND] Rule file (depth 1): ./rules/advanced/best64.rule
[FOUND] Rule file (depth 2): ./rules/advanced/legacy/d3ad0ne.rule
[INFO] Found 15 rule files in: ./rules
[TOTAL] Found 15 unique rule files to process

Processing Results

Files Found: 15 rule files
Total Rules: 250,000 lines
Unique Rules: 45,000 rules
Processing: GPU Accelerated
Time: 45 seconds

Interactive Processing Mode

Real-time Processing

Live Statistics:
Current dataset size, unique rule count, processing history with immediate feedback after each operation.
Sequential Filtering:
Apply multiple filters in sequence, compare results, and choose the optimal approach for your specific needs.
Flexible Workflow:
Reset to original dataset at any point, export intermediate results, or continue refining with different parameters.

Interactive Features

Menu System: Six processing modes with clear descriptions
Parameter Control: Dynamic threshold setting with suggestions
Progress Tracking: Real-time updates with tqdm integration
Data Persistence: Maintain dataset between operations
Export Options: Save at any processing stage
Analysis Tools: On-demand Pareto analysis

Interactive Session Example

Current dataset: 45,000 unique rules
----------------------------------------
FILTERING OPTIONS:
(1) Filter by MINIMUM OCCURRENCE
(2) Filter by MAXIMUM NUMBER OF RULES
(3) Filter by FUNCTIONAL REDUNDANCY
(4) **INVERSE MODE**
(5) **HASHCAT CLEANUP**
(6) **LEVENSHTEIN FILTER**
(p) Show PARETO analysis
(s) SAVE current rules
(r) RESET to original
(q) QUIT program
----------------------------------------
Enter your choice: 3
[FUNCTIONAL MINIMIZATION] Starting logic-based redundancy removal...
[MP] Using 8 processes for functional simulation.
Simulating rules: 100%|██████████| 45000/45000 [02:15<00:00, 332.15 rules/s]
[FUNCTIONAL MINIMIZATION] Removed 18,500 functionally redundant rules.
[STATUS] Dataset updated: 26,500 unique rules

Enhanced Command Line Interface

New Arguments

--no-gpu
Disable GPU acceleration entirely
-ld MAX_DIST
Levenshtein distance threshold (0=disabled)
-o / --output-stdout
Output to STDOUT for piping
-d / --use-disk
Use disk mode for large files

Usage Examples

Basic Processing:
ruleminimizer.py rules/*.rule
GPU Disabled:
ruleminimizer.py rules/ --no-gpu
Pipeline Output:
ruleminimizer.py rules/ -o | head -n 1000
Full Features:
ruleminimizer.py rules/ -ld 2 --use-disk

Smart Processing Output

[INFO] PyOpenCL found. GPU-accelerated rule counting enabled.
[INFO] NumPy found. Using optimized Levenshtein distance calculation.
[SEARCH] Scanning directory: ./rules (max depth: 3)
[TOTAL] Found 15 unique rule files to process
[PROCESSING] Using GPU method for 250,000 rules
[GPU] Counting 250,000 rules using GPU acceleration...
[GPU] Preparing rules data...
[GPU] Creating GPU buffers...
[GPU] Executing hash calculation kernel...
[GPU] Counting complete: 45,000 unique rules found

Performance Comparison

Small Dataset (10K rules):
GPU: 2.1 seconds
CPU: 8.7 seconds

Medium Dataset (100K rules):
GPU: 15.3 seconds
CPU: 22.8 seconds

Large Dataset (1M rules):
CPU: 45.2 seconds
GPU: 68.1 seconds (memory limited)