RuleMinimizer: Advanced Hashcat Rule Deduplication & Functional Optimization Tool

Enhanced Architecture & Next-Gen Features

RuleMinimizer v2 now features GPU-accelerated processing, OpenCL validation, recursive file discovery, and smart processing selection for optimal performance across all dataset sizes.

GPU Acceleration: OpenCL-based rule counting and validation
Smart Processing: Auto-selection between CPU/GPU based on dataset size
File Discovery: Recursive directory scanning with max depth control
Interactive Mode: Real-time processing with immediate feedback
Hashcat Validation: Full rule validation for CPU/GPU compatibility

Enhanced Technical Stack

• PyOpenCL GPU Processing

• Recursive File Discovery

• Smart Processing Selection

• Interactive Processing Loop

• Hashcat Rule Validation

• Multi-format Rule Support

• Real-time Statistics

• Memory Management

New Enhanced Features

GPU Acceleration

• OpenCL-based rule counting

• GPU-accelerated validation

• Automatic CPU fallback

• Memory-optimized kernels

--no-gpu to disable

Smart Processing

• Auto CPU/GPU selection

• Dataset size analysis

• Performance recommendations

• Memory usage optimization

Based on rule count

File Discovery

• Recursive directory scanning

• Multiple file extensions

• Configurable depth limit

• Duplicate prevention

.rule, .rules, .hr, .txt

Interactive Mode

• Real-time processing

• Immediate statistics

• Step-by-step filtering

• Dataset reset capability

Live progress updates

Hashcat Validation

• CPU/GPU compatibility

• Rule syntax validation

• Operation limit enforcement

• Invalid rule removal

Mode 5 in interactive

Enhanced Levenshtein

• GPU-accelerated option

• Interactive distance setting

• Progress tracking

• Large dataset warnings

Mode 6 in interactive

Six Interactive Processing Modes

Mode 1: Min Occurrence

• Filter by occurrence threshold

• Remove low-frequency rules

• Statistical significance

count ≥ threshold

Mode 2: Top-N Statistical

• Keep top N frequent rules

• Pareto principle

• Automated suggestions

first N by frequency

Mode 3: Functional Minimization

• Full rule engine simulation

• Multi-core processing

• Semantic redundancy removal

remove identical outputs

Mode 4: Inverse Mode

• Keep rules BELOW threshold

• Capture "long tail" rules

• Dual-phase attack support

skip top N, keep rest

Mode 5: Hashcat Cleanup

• Rule validation

• CPU/GPU compatibility

• Syntax checking

• Invalid rule removal

hashcat standards

Mode 6: Levenshtein Filter

• Similarity detection

• GPU acceleration

• Distance threshold

• Semantic grouping

remove similar rules

Enhanced Processing Pipeline

Phase 1: Smart File Discovery & Data Loading

Recursive File Discovery

File Detection

• Recursive directory scanning (max depth: 3)
• Multiple extensions: .rule, .rules, .hr, .hashcat, .txt
• Duplicate prevention
• Progress reporting

Smart Loading

• Auto CPU/GPU processing selection
• Dataset size estimation
• Performance recommendations
• Memory optimization

Phase 2: Smart Processing Selection

Intelligent Method Selection

GPU Processing (<50K rules):
OpenCL-accelerated counting
Fastest for small datasets

CPU Processing (<1M rules):
Optimized Counter-based
Balanced performance

Chunked CPU (>2M rules):
Memory-efficient processing
Handles massive datasets

Phase 3: Interactive Processing Loop

Real-time Statistics

• Current dataset size
• Unique rule count
• Processing history
• Immediate feedback
• Reset capability

Live Processing

• Apply filters sequentially
• See results immediately
• Compare different approaches
• Export at any stage
• Pareto analysis on-demand

Phase 4: Advanced Processing Features

Enhanced Processing Capabilities

Hashcat Validation:
• Full rule syntax validation
• CPU/GPU compatibility checking
• Operation limit enforcement
• Invalid rule removal

GPU Acceleration:
• OpenCL rule counting
• Memory-optimized kernels
• Automatic fallback handling
• Progress tracking

GPU Acceleration & OpenCL Features

OpenCL Rule Counting

Kernel Operations:
• DJB2 hash calculation
• Rule length computation
• Unique flag determination
• Occurrence counting

Optimizations:
• Work group size tuning
• Memory-efficient buffers
• Fallback error handling
• Progress tracking

Smart Processing Selection

Dataset Size Analysis:
• <50K rules: GPU recommended
• 50K-100K: Either method
• >100K: CPU recommended

User Control:
• Auto-selection (default)
• Force CPU (--no-gpu)
• Interactive choice
• Performance warnings

GPU Performance Benefits

Small Datasets (<50K):
2-5x faster than CPU
Parallel hash computation
Memory bandwidth utilization

Medium Datasets (50K-500K):
Balanced performance
Memory considerations
User choice recommended

Large Datasets (>500K):
CPU recommended
Memory safety
Stable processing

OpenCL Implementation

• Device discovery and selection
• Context and queue management
• Kernel compilation
• Buffer allocation and transfer
• Error handling and fallbacks
• Memory optimization

Enhanced File Discovery & Management

Recursive Discovery

Directory Scanning:
• Max depth: 3 levels (configurable)
• Multiple file extensions supported
• Duplicate file prevention
• Progress reporting

Supported Extensions:
.rule, .rules, .hr, .hashcat, .txt

Smart Processing

Automatic Selection:
• Dataset size estimation
• Processing method recommendation
• Memory usage optimization
• Performance warnings

User Interaction:
• Processing method choice
• Confirmation for large datasets
• Real-time progress updates

Discovery Output Example

[SEARCH] Scanning directory: ./rules (max depth: 3)
[FOUND] Rule file: ./rules/basic.rule
[FOUND] Rule file (depth 1): ./rules/advanced/best64.rule
[FOUND] Rule file (depth 2): ./rules/advanced/legacy/d3ad0ne.rule
[INFO] Found 15 rule files in: ./rules
[TOTAL] Found 15 unique rule files to process

Processing Results

Files Found: 15 rule files
Total Rules: 250,000 lines
Unique Rules: 45,000 rules
Processing: GPU Accelerated
Time: 45 seconds

Interactive Processing Mode

Real-time Processing

Live Statistics:
Current dataset size, unique rule count, processing history with immediate feedback after each operation.

Sequential Filtering:
Apply multiple filters in sequence, compare results, and choose the optimal approach for your specific needs.

Flexible Workflow:
Reset to original dataset at any point, export intermediate results, or continue refining with different parameters.

Interactive Features

Menu System: Six processing modes with clear descriptions

Parameter Control: Dynamic threshold setting with suggestions

Progress Tracking: Real-time updates with tqdm integration

Data Persistence: Maintain dataset between operations

Export Options: Save at any processing stage

Analysis Tools: On-demand Pareto analysis

Interactive Session Example

Current dataset: 45,000 unique rules
----------------------------------------
FILTERING OPTIONS:
(1) Filter by MINIMUM OCCURRENCE
(2) Filter by MAXIMUM NUMBER OF RULES
(3) Filter by FUNCTIONAL REDUNDANCY
(4) **INVERSE MODE**
(5) **HASHCAT CLEANUP**
(6) **LEVENSHTEIN FILTER**
(p) Show PARETO analysis
(s) SAVE current rules
(r) RESET to original
(q) QUIT program
----------------------------------------
Enter your choice: 3
[FUNCTIONAL MINIMIZATION] Starting logic-based redundancy removal...
[MP] Using 8 processes for functional simulation.
Simulating rules: 100%|██████████| 45000/45000 [02:15<00:00, 332.15 rules/s]
[FUNCTIONAL MINIMIZATION] Removed 18,500 functionally redundant rules.
[STATUS] Dataset updated: 26,500 unique rules

Enhanced Command Line Interface

New Arguments

--no-gpu
Disable GPU acceleration entirely

-ld MAX_DIST
Levenshtein distance threshold (0=disabled)

-o / --output-stdout
Output to STDOUT for piping

-d / --use-disk
Use disk mode for large files

Usage Examples

Basic Processing:
ruleminimizer.py rules/*.rule

GPU Disabled:
ruleminimizer.py rules/ --no-gpu

Pipeline Output:
ruleminimizer.py rules/ -o | head -n 1000

Full Features:
ruleminimizer.py rules/ -ld 2 --use-disk

Smart Processing Output

[INFO] PyOpenCL found. GPU-accelerated rule counting enabled.
[INFO] NumPy found. Using optimized Levenshtein distance calculation.
[SEARCH] Scanning directory: ./rules (max depth: 3)
[TOTAL] Found 15 unique rule files to process
[PROCESSING] Using GPU method for 250,000 rules
[GPU] Counting 250,000 rules using GPU acceleration...
[GPU] Preparing rules data...
[GPU] Creating GPU buffers...
[GPU] Executing hash calculation kernel...
[GPU] Counting complete: 45,000 unique rules found

Performance Comparison

Small Dataset (10K rules):
GPU: 2.1 seconds
CPU: 8.7 seconds

Medium Dataset (100K rules):
GPU: 15.3 seconds
CPU: 22.8 seconds

Large Dataset (1M rules):
CPU: 45.2 seconds
GPU: 68.1 seconds (memory limited)