Hashcat Rule Minimizer

minimizer

Eliminate functionally redundant Hashcat rules using signature-based deduplication. SQLite-backed storage, parallel workers, multibyte mode, and a built-in 33-word probe set — no wordlist required.

Pure Python CPU-only · No GPU Signature-based dedup SQLite storage Preserves file order Parallel workers Multibyte / Unicode
What does it do?

Two rules are functionally equivalent if they produce identical output on every possible input word. This tool checks a carefully chosen probe set (33 words) covering short/long words, mixed-case, digits, specials, and repeated chars — plus an optional 11-word multibyte set for non-ASCII wordlists.

Rules producing the same transformed strings for all probes are indistinguishable in a real attack. The rule appearing earlier in the file is kept and the later one is discarded, preserving the original frequency/priority ranking. Signature storage uses a temporary SQLite database so the signature map never occupies Python heap memory regardless of ruleset size.

Example: rule c (capitalize) and chain l c on already-lowercase inputs produce identical output → same signature → only the first occurrence survives.
Key Features
Built-in 33-word probe set
Covers short (len 2–4), typical bases (7–9), long (10+), mixed-case, digits, specials, repeated chars, and leet-substitution targets. Works out of the box.
Signature-based dedup via SQLite
Computes output tuple per rule, stores signatures in a temp SQLite DB — no heap pressure regardless of ruleset size. Deleted automatically on completion.
🌐
Multibyte / Unicode mode
--multibyte: process words as UTF-8 bytes and activate 11 additional non-ASCII probes (Polish, German, French, Russian, CJK).
Parallel workers
--workers N distributes signature computation across CPU cores. --workers 0 auto-detects core count. Deduplication always runs on the main process.
Progress & statistics
Two tqdm progress bars (computing + deduplicating) plus counters: rules loaded, kept, removed, and % reduction.
🔍
Debug & rule trace
--debug logs every keep/drop decision. --debug-rule "RULE" traces a single rule against all probes and exits — no input file needed.
Built-in Probe Set (33 words + 11 multibyte)

These words are always used — no wordlist needed. They exercise every important Hashcat rule category. Use --multibyte to additionally activate 11 non-ASCII words.

Category 1 — ASCII (33 words, always active):

ababcabcd passroottestadminlogin letmeinwelcomepasswordsunshine footballbaseballprincessdragon12 qwertyuiopiloveyou12monkey12345superman123mustang2024 PasswordAdminUserMySecretHelloWorld pass123admin2024test1234user9999 p@ssw0rds3cur1tyaaaabbbb masterleeteliteaccess

Category 2 — Multibyte UTF-8 (11 words, active with --multibyte):

hasłożółwźródło straßemünchenübermensch cafénaïve пароль密码パスワード

Use --list-probes (or --list-probes --multibyte) to see the full list with byte counts and UTF-8 encodings.

Installation & Usage
# Clone the repository git clone https://github.com/A113L/minimizer.git cd minimizer # Install optional dependency pip install tqdm # Basic minimization (built-in probe set only) python minimizer.py my_rules.rule -o minimized.rule # Use parallel workers (0 = auto-detect CPU count) python minimizer.py my_rules.rule -o minimized.rule --workers 8 # Add your own probe words python minimizer.py my_rules.rule --extra-probes password admin 123456 # Sample extra probes from a wordlist python minimizer.py my_rules.rule --probe-file rockyou.txt --probe-words 100 # Enable UTF-8/multibyte mode (non-ASCII wordlists) python minimizer.py my_rules.rule -o minimized.rule --multibyte # Trace a single rule against the probe set (no input file needed) python minimizer.py --debug-rule "l r \$1" # Combine all sources python minimizer.py rules.txt \ --extra-probes "letmein" "welcome" \ --probe-file /usr/share/wordlists/fasttrack.txt \ --probe-words 50 \ -o final.rule
How the Signature is Computed
Step 1
Probe set assembly
Built-in 33 words + --extra-probes + sampled words from --probe-file (deduplicated). With --multibyte, 11 additional UTF-8 words are added.
Step 2
Rule application
A pure-Python interpreter applies every rule atom (all GPU-compatible Hashcat opcodes) to each probe word. Unsupported opcodes are grouped together as a single equivalence class — the first such rule in the file is kept.
Step 3
Signature tuple
For each rule, collect transformed outputs for all probes into a tuple: ("Abc", "Def", ...). In multibyte mode, invalid UTF-8 byte sequences are wrapped in an InvalidUTF8Bytes object rather than silently decoded.
Step 4
SQLite deduplication
Signatures are stored in a temporary SQLite database via INSERT OR IGNORE. The rule appearing first in the file wins. The temp DB is deleted unconditionally on completion (including on error or Ctrl+C).
Get the Source

Minimizer is open source and hosted on GitHub. MIT licensed.

View minimizer on GitHub