Hashcat Rule Minimizer
minimizer
Eliminate functionally redundant Hashcat rules using signature-based deduplication. SQLite-backed storage, parallel workers, multibyte mode, and a built-in 33-word probe set — no wordlist required.
Pure Python
CPU-only · No GPU
Signature-based dedup
SQLite storage
Preserves file order
Parallel workers
Multibyte / Unicode
What does it do?
Two rules are functionally equivalent if they produce identical output on every possible input word. This tool checks a carefully chosen probe set (33 words) covering short/long words, mixed-case, digits, specials, and repeated chars — plus an optional 11-word multibyte set for non-ASCII wordlists.
Rules producing the same transformed strings for all probes are indistinguishable in a real attack. The rule appearing earlier in the file is kept and the later one is discarded, preserving the original frequency/priority ranking. Signature storage uses a temporary SQLite database so the signature map never occupies Python heap memory regardless of ruleset size.
Example: rule c (capitalize) and chain l c on already-lowercase inputs produce identical output → same signature → only the first occurrence survives.
Key Features
⌗
Built-in 33-word probe set
Covers short (len 2–4), typical bases (7–9), long (10+), mixed-case, digits, specials, repeated chars, and leet-substitution targets. Works out of the box.
∑
Signature-based dedup via SQLite
Computes output tuple per rule, stores signatures in a temp SQLite DB — no heap pressure regardless of ruleset size. Deleted automatically on completion.
🌐
Multibyte / Unicode mode
--multibyte: process words as UTF-8 bytes and activate 11 additional non-ASCII probes (Polish, German, French, Russian, CJK).
⚙
Parallel workers
--workers N distributes signature computation across CPU cores. --workers 0 auto-detects core count. Deduplication always runs on the main process.
≡
Progress & statistics
Two tqdm progress bars (computing + deduplicating) plus counters: rules loaded, kept, removed, and % reduction.
🔍
Debug & rule trace
--debug logs every keep/drop decision. --debug-rule "RULE" traces a single rule against all probes and exits — no input file needed.
Built-in Probe Set (33 words + 11 multibyte)
These words are always used — no wordlist needed. They exercise every important Hashcat rule category. Use --multibyte to additionally activate 11 non-ASCII words.
Category 1 — ASCII (33 words, always active):
ababcabcd
passroottestadminlogin
letmeinwelcomepasswordsunshine
footballbaseballprincessdragon12
qwertyuiopiloveyou12monkey12345superman123mustang2024
PasswordAdminUserMySecretHelloWorld
pass123admin2024test1234user9999
p@ssw0rds3cur1tyaaaabbbb
masterleeteliteaccess
Category 2 — Multibyte UTF-8 (11 words, active with --multibyte):
hasłożółwźródło
straßemünchenübermensch
cafénaïve
пароль密码パスワード
Use --list-probes (or --list-probes --multibyte) to see the full list with byte counts and UTF-8 encodings.
Installation & Usage
# Clone the repository
git clone https://github.com/A113L/minimizer.git
cd minimizer
# Install optional dependency
pip install tqdm
# Basic minimization (built-in probe set only)
python minimizer.py my_rules.rule -o minimized.rule
# Use parallel workers (0 = auto-detect CPU count)
python minimizer.py my_rules.rule -o minimized.rule --workers 8
# Add your own probe words
python minimizer.py my_rules.rule --extra-probes password admin 123456
# Sample extra probes from a wordlist
python minimizer.py my_rules.rule --probe-file rockyou.txt --probe-words 100
# Enable UTF-8/multibyte mode (non-ASCII wordlists)
python minimizer.py my_rules.rule -o minimized.rule --multibyte
# Trace a single rule against the probe set (no input file needed)
python minimizer.py --debug-rule "l r \$1"
# Combine all sources
python minimizer.py rules.txt \
--extra-probes "letmein" "welcome" \
--probe-file /usr/share/wordlists/fasttrack.txt \
--probe-words 50 \
-o final.rule
How the Signature is Computed
Step 1
Probe set assembly
Built-in 33 words + --extra-probes + sampled words from --probe-file (deduplicated). With --multibyte, 11 additional UTF-8 words are added.
Step 2
Rule application
A pure-Python interpreter applies every rule atom (all GPU-compatible Hashcat opcodes) to each probe word. Unsupported opcodes are grouped together as a single equivalence class — the first such rule in the file is kept.
Step 3
Signature tuple
For each rule, collect transformed outputs for all probes into a tuple: ("Abc", "Def", ...). In multibyte mode, invalid UTF-8 byte sequences are wrapped in an InvalidUTF8Bytes object rather than silently decoded.
Step 4
SQLite deduplication
Signatures are stored in a temporary SQLite database via INSERT OR IGNORE. The rule appearing first in the file wins. The temp DB is deleted unconditionally on completion (including on error or Ctrl+C).