Full Report
Researchers are using machine learning algorithms to decrypt historical pencil-and-paper ciphers.
Analysis Summary
# Research: Decoding Historical Secrets with Neural Networks
## Metadata
- **Authors:** Various (Collaborators often include Beáta Megyesi, Kevin Knight, and the DECRYPT project team)
- **Institution:** Uppsala University and various international collaborators (The DECRYPT Project)
- **Publication:** Analysis via *BBC Future* as highlighted by *Schneier on Security*
- **Date:** June 3, 2026 (Analysis date)
## Abstract
This research highlights the application of Machine Learning (ML) and Natural Language Processing (NLP) to automate the decryption of complex, historical pencil-and-paper ciphers. By treating transcription and decryption as a computational linguistic problem, researchers are uncovering centuries of diplomatic, personal, and occult secrets that were previously inaccessible to traditional manual cryptanalysis.
## Research Objective
The study aims to solve the "transcription bottleneck" and the linguistic complexity of historical homophonic substitution ciphers. The primary question is: Can ML architectures, designed for modern machine translation, be adapted to identify patterns in non-standardized historical scripts and break complex replacement ciphers?
## Methodology
### Approach
1. **Optical Symbol Recognition (OSR):** Using neural networks to recognize and digitize non-standard symbols, often handwritten by various scribes.
2. **Language Modeling:** Utilizing Large Language Models (LLMs) and n-gram analysis to predict the underlying "plain" language (e.g., Latin, archaic French, or German) even when obscured.
3. **Beam Search & Constraint Satisfaction:** Using algorithmic searching to test millions of potential symbol-to-letter mappings simultaneously.
### Dataset/Environment
- Thousands of encrypted manuscripts from European archives (15th–18th century).
- The *DECRYPT* database, a centralized repository of historical ciphertexts.
- Diverse formats including letters, diplomatic cables, and secret society manuscripts (e.g., the Copiale Cipher).
### Tools & Technologies
- Neural Machine Translation (NMT) frameworks.
- Python-based cryptanalysis toolsets (such as *CrypTool* or custom DECRYPT scripts).
- OCR/OSR engines trained on handwritten historical datasets.
## Key Findings
### Primary Results
1. **Automation of Transcription:** AI significantly reduces the time required to convert handwritten symbols into digital strings, a process that used to take human experts months.
2. **High-Order Homophonic Success:** ML models successfully identified patterns in "one-to-many" ciphers (where one letter has multiple symbols to flatten frequency analysis).
3. **Language Identification:** The models can accurately predict the source language of a ciphertext without any prior metadata, based purely on the statistical distribution of character transitions.
### Supporting Evidence
- Successful decryption of the **Copiale Cipher**, a 105,000-character manuscript belonging to a secret German society, revealed through a combination of ML-assisted frequency analysis and neural language modeling.
### Novel Contributions
- Transferring **Unsupervised Machine Translation** techniques to cryptanalysis, allowing the AI to learn "translation" between ciphertext and plaintext without a parallel training set.
## Technical Details
Most historical ciphers are "homophonic substitutions." Unlike a simple Caesar cipher, these use different symbols for the same letter to mask the "E-T-A-O-I-N" frequency pattern. The researchers utilize **Expectation-Maximization (EM) algorithms** and **Recurrent Neural Networks (RNNs)** to estimate the probability of a symbol sequence given a specific language model. The AI essentially treats "Ciphertext" as a foreign language and "Plaintext" as the target language, seeking the mapping that maximizes linguistic "fluency."
## Practical Implications
### For Security Practitioners
- **Legacy Vulnerability:** This demonstrates that "security through obscurity" or artisanal encryption methods eventually succumb to advances in compute power and algorithmic efficiency.
- **Data Longevity:** Encryption is not a permanent state; information encrypted today may be vulnerable to the "Store Now, Decrypt Later" (SNDL) approach as tech evolves.
### For Defenders
- **Entropy Matters:** Hand-drawn symbols and unique scripts do not provide actual cryptographic strength against statistical modeling.
- **Modern Standards:** Only computationally hard, standardized algorithms (AES, RSA, etc.) should be trusted for sensitive data.
### For Researchers
- **Cross-Disciplinary Potential:** Tools built for historical analysis are highly applicable to modern steganography and the detection of hidden communication patterns in digital traffic.
## Limitations
- **Transcription Errors:** If the initial AI-driven OCR misreads a symbol, the error cascades through the decryption phase.
- **Nomenclature/Jargon:** Historical texts often use "nomenclators" (specific code words for names or places) which cannot be solved by frequency analysis alone and require human context.
- **Compute Intensity:** Training models for every specific style of historical handwriting remains resource-heavy.
## Comparison to Prior Work
Traditional historical cryptanalysis relied on manual card-sorting and human intuition (e.g., Herbert Yardley or the Black Chambers). This research shifts the paradigm from human **pattern recognition** to machine **statistical inference**, allowing for the "brute-forcing" of linguistic rules.
## Real-world Applications
- **Historical Archeology:** Uncovering diplomatic secrets that rewrite historical narratives.
- **Intelligence Analysis:** Identifying and categorizing "dead" or extremely niche ciphers used by non-state actors who may attempt to use archaic methods to bypass digital filters.
## Future Work
- **Multimodal Models:** Integrating symbol recognition and decryption into a single "end-to-end" neural pipeline.
- **Deciphering "Lost" Languages:** Applying these algorithms to undeciphered ancient scripts (e.g., Linear A or the Voynich Manuscript).
## References
- The DECRYPT Project: hxxps://cl[.]lingfil[.]uu[.]se/decrypt/
- Megyesi, B., et al. (2020). "The DECRYPT Database: An Inventory of Cryptic Documents."
- Schneier, B. (2026). "AI Used to Decrypt Medieval Ciphers." *Schneier on Security.*