Skip to content

This repository contains a Python implementation of HMM and MCMC methods for text decryption. These methods are applied for decrypting messages which have been encoded using a various ciphers.

Notifications You must be signed in to change notification settings

alessimichele/HMM-for-text-decryption

Repository files navigation

HMM-for-text-decryption

This repository contains Python implementation of a MCMC and HMM methods for text decryption. The method is applied for decrypting messages which have been encoded using substitution cipher, homophonic cipher an double cipher. This code was done in collaboration with @SDavenia

Here there is the outline of our project.

Repository description

  • main_HMM.ipynb running for the HMM model on subsitution cipher.

  • main_HMM_homophonic.ipynb running for the HMM model on homophonic cipher.

  • main_HMM_double_cipher.ipynb running for the HMM model on double cipher.

  • main_MCMC.ipynb running for the MCMC model on subsitution cipher.

  • main_MCMC_double_cipher.ipynb running for the MCMC model on double cipher.

  • src contains the implementation of the algorithms and other functions needed for preprocessing and evaluattion written from scratch.

    • CipherUtils.py

      • Cipher Generator
        The Cipher Generator is a Python class that allows you to generate a random cipher. A cipher is essentially a shuffled version of the alphabet. This class provides a method called generate_cipher() that returns a list representing the generated cipher.

      • Text Encoder
        The Text Encoder is a Python class that provides functionality to encode text using a given cipher. It has a method called encode_text(text, cipher) that takes an input text and a cipher as parameters and returns the encoded text as a string.

      • Text Decoder
        The Text Decoder is a Python class that enables the decoding of encoded text using a provided cipher. It contains a method called decode_text(text, cipher) that takes an encoded text and a cipher as input and returns the decoded text as a string.

      • Text Preprocessor
        The Text Preprocessor is a Python class that performs preprocessing operations on text. It provides methods for converting text to lowercase, finding unknown characters in the text, removing unknown characters from the text, removing extra-spaces, and saving the preprocessed text to a file.

    • ProbabilityMatrix.py
      Probability Matrix is a class used to compute the probability table and the probability matrix for all bigrams within a given text. It is used both for MCMC and HMM approach.

    • MCMC

      • CipherBreaker.py
        The Cipher Breaker is a Python class that aims to break a given cipher by performing iterations of swapping elements in the current cipher using MCMC eexploration. It uses a probability table, a decoder, and a likelihood calculator to evaluate the quality of each proposed cipher during the breaking process. The class also provides functionality to generate an animation of the breaking process.
    • HMM

      • HMM_functions.py
        Contains Baum-Welch algorithm and Viterbi algorithm implementation.
      • HMM_utils.py
        This module provides functions to map characters in the alphabet to corresponding numbers, convert strings to lists of numbers based on a given mapping, create mappings between indices and characters.
    • evaluation.py
      This module provides functions for MCMC and HMM performances comparison.

  • texts contains the corpus used to learn the transition probabilities.

  • outputs contains:

    • accuracies coming from evaluation modules.
    • probability matrix of bigrams computed using the corpus.
  • GIF contains .gif outputs from main.ipynb.

  • articles contains some interesting articles about the topic.

Example

In main_MCMC.ipynb there are running examples on subsitution cipher for MCMC approach. Some results are reported below, play with it!

Original Text: she is not acting by design as yet she cannot even be certain of the degree of her own regard nor of its reasonableness she has known him only a fortnight she danced four dances with him at meryton she saw him one morning at his own house and has since dined in company with him four times

Original Text: your plan is a good one replied elizabeth where nothing is in question but the desire of being well married and if i were determined to get a rich husband or any husband i dare say i should adopt it but these are not jane s feelings

Original Text: there were better sense in the sad mechanic exercise of determining the reason of its absence where it is not in the novels of the last hundred years there are vast numbers of young ladies with whom it might be a pleasure to fall in love there are at least five with whom as it seems to me no man of taste and spirit can help doing so

About

This repository contains a Python implementation of HMM and MCMC methods for text decryption. These methods are applied for decrypting messages which have been encoded using a various ciphers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published