GeneriCon 2023Join us in Denver from June 7 – 9 to see what’s coming next.

Register now

Atul Lal

Breaking Boundaries: A Scalable Distributed MD5 Hash Matching System

Cover Image for Breaking Boundaries: A Scalable Distributed MD5 Hash Matching System
Atul Lal
Atul Lal
password-cracker repository image

password-cracker

A full stack distributed password cracker that can crack passwords using dictionary-based attacks.

Language:Python
Topics:distributed-systemsmd5-hashpython3reactsocket-programming
Check this project out on GitHub

This article delves into the design and implementation of a scalable distributed system for MD5 hashThe MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. Though heavily compromised, it's great for academic distributed computing! matching. With a user-friendly web interface and a robust management service, the system efficiently cracks passwords using a brute-force approach.

*Why MD5?* Sure, MD5 is cryptographically broken for modern use-cases, but it's the perfect algorithmic sandbox for testing distributed compute performance! 🕵️‍♂️

The project demonstrates the power of distributed systems, REST APIs, and real-world web applications, while also touching upon important concepts like SSH tunneling and client-server communication.

Introduction

In recent years, distributed systems have gained massive prominence in a variety of applications, offering improved performance, scalability, and fault tolerance. One such classic application involves matching MD5 hashes for password-cracking purposes.

Why crack passwords on one machine when you can use ten?

This article explores the design and implementation of a distributed system that can efficiently crack 5-character alphabetical passwords using a brute force approach, leveraging the power of multiple worker nodes in a cluster.

Problem Statement and Learning Outcomes

The primary goal of the project is to create a scalable distributed system that cracks MD5 hashes for 5-character passwords. The system must be capable of distributing the search space (the brute-force workload) evenly among multiple worker nodes and responding to the client with the appropriate outcome as quickly as possible.

The Search Space: A 5-character alphabetical password (a-z, A-Z) has $52^5$ combinations. That's 380,204,032 possible hashes to check!

In the process, the project aims to provide insights into:

  • The implementation of a distributed system architecture.
  • The significance of REST APIs for job orchestration.
  • Deployment of single-page applications (React).
  • SSH tunneling for secure communication.
  • Maintaining robust client-server connections.

Design and Setup

The system's architecture consists of a server-client interaction model, utilizing resources from the GENI networkGlobal Environment for Network Innovations, a virtual laboratory for exploring future internets at scale. for server and client nodes.

A frontend developed in React is deployed on the web to enable user input. The user inputs the MD5 hash, and the frontend communicates with the password cracker management server (written in Python/Flask) running on a GENI node. The server then chunks the 5 2 5 52^5 combinations and assigns blocks of work to the active worker nodes.

Execution and Results

To reproduce the experiment, a slice on GENI with 1 management server and 10 client worker nodes is created using a provided RSPEC file.

The environment utilizes Python 3.7 and Flask 2.2.2. We used NgrokA cross-platform application that exposes local network services behind NATs and firewalls to the public internet. to expose the GENI server securely to our React frontend. The server and clients are configured using a series of automated SSH commands.

Metrics and Analysis

The experiment was run exactly 50 times with varying bandwidth values and numbers of worker clients. We analyzed two key metrics:

  1. The total time required to break a single given hash.
  2. The total time required to process a batch file of hashes.
*Results are in!* As expected, the time required to crack a password decreases almost linearly as more clients are added to the cluster.

Furthermore, we observed that as the network bandwidth value between the GENI nodes increases, the overhead of distributing the search space chunks decreases, leading to an even tighter linear scaling factor. Horizontal scaling works!

Conclusion

This project demonstrates the immense power and flexibility of distributed systems in efficiently cracking MD5 hashes. Through a user-friendly web interface and an effective management service, the system enables the dynamic distribution of workload among multiple worker nodes, resulting in highly improved performance and scalability.

The learnings from this project—specifically around job chunking, worker fault tolerance, and API orchestration—can be applied to any modern distributed system application.

Check out more about the project through this detailed PDF reportA deep dive PDF report covering the architecture, mathematical scaling, and network topologies tested during the project..

References & Citations

  1. Rivest, R. (1992). "The MD5 Message-Digest Algorithm". RFC 1321. Link
  2. Berman, M., et al. (2014). "GENI: A federated testbed for innovative network experiments". Computer Networks, 61, 5-23.
Image of Atul Lal

About Atul Lal

I am a software engineer with a passion for creating innovative and impactful applications that solve real-world problems. At Commvault Systems, I optimized APIs, developed distributed systems, and automated cloud environments for over two years.