Breaking Boundaries: A Scalable Distributed MD5 Hash Matching System

password-cracker
A full stack distributed password cracker that can crack passwords using dictionary-based attacks.
This article delves into the design and implementation of a scalable distributed system for MD5 hashThe MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. Though heavily compromised, it's great for academic distributed computing! matching. With a user-friendly web interface and a robust management service, the system efficiently cracks passwords using a brute-force approach.
The project demonstrates the power of distributed systems, REST APIs, and real-world web applications, while also touching upon important concepts like SSH tunneling and client-server communication.
Introduction
In recent years, distributed systems have gained massive prominence in a variety of applications, offering improved performance, scalability, and fault tolerance. One such classic application involves matching MD5 hashes for password-cracking purposes.
Why crack passwords on one machine when you can use ten?
This article explores the design and implementation of a distributed system that can efficiently crack 5-character alphabetical passwords using a brute force approach, leveraging the power of multiple worker nodes in a cluster.
Problem Statement and Learning Outcomes
The primary goal of the project is to create a scalable distributed system that cracks MD5 hashes for 5-character passwords. The system must be capable of distributing the search space (the brute-force workload) evenly among multiple worker nodes and responding to the client with the appropriate outcome as quickly as possible.
In the process, the project aims to provide insights into:
- The implementation of a distributed system architecture.
- The significance of REST APIs for job orchestration.
- Deployment of single-page applications (React).
- SSH tunneling for secure communication.
- Maintaining robust client-server connections.
Design and Setup
The system's architecture consists of a server-client interaction model, utilizing resources from the GENI networkGlobal Environment for Network Innovations, a virtual laboratory for exploring future internets at scale. for server and client nodes.
A frontend developed in React is deployed on the web to enable user input. The user inputs the MD5 hash, and the frontend communicates with the password cracker management server (written in Python/Flask) running on a GENI node. The server then chunks the combinations and assigns blocks of work to the active worker nodes.
Execution and Results
To reproduce the experiment, a slice on GENI with 1 management server and 10 client worker nodes is created using a provided RSPEC file.
The environment utilizes Python 3.7 and Flask 2.2.2. We used NgrokA cross-platform application that exposes local network services behind NATs and firewalls to the public internet. to expose the GENI server securely to our React frontend. The server and clients are configured using a series of automated SSH commands.
Metrics and Analysis
The experiment was run exactly 50 times with varying bandwidth values and numbers of worker clients. We analyzed two key metrics:
- The total time required to break a single given hash.
- The total time required to process a batch file of hashes.
Furthermore, we observed that as the network bandwidth value between the GENI nodes increases, the overhead of distributing the search space chunks decreases, leading to an even tighter linear scaling factor. Horizontal scaling works!
Conclusion
This project demonstrates the immense power and flexibility of distributed systems in efficiently cracking MD5 hashes. Through a user-friendly web interface and an effective management service, the system enables the dynamic distribution of workload among multiple worker nodes, resulting in highly improved performance and scalability.
The learnings from this project—specifically around job chunking, worker fault tolerance, and API orchestration—can be applied to any modern distributed system application.
Check out more about the project through this detailed PDF reportA deep dive PDF report covering the architecture, mathematical scaling, and network topologies tested during the project..
References & Citations
- Rivest, R. (1992). "The MD5 Message-Digest Algorithm". RFC 1321. Link
- Berman, M., et al. (2014). "GENI: A federated testbed for innovative network experiments". Computer Networks, 61, 5-23.
About Atul Lal
I am a software engineer with a passion for creating innovative and impactful applications that solve real-world problems. At Commvault Systems, I optimized APIs, developed distributed systems, and automated cloud environments for over two years.
