Hashcat performance on AWS EC2 GPU instances

Why GPU instances are better for hashcat

GPU’s are more suitable than CPU’s because GPU’s are designed to perform work in parallel. Therefore, when there are many identical jobs to perform (like the password hashing function) a GPU scales much better. Hence I was interested in benchmarking Hashcat with the AWS EC2 p3 & g4 instances.

Setup

To take advantage of the GPU capabilities these EC2 instances, we need to:

Run a supported Linux distribution.
Install NVIDIA driver
Optionally: CUDA toolkit

Instead of re-inventing the wheel, you use the Deep Learning AMI (Ubuntu 18.04) provided by AWS. While these AMI’s are created for machine learning, they are also great for Hashcat. This is because the AMI comes prepackaged with GPU Drivers (v26 of the AMI includes driver version 418.87.01) and the latest version of CUDA SDK. It also ships with and NVIDIA-DOCKER which enables us to run hashcat in a container that has access to the GPU’s.

Running Hashcat in Docker

After spinning up the instance just run the the below Docker container to initiate a hashcat benchmark:

nvidia-docker run javydekoning/hashcat:latest hashcat -b

You can find both CUDA and OpenCL containers here:

Benchmark results

Benchmarks below are run using CUDA, performance on OpenCL is nearly identical.

Hashmode: 0 - MD5
Speed.#1.........: 20625.3 MH/s (64.90ms)
@ Accel:64 Loops:512 Thr:1024 Vec:1

Hashmode: 1400 - SHA2-256
Speed.#1.........:  3099.2 MH/s (53.98ms)
@ Accel:8 Loops:512 Thr:1024 Vec:1

Hashmode: 1000 - NTLM
Speed.#1.........: 36730.7 MH/s (72.91ms)
@ Accel:64 Loops:1024 Thr:1024 Vec:1

Hashmode: 0 - MD5
Speed.#1.........: 55627.8 MH/s (47.88ms)
@ Accel:32 Loops:1024 Thr:1024 Vec:8

Hashmode: 1400 - SHA2-256
Speed.#1.........:  7602.7 MH/s (87.86ms)
@ Accel:8 Loops:1024 Thr:1024 Vec:1

Hashmode: 1000 - NTLM
Speed.#1.........:   100.8 GH/s (26.25ms)
@ Accel:32 Loops:1024 Thr:1024 Vec:8

Per dollar on-demand price of both instances are almost equal, where the g4 instances are just slightly more cost effective (based on us-east-1 pricing.)

Find a full list of benchmarks on my GitHub page here: