M K Gharzai

Research, projects, musings, quotes, and memes

Hashcat with OpenCL/ROCm

Posted at — Jul 21, 2019

Introduction

I had some issues getting Hashcat running on my local machine and started digging around into GPU acceleration. My main goal was to run some cryptanalysis on offline hashes, but I had previously been confined to doing this in a Virtual Machine running ParrotOS. I don’t mind Parrot, as everything pretty much works out of the box and it has many drivers for cheap WiFi chipsets. However, the performance greatly suffers (depending on your VM configuration) and I wanted to use GPU accel anyways.

The project I found was Radeon Open Compute (ROCm), which is CUDA for AMD cards that run on the newer AMDGPU Linux driver. I have been consistently impressed with AMDGPU’s performance on Linux, so I continue to buy Radeon graphics cards even when the 1080 Ti/1660 GTX were being released. The card I am using as an RX580, but I hope to upgrade to Navi or Radeon VII when those become more available. After installing ROCm on my machine, I had many issues getting Hashcat to recognize it as an OpenCL environment. Hashcat would either stall or segfault, with little evidence of what I was doing wrong.

I’ll walk you through what I did to get things working, and try to show you the pitfalls to avoid.

Installation

I’m using Gentoo Linux, but you can use Ubuntu or Fedora, whatever you want. ROCm needs some binaries installed (from their GitHub or your package manager) as well as a custom Linux kernel built that supports some yet-to-be-merged kernel patches for the AMDGPU driver.

I don’t think you need all of these packages, but here’s what I have installed:

$ eix -cI "(amdgpu|rocm|hashcat)"
[I] app-crypt/hashcat-utils (1.9@06/09/2019): a set of small utilities that are useful in advanced password cracking
[I] dev-libs/rocm-cmake [1] (9999@06/10/2019): ROCm-CMake
[I] dev-libs/rocm-opencl-driver [1] (2.2.0@06/11/2019): ROCm-OpenCL-Driver
[I] dev-libs/rocm-opencl-runtime [1] (2.5.0@06/10/2019): ROCm-OpenCL-Runtime
[I] dev-util/rocm-smi [1] (2.5.0@06/16/2019 -> (~)9999): ROCm System Management Interface
[I] dev-util/rocminfo [1] (9999@06/12/2019): ROCm Application for Reporting System Info
[I] sys-devel/amd-rocm-meta [1] (2.5.0(2.4)@06/11/2019): Meta package for ROCm
[I] sys-kernel/rocm-sources [1] (2.5.9999(2.5)@06/11/2019): ROCm kernel sources
[I] x11-drivers/xf86-video-amdgpu (19.0.1@07/17/2019): Accelerated Open Source driver for AMDGPU cards

You could install rocm-smi, rocminfo, and rocm-sources from GitHub, but if you also use Gentoo (or Funtoo), justxi’s portage overlay was plenty up-to-date for me.

I am using the hashcat-utils for some of the useful binaries it installs, I am not using my repository’s hashcat package (more on this later).

$ equery f hashcat-utils
 * Searching for hashcat-utils ...
 * Contents of app-crypt/hashcat-utils-1.9:
/usr
/usr/bin
/usr/bin/cap2hccapx
/usr/bin/cleanup-rules
/usr/bin/combinator
/usr/bin/combinator3
/usr/bin/combipow
/usr/bin/ct3_to_ntlm
/usr/bin/cutb
/usr/bin/expander
/usr/bin/gate
/usr/bin/generate-rules
/usr/bin/hcstat2gen
/usr/bin/hcstatgen
/usr/bin/keyspace
/usr/bin/len
/usr/bin/mli2
/usr/bin/morph
/usr/bin/permute
/usr/bin/permute_exist
/usr/bin/prepare
/usr/bin/req-exclude
/usr/bin/req-include
/usr/bin/rli
/usr/bin/rli2
/usr/bin/rules_optimize
/usr/bin/splitlen
/usr/bin/strip-bsn
/usr/bin/strip-bsr

Kernel Configuration and Installation

If you’ve never setup a Linux kernel before, I wouldn’t worry too much as this is going to be pretty straightforward. However, if you have anything weird in your kernel (like RAID, special drivers you need to boot your OS, etc) this method may need some tweaking. In general, we’re going to use the ROCm team’s base kernel config, compile a kernel, and copy it to our boot partition and update grub. After booting into the new kernel, Hashcat will see ROCm’s amdgpu-kfd OpenCL implementation and use the kernel driver to access our GPU resources.

As always, make sure you have a backup of anything super important, there is potential to mess up your machine and need to boot to a recovery environment to restore your boot partition.

Either obtain the rocm-sources from GitHub or (with gentoo overlay) package manager. Navigate to the sources directory (with Makefile, etc) and run as root:

# make rock-rel_defconfig
# make nconfig && make -sj9 && make modules_install && make install

This will bring you to an ncurses kernel config menu that will allow you to make any kernel tweaks, then you can save / exit with F9, and it will start compiling. You can tweak the make -j option to be your # of logical CPUs + 1 to speed up compilation. After it’s done compiling your kernel, it will install modules to /lib/modules and build initramfs. Lastly, it will copy your new Linux kernel image and initramfs to /boot. Next, you need to rebuild the grub boot menu:

# cd /boot
# ls
# cd grub
# grub-mkconfig -o grub.cfg

I recommend you run the commands shown, to see what images are on your boot partition before you create the grub file. I like to have an assumption of what it will create, then make sure it adds what I expected to the grub config. You don’t want any surprises and get locked out of your OS. One caveat here, depending on your Linux distribution (Ubuntu, Fedora, Gentoo, etc) your /boot partition layout will be different, and you may even be booting directly without grub in which case you probably know what to do. The main things to look out for are if it’s /boot/grub or /boot/grub2, and if it’s grub-mkconfig or grub2-mkconfig.

Hashcat

Once you reboot, make sure you press the arrow keys at the grub menu to get to advanced options. Boot to the newly created ROCm kernel, which for me was Gentoo GNU/Linux, with Linux x86_64-5.0.0-rc1-kfd+.

I would first try your package manager’s Hashcat, but I had issues with hashcat-5.1.0 only seeing Mesa OpenCL, and segfaulting even if I tried to run with --force. If that doesn’t work, or you want to be on the latest version, you can install Hashcat from GitHub, build and (optionally) install it. I recommend putting it in /opt/hashcat and symlinking the built binary to /usr/local/bin/hashcat.

Now, you can run hashcat -I to verify Hashcat sees your GPU:

$ hashcat -I
hashcat (v5.1.0-1243-gd1f473d6+) starting...

OpenCL Info:
============

OpenCL Platform ID #1
  Vendor..: Advanced Micro Devices, Inc.
  Name....: AMD Accelerated Parallel Processing
  Version.: OpenCL 2.0 AMD-APP.internal.dbg (2901.0)

  Backend Device ID #1
    Type...........: GPU
    Vendor.ID......: 1
    Vendor.........: Advanced Micro Devices, Inc.
    Name...........: gfx803
    Version........: OpenCL 1.2 
    Processor(s)...: 36
    Clock..........: 1365
    Memory.........: 6963/8192 MB allocatable
    OpenCL.Version.: OpenCL C 2.0 
    Driver.Version.: 2901.0 (HSA1.1,LC)

You can run one of the included example scripts to test its functionality:

$ ./example400.sh 
hashcat (v5.1.0-1243-gd1f473d6+) starting...

OpenCL API (OpenCL 2.0 AMD-APP.internal.dbg (2901.0)) - Platform #1 [Advanced Micro Devices, Inc.]
==================================================================================================
* Device #1: gfx803, 6963/8192 MB allocatable, 36MCU

Hashes: 1 digests; 1 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates
Rules: 1

Applicable optimizers:
* Zero-Byte
* Single-Hash
* Single-Salt

Minimum password length supported by kernel: 0
Maximum password length supported by kernel: 256

ATTENTION! Pure (unoptimized) backend kernels selected.
Using pure kernels enables cracking longer passwords but for the price of drastically reduced performance.
If you want to switch to optimized backend kernels, append -O to your commandline.
See the above message to find out about the exact limits.

Watchdog: Temperature abort trigger set to 90c

Host memory required for this attack: 696 MB

Starting attack in stdin mode...        

$H$9y5boZ2wsUlgl2tI6b5PrRoADzYfXD1:hash234
Session..........: hashcat
Status...........: Cracked
Hash.Name........: phpass
Hash.Target......: $H$9y5boZ2wsUlgl2tI6b5PrRoADzYfXD1
Time.Started.....: Sun Jul 21 20:44:08 2019 (1 sec)
Time.Estimated...: Sun Jul 21 20:44:09 2019 (0 secs)
Guess.Base.......: Pipe
Speed.#1.........:        0 H/s (7.11ms) @ Accel:1024 Loops:1024 Thr:64 Vec:1
Recovered........: 1/1 (100.00%) Digests
Progress.........: 128416
Rejected.........: 0
Restore.Point....: 0
Restore.Sub.#1...: Salt:0 Amplifier:0-0 Iteration:0-1024
Candidates.#1....: [Copying]
Hardware.Mon.#1..: Temp: 61c Fan: 40% Core:1365MHz Mem:2000MHz Bus:0
Started: Sun Jul 21 20:44:01 2019
Stopped: Sun Jul 21 20:44:09 2019

Lastly, I’d get rocm-smi and rocminfo to keep an eye on your hardware resources and see more detailed info. These utilities do work without using the custom kernel, in case you just wanted to see resource info:

========================ROCm System Management Interface========================
================================================================================
GPU  Temp   AvgPwr    SCLK     MCLK     Fan    Perf PwrCap  VRAM%  GPU%
0    74.0c  145.067W  1365Mhz  2000Mhz  60.0%  auto 145.0W   33%   100%
================================================================================
 $ ./rocminfo
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  Marketing Name:          Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5000                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32869532(0x1f58c9c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32869532(0x1f58c9c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx803                             
  Marketing Name:          Ellesmere [Radeon RX 470/480/570/570X/580/580X]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26591(0x67df)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1365                               
  BDFID:                   256                                
  Internal Node ID:        1                                  
  Compute Unit:            36                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8388608(0x800000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Acessible by all:        FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

Thanks for reading and happy cracking!