Does the openssl/libressl already use hardware crypto engine?

While I rsync / scp / sftp one of the CPU core will be 100% used, the total bandwidth wasn’t great too. (The files are on NVMe)

So, as the title, does the openssl/libressl already use hardware crypto engine?

4 Likes

Did you change the encryption algorithm away from the default ( chacha20-poly1305@openssh.com mentioned in the documentation $ man sshd_config ) to an algorithm that is currently supported by the JH7110 encryption engine (3des-cbc, aes128-cbc, aes192-cbc, aes256-cbc, aes128-ctr, aes192-ctr, aes256-ctr, aes128-gcm@openssh.com, aes256-gcm@openssh.com ) with a valid supported key length (32/64/96/128/160/192/224/256-bit) for the hardware. The 3des, could be supported in hardware but no software exists for that (yet), and I suspect that last two that I put a line through are actually supported but I am not 100% sure, so I only left the crypto algorithms that I would expect to work with (some, maybe not enough - yet) software in place for hardware acceleration.

e.g.
$ sftp -c aes256-ctr username@IP_address_of_VF2

$ ssh -Q cipher
Will list local ciphers available for ssh (and sftp) to use.

$ nmap --script ssh2-enum-algos -sV -p 22 IP_address_of_VF2
Will list of available KexAlgorithms,HostKeyAlgorithms,Ciphers,mac from sshd on a remote machine.

On the VF2 confirm that the API is exposed from the kernel ( $ cat /proc/crypto ).

4 Likes

I already set as you told, but I have no idea that the openssl already offloaded to the hardware crypto engine or not.

1 Like

If that might add some little information, cryptsetup -c aes-xts-plain64 -s 512 benchmark will pop the [16000000.crypto] process to the top but from my experience, there is no any benefit:

root@serval:/tmp/r# cryptsetup -c aes-xts-plain64 -s 512 benchmark
# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-xts        512b        26.2 MiB/s        26.2 MiB/s

vs

root@serval:/tmp/r# openssl speed -evp aes-256-xts
Doing AES-256-XTS for 3s on 16 size blocks: 2404619 AES-256-XTS's in 2.97s
Doing AES-256-XTS for 3s on 64 size blocks: 1005061 AES-256-XTS's in 2.97s
Doing AES-256-XTS for 3s on 256 size blocks: 301774 AES-256-XTS's in 2.96s
Doing AES-256-XTS for 3s on 1024 size blocks: 79557 AES-256-XTS's in 2.97s
Doing AES-256-XTS for 3s on 8192 size blocks: 10092 AES-256-XTS's in 2.97s
Doing AES-256-XTS for 3s on 16384 size blocks: 5036 AES-256-XTS's in 2.97s
version: 3.0.8
built on: Fri Apr  7 14:51:31 2023 UTC
options: bn(64,64)
compiler: riscv64-slackware-linux-clang -fPIC -pthread -menable-experimental-extensions -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtune=sifive-7-series -O2 -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DNDEBUG -menable-experimental-extensions -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtune=sifive-7-series -O2 -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048
CPUINFO: N/A
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-256-XTS      12954.18k    21657.88k    26099.37k    27429.75k    27836.25k    27781.09k

The difference is that cryptsetup uses kernel driver whilst OpenSSL is not.

1 Like

What happens if you use one of the supported aes drivers (jh7110-ecb-aes, jh7110-cbc-aes, jh7110-ctr-aes, jh7110-cfb-aes, jh7110-ofb-aes, jh7110-gcm-aes, jh7110-ccm-aes) and a supported key length ?
The hardware only supports key lengths of 32/64/96/128/160/192/224/256-bits (and the software may not support all - yet)

So something like:

$ openssl speed -evp aes-256-cbc
$ openssl speed -decrypt -evp aes-256-cbc
$ /sbin/cryptsetup benchmark --cipher aes-cbc --key-size=256
$ openssl speed -evp aes-128-cbc
$ openssl speed -decrypt -evp aes-128-cbc
$ sudo cryptsetup benchmark -c aes-cbc -s 128
$ /sbin/cryptsetup benchmark --cipher aes-cbc --key-size=128

Clues may be gained from:

$ cat /proc/crypto | grep -i aes | grep -i cbc

Does openssl list an engine:

$ openssl engine -c

It could end up being something like:

$ openssl speed -engine jh7110 -evp aes-256-cbc
$ openssl speed -engine jh7110-cbc-aes -evp aes-256-cbc
$ /sbin/cryptsetup benchmark --cipher jh7110-cbc-aes --key-size= 256

For now, it might even require a custom configuration file (similar to FIPS). e.g.

OPENSSL_CONF=/my/nondefault/openssl.cnf openssl speed -evp aes-256-cbc

(I’m nowhere near a VF2 right now).

2 Likes

may be I should try this GitHub - cryptodev-linux/cryptodev-linux: Cryptodev-linux is a Linux-kernel device that allows user-space access to hardware cryptographic accelerators.
and then rebuild the openssl?

edit: just found this package in AUR AUR (en) - cryptodev-linux

3 Likes

Try use old-version of openssh (maybe < 7.0 or 6.5 ? I forgot.)
It’s have a zero-crypt engine.

Good news! I just successfully build cryptodev and openssl with devcrypto engine. The result as below, on the left is the original Arch Linux openssl, on the right is the openssl with devcrypto as the default engine.

4 Likes

Thank you. I was expecting more. Now I’m wondering if the performance increase was due to running in the kernel space or if it really was accessing the encryption engine in the JH7110.

1 Like

Thanks, I’ve got it working too, although it’s very quirky (OpenSSL devs, stop pretending to be smart in every area you touch!)

root@serval:/tmp/r# openssl speed -engine devcrypto -evp aes-256-ctr -elapsed                                                               
Engine "devcrypto" set.                                                                                                                     
You have chosen to measure elapsed time instead of user CPU time.     
Doing AES-256-CTR for 3s on 16 size blocks: 92533 AES-256-CTR's in 3.00s                  
Doing AES-256-CTR for 3s on 64 size blocks: 43929 AES-256-CTR's in 3.00s                                                                    
Doing AES-256-CTR for 3s on 256 size blocks: 38245 AES-256-CTR's in 3.00s                                        
Doing AES-256-CTR for 3s on 1024 size blocks: 34041 AES-256-CTR's in 3.00s                                                                  
Doing AES-256-CTR for 3s on 8192 size blocks: 14204 AES-256-CTR's in 3.00s                                                                  
Doing AES-256-CTR for 3s on 16384 size blocks: 7766 AES-256-CTR's in 3.00s                                                                  
version: 3.0.8                                                        
built on: Mon Apr 10 15:09:42 2023 UTC                                                                                                      
options: bn(64,64)                                                    
compiler: riscv64-slackware-linux-clang -fPIC -pthread -menable-experimental-extensions -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u7
4 -mtune=sifive-7-series -O2 -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048 -DOPENSSL_USE_NODELETE -DOPENSSL
_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DNDEBUG -menable-experimental-extensions -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtun
e=sifive-7-series -O2 -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048                                        
CPUINFO: N/A                                                                                                                                
The 'numbers' are in 1000s of bytes per second processed.                                                                                   
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes                                                  
AES-256-CTR        493.51k      937.15k     3263.57k    11619.33k    38786.39k    42412.71k

The [16000000.crypto] popped up during test, and working frequency/mode is also important: I get nearly 27M/s if my cpufreq is set to ondemand (perhaps it aggressively saves clocks, or frequent transitions cause lags). This is out of 1500MHz running core.

3 Likes

Note that you need to edit /etc/ssl/openssl.cnf and include the following in area shown:

# To use this configuration file with the "-extfile" option of the
# "openssl x509" utility, name here the section containing the
# X.509v3 extensions to use:
# extensions            =
# (Alternatively, use a configuration file that has only
# X.509v3 extensions in its main [= default] section.)

openssl_conf=openssl_conf

[openssl_conf]
engines=engines

[engines]
devcrypto=devcrypto

[devcrypto]
default_algorithms = ALL
USE_SOFTDRIVERS = 1
CIPHERS = ALL
DIGESTS = NONE

As suggested by OpenWRT community Solved (sorta) /dev/crypto No Ciphers or Hashes - #28 by rossb - For Developers - OpenWrt Forum

To check if it’s really uses /dev/crypto, run openssl engine -t -c:

(dynamic) Dynamic engine loading support
     [ unavailable ]
(devcrypto) /dev/crypto engine
 [AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CTR, AES-192-CTR, AES-256-CTR, AES-128-ECB, AES-192-ECB, AES-256-ECB, CAMELLIA-128-CBC, CAMELLIA-192-CBC, CAMELLIA-256-CBC]
     [ available ]

The reason why USE_SOFTDRIVERS = 1 is suggested is probably because of this:

root@serval:/usr/src# openssl engine -post DUMP_INFO -t -c
(dynamic) Dynamic engine loading support
     [ unavailable ]
(devcrypto) /dev/crypto engine
 [AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CTR, AES-192-CTR, AES-256-CTR, AES-128-ECB, AES-192-ECB, AES-256-ECB, CAMELLIA-128-CBC, CAMELLIA-192-CBC, CAMELLIA-256-CBC]
     [ available ]
Information about ciphers supported by the /dev/crypto engine:
Cipher DES-CBC, NID=31, /dev/crypto info: id=1, CIOCGSESSION (session open call) failed
Cipher DES-EDE3-CBC, NID=44, /dev/crypto info: id=2, CIOCGSESSION (session open call) failed
Cipher BF-CBC, NID=91, /dev/crypto info: id=3, CIOCGSESSION (session open call) failed
Cipher CAST5-CBC, NID=108, /dev/crypto info: id=4, CIOCGSESSION (session open call) failed
Cipher AES-128-CBC, NID=419, /dev/crypto info: id=11, driver=jh7110-cbc-aes (software)
Cipher AES-192-CBC, NID=423, /dev/crypto info: id=11, driver=jh7110-cbc-aes (software)
Cipher AES-256-CBC, NID=427, /dev/crypto info: id=11, driver=jh7110-cbc-aes (software)
Cipher RC4, NID=5, /dev/crypto info: id=12, CIOCGSESSION (session open call) failed
Cipher AES-128-CTR, NID=904, /dev/crypto info: id=21, driver=jh7110-ctr-aes (software)
Cipher AES-192-CTR, NID=905, /dev/crypto info: id=21, driver=jh7110-ctr-aes (software)
Cipher AES-256-CTR, NID=906, /dev/crypto info: id=21, driver=jh7110-ctr-aes (software)
Cipher AES-128-ECB, NID=418, /dev/crypto info: id=23, driver=jh7110-ecb-aes (software)
Cipher AES-192-ECB, NID=422, /dev/crypto info: id=23, driver=jh7110-ecb-aes (software)
Cipher AES-256-ECB, NID=426, /dev/crypto info: id=23, driver=jh7110-ecb-aes (software)
Cipher CAMELLIA-128-CBC, NID=751, /dev/crypto info: id=101, driver=cbc(camellia-generic) (software)
Cipher CAMELLIA-192-CBC, NID=752, /dev/crypto info: id=101, driver=cbc(camellia-generic) (software)
Cipher CAMELLIA-256-CBC, NID=753, /dev/crypto info: id=101, driver=cbc(camellia-generic) (software)

Information about digests supported by the /dev/crypto engine:
Digest MD5, NID=4, /dev/crypto info: id=13, driver=md5-generic (software), CIOCCPHASH capable
Digest SHA1, NID=64, /dev/crypto info: id=14, driver=jh7110-sha1 (software), CIOCCPHASH capable
Digest RIPEMD160, NID=117, /dev/crypto info: id=102, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA224, NID=675, /dev/crypto info: id=103, driver=jh7110-sha224 (software), CIOCCPHASH capable
Digest SHA256, NID=672, /dev/crypto info: id=104, driver=jh7110-sha256 (software), CIOCCPHASH capable
Digest SHA384, NID=673, /dev/crypto info: id=105, driver=jh7110-sha384 (software), CIOCCPHASH capable
Digest SHA512, NID=674, /dev/crypto info: id=106, driver=jh7110-sha512 (software), CIOCCPHASH capable

[Success]: DUMP_INFO

Note driver=jh7110-ctr-aes (software) strings. /dev/crypto thinks that it’s not hardware.

3 Likes

This option causes sshd to stop working on my VF2.
May be related to this SSH with openssl "mux digest failed" · Issue #56 · cryptodev-linux/cryptodev-linux · GitHub ?

Well, I wanted to make openssl faster because rsync and scp/sftp were slow. If we get openssl faster, but it cannot be used for sshd then this is the show stopper.

1 Like

I would rollback the change, and then record the output of what is offered by sshd with a command like:

$ script sshd_protocols.txt
(local) $ nmap --script ssh2-enum-algos -sV -p 22 127.0.0.1
(remote) $ nmap --script ssh2-enum-algos -sV -p 22 ip_address_of_VF2
$ exit

Or you can find out what is offered by the server and available from the client, in preferential order with a command like:

(local) $ ssh -v 0
(remote) $ ssh -v ip_address_of_VF2

Hidden in the debug output, the lines you are looking for start with KEX algorithms:, host key algorithms:, ciphers ctos:, ciphers stoc:, MACs ctos:, MACs stoc: (ctos is client to server and stoc is the opposite direction). The most important of these if there are problems with sshd is probably going to be ciphers stoc, in this particular case (CIPHERS = ALL, DIGESTS = NONE). Unless of course “DIGESTS = NONE” is somehow disabling access for “MACs stoc”.

But that would show up if you log, the speed of, all working algorithms before making a change with commands like:

$ script before_change.txt
$ openssl speed -seconds 1
$ exit

And then again after reinstating the change, and comparing what is expected by sshd to what was there to what is now missing.

$ script after_change.txt
$ openssl speed -seconds 1
$ exit

If there is a problem with some algorithms you can always edit /etc/ssh/sshd_config on the VF2 and add new lines (man sshd_config) to prevent the ssh daemon from offering any default protocols that are broken by the change (- means disable).
e.g. (something similar to the following)
Ciphers -aes192-ctr
MACs -hmac-sha1

Or for now you could limit sshd to only offer protocols that are the fastest (see the results of the openssl speed and compare to the ciphers used by the sshd and ssh client), at least until any crypto problems are fixed.

Another option if you just need a higher throughput might be enabling compression (zlib@openssh.com) which can sometimes help, unless the data you are transferring is already heavily compressed, of not very compressible (multimedia files).

sftp -C username@ip_address_of_VF2
ssh -C username@ip_address_of_VF2
1 Like

Seems driver under upstreaming process has better logic, which might improve performance:
https://patchwork.kernel.org/project/linux-riscv/cover/20230411081424.131912-1-jiajie.ho@starfivetech.com/

3 Likes

If $ cat /proc/crypto exists you can also benchmark the kernel cipher and hash algorithms with commands like the following:

$ sudo modinfo tcrypt
$ sudo modprobe tcrypt mode=1000

mode=1000 means list algorithms that are available, or not, to the kernel.

$ sudo modprobe tcrypt sec=1 mode=200

mode=200 means run speed tests on AES encryption and decryption cipher algorithms in the kernel for various key lengths.
sec=1 means run each test for 1 second, and then test next algorithm

$ sudo modprobe tcrypt sec=1 mode=400 alg="sha256"

Only test the speed of asynchronous sha256 (sha256-generic) hashing algorithm.

The benchmark results can be viewed, in another terminal, with either $ sudo dmesg or $ sudo tail -f /var/log/kern.log
All output is sent to log files, so the kernel module command will appear to have hung and look like it is doing nothing at all. The prompt will return once the processing has finished.

3 Likes

Oh, sorry about that. To date, I did tests only in chroot with bind mounts. I will update & report soon.

4 Likes

Yes, I can confirm this, although my sshd does not use /dev/crypto (probably because I built it myself), but nginx fails while using it. I’ll see now how that patch can resolve this.
UPD. nginx still fails even after applying this patch. cryptodev-linux is useless for me this way even if nginx is configured with modern lines like these:

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers CHACHA20:ECDHE:ECDSA:AES256:!AES128:ARIA256:!ARIA128:CAMELLIA256:!CAMELLIA128:!aNULL:!MD5:!SHA1:!SHA256:!SHA384:!CBC:!aPSK:!aDSS:!aRSA:!MEDIUM:!LOW;

Either it passess through everything as seen by dmesg line "nginx" (31742) uses obsolete ecb(arc4) skcipher or I need to explicitly make it cryptodev cipher-safe.

2 Likes

So it passed openssl test suite, but failed on the real usage with both sshd and nginx? Such a shameful.

Yeah.
And to this time, I wasn’t even able to fix nginx side. It still refused to work and spit same line to dmesg. Seems it requires much more investigation, but I have no much incentive to do so unless someone will roll out a driver which offloads crypto onto S76/E24 core or stuff. I mean, it does not even performs well right now. So no worth trying.

At least kernel cryptography is accelerated and I can do FDE. That’s enough for now.

Have you tried upstream version driver yet?

2 Likes