While I rsync / scp / sftp one of the CPU core will be 100% used, the total bandwidth wasn’t great too. (The files are on NVMe)
So, as the title, does the openssl/libressl already use hardware crypto engine?
While I rsync / scp / sftp one of the CPU core will be 100% used, the total bandwidth wasn’t great too. (The files are on NVMe)
So, as the title, does the openssl/libressl already use hardware crypto engine?
Did you change the encryption algorithm away from the default ( chacha20-poly1305@openssh.com
mentioned in the documentation $ man sshd_config
) to an algorithm that is currently supported by the JH7110 encryption engine (, 3des-cbc
aes128-cbc
, aes192-cbc
, aes256-cbc
, aes128-ctr
, aes192-ctr
, aes256-ctr
, , aes128-gcm@openssh.com
) with a valid supported key length (32/64/96/128/160/192/224/256-bit) for the hardware. The 3des, could be supported in hardware but no software exists for that (yet), and I suspect that last two that I put a line through are actually supported but I am not 100% sure, so I only left the crypto algorithms that I would expect to work with (some, maybe not enough - yet) software in place for hardware acceleration.aes256-gcm@openssh.com
e.g.
$ sftp -c aes256-ctr username@IP_address_of_VF2
$ ssh -Q cipher
Will list local ciphers available for ssh
(and sftp
) to use.
$ nmap --script ssh2-enum-algos -sV -p 22 IP_address_of_VF2
Will list of available KexAlgorithms,HostKeyAlgorithms,Ciphers,mac from sshd
on a remote machine.
On the VF2 confirm that the API is exposed from the kernel ( $ cat /proc/crypto
).
I already set as you told, but I have no idea that the openssl already offloaded to the hardware crypto engine or not.
If that might add some little information, cryptsetup -c aes-xts-plain64 -s 512 benchmark
will pop the [16000000.crypto]
process to the top but from my experience, there is no any benefit:
root@serval:/tmp/r# cryptsetup -c aes-xts-plain64 -s 512 benchmark
# Tests are approximate using memory only (no storage IO).
# Algorithm | Key | Encryption | Decryption
aes-xts 512b 26.2 MiB/s 26.2 MiB/s
vs
root@serval:/tmp/r# openssl speed -evp aes-256-xts
Doing AES-256-XTS for 3s on 16 size blocks: 2404619 AES-256-XTS's in 2.97s
Doing AES-256-XTS for 3s on 64 size blocks: 1005061 AES-256-XTS's in 2.97s
Doing AES-256-XTS for 3s on 256 size blocks: 301774 AES-256-XTS's in 2.96s
Doing AES-256-XTS for 3s on 1024 size blocks: 79557 AES-256-XTS's in 2.97s
Doing AES-256-XTS for 3s on 8192 size blocks: 10092 AES-256-XTS's in 2.97s
Doing AES-256-XTS for 3s on 16384 size blocks: 5036 AES-256-XTS's in 2.97s
version: 3.0.8
built on: Fri Apr 7 14:51:31 2023 UTC
options: bn(64,64)
compiler: riscv64-slackware-linux-clang -fPIC -pthread -menable-experimental-extensions -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtune=sifive-7-series -O2 -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DNDEBUG -menable-experimental-extensions -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtune=sifive-7-series -O2 -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048
CPUINFO: N/A
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-XTS 12954.18k 21657.88k 26099.37k 27429.75k 27836.25k 27781.09k
The difference is that cryptsetup
uses kernel driver whilst OpenSSL is not.
What happens if you use one of the supported aes drivers (jh7110-ecb-aes, jh7110-cbc-aes, jh7110-ctr-aes, jh7110-cfb-aes, jh7110-ofb-aes, jh7110-gcm-aes, jh7110-ccm-aes) and a supported key length ?
The hardware only supports key lengths of 32/64/96/128/160/192/224/256-bits (and the software may not support all - yet)
So something like:
$ openssl speed -evp aes-256-cbc
$ openssl speed -decrypt -evp aes-256-cbc
$ /sbin/cryptsetup benchmark --cipher aes-cbc --key-size=256
$ openssl speed -evp aes-128-cbc
$ openssl speed -decrypt -evp aes-128-cbc
$ sudo cryptsetup benchmark -c aes-cbc -s 128
$ /sbin/cryptsetup benchmark --cipher aes-cbc --key-size=128
Clues may be gained from:
$ cat /proc/crypto | grep -i aes | grep -i cbc
Does openssl list an engine:
$ openssl engine -c
It could end up being something like:
$ openssl speed -engine jh7110 -evp aes-256-cbc
$ openssl speed -engine jh7110-cbc-aes -evp aes-256-cbc
$ /sbin/cryptsetup benchmark --cipher jh7110-cbc-aes --key-size= 256
For now, it might even require a custom configuration file (similar to FIPS). e.g.
OPENSSL_CONF=/my/nondefault/openssl.cnf openssl speed -evp aes-256-cbc
(I’m nowhere near a VF2 right now).
may be I should try this GitHub - cryptodev-linux/cryptodev-linux: Cryptodev-linux is a Linux-kernel device that allows user-space access to hardware cryptographic accelerators.
and then rebuild the openssl?
edit: just found this package in AUR AUR (en) - cryptodev-linux
Try use old-version of openssh (maybe < 7.0 or 6.5 ? I forgot.)
It’s have a zero-crypt engine.
Good news! I just successfully build cryptodev and openssl with devcrypto engine. The result as below, on the left is the original Arch Linux openssl, on the right is the openssl with devcrypto as the default engine.
Thank you. I was expecting more. Now I’m wondering if the performance increase was due to running in the kernel space or if it really was accessing the encryption engine in the JH7110.
Thanks, I’ve got it working too, although it’s very quirky (OpenSSL devs, stop pretending to be smart in every area you touch!)
root@serval:/tmp/r# openssl speed -engine devcrypto -evp aes-256-ctr -elapsed
Engine "devcrypto" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing AES-256-CTR for 3s on 16 size blocks: 92533 AES-256-CTR's in 3.00s
Doing AES-256-CTR for 3s on 64 size blocks: 43929 AES-256-CTR's in 3.00s
Doing AES-256-CTR for 3s on 256 size blocks: 38245 AES-256-CTR's in 3.00s
Doing AES-256-CTR for 3s on 1024 size blocks: 34041 AES-256-CTR's in 3.00s
Doing AES-256-CTR for 3s on 8192 size blocks: 14204 AES-256-CTR's in 3.00s
Doing AES-256-CTR for 3s on 16384 size blocks: 7766 AES-256-CTR's in 3.00s
version: 3.0.8
built on: Mon Apr 10 15:09:42 2023 UTC
options: bn(64,64)
compiler: riscv64-slackware-linux-clang -fPIC -pthread -menable-experimental-extensions -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u7
4 -mtune=sifive-7-series -O2 -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048 -DOPENSSL_USE_NODELETE -DOPENSSL
_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DNDEBUG -menable-experimental-extensions -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtun
e=sifive-7-series -O2 -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048
CPUINFO: N/A
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-CTR 493.51k 937.15k 3263.57k 11619.33k 38786.39k 42412.71k
The [16000000.crypto]
popped up during test, and working frequency/mode is also important: I get nearly 27M/s if my cpufreq is set to ondemand
(perhaps it aggressively saves clocks, or frequent transitions cause lags). This is out of 1500MHz running core.
Note that you need to edit /etc/ssl/openssl.cnf
and include the following in area shown:
# To use this configuration file with the "-extfile" option of the
# "openssl x509" utility, name here the section containing the
# X.509v3 extensions to use:
# extensions =
# (Alternatively, use a configuration file that has only
# X.509v3 extensions in its main [= default] section.)
openssl_conf=openssl_conf
[openssl_conf]
engines=engines
[engines]
devcrypto=devcrypto
[devcrypto]
default_algorithms = ALL
USE_SOFTDRIVERS = 1
CIPHERS = ALL
DIGESTS = NONE
As suggested by OpenWRT community Solved (sorta) /dev/crypto No Ciphers or Hashes - #28 by rossb - For Developers - OpenWrt Forum
To check if it’s really uses /dev/crypto
, run openssl engine -t -c
:
(dynamic) Dynamic engine loading support
[ unavailable ]
(devcrypto) /dev/crypto engine
[AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CTR, AES-192-CTR, AES-256-CTR, AES-128-ECB, AES-192-ECB, AES-256-ECB, CAMELLIA-128-CBC, CAMELLIA-192-CBC, CAMELLIA-256-CBC]
[ available ]
The reason why USE_SOFTDRIVERS = 1
is suggested is probably because of this:
root@serval:/usr/src# openssl engine -post DUMP_INFO -t -c
(dynamic) Dynamic engine loading support
[ unavailable ]
(devcrypto) /dev/crypto engine
[AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CTR, AES-192-CTR, AES-256-CTR, AES-128-ECB, AES-192-ECB, AES-256-ECB, CAMELLIA-128-CBC, CAMELLIA-192-CBC, CAMELLIA-256-CBC]
[ available ]
Information about ciphers supported by the /dev/crypto engine:
Cipher DES-CBC, NID=31, /dev/crypto info: id=1, CIOCGSESSION (session open call) failed
Cipher DES-EDE3-CBC, NID=44, /dev/crypto info: id=2, CIOCGSESSION (session open call) failed
Cipher BF-CBC, NID=91, /dev/crypto info: id=3, CIOCGSESSION (session open call) failed
Cipher CAST5-CBC, NID=108, /dev/crypto info: id=4, CIOCGSESSION (session open call) failed
Cipher AES-128-CBC, NID=419, /dev/crypto info: id=11, driver=jh7110-cbc-aes (software)
Cipher AES-192-CBC, NID=423, /dev/crypto info: id=11, driver=jh7110-cbc-aes (software)
Cipher AES-256-CBC, NID=427, /dev/crypto info: id=11, driver=jh7110-cbc-aes (software)
Cipher RC4, NID=5, /dev/crypto info: id=12, CIOCGSESSION (session open call) failed
Cipher AES-128-CTR, NID=904, /dev/crypto info: id=21, driver=jh7110-ctr-aes (software)
Cipher AES-192-CTR, NID=905, /dev/crypto info: id=21, driver=jh7110-ctr-aes (software)
Cipher AES-256-CTR, NID=906, /dev/crypto info: id=21, driver=jh7110-ctr-aes (software)
Cipher AES-128-ECB, NID=418, /dev/crypto info: id=23, driver=jh7110-ecb-aes (software)
Cipher AES-192-ECB, NID=422, /dev/crypto info: id=23, driver=jh7110-ecb-aes (software)
Cipher AES-256-ECB, NID=426, /dev/crypto info: id=23, driver=jh7110-ecb-aes (software)
Cipher CAMELLIA-128-CBC, NID=751, /dev/crypto info: id=101, driver=cbc(camellia-generic) (software)
Cipher CAMELLIA-192-CBC, NID=752, /dev/crypto info: id=101, driver=cbc(camellia-generic) (software)
Cipher CAMELLIA-256-CBC, NID=753, /dev/crypto info: id=101, driver=cbc(camellia-generic) (software)
Information about digests supported by the /dev/crypto engine:
Digest MD5, NID=4, /dev/crypto info: id=13, driver=md5-generic (software), CIOCCPHASH capable
Digest SHA1, NID=64, /dev/crypto info: id=14, driver=jh7110-sha1 (software), CIOCCPHASH capable
Digest RIPEMD160, NID=117, /dev/crypto info: id=102, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA224, NID=675, /dev/crypto info: id=103, driver=jh7110-sha224 (software), CIOCCPHASH capable
Digest SHA256, NID=672, /dev/crypto info: id=104, driver=jh7110-sha256 (software), CIOCCPHASH capable
Digest SHA384, NID=673, /dev/crypto info: id=105, driver=jh7110-sha384 (software), CIOCCPHASH capable
Digest SHA512, NID=674, /dev/crypto info: id=106, driver=jh7110-sha512 (software), CIOCCPHASH capable
[Success]: DUMP_INFO
Note driver=jh7110-ctr-aes (software)
strings. /dev/crypto
thinks that it’s not hardware.
This option causes sshd to stop working on my VF2.
May be related to this SSH with openssl "mux digest failed" · Issue #56 · cryptodev-linux/cryptodev-linux · GitHub ?
Well, I wanted to make openssl faster because rsync and scp/sftp were slow. If we get openssl faster, but it cannot be used for sshd then this is the show stopper.
I would rollback the change, and then record the output of what is offered by sshd with a command like:
$ script sshd_protocols.txt
(local) $ nmap --script ssh2-enum-algos -sV -p 22 127.0.0.1
(remote) $ nmap --script ssh2-enum-algos -sV -p 22 ip_address_of_VF2
$ exit
Or you can find out what is offered by the server and available from the client, in preferential order with a command like:
(local) $ ssh -v 0
(remote) $ ssh -v ip_address_of_VF2
Hidden in the debug output, the lines you are looking for start with KEX algorithms:
, host key algorithms:
, ciphers ctos:
, ciphers stoc:
, MACs ctos:
, MACs stoc:
(ctos is client to server and stoc is the opposite direction). The most important of these if there are problems with sshd is probably going to be ciphers stoc
, in this particular case (CIPHERS = ALL, DIGESTS = NONE). Unless of course “DIGESTS = NONE” is somehow disabling access for “MACs stoc”.
But that would show up if you log, the speed of, all working algorithms before making a change with commands like:
$ script before_change.txt
$ openssl speed -seconds 1
$ exit
And then again after reinstating the change, and comparing what is expected by sshd to what was there to what is now missing.
$ script after_change.txt
$ openssl speed -seconds 1
$ exit
If there is a problem with some algorithms you can always edit /etc/ssh/sshd_config on the VF2 and add new lines (man sshd_config) to prevent the ssh daemon from offering any default protocols that are broken by the change (- means disable).
e.g. (something similar to the following)
Ciphers -aes192-ctr
MACs -hmac-sha1
Or for now you could limit sshd to only offer protocols that are the fastest (see the results of the openssl speed
and compare to the ciphers used by the sshd and ssh client), at least until any crypto problems are fixed.
Another option if you just need a higher throughput might be enabling compression (zlib@openssh.com
) which can sometimes help, unless the data you are transferring is already heavily compressed, of not very compressible (multimedia files).
sftp -C username@ip_address_of_VF2
ssh -C username@ip_address_of_VF2
Seems driver under upstreaming process has better logic, which might improve performance:
https://patchwork.kernel.org/project/linux-riscv/cover/20230411081424.131912-1-jiajie.ho@starfivetech.com/
If $ cat /proc/crypto
exists you can also benchmark the kernel cipher and hash algorithms with commands like the following:
$ sudo modinfo tcrypt
$ sudo modprobe tcrypt mode=1000
mode=1000 means list algorithms that are available, or not, to the kernel.
$ sudo modprobe tcrypt sec=1 mode=200
mode=200 means run speed tests on AES encryption and decryption cipher algorithms in the kernel for various key lengths.
sec=1 means run each test for 1 second, and then test next algorithm
$ sudo modprobe tcrypt sec=1 mode=400 alg="sha256"
Only test the speed of asynchronous sha256 (sha256-generic) hashing algorithm.
The benchmark results can be viewed, in another terminal, with either $ sudo dmesg
or $ sudo tail -f /var/log/kern.log
All output is sent to log files, so the kernel module command will appear to have hung and look like it is doing nothing at all. The prompt will return once the processing has finished.
Oh, sorry about that. To date, I did tests only in chroot with bind mounts. I will update & report soon.
Yes, I can confirm this, although my sshd does not use /dev/crypto
(probably because I built it myself), but nginx
fails while using it. I’ll see now how that patch can resolve this.
UPD. nginx
still fails even after applying this patch. cryptodev-linux is useless for me this way even if nginx
is configured with modern lines like these:
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers CHACHA20:ECDHE:ECDSA:AES256:!AES128:ARIA256:!ARIA128:CAMELLIA256:!CAMELLIA128:!aNULL:!MD5:!SHA1:!SHA256:!SHA384:!CBC:!aPSK:!aDSS:!aRSA:!MEDIUM:!LOW;
Either it passess through everything as seen by dmesg line "nginx" (31742) uses obsolete ecb(arc4) skcipher
or I need to explicitly make it cryptodev cipher-safe.
So it passed openssl test suite, but failed on the real usage with both sshd and nginx? Such a shameful.
Yeah.
And to this time, I wasn’t even able to fix nginx side. It still refused to work and spit same line to dmesg. Seems it requires much more investigation, but I have no much incentive to do so unless someone will roll out a driver which offloads crypto onto S76/E24 core or stuff. I mean, it does not even performs well right now. So no worth trying.
At least kernel cryptography is accelerated and I can do FDE. That’s enough for now.
Have you tried upstream version driver yet?