I would like to share the news that we just managed to run the subproject sherpa-ncnn of next-gen Kaldi on VisionFive2 for real-time speech recognition with a USB microphone.
You can find the documentation at
Everything is open-source, i.e., the code, the model, the data, and the documentation, etc.
The video demo is available at
Have you tried the same thing on the Raspberry pi 4? If so, how do they compare?
I have tried it on Raspberry Pi 4 Model B, which is faster than VisionFive2 and it can also run a larger model in real time.
VisionFive2 can only run a smaller model in real time.
By real-time, I mean the RTF is less than 1.