Experimental Gentoo Image

andrew · April 3, 2023, 6:26am

After copying the GPU files to my Gentoo install and installing opencl and pyopencl I ran a python benchmark below to see the difference in speed between the CPU and GPU

# example provided by Roger Pau Monn'e

import pyopencl as cl
import numpy
import numpy.linalg as la
import datetime
from time import time

a = numpy.random.rand(1000).astype(numpy.float32)
b = numpy.random.rand(1000).astype(numpy.float32)
c_result = numpy.empty_like(a)

# Speed in normal CPU usage
time1 = time()
for i in range(1000):
        for j in range(1000):
                c_result[i] = a[i] + b[i]
                c_result[i] = c_result[i] * (a[i] + b[i])
                c_result[i] = c_result[i] * (a[i] / 2.0)
time2 = time()
print ("Execution time of test without OpenCL: ", time2 - time1, "s")


for platform in cl.get_platforms():
    for device in platform.get_devices():
        print ("===============================================================")
        print ("Platform name:", platform.name)
        print ("Platform profile:", platform.profile)
        print ("Platform vendor:", platform.vendor)
        print ("Platform version:", platform.version)
        print ("---------------------------------------------------------------")
        print ("Device name:", device.name)
        print ("Device type:", cl.device_type.to_string(device.type))
        print ("Device memory: ", device.global_mem_size//1024//1024, 'MB')
        print ("Device max clock speed:", device.max_clock_frequency, 'MHz')
        print ("Device compute units:", device.max_compute_units)

        # Simnple speed test
        ctx = cl.Context([device])
        queue = cl.CommandQueue(ctx, 
                properties=cl.command_queue_properties.PROFILING_ENABLE)

        mf = cl.mem_flags
        a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
        b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b)
        dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, b.nbytes)

        prg = cl.Program(ctx, """
            __kernel void sum(__global const float *a,
            __global const float *b, __global float *c)
            {
                        int loop;
                        int gid = get_global_id(0);
                        for(loop=0; loop<1000;loop++)
                        {
                                c[gid] = a[gid] + b[gid];
                                c[gid] = c[gid] * (a[gid] + b[gid]);
                                c[gid] = c[gid] * (a[gid] / 2.0);
                        }
                }
                """).build()

        exec_evt = prg.sum(queue, a.shape, None, a_buf, b_buf, dest_buf)
        exec_evt.wait()
        elapsed = 1e-9*(exec_evt.profile.end - exec_evt.profile.start)

        print ("Execution time of test: %g s" % elapsed)

        c = numpy.empty_like(a)
        cl._enqueue_read_buffer(queue, dest_buf, c).wait()
        error = 0
        for i in range(1000):
                if c[i] != c_result[i]:
                        error = 1
        if error:
                print ("Results doesn't match!!")
        else:
                print ("Results OK")

Here is the result

StarFive ~ # python test.py 
Execution time of test without OpenCL:  34.012295722961426 s
===============================================================
Platform name: PowerVR
Platform profile: EMBEDDED_PROFILE
Platform vendor: Imagination Technologies
Platform version: OpenCL 3.0 
---------------------------------------------------------------
Device name: PowerVR B-Series BXE-4-32
Device type: ALL | GPU
Device memory:  7927 MB
Device max clock speed: 594 MHz
Device compute units: 1
Execution time of test: 0.014952 s
Results OK

bing · April 21, 2023, 12:20pm

In case someone want to use GUI with gentoo on StarFive 2, I created gentoo overlay for building mesa-21.3.9 with img-gpu-powervr-bin-1.17.6210866 in bing c / gentoo-overlay · GitLab

andrew · April 21, 2023, 1:42pm

@bing Thank you.

NOTE: You may want to review the directories you install to.
EG
src_install() {
cd ${S}/target/usr/lib
insinto /usr/lib
doins *

On my system they are installed in /usr/lib64

cd ${S}/target/etc/vulkan/icd.d
insinto /usr/share/vulkan/icd.d
newins icdconf.json pvr_icd.json

On my system they are installed in /etc/vulkan/icd.d

bing · April 22, 2023, 1:26pm

yeah I think /usr/lib64 is the default path, I just use the ddk’s path

I prefer to put icd file in /usr/share so update won’t overwrite user’s custom conf

andrew · April 22, 2023, 6:57pm

We should stick to the Gentoo Standards or it may cause problems later down the road with mergeusr and other changes.
See https://projects.gentoo.org/qa/policy-guide/filesystem.html

bing · April 23, 2023, 7:15am

good point, I’ve updated ebuild for img-gpu-powervr-bin to use /usr/lib64

andrew · April 23, 2023, 10:01pm

Here are the results from the clpeak benchmark running under Gentoo on my VisionFive 2
The Source code is located here GitHub - krrishnarraj/clpeak: A tool which profiles OpenCL devices to find their peak capacities

Platform: PowerVR
  Device: PowerVR B-Series BXE-4-32
    Driver version  : 1.17@6210866 (Linux unknown)
    Compute units   : 1
    Clock frequency : 594 MHz

    Global memory bandwidth (GBPS)
      float   : 1.29
      float2  : 2.34
      float4  : 7.01
      float8  : 3.02
      float16 : 4.69

    Single-precision compute (GFLOPS)
      float   : 8.92
      float2  : 17.53
      float4  : 17.08
      float8  : 10.61
      float16 : 12.29

    Half-precision compute (GFLOPS)
      half   : 8.43
      half2  : 17.52
      half4  : 17.07
      half8  : 10.59
      half16 : 12.29

    No double precision support! Skipped

    Integer compute (GIOPS)
      int   : 8.90
      int2  : 8.93
      int4  : 8.99
      int8  : 9.27
      int16 : 9.19

    Integer compute Fast 24bit (GIOPS)
      int   : 8.90
      int2  : 8.93
      int4  : 8.99
      int8  : 9.27
      int16 : 9.19

    Integer compute Fast 24bit (GIOPS)
      int   : 8.90
      int2  : 8.93
      int4  : 8.99
      int8  : 9.27
      int16 : 9.19

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 0.52
      enqueueReadBuffer               : 0.13
      enqueueWriteBuffer non-blocking : 0.52
      enqueueReadBuffer non-blocking  : 0.13
      enqueueMapBuffer(for read)      : 4977.81
        memcpy from mapped ptr        : 0.13
      enqueueUnmap(after write)       : 14166.56
        memcpy to mapped ptr          : 0.51

    Kernel launch latency : 71.72 us


Platform: Clover
clCreateContextFromType (-1)

andrew · April 1, 2024, 6:14pm

If you want to try one of the latest version of Chromium on your VisionFive 2 running Gentoo I have created a Pull Request on the riscv Gentoo Overlay. It is for www-client/chromium: 123.0.6312.58 and takes 3-4 days to emerge.

I have it running and it seems faster than Firefox on my VisionFive 2.
If it works for you please comment on the PR and let them know that you have successfully emerged and it.

andrew · April 8, 2024, 9:23pm

My Pull request has been merged and is now available on the riscv Gentoo Overlay.