VisionFive hardware performance counters

Has anyone successfully brought out the hardware performance counters on the VisionFive board? The Sifive manual details two programmable event counters where we could see events such as ‘exceptions taken’ or ‘cache misses’. I have tried with no success yet. If anyone has been able to do this, I would love an overview of how it is done. Thanks in advance!

Hi,
The performance monitoring hardware on the VisionFive doesn’t support interrupts, so perf sampling isn’t available. However, I have been able to build a local version of the 6.2 kernel (GitHub - esmil/linux: Linux kernel source tree) and the tools/perf binary on my Vision Five. Below is example output on the system:

$ perf stat /bin/true                                                                
                                                                                                                      
 Performance counter stats for '/bin/true':                                                                           
                                                                                                                      
              2.51 msec task-clock                       #    0.549 CPUs utilized                                     
                 0      context-switches                 #    0.000 /sec                                              
                 0      cpu-migrations                   #    0.000 /sec                                              
                41      page-faults                      #   16.354 K/sec                                             
         2,500,833      cycles                           #    0.998 GHz                                               
         1,019,572      instructions                     #    0.41  insn per cycle                                    
     <not counted>      branches                                                                (0.00%)               
     <not counted>      branch-misses                                                           (0.00%)               
                                                                                                                      
       0.004562571 seconds time elapsed                                                                               
                                                                                                                      
       0.005002000 seconds user                                                                                       
       0.000000000 seconds sys

The following are the hardware and cache events that it lists out:

  $ perf list hardware cache                                                           
                                                                                                                      
List of pre-defined events (to be used in -e or -M):                                                                  
                                                                                                                      
  branch-instructions OR branches                    [Hardware event]                                                 
  branch-misses                                      [Hardware event]                                                 
  bus-cycles                                         [Hardware event]                                                 
  cache-misses                                       [Hardware event]                                                 
  cache-references                                   [Hardware event]                                                 
  cpu-cycles OR cycles                               [Hardware event]                                                 
  instructions                                       [Hardware event]                                                 
  ref-cycles                                         [Hardware event]                                                 
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]                                                 
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]                                                 
  L1-dcache-load-misses                              [Hardware cache event]                                           
  L1-dcache-loads                                    [Hardware cache event]                                           
  L1-dcache-prefetch-misses                          [Hardware cache event]                                           
  L1-dcache-prefetches                               [Hardware cache event]                                           
  L1-dcache-store-misses                             [Hardware cache event]                                           
  L1-dcache-stores                                   [Hardware cache event]                                           
  L1-icache-load-misses                              [Hardware cache event]                                           
  L1-icache-loads                                    [Hardware cache event]                                           
  L1-icache-prefetch-misses                          [Hardware cache event]                                           
  L1-icache-prefetches                               [Hardware cache event]                                           
  LLC-load-misses                                    [Hardware cache event]                                           
  LLC-loads                                          [Hardware cache event]                                           
  LLC-prefetch-misses                                [Hardware cache event]                                           
  LLC-prefetches                                     [Hardware cache event]                                           
  LLC-store-misses                                   [Hardware cache event]                                           
  LLC-stores                                         [Hardware cache event]                                           
  branch-load-misses                                 [Hardware cache event]                                           
  branch-loads                                       [Hardware cache event]                                           
  dTLB-load-misses                                   [Hardware cache event]                                           
  dTLB-loads                                         [Hardware cache event]                                           
  dTLB-prefetch-misses                               [Hardware cache event]                                           
  dTLB-prefetches                                    [Hardware cache event]                                           
  dTLB-store-misses                                  [Hardware cache event]                                           
  dTLB-stores                                        [Hardware cache event]                                           
  iTLB-load-misses                                   [Hardware cache event]                                           
  iTLB-loads                                         [Hardware cache event]                                           
  node-load-misses                                   [Hardware cache event]                                           
  node-loads                                         [Hardware cache event]                                           
  node-prefetch-misses                               [Hardware cache event]                                           
  node-prefetches                                    [Hardware cache event]                                           
  node-store-misses                                  [Hardware cache event]                                           
  node-stores                                        [Hardware cache event]

If you want to do sampling on the machine with perf you will need to use a software event setup like the following example:

perf record -e cpu-clock du

-Will Cohen

3 Likes

Thanks for the reply! I actually got them enabled as well, almost after I posted :).
However, I noticed most of the counters show ‘not counted’:

perf stat -e cache-misses -e dTLB-load-misses -e dTLB-store-misses -e cycles -e instructions -e iTLB-load-misses -e iTLB-loads -e exception_taken -e integer_load_retired -e integer_store_retired -e atomic_memory_retired -e dcache_miss_mmio_accesses perf bench sched messaging -g 5 -l 5
Running ‘sched/messaging’ benchmark:
20 sender and receiver processes per group
5 groups == 200 processes run

 Total time: 0.177 [sec]

Performance counter stats for ‘perf bench sched messaging -g 5 -l 5’:

 <not counted>      cache-misses                                                            (0.00%)
 <not counted>      dTLB-load-misses                                                        (0.00%)
 <not counted>      dTLB-store-misses                                                       (0.00%)
     270313389      cycles                                                                  (19.80%)
     111009817      instructions                     #    0.41  insn per cycle              (34.09%)
 <not counted>      iTLB-load-misses                                                        (0.00%)
 <not counted>      iTLB-loads                                                              (0.00%)
 <not counted>      exception_taken                                                         (0.00%)
 <not counted>      integer_load_retired                                                    (0.00%)
 <not counted>      integer_store_retired                                                   (0.00%)
 <not counted>      atomic_memory_retired                                                   (0.00%)
 <not counted>      dcache_miss_mmio_accesses                                               (0.00%)

   0.404258240 seconds time elapsed

   0.203738000 seconds user
   0.415510000 seconds sys

My initial thought here is maybe I need to make updates to openSBI to write to the mcountinhibit in the mstatus register. Or is it possible these are not supported with the board? Thanks in advance!