The way they solve this in video editing software like Shotcut or Premiere Pro, which also need frame by frame navigation, is a workflow known as "Proxy editing".
Essentially they create lower resolution and editing-friendly versions of the files, you create your edits on these proxies, and when you are ready to make the final export they swap in the original footage.
This seems the way to go and the fact that it's an industry standard there is an indication that no amount of buffer optimization or GPU acceleration would truly solve it.
I could add a function to create frame by frame friendly version of the video. I think the simplest would be to create the file next to the original with a suffix, as this would open the way for automatically detecting the kva annotation file when opening the original video. This way we can have a similar workflow: add your annotations on the proxy, then export a video based on the original.
One caveat is for making measurements and tracking as the lower resolution will reduce precision. But I think for the main use case of visually inspecting technique/posture it would be a good solution.
Two ways to go about it, either from the file browser by right clicking the thumbnail and a simple convert menu. Or by first opening the video and having an option under export.
For the resolution it should probably be a preset while keeping the original aspect ratio.