Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Stateless refer to the HW Interface. The stateless HW can accept decoding jobs in any order as long as all the information is properly provided (parameters extracted and deduces from the bitstream along with the previously decoded references). As a side effect, it is trivial to multiplex multiple streams using this type of HW.

The V4L2 layer keeps a bit of the state (more like caching, to avoid re-uploading too much information for each jobs). The userspace is responsible for bitstream parsing and DPB management (including re-ordering).



Oh so, you just provide whatever reference frames, if any, are needed, and it's just on you to make sure you've decoded what's necessary first? The difference here basically being that the hardware will not do the "bookkeeping"?


Correct.


Thanks for explaining.


Are there performance implications of needing to upload the entire state needed for a single frame? Or do none of these encoders have such caching anyway & thus it's just pushing the complex pieces of resource management out to user space where it belongs better?


I don't think anything is uploaded anywhere, you just need more RAM to keep frames around as long as they are necessary. The decoder operates on data in system's RAM.


Is that generally true to be faster rather than having dedicated RAM alongside the ASIC? Or are the unit economics not worth it and generally unified memory systems is the current dominating design?


Considering most pixels in the reference frames will be read, on average, less than once per generated frame, it makes no sense to have dedicated RAM.


But each reference frame is on average used for many generated frames, no? I mean that's kinda the point of them, isn't it?


Input is way smaller than output, so memory performance considerations there probably don't even register in the larger scale of things. (having to write the decompressed frame to RAM and read it again to scan it out to display)

Say 4-8kiB per frame on input leads to 4MiB frame on output.


I admit I'm not familiar with h246, I thought the motion vectors and such was applied on the decompressed reference image. At least that's how we implemented the psedudo-MPEG1 encoder/decoder in class.

Not having to decompress the reference frame for every decoded frame seems like a win.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: