Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Each time the [std::deque] slides, every value in the queue has to change position in memory

I don't think that this is true. `std::deque` has guaranteed O(1) complexity for insertions or deletions at the beginning or end of the container. It couldn't provide this if it was moving every value in the queue.

> How do we alter the stackSum when the queue moves?

You never quite answer this question. How are `sumOut` and `sumIn` related to `stackSum`? This is available in the code but I feel like there should at least be a one-line summary available without having to click through to the code.

> sumOut is what we need to add to the stack.

This is supposed to be `sumIn`?

The examples that you provide use a radius of 2, thus having a `sizeOfStack` of (radius+1)*2 = 9 and requiring a division by 9 for each pixel. Have you considered using a radius of 1 or a radius of 3 instead? This would require division by (1+1)*2 = 4 or (3+1)*2 = 16 respectively which could then be done with a bitshift instead of a division. You'd have to template the code on `radius` instead of passing it in as a runtime parameter so that the compiler could lower the divisions to bitshifts.



Thanks for the proofread! I had so much text juggling between this and the README that it was guaranteed some things would fall through the cracks! I updated the things you mentioned about `stackSum` and thanks for the catch on the `sumIn` definition.

> It couldn't provide this if it was moving every value in the queue.

I actually don't remember anymore what std::deque does under the hood, I did look into it, but the only thing I remember is that it was quite slow!

> You'd have to template the code on `radius` instead of passing it in as a runtime parameter so that the compiler could lower the divisions to bitshifts.

Yes, I really like this idea. Especially because radii only really vary between 1-48px for most drop shadow needs. It would be nice to have a handful of the common radii be ripping fast.


std::deque will never beat your hand-rolled ring buffer. It's a container, it will spend time managing memory every time you walk off the end of a chunk. If it's implemented right*, it will hold onto a chunk which fell off, and will only allocate twice. If it's implemented wrong*, it will allocate after `chunksize` pushes, and free every `chunksize` pops (which might be handled properly by your allocator). But it's still going to need to shuffle those chunks around, adding unnecessary overhead because you're using a ring in a way that is a perfect fit for your application: you use the value you're popping every time you push.

* for your purpose! Therein lies the challenge of writing standard libraries... choices must be made.


> std::deque will never beat your hand-rolled ring buffer.

std::deque may never beat a hand-rolled ring buffer of fixed size that maintains cursors, but it will beat a hand-rolled linked-list implementation.

The typical implementation will generally exhibit better cache locality than a linked list, and will outperform it for random access as required by the specification (amortized constant, vs linear).

(The typical implementation per cppreference, although I don't think this is formally required by the spec, is a sequence of fixed-length arrays.)


After some thought, I figured out a way to implement a chunked array of the sort specified by the c++ standard library, which does not require reallocation if the queue length stays below the chunksize. It involves storing cyclic offsets for each chunk.

So what I wrote isn't quite correct about needing to fiddle with memory regardless. But there is still overhead -- even if there's well-optimized code for the single chunk case, the container needs to check that condition.


Thanks for this!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: