More

apoorvgarg · on April 20, 2019

https://www.youtube.com/watch?v=37wFVVVZlVU

This recent panel (Armstrong, Hewitt and Hoare) discussion on concurrency was the first time I got to know about him and it gave me a lot of food for thought.

apoorvgarg · on April 4, 2019

Before we make a technology choice, we should be clear what those choices are. SQL is a query language and DynamoDB is a database. "NoSQL" technologies can be fronted with SQL interfaces. Relational databases can vary a lot too in their ACID compliance ("I" being the component with maximum variance).

The choice of technology should first be based on the problem and not whether it is serverless. Choosing DynamoDB when your application demands strict serializability would be totally wrong. I hope more people think about the semantic specifications of their applications (especially under concurrency).

apoorvgarg · on Feb 21, 2019

Yes, and I think the quoted paragraph has so much more to do with coding around interfaces (behaviour) than with abstraction using non-exported package symbols.

apoorvgarg · on Feb 21, 2019

Regarding single-letter variables, mathematical functions might be an exception. I think writing func gcd(a, b int) int {...} is better than other alternatives. There is simply no need to assign any more meaning to the arguments other than their type.

Cthulhu_ · on Feb 22, 2019

Maybe if you - and more importantly, everyone that will read the code in the future - are comfortable with domain-specific expressions like that. It depends on the audience really.

As an extreme example, scalaz is similarly a very specialized DSL. Or in my personal experience, functional constructions like map, flatMap, foldLeft and reduce (which I never learned in school).

bnolsen · on Feb 22, 2019

no, it's not. bigger deal is that single letter vars make it much harder to search for variable usage in files using arbitrary editors.

coldtea · on Feb 22, 2019

Don't use arbitrary editors, or editors that don't support "search for standalone identifier" (e.g. surrounded by space, in (), with ; after etc.".

apoorvgarg · on Feb 22, 2019

What would be the signature of your gcd function ?

base698 · on Feb 22, 2019

Numerator, denominator?

mrgriffin · on Feb 22, 2019

For GCD? I would find those names very misleading since semantically the order of the arguments to GCD is irrelevant (even if in the implementation you typically mod by b there's no reason you couldn't mod by a).

sacado2 · on Feb 22, 2019

This is not mathematically correct. Those names would be pretty misleading, actually.

coldtea · on Feb 22, 2019

Good luck with any bigger equation then

apoorvgarg · on Jan 11, 2019

To make your comment useful [0], I would suggest explaining why and giving some context. For example, you could start with the size of your AWS infra, which other tools are you using other than CloudWatch etc.

[0] https://news.ycombinator.com/newsguidelines.html

apoorvgarg · on Nov 18, 2018

The question is not suggesting a follow feature. It is just a hypothetical. The aim is to list great contributors on this platform, so that anyone looking at this thread can learn something new from their contributions.

apoorvgarg · on Nov 18, 2018

"Oracle DB is at least 10 years ahead of anything else"

Is this just sales talk or does it hold any technical truth?

apoorvgarg · on Oct 17, 2018

This might be useful https://github.com/ant-design/ant-design-pro

apoorvgarg · on Oct 9, 2018

Again, all this confusion is due to not being clear about virtual and physical memory. Every OS thread in linux does have a fixed virtual memory size (it is claimed when a thread is created, but this claim is on the virtual memory. This value is fixed at creation time). As you grow your program's conceptual stack in this virtual memory area, you would soon hit new pages of virtual memory, leading to page faults and linux allocating physical memory for you.

From a virtual memory standpoint, every thread would appear to have a fixed size which can be set when you create it. [0]

From a physical memory standpoint, every thread would appear to have a dynamic size (but bounded by the virtual memory size of course).

[0] http://man7.org/linux/man-pages/man3/pthread_create.3.html

apoorvgarg · on Oct 8, 2018

I think it is so important to not arbitrarily throw words like 'RAM' in such discussions, and use more unambiguous terminology such as - virtual address space and the resident set size (RSS).

On a 64-bit machine, with 1MB stack size (this is a claim on the virtual address space, not the physical memory), you can have millions of threads too (that you may hit a lower limit due to other OS level knobs is another matter).

timClicks · on Oct 8, 2018

But wouldn't the consequence of spawning millions of native threads be to spill those virtual addresses into many, many pages and incur many more TLB misses and page faults generally?

nostrademons · on Oct 8, 2018

Only the portions of the stack that are actually touched trigger a TLB miss and page fault. If you've got a million threads but each only touches the first 4K of stack space, you end up only touching 4G of RAM even though you've mapped 1T of it.

cryptonector · on Oct 9, 2018

When you have terabytes of RAM you want large pages anyways.

gpderetta · on Oct 9, 2018

As each stack will need to use [1] at least one page of physical memory, large pages are not great for stacks as they will normally waste memory.

[1] for classic contiguous stacks of course. That doesn't apply for separately allocated frames or segmented stacks.

cryptonector · on Oct 9, 2018

Sure, you can have millions of 1MB stacks, but unless you have terabytes of RAM performance is going to suffer.

If you write your app with more explicit state rather than implicit state (bound up in the stack), such as by writing a C10K style callback-hell, thread-per-CPU application, you're going to get much much better performance. The reason is that you'll be using less memory per-client, which means fewer cache fills to service any one client, which means less cache pressure, all of which means more performance.

The key point is that thread-per-client applications are just very inefficient in terms of memory use. The reason is that application state gets bound up in large stacks. Whereas if the programmer takes the time to make application state explicit (e.g., as in the context arguments to callbacks, or as in context structures closed over by callback closures) then the program's per-client footprint can shrink greatly.

Writing memory-efficient code is difficult. But it is necessary in order to get decent throughput and scale.

apoorvgarg · on Oct 9, 2018

Nobody is suggesting that you create a million threads. But to think that it is not possible without TBs of physical memory is a fallacy (for 64-bit machines).

The thread switching costs itself could be prohibitive to performance. IIRC, goroutine switching cost is roughly 1/10th of linux thread switching costs.

cryptonector · on Oct 9, 2018

It's "possible". If you care about performance, then it's not. Obviously I'm not defining "performance", but you'll know this when you see this. Paging is the kiss of death for performance.

apoorvgarg · on Oct 9, 2018

Paging has nothing to do with this. When I say 1 MB thread stacks, it means the maximum size of that thread's stack in the virtual address space [0]. Each of these million threads could be using only a few KBs for its stack (out of that 1 MB of stack space). That would imply a few gigs of physical memory => no paging.

[0] http://man7.org/linux/man-pages/man3/pthread_attr_setstacksi...