V8 does heavy code optimization and Node.js http sever code should be well optimized as well. And maybe the load balancer for the multicore Node test isn't optimized enough. Thus, these results feel a little shady. Anyway, Vert.x was the trigger that I am finally downloading the JVM (JDK) to try Vert.x (and maybe later Clojure).
But there's still one major drawback—the non-existant ecosystem. I know the hint to look for libs from the Java world but I need a concrete and precise guide how to do this. Let's say I want to plugin some Java lib for image manipulation. How? And who will guarantee that these libs will be concurrent and/or non-blocking as well? The lib developer or Vert.x? At the moment there is—except few out-of-the-box modules—nothing. No 3rd party libs, no module manager a la npm and no guide or documentation how to glue Java libs to this Vert.x thing. Nothing. Correct me if I'm wrong.
It's funny, the path software technology takes. A Java clone of a fairly basic web engine (which is a fairly simple application type to begin with) convinces people to try the JVM which is the most performant managed runtime around and definitely one of the most advanced pieces of software ever created. I'm sometimes amazed that and people coming from easy-to-use web languages have never been exposed to it, and if they have they still have their doubts.
Dude, I've been developing real-time military applications and we've moved from C/C++ to Java and got performance gains (I'm not saying that C can't beat Java doing specific computations - of course it can - but when building a large application with millions of lines of code and a large team of developers, Java would be a safer bet than C++ if speed is your concern). In other words, no other environment can beat the JVM.
And, yeah, you should try Clojure. It's a cathartic experience.
EDIT: The only doubts I have about vert.x is that it can consistently beat "old" JVM servlet containers under heavy, real-world loads. That remains to be seen.
Non-existant ecosystem is a harsh assessment especially given the massive amount of java code our there. Any jar file can be used so the Java ImageIO package can easily be leveraged. It's early on for this tool chain, but all you have to do is get a 3rd party jar file in the class path of the server, and you can directly call Java methods from Javascript.
importPackage(java.io);
var file = new File('/blah/blah/blah.txt');
Other languages would be equally easy as well. It's burried, but here is some relevant information from the docs on integrating 3rd party libs:
-cp <path> The path on which to search for the main and any other resources used by the verticle. This is ignored if you are running an installed module. This defaults to . (current directory). If your verticle references other scripts, classes or other resources (e.g. jar files) then make sure these are on this path. The path can contain multiple path entries separated by : (colon). Each path entry can be an absolute or relative path to a directory containing scripts, or absolute or relative filenames for jar or zip files. An example path might be -cp classes:lib/otherscripts:jars/myjar.jar:jars/otherjar.jar Always use the path to reference any resources that your verticle requires. Please, do not put them on the system classpath as this can cause isolation issues between deployed verticles.
Since they are not limited by a single threaded VM, you would run those using the worker pools and communicate with them via an actor type model similar to erlang. So, modulo a small amount of verticle code to deploy your image processor, yes.
Not automagically, but java has some really good tools for helping. See the http://docs.oracle.com/javase/7/docs/api/java/util/concurren... package. Check the Executors section and you can easily distribute work via threads to all your cores. In fact, vert.x is using netty, which makes heavy use of threads along with async IO. You can do both.
You don't need to necessarily have non-blocking libraries to take advantage of the async I/O. Netty uses worker threads to prevent blocking the server event loop and I assume Vert.X does as well. You can do blocking JDBC queries, Redis, etc.
As far as concurrency, most libraries state in the docs what is threadsafe and what is not, so this is more or less "caveat emptor" and RTFM.
Vert.x has its own module subsystem which appears to package up .jar files and some glue code for the Vert.x-specific interface, but I assume you can call any included Java classes directly:
http://vertx.io/mods_manual.html
(Interestingly enough, Vert.x uses Gradle for its build/dependency manager, but I guess wanted something more Node-like for its module system)
We know the java libraries are there for lots of things, but Node is all about async and handling thousands of connections. It does this by forcing the entire ecosystem to be async too (including things like database drivers).
By using a Java JDBC database driver you're completely losing any async support. Same presumably goes for redis or Mongo drivers. You can do some of the work with threads and pooling, but it's still not the same, and makes this another useless micro benchmark.
The value of this is far overstated. You can think of your server as being composed of async queues and thread pools, even in node. About the only thing in a stack that truly is async is connection handling via epoll. Mysql itself is a big threadpool, bounded by the number of cores and table locking.
The jvm has amazing threading support, doubly so if you use it with a language like scala or clojure. You can and should handle the connections asynchronously and use a thread pool for things like db access. It works well, people have done this with the jvm for years.
Node's API is async. Under the hood everything is done via threadpools, same as in Java or any other stack. Your hardware knows how to run threads; that's all. Whether or not that's what's exposed to you as a programmer is a different story.
You don't actually get a performance boost from Node being "async". Node's async abilities simply give you transparent access to threads that are otherwise unavailable with javascript, and it's the threads giving you performance.
I don't think this is true. Nodejs uses epoll/kqueue/select etc to multiplex access to multiple file descriptors from a single main thread.
The async API is actually a price to pay (spaghettification).
For example, the go language took a different approach: it created a cheap thread-like construct which doesn't incur in the biggest overhead of classical threading (namely pre-allocated linear stacks and context switching/sleeping requiring a systemcall; all this aided by the compiler), and a cheap mean of communication (channels).
Then, the whole core IO library was written using a multiplexing async model (epoll...), which communicates with the user part of the library via channels). The result is a blocking like API which under the hood behaves like an async implementation.
A similar goal is also met by http://www.neilmix.com/narrativejs/doc/ and other javascript 'weavers' which convert "sequantial looking" code into callbacks.
Yes, but at the end of the day, underneath it all, you gotta have threads because that's what the hardware understands. Even if you use hardware interrupts to detect IO, you still need a plain old thread to handle it. The only difference between various languages and runtimes is how you distribute tasks among the threads. Some environments provide green threads that have a partial stack, but even they are handed off to a thread pool (or a single thread) for execution.
It's been found that if you employ only a single thread (that can run any number of tasks) you get a performance boost over using a larger threadpool under some conditions, but a single thread wouldn't let you scale by taking advantage of all processors.
I feel that the cause of misunderstanding lies in the fact that "thread" in this context is usually means "thread based IO", which means that when a thread issues a IO request it remains blocked until the IO request returns, leaving CPU time to other threads. All this regardless how many processors you have; it works perfectly fine with single processors.
Async IO is different, it's a different patter of access to IO and as such it's orthogonal to any threading or multiprocessing that's going on in order to actually do stuff in response to that IO.
> It's been found that if you employ only a single thread (that can run any number of tasks) you get a performance boost over using a larger threadpool under some conditions, but a single thread wouldn't let you scale by taking advantage of all processors.
Indeed. Nodejs solution to this problem is to have a cluster of nodejs processes and a dispatcher process on top. So multiprocessing is done the "old way".
In that case, Java gives you both options: blocking IO and non-blocking, polling IO. Netty can use both, but most people use it with the non-blocking option. Experiments have shown that sometimes one is faster and sometimes the other.
Umm, hardware knows NOTHING about threads. Threads give you a very fake view of the hardware. Everything about threads is an emulation over the hardware layer, hence why they have a large memory overhead.
The CPU is aware of a thread's instruction pointer and stack pointer (that's how some CPUs are able to support hyperthreading). Perhaps it's possible that the OS could somehow manipulate that to implement threads that are not as heavyweight as "common" threads, but I'm not aware of any OS that does that. Threads are the only multiprocessing abstraction provided by the CPU and the OS (although now there are some new abstractions for GPUs).
I would call it "node.js's largest implementation issue". It is not that JavaScript gives you another choice, while you make it sound like it was a principled decision.
Other platforms/languages have real concurrency constructs and don't suffer node's limitations.
Well, no, you could have done all of these things synchronously, and in fact JS would have preferred it because JS is intrinsically single-threaded. Ryan Dahl's stated inspiration for Node was that he struggled with a certain slowness in Ruby because it blocked for everything, so he tried to build an entire language that simply wouldn't let you sleep(). You can go listen to his talks; they're on YouTube. It was a principled decision.
I don't know whether making JS single-threaded was a principled decision -- if anything it was presumably the KISS principle at work. However, it was actually a ridiculously nice choice to offer a single-threaded-asynchrony model. It sometimes gets in the way rather obtusely -- Firefox can still (if very rarely) fail to introspect and then crash when some ad script on your page goes into an infinite loop! -- but on the whole, it is very nice to always know that while I'm in this function, modifying this variable, nobody else can interfere.
With that said, I also think that the lack of good concurrency planning is indeed missing, and that it will probably enter the language at a future time.
>We know the java libraries are there for lots of things, but Node is all about async and handling thousands of connections. It does this by forcing the entire ecosystem to be async too (including things like database drivers).
By using a Java JDBC database driver you're completely losing any async support. Same presumably goes for redis or Mongo drivers. You can do some of the work with threads and pooling, but it's still not the same, and makes this another useless micro benchmark.
Nothing inherently special about "forcing the entire ecosystem to be async too", especially since Node is more or less FORCED to do that, because javascript is single threaded.
Add the bad callback spaghetti implementation of async, and the main benefit of Node is easy deployment, and accessibility to the millions of javascript programmers.
As an async environment it doesn't offer anything either new or too compelling.
> As an async environment it doesn't offer anything either new or too compelling
That isn't strictly true.
If you focus on the traditional scripting languages as your competition: Ruby, Python, PHP, Perl: Then you start to realise that Node.js does offer a similar language structure (dynamic, no compilation, etc), with the benefits of thousands of concurrent connections (which those languages will do with certain modules) but while forcing all the libraries to also be async (which those languages DO NOT do).
At my last job I had to build an SMTP server capable of scaling to 50k concurrent connections. Building this in Perl was fine, except for any library I wanted to use - all of the libraries were synchronous. So now I wrote Haraka, which Craigslist are now using as their incoming SMTP server.
If you compare all that to Java you get slightly less performance but probably lower memory requirements. And that's OK. Different strokes for different folks.
No, not really. You can happily run blocking and non-blocking code on the JVM without problems. The inability of JavaScript to do that created the need make everything asynchronous, not the other way around.
> Let's say I want to plugin some Java lib for image manipulation. How? And who will guarantee that these libs will be concurrent and/or non-blocking as well?
What? What do you mean by non blocking in this context?
>Let's say I want to plugin some Java lib for image manipulation. How? And who will guarantee that these libs will be concurrent and/or non-blocking as well?
Well, this is the JVM, not a single threaded javascript engine. As the other guy says below:
"You don't need to necessarily have non-blocking libraries to take advantage of the async I/O. Netty uses worker threads to prevent blocking the server event loop and I assume Vert.X does as well. You can do blocking JDBC queries, Redis, etc."
And threadsafe libs --of which are tons--, usually advertise it on "the box.
V8 does heavy code optimization and Node.js http sever code should be well optimized as well. And maybe the load balancer for the multicore Node test isn't optimized enough. Thus, these results feel a little shady. Anyway, Vert.x was the trigger that I am finally downloading the JVM (JDK) to try Vert.x (and maybe later Clojure).
But there's still one major drawback—the non-existant ecosystem. I know the hint to look for libs from the Java world but I need a concrete and precise guide how to do this. Let's say I want to plugin some Java lib for image manipulation. How? And who will guarantee that these libs will be concurrent and/or non-blocking as well? The lib developer or Vert.x? At the moment there is—except few out-of-the-box modules—nothing. No 3rd party libs, no module manager a la npm and no guide or documentation how to glue Java libs to this Vert.x thing. Nothing. Correct me if I'm wrong.