Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I work with supercomputers.

Mainframes are very, very extended nowadays. Researchers use them, the government uses them, some big business use them.

From my point of view, there are four "mainframe" concepts, and one of them is not a monolithic mainframe per se:

1. Supercomputers: thousands of CPUs, each with access to a few gigabytes of RAM, which can run massively parallel jobs

2. Shared memory machines: 64-256 CPUs, each of them can access to 0.5-2TB of memory

3. GPUs: Hundreds of CPUs, each with access to 1-4 GPUs and a few gigabytes of RAM, to run highly parallel vector jobs

4. Clusters: Commodity hardware (either racked or workstations) connected by a network which uses a queue system

The first three belong to the modern implementation of the mainframe, the last one is a "supercomputing"-like facility, built with regular hardware.

This brings us to OP's article, and to my main point. There are many research centres, business and government which use them, but there is a trend towards using regular Intel processors instead of mainframe-like vector processors.

Why? Because they're easier to program for. Us at the Barcelona Supercomputing Center have worked with ppc, ppc64, cell, ia64, i686 and x86_64 architectures, and here's what happened.

Some of these architectures don't have debugging/profiling tools available.

I joke with my colleagues about how it has been 10 years from the first CPU with hyperthreading and we still don't have fully automated tools to take advantage of parallelism on legacy code. We know it's difficult to do, but that has to be taken into account when planning to buy a new machine. Old code might run slower on the new machine because of the highly parallelism but lower raw speed.

Others (cell) are just so difficult to optimize that we needed 2 postdocs working 2 years only to make a matrix multiplication take about 80% of the full computing power. I'll only say that our next machine was going to be a Cell supercomputer, but that idea was dropped after a scientific panel advised against it.

Some months ago I took a course on GPU programming, and for those of you who have never programmed on vector processors, it is like the Hello World in brainfuck. My CS degree+masters wasn't just enough. It is overwhelming. The guy teaching the course was a PhD with 3 year experience programming GPUs, and he admitted that most of the times, if you compare the time it takes for you to program on a GPU with the time you save running the software, it is just not worth it. Hey, sometimes it is, and you save a lot of time. But not for most cases. So we're back to regular Intel processors again.

What I'm trying to say is that nowadays we buy machines for different purposes (large number of cores, large amount of memory, large throughput for vector multiplication, etc) but most of them are x86_64. GPUs are an exception because we buy software which has already been adapted to vector operations, but somebody had to spend many days adapting the code.

We work with Intel and IBM and we have access to the most recent CPUs and tools. IBM makes great hardware but they don't have anybody writing a development framework for that hardware, so you never reach even 70% of its peak throughput.

TL;DR: Yes, many companies use mainframes, but most of them try to put x86_64 processors in there, because they are easier to parallelise and there are many tested programming tools available.



Nice rant, but not really relevant.

Unfortunately, none of these are mainframes. Mainframes are a fairly different architecture and usually run OS/390 or z/OS these days. They really aren't about performance computing and are more oriented towards databases and IO.


We also work with banks and process huge amount of data, but we don't use those kind of mainframes.

I'm not saying nobody uses them, but rather that there is a trend to use a NoSQL database on a regular Linux system rather than z/OS.

It's easier to find developers, more widespread support, and hopefully they will be more compatible in the future.

Edit: chuckMcM's comment [1] provides more insight on this. I must admit I've never worked with traditional mainframes myself, but I know of many research groups who are trying to develop new high-troughput data channels for commodity hardware (i.e. fibre-channel disks).

The current trend is going towards Intel+Nvidia+Linux. Anything else might be a good investment for some, but generally a bad idea for most.

http://news.ycombinator.com/item?id=4447122


Most of banks use linux boxes for downstream systems (reporting and so on) but a mainframe is the centrepiece of it all, executing end of day batch jobs (this can be anything from the feeds to the clearing house, or end of day reconciliation) and processing trades.

You will be surprised, but it is due to the fact that banks were some of the first people to start using computing decades ago, and some of the programs I have seen in use at banks were older than me. Moving everything to a Intel based platform removes that backwards compatability, but z/OS is made with this in mind.


> Most of banks use linux boxes for downstream systems (reporting and so on)

I don't know - for regular client-server computing, I've seen a mix at the 6 banks I've worked at - some Linux, some Windows Server, some Solaris. Most banks have a mix of all three, but have one that they "major" in.

Also, mainframes tend to be more of a Retail banking thing, for the core banking platform.Some Investment Banks use mainframes for back office stuff, but not many (in my experience).

Where I am working now, I have seen plaques in recognition for using a particular mainframe program from VISA for 20+ years. Also, the recently-retired core-banking platform from an acquired retail bank ran on mainframes, but was written in assembler.


"process huge amount of data"

What kinds of workloads though? I can see that "supercomputers" might be appropriate for modelling, analysis and reporting but that's a very different kind of scenario to highly available high throughput transaction processing that mainframes have traditionally excelled at.


And reliability, aren't they?


Yeah, I was just talking about architecture - I left out a lot of good stuff.

These things are fascinating (as is AS/400, which is a mini and also an entirely different thing) in convergent evolution with modern hardware.


Nice rant, but All 4 of your examples are supercomputers, not mainframes. A supercomputer is optimized for absolute performance. A mainframe is optimized for reliability and guaranteed performance. With a supercomputer if you get a soft error on a block, you retry the block. On a mainframe, it's too late.


More to the point, mainframes are designed for very good performance on commercial applications -- often database applications that aren't too different from what you could knock together with PHP and MySQL except the names of the tools are different.


Yes, yes, of course. My point is that even traditional mainframes users are now moving towards regular supercomputers, because of its convenience.


It seems strange to say that it's so difficult to program for a GPU. It wasn't hard to learn or to do. The OpenCL specification struck me as quite accessible and straightforward; plus there are a plethora of SDK examples.

Sorry, I don't mean to detract from your excellent points. I was just wondering if there was some miscommunication. The idea of me (a self-taught 24 year old) being more competent than a postdoc feels absurd, so it probably isn't true. Yet GPU programming is very easy for me.

Just an interesting mystery. I would like to help people out if I can.


Learning the syntax of programming for GPU is easy. The problem is porting algorithms to utilize the GPU most efficiently. This means taking into account SIMD architecture, warps, different kinds of memory, ... It's easy to port code to run on the GPU, it's not easy to actually make it run faster than a general purpose CPU.

If you understood this and still think it's easy, then congratulations you'll be able to pick and choose your jobs :)


Can you give an example of an algorithm that you feel is difficult to port to a GPU in a way that takes advantage of its capabilities? I'd like to try my hand at porting one. It's a lot of fun and hasn't been difficult so far. I'm genuinely curious whether it's odd that "taking advantage of the characteristics of different kinds of memory" is quite natural to me, or if I have a distorted view of my own capabilities. Either way I'll learn something, though, which is the fun part.

(Obviously something like scrypt would be difficult to port to a GPU since by definition it's un-parallelizable, but other algorithms should be doable.)


Dijkstra's shortest path. (IE: Pointer chasing). Goertzel's algorithm

If you have a simple, straight through algorithm running on a few million datapoints. GPU's are great. If it deviates even a little from there, gpu's start to fumble. Writing those two algorithms is an afternoons exercise in Mathematica or matlab, but several weeks adventure on a GPU.

And remember computers are cheaper than humans (Currently, $0.12/hr at amazon and $10/hr minimum wage).


Try any P-complete problem or arbitrary integer multiplication.


Have you started benchmarking, yet? Just see if you a) can do way better than the CPU, and b) are competitive with other people porting to the GPU. Of course, you'll have to choose an algorithm that's already ported for the second comparison.


Regarding the second comparison, it's easy to do trivially better than someone else's work if you use their work as a starting point. But that type of optimization doesn't really matter as much as the first one you mentioned: the ability to "take that which would have run on a CPU, and parallelize it to run on a GPU.". It has always felt quite natural for me to do that, so it was strange to hear that it's so hard for others. Why is it difficult for them, but easy for me?

The easiest way for me to solve this mystery is to try to port an example someone considers difficult. Do you know of any?


try matrix multplication. I mean a real matrices, say 10000x10000. And once you're there, try a LU decomposition of matrix, with proper numerical erro handling. You'll soon get to the point where you fell it's harder than you tought. And BTW, matrix multiplication or LU decomposition are super fundamental stuff when solving equations (which in turn is super fundamental when you want to compute bridges, constructions, aerodynamics, etc...)


Sparse or dense matrices?


I don't know very much about this, but do you think that part of your success has just been the benefit you get from going from dynamic ram to static ram? Seems that alone would give you a baseline performance improvement?


GPUs use dynamic RAM too (though discrete cards usually have GDDR rather than DDR, which gives higher bandwidth). The most fundamental difference between a CPU and GPU isn't the memory hierarchy (modern GPUs are growing cache hierarchies as well) but the core architecture (vector/highly threaded "small cores", vs. out-of-order/superscalar "big cores").


Also remember that each GPU architecture will likely have different strengths and weaknesses, making optimization difficult. To say the least.


They do have different characteristics, but presently you can focus on just two architectures: the 7970 and the 680.


Hey, maybe you are very good at address and data organizing.

I'm serious. It's just so difficult for the majority of us. As relix said, if you find that easy, you'll make a lot of money working as a GPU programmer.

That being said, there are many real world problems where there are so many data dependences that transforming the input into independent vectors is plain impossible. For those cases, letting the CPU optimize the datapath is better than programming a GPU version yourself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: