I saw that. From the context of the article, do you think he was really talking about a system using perfect hash tables? Be that as it may, "use a perfect hash table" is rarely a good answer to the job interview question, which is the only reason I commented on this.
Only if you neglect hashing time (significant with perfect hashes) and comparison-time. That makes them O(k), not O(1). It is helpful to me to think of hashing as expensive as copying.
Radix tries on the other hand don't hash. They're always at least O(k), and you can find the next and previous values (lexicographically) in O(k) as well. They're also more compact and you never have to resize your tables.
But isn't it really as much of an oversimplification as saying they're O(1)?
No, because big-oh notation implies worst case. It's an asymptotic upper-bound. If you want to talk about average case, then you need to qualify what your assumptions are. So insertion into a hash-table is O(n), but if we assume an even distribution of keys with a sufficiently large table, then insertion is O(1).
Big-O doesn't imply anything, it talks about functions, not algorithms. There are no laws of the English language that say a big-O bound implicitly refers to the function f(n) = max { T(input) : input is of size n }. Or with hash table insertion it's more like f(n) = max { T(h, input) : h is a hash table of size n, input is of size g(n) } where g is some Theta(log(n)) function. The proof that there's no guarantee that big-O notation refers to some worst-of-worst-case running time function is that people are using it in a way that makes you angry.
Yes, it talks about functions, and when used in the context of algorithms it describes the asymptotic growth of the function that describes the runtime behavior of an algorithm. There are no laws of English but there conventions of Computer Science that say when you say an algorithm is O(n), that implies that in the worst case, it will be linear.
For any given algorithm, the worst-case runtime behavior of an algorithm and the average-case runtime behavior are described by totally different functions, each of which has its own big-O.
There are no such conventions, you have just implanted in your mind this idea that there should be such conventions. People say that quicksort is O(n log n) all the time. They say that hash tables are O(1). It is normal for O notation to refer to average case or a set of cases with probability 1 of occurring.
Sure, but they should not say that hash tables are "O(1)", (perhaps "average case O(1)" is better). Unlike qsort's n^2 worst case, the conditions that degrade hash tables are very common: all you have to do is specify a crappy hash function for your load, or size the table poorly.
This is kind of a silly semantics argument... but, if you interview for a job and look like a deer in the headlights when I said hash tables aren't O(1), NO HIRE.
Unlike qsort's n^2 worst case, the conditions that degrade hash tables are very common: all you have to do is specify a crappy hash function for your load, or size the table poorly.
(I'll assume you're talking about the quicksort algorithm rather than the qsort() function because you're comparing it to hash table accesses in general rather than a specific implementation of them.)
But I'm not sure I agree. The worst case for quicksort is an already-sorted list. I run into this scenario all the time, though usually in a slightly different form. When I iterate through the elements of one of the tree-based containers std::set and std::map they emerge in sorted order. If, inside my loop, I then insert them into a similar container I end up with worst-case performance. The other day I replaced std::set with std::unordered_set and saw a dramatic increase in performance, although it may not have been entirely due to this effect.
On the other hand, a non-crappy hash function should be available to everyone at this point. Personally I like FNV-1a, but haven't found an issue with whatever my C++ standard library is supplying. I have spoken with other programmers though who didn't realize hash tables needed some empty space to perform well or had stumbled into a pathological case with their oversimplified hash function.
This is kind of a silly semantics argument... but, if you interview for a job and look like a deer in the headlights when I said hash tables aren't O(1), NO HIRE.
Gee Ptacek, what do you have against deer? I can just imagine some poor interviewee with a bright desk lamp shining into his eyes...
There are such conventions. Take a look at an algorithms textbook, or a theory paper. That some people are semantically lazy does not change these terms have established definitions within computer science.
Well, the idea of conventions is that people agree to use them. I see a lot of variation in the degree to which people know about or are attached to this one.
While there may be a set of textbooks and papers for which the editors would have flagged "worst cast O(..)" as redundant, I think it's relevant to point out that the use of big-oh notation in mathematics predates computer science. So to the extent algorithm analysis is a branch of mathematics, those who use big-oh in this more general way are in fact consistent with the larger body of work.
Those are the conventions in Computer Science, but the conventions in Software Engineering are different. Words having different meanings in the jargons of related fields is a really old problem, and one which has bitten me from time to time. At least we aren't arguing over whether a factor of 10 increase in something is a 10 dB or a 20 dB change.
Knuth TAOCP 1.2.11.2 People often abuse O-notation by assuming that it gives an exact order of growth; they use it as if it specifies a lower bound as well as an upper bound. [It looks to me like Knuth does this on page 389 2.3.4.4 right after equation 14.] For example, an algorithm to sort n numbers might be called inefficient "because its running time is O(n^2)." But a running time of O(n^2) dos not necessarily imply that the running time is not also O(n). There's another notation, Big Omega, for lower bounds
I find this a bit confusing because it's unclear to me what Knuth actually endorses for use in average case description.
Wikipedia: Although developed as a part of pure mathematics, this notation is now frequently also used in the analysis of algorithms to describe an algorithm's usage of computational resources: the worst case or average case running time or memory usage of an algorithm is often expressed as a function of the length of its input using big O notation. Wikipedia and the web in general have many examples of "worst case O(...)" which would be redundant under your convention.
So clearly one needs to be vigilant about assumptions when discussing average case, but common usage does not agree that big-oh notation implies worst case.
I'm only commenting because of the "job interview" comment, but when someone compares hash tables to B-trees, I usually assume (fairly or not) that they don't really know what a B-tree is.
Ah, thanks for clarifying. I had to implement a B+-tree using only a 2D byte array as a sophomore, so I don't think I'll ever be able to forget how they work.
It's possible. There are parallels between main memory and disks. A disk seek gets you hundreds or thousands of bytes and requires (blocking for) millions of cpu cycles; a cache line fill gets you 64 (sometimes) bytes and requires (spinning for) hundreds of cpu cycles. You want to minimize both of those, but with memory, the difference is much less dramatic.
Well, you can walk spatial partitions with trees in ways that become clunky with hashes (taking a branch means all children are 'near' each other). Also, B-trees might work better in cached architectures -- this is certainly true with respect to disk accesses but I'm not sure if that carries over to L3/L2/L1 cache.
B-Trees actually have superior cache performance compared to other types of trees (such as T-Trees or BST Trees). In part, this is because B-Trees have a higher density of actual data to pointers.
There is a tradeoff of computation (doing binary search inside the node) and amount of storage (keeping less pointers). In environments where memory access is much more expensive than a CPU instruction, it is preferable to perform these computations than to have to read all the extra pointer data to jump to the right places.
In fact, a breed of cache-friendly data structures are precisely based on the B+Trees but with even less pointers, having the algorithm compute these pointers instead (CSS-, CSB-Trees)
It does carry over to L3/L2/L1 cache, but only if the tree is designed to benefit from that. See http://blogs.msdn.com/b/devdev/archive/2007/06/12/cache-obli... for one strategy to do that. (With that structure lookups become faster, but walking the tree in order becomes difficult.)
This is a really great article and I wish I could give you some more points to make this appear higher. I once used a cache-oblivious matrix transpose algorithm that was startlingly simple and effective but I didn't know the approach had been so widely applied. Thanks!
For some reason, binary trees look real cool when you learn about them from a textbook that people sometimes forget about hash tables. But, that might just be my personal experience from candidate interviews.
B-trees are an external storage technique; he means balanced binary trees.
The answer to the interview question is another question: "what operations does the container need to support?". It's not "hash tables are O(1)".