On Binary Search

MauranKilom · on Oct 31, 2019

Is this machine-generated text? That or it's badly translated. Either way, it's strangely incoherent and subtly flawed.

Take just the first section. It talks about a bubble sort of ints. Then provides code for bubble sorting strings [1]. Then talks about "sign position in the ASCII table". "linking to the native sort function" is... not the right term (what happened to "call"?). And then we suddenly talk about "the element to be found" (weren't we just sorting?). The rest of the article is in the same vein.

I hope I'm not coming across as just picking on someone's English skills - even with perfect English I would find it hard to follow the train of thought.

[1] This is subtly different because (naive) string comparison is not constant-time, unlike comparing ints in most languages. I was half expecting the article to go there but it didn't.

PS: If not for the discussion here, I would've skipped the read because the site has no HTTPS. Please get a cert, it's free: https://letsencrypt.org/

_4bcr · on Oct 31, 2019

@Mauran Kilom. Thank you for your criticism, especially your mention of https://letsencrypt.org/. I will look into that! And I am also very grateful that your pointed out the obvious: the red thread was missing, and I have tried to fix this by erasing large parts of the text. My english is however as it is, much to improve! :) Thank your the time!

idbfs · on Oct 31, 2019

I was wondering the same thing. It's either a horrible human-written article, or an impressive machine-written one.

gigatexal · on Oct 31, 2019

It could be a non-native speaker of English trying his or her best.

adrianmonk · on Oct 31, 2019

> The problem with linearity is that the time to return the correct position grows exponentiality with the size of the dataset and of the position of the value to be found

No, the time required for linear search does not grow exponentially. It doesn't even grow quadratically. It grows linearly.

tempguy9999 · on Oct 31, 2019

Agreed. It's such a strange statement I suspect he meant something else but I can't see what.

akamoonknight · on Oct 31, 2019

Could they have meant exponentially with respect to using binary search? That is, using linear search is exponentially worse than using binary search on a sorted data set.

adrianmonk · on Oct 31, 2019

That's an interesting theory. It is indeed exponentially worse then. Binary search is O(log N), and linear search is O(N).

If you let M=log N, then binary search is O(M) and linear search is O(e^(log N)) = O(e^M). So viewed that way, it's O(M) vs. O(e^M), an exponential relationship.

But the blog would still not be 100% right because it adds a qualifier. It says "grows exponentiality with the size of the dataset".

_4bcr · on Oct 31, 2019

I want to try attempt explaining what I meant. @adriamonk: if you have the time, feel free to help me out here. :)

First, my point relates to JavaScript.

``` Take this code: const list = [2,1,2,"4", 5]; console.log( list.sort((a,b) => a-b) ); // [1, 2, 2, "4", 5] ``` This is kind of correct (or wrong), depending on how to view things. But I think we must acknowledge this as a part of the language design of JavaScript.

However, using strict eq... ``` console.log( list.findIndex((value) => value === 4) ); // -1 ``` Correct, sure.

On the other hand... ``` console.log( list.findIndex((value) => value == 4) ); // 3 ``` But now we're using a non strict form of equality.

In JavaScript, the aspects of the language described above can cause a problem concerning binary search. Because of JSs lack of strictness, we can't trust our data in the same way as in C or Rust or other languages (even though TypeScript could save us).

Sure if the preconditions aren't met, then we can't use BS. My point is that even though another language still would have to sort the elements in an array, we would know that all elements were of the same type in an array.

As programmers we want the behavior of our code to be predictable. And my point - I don't know if it is valid though - is that in JavaScript we can't. Therefore we would have to check every value in the array beforehand, and in JS we would have to do this linearly. I don't know if I misuse the terminology now, in that case, please tell me so (and I will be at least a bit wiser). My main point is that the big O of binary search is irrelevant in the context of JavaScript, if we're not 100% sure that our array is sorted and of the same type. Many other languages would have the first problem (we must trust the array to sorted), while JavaScript has to problems and this is more unpredictable. And I think we can only ensure this by the use of a linear method in JS.

Liquid_Fire · on Oct 31, 2019

Trivial nitpick: since this is binary search, the base for the logarithm is 2, not e. So it would be 2^M.

tempguy9999 · on Oct 31, 2019

Nitpick upon nitpick, the base may be 2 rather than e but IIRC the difference between lg(n) and log2(n) is a constant, and the log/lg term dominates, so it still is 0(log). I think.

Big fleas have little fleas upon their backs to bite 'em \ And little fleas have lesser fleas, and so, ad infinitum.

(https://en.wikipedia.org/wiki/The_Siphonaptera>

thethirdone · on Oct 31, 2019

Nitpick on Nitpick on Nitpick, it is indeed O(2^m) and not O(e^m). Those are not the same complexity.

Logs get to not care, but when we exponentiate it matters.

adrianmonk · on Oct 31, 2019

I guess, informally, people sometimes say exponential when they just mean "way too much" or "extreme".

Someone might say, for example, "The party was already getting out of hand, but then my roommate's drinking buddies showed up and that's when things got exponentially worse."

_4bcr · on Oct 31, 2019

Thanks for comments! I will attempt to fix the flaws. Fixing the english will be harder though, for the time being my english is flawed (period). :)

I should provide some background. I don’t have any education in CS. This is an antempt on my behalf to fill gaps of knowledge. Being a junior frontend dev they are plenty, and I am thankful for your criticism (thanks again! You’ll make me better).

Anyway, I have changed the blog post! Mostly by erasing text. But if there are no good, clear red thread this is the best way. :) Thanks again!

kqr · on Oct 31, 2019

Thank you for exploring algorithmic concepts as a front-end developer! Keep that up and you'll get very far in your career.

I say this because we have interviewed nearly 150 front-end candidates and had to reject all but one due to their lack of very basic algorithm/data structure skills. I encourage anything that makes that pool less depressing!

_4bcr · on Oct 31, 2019

I am not saying you are guilty of this, by I think this is sad in a way. My prejudice is that many frontend developers are interested in CS for the wrong reasons; I don't know if it's true though.

But CS as a career boost is very instrumental. I think it's better to focus on the good parts:

a) CS is fascinating, b) I also think there is an ethical side of CS. For each year, more 'logic' ends up in the frontend - I think. If this is true, there is an ethical dimension to it; not everyone have a new computer with the latest browser - performance has a ethical dimension.

Also, I for one hates when web sites are slow (so I guess there also is a business/economical side as well). But my main point is that - and no, I am not saying you're guilty of this, you clearly like CS (why comment otherwise), but the reasons you provide would gain from adding the perspective I am trying to provide, I think - is that CS should be studied because it is important, and fascinating. :)

I will never be good with CS. But if my own take on CS were only instrumental (career boost), I would most likely end up with less knowledge in comparison with reasons related to fascination (and ethics). I think this is an important point. Sure, it is perhaps made often. But in that case, I think it is important to repeat it. I see ads for many CS courses on i.e. Udemy with titles like CS for interviews and so on. I am not saying those course are bad (don't know); all I am saying is this is an instrumental view and that we (developers as a community) sometimes should be better at finding the right reasons.

Sorry, this was a long reply. :( :)

smellia · on Oct 31, 2019

Then you must have a look at the amazing elastic binary trees used in haproxy, the article describes their properties quite well in terms of binary search: https://wtarreau.blogspot.com/2011/12/elastic-binary-trees-e...

These were featured at QCon last June by one of their authors. Really great work.

_4bcr · on Oct 31, 2019

@smellia: thank you very much for this suggestion! This seems really interesting. :)

carapace · on Oct 31, 2019

Keep up the good work! :-)

j-pb · on Oct 31, 2019

Pretty blog, but though the title suggests some "insight" into binary search, there is not more than what would be taught in the first two weeks of an undergrad CS curriculum.

_4bcr · on Oct 31, 2019

Thank you. CS is what I am trying to learn. :) I have not studied CS, only some vocational education in programming, and I am into front-end. However, I think this is important for someone like me, and it is really fascinating. :)

person_of_color · on Oct 31, 2019

Good for interview prep, then!

devnonymous · on Oct 31, 2019

Each time I see an implementation of binary search like this I can't help but recall:

https://ai.googleblog.com/2006/06/extra-extra-read-all-about...

Which reminds me just how young the field of computer science is and how even the ^well known^, fundamental algorithms might have buggy implementations.

smitty1e · on Oct 31, 2019

Working in IT affords occasion to be humble at least daily.

agentultra · on Oct 31, 2019

A friend of mine did an experiment with his grad students in his Formal Methods class.

They all submitted their implementation of binary search and he modelled them each in TLA+. The majority of them all had errors.

person_of_color · on Oct 31, 2019

Why does this implementation used signed integers for indices at all?

MauranKilom · on Oct 31, 2019

Java. There are no unsigned integers in Java. Many would consider that a benefit (although opinions vary on signed vs unsigned bytes). FWIW, the creators of C++ have stated that they'd go with signed integers for e.g. indexes in the library interfaces nowadays (but it's way too late to change that).

Popular problem with using unsigned : Write a reverse for loop from n (exclusive) to 0 (inclusive). See if you can do it without discarding your two most intuitive (but wrong) approaches.

smellia · on Oct 31, 2019

Its the opposite. Its easy to write such a loop with unsigned ints: for(x=top;x<=top;x--). However you can't reliably write a loop spanning the whole space with signed ints without breaking the no-overflow rules. With unsigned you can do x=0;do{stuff(x);}while(++x); Good luck doing this with signed ints.

mypalmike · on Oct 31, 2019

This is why integer sign should be part of the operation rather than the type. Every language but assembly gets this wrong.

hansdieter1337 · on Oct 31, 2019

"The problem with linearity is that the time to return the correct position grows exponentiality with the size of the dataset and of the position of the value to be found"

Eh...what? Linear = Exponential growth?

bla3 · on Oct 31, 2019

Your loop body looks incorrect. You're both adding 1 when updating low and subtracting 1 when updating high -- but you let the high index start at array.length. It's possible that your additional || clause before the high update mitigates the effects of this, but it's at best inefficient. To implement binary search effectively, it's useful to think about invariants. https://zhu45.org/posts/2018/Jan/12/how-to-write-binary-sear... looks like a decent post on this topic. Searching for "binary search loop invariant" or similar will find many similar articles. I recommend giving them a look, it's a useful thought framework for this algorithm :)

bla3 · on Oct 31, 2019

The highest-rated comment on https://stackoverflow.com/questions/504335/what-are-the-pitf... is also good.

_4bcr · on Oct 31, 2019

Thank you!

RcouF1uZ4gsC · on Oct 31, 2019

> The problem with linearity is that the time to return the correct position grows exponentiality with the size of the dataset and of the position of the value to be found;

I don't this is quite true.

xrisk · on Oct 31, 2019

This TopCoder article [1] on binary search is the most comprehensive one I've read on the topic so far. It covers the discrete and continuous variants and mentions some common pitfalls too.

[1]: https://www.topcoder.com/community/competitive-programming/t...

OrWestSide · on Oct 31, 2019

Just read your article. Something about the way you write made me look at your archives. Now I am coding a maze generator using prim's algorithm. For fun.

Yes, you should get a certificate for your site, yes there is room for improvement for your english and yes you have a long long way to go to just get a grasp of CS fundamentals.

But don't stop trying man. You just inspired me :)

_4bcr · on Oct 31, 2019

Great! That’s why we have communities such as this one?! :) To inspire each other; even when we are wrong... Curiosity is contagious, I guess. :)

userbinator · on Oct 31, 2019

    let mid = Math.floor((low + high) / 2);

I don't know if JS has the same problem with overflow, but seeing that expression always reminds me of https://news.ycombinator.com/item?id=14906429

dsego · on Oct 31, 2019

JS only has 64-bit floats.

manu3000 · on Nov 1, 2019

a fun exercise is to write this in Dafny https://github.com/dafny-lang/dafny

It's simple enough to be attempted with a limited knowledge of Dafny, but subtle enough that you will need to think about what you are doing

https://www.youtube.com/watch?v=-_tx3lk7yn4