Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I notice that subsequent runs in the same (non changing) directory get different results. These runs are all within 20 seconds, what gives?

  $ rg each | md5sum
  670b544e15f9430d9934334a11a87b7e  -
  $ rg each | md5sum
  4d13be6b4531ad52b1b476314fe98fb7  -
  rg each | md5sum
  88e15dbb943665ea54482cb499741938  -
  rg each | md5sum
  eec6d6d5c9a592cec25aa8b0c19aae15  -
  rg each | md5sum
  ad74b78ef8f0d21450f8f87415555af0  -
And:

  $ date
  Sat Sep 24 01:42:27 EEST 2016
  $ rg each > foo1
  $ rg each > foo2
  $ rg each > foo3
  $ rg each > foo4
  $ ls -la foo*
  -rw-r--r--  1 coldtea  staff  1429646 Sep 24 01:42 foo1
  -rw-r--r--  1 coldtea  staff  2250868 Sep 24 01:42 foo2
  -rw-r--r--  1 coldtea  staff  4536031 Sep 24 01:42 foo3
  -rw-r--r--  1 coldtea  staff  9140652 Sep 24 01:42 foo4
  $ date
  Sat Sep 24 01:42:44 EEST 2016
OS X 10.12, installed with brew.


This can happen because rg searches files in parallel, so the order in which it finishes the files can be nondeterministic. If you run with -j1 (single-threaded) then it is deterministic.

To get deterministic output in multi-threaded mode, rg could wait and buffer the output until it can print it in sorted order. This might increase memory usage, and possibly time, though I think the increase would be minor.


In the first case, it's searching in parallel, so I bet the order of results is different each time.

In the second case, rg each > foo2 found results in foo1 and put them in foo2. Then rg each > foo3 found results in foo1 and foo2, and put them in foo3. Etc. That's why the file size increases so quickly.


>In the first case, it's searching in parallel, so I bet the order of results is different.

Aha. Thought that needed the -j flag (it says: default threads: 0 in the cli help).

Could it do anything to put them out in order of "depth" (and directory/file sorting order)?

>In the second case, rg each > foo2 found results in foo1 and put them in foo2. Then rg each > foo3 found results in foo1 and foo2, and put them in foo3. Etc. That's why the file size increases so quickly.

LOL, facepalm -- yes.


Forcing it to use one worker (-j1, I think) should give it deterministic output.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: