The idea is to use a two stage process: 1. take a sha1 hash of the last 4KB of e... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		daurnimator on April 26, 2016 \| parent \| context \| favorite \| on: Why it's hard to write a dupefinder The idea is to use a two stage process: 1. take a sha1 hash of the last 4KB of each file. 2. for any equal hashes, compare the whole file. With this method you should be able to skip reading many large files in their entirety.

Someone on April 26, 2016 [–]

That is an idea, and it would, in typical cases, avoid reading most large files in their entirety, but it is not what I read in signa11's comment.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact