master
Mirror of a627388 but for the forum image path. The same image is often
re-uploaded under different filenames across pages/posts, so existsSync
on the target name can't catch content-duplicates. After fetching the
buffer, hash the first 64KB and compare against existing same-size files
in the target folder (same md5+size signature as gallery's duplicate
scanner). Confirmed against a known dani-speegle-2 pair:
skip IMG_79695f8914f20ce38b07.jpg — same content as
72759c89-7e53-4976-839a-7d952c444579.jpg
buildSizeIndex is built once per job in runForumScrape and threaded
through scrapeForumPage → downloadImage; the hash cache amortizes across
all pages in the job.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
No description provided
Languages
JavaScript
96.9%
Python
2.1%
Shell
0.6%
CSS
0.2%
Dockerfile
0.1%