Same video gets uploaded to turbo.cr under different IDs and resolves to
different filenames, so the existsSync(filename) check can't catch
content-duplicates. Switched to the same signature the gallery scanner
uses — md5 of the first 64KB plus exact byte-size match — and apply it
during the download stream so we abort once a same-content existing file
is detected. Avoids re-downloading content the user already has (or has
deliberately deleted via the duplicate scanner).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DDoS-Guard now binds session cookies to the issuing browser's fingerprint, so
direct Node fetch returns 403 even with valid cookies. Page HTML for any
forum_site with stored cookies is now fetched via a FlareSolverr browser
session opened once per scrape job.
- Hybrid cookie refresh: FlareSolverr clears the DDoS-Guard captcha, those
cookies seed undetected_chromedriver, Turnstile auto-solves in the real
browser, login form submits, final cookies + browser UA persist to forum_sites
- Per-site user_agent column so subsequent scraper requests match the UA the
cookies were issued for (DDoS-Guard rejects UA mismatches)
- XenForo search rewritten as proper CSRF POST /search/search → results page
parse, replacing the broken ?q=... GET that only returned the search form
- Pagination regex fallback in detectMaxPage catches XenForo pages that
cheerio's class-based selectors miss
- New scrapers/turbo.js handles turbo.cr /embed/ and /a/ URLs by rendering
the page via FlareSolverr and grabbing the signed mp4 from the resolved
<video src> attribute (gallery-dl can't extract these — obfuscated WASM)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>