Commit Graph

5 Commits

Author SHA1 Message Date
Trey T 4ba88d96f4 forum scraper: dedup downloads by content hash, not just filename
Mirror of a627388 but for the forum image path. The same image is often
re-uploaded under different filenames across pages/posts, so existsSync
on the target name can't catch content-duplicates. After fetching the
buffer, hash the first 64KB and compare against existing same-size files
in the target folder (same md5+size signature as gallery's duplicate
scanner). Confirmed against a known dani-speegle-2 pair:

  skip IMG_79695f8914f20ce38b07.jpg — same content as
       72759c89-7e53-4976-839a-7d952c444579.jpg

buildSizeIndex is built once per job in runForumScrape and threaded
through scrapeForumPage → downloadImage; the hash cache amortizes across
all pages in the job.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 10:14:23 -05:00
Trey T b75b6542d9 Make detectMaxPage thread-scoped to avoid sidebar false positives
The previous fallback scanned for any anchor whose text was a number, which
matched widget elements (online count, trending threads, etc.) and inflated
the page count — a sidebar showing "26" caused detectMaxPage to report 26
pages on threads that were actually 12 and 8 pages long.

Now we derive the thread's URL prefix from the input baseUrl and only count
page-N references in hrefs that match that thread, ignoring sidebar
references to unrelated threads. The bare numeric-anchor scan is dropped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 19:57:26 -05:00
Trey T aa4f1157d1 Route SimpCity forum scraping through FlareSolverr + add turbo.cr resolver
DDoS-Guard now binds session cookies to the issuing browser's fingerprint, so
direct Node fetch returns 403 even with valid cookies. Page HTML for any
forum_site with stored cookies is now fetched via a FlareSolverr browser
session opened once per scrape job.

- Hybrid cookie refresh: FlareSolverr clears the DDoS-Guard captcha, those
  cookies seed undetected_chromedriver, Turnstile auto-solves in the real
  browser, login form submits, final cookies + browser UA persist to forum_sites
- Per-site user_agent column so subsequent scraper requests match the UA the
  cookies were issued for (DDoS-Guard rejects UA mismatches)
- XenForo search rewritten as proper CSRF POST /search/search → results page
  parse, replacing the broken ?q=... GET that only returned the search form
- Pagination regex fallback in detectMaxPage catches XenForo pages that
  cheerio's class-based selectors miss
- New scrapers/turbo.js handles turbo.cr /embed/ and /a/ URLs by rendering
  the page via FlareSolverr and grabbing the signed mp4 from the resolved
  <video src> attribute (gallery-dl can't extract these — obfuscated WASM)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 19:33:54 -05:00
Trey T 236f36aae6 Add app auth, dashboard, scheduler, video management, and new scrapers
- JWT-based app authentication with user roles, folder/route access control
- Dashboard with storage stats, health checks, and recent activity
- Auto-download/scrape scheduler (12h interval) with per-user and per-job configs
- Video upload, tagging, HLS transcoding, and detail pages
- New scrapers: LeakGallery, Mega (megajs), yt-dlp
- FlareSolverr integration for Cloudflare-protected sites
- Gallery: advanced filtering (date, size, search), sort modes, equal-mix shuffle
- Forum sites management with stored cookies/auth
- GridWall/GridCell components for responsive media grid
- Media API with folder-access permissions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:48:10 -05:00
Trey t 1e5f54f60b Add DRM downloads, scrapers, gallery index, and UI improvements
- DRM video download pipeline with pywidevine subprocess for Widevine key acquisition
- Scraper system: forum threads, Coomer/Kemono API, and MediaLink (Fapello) scrapers
- SQLite-backed media index for instant gallery loads with startup scan
- Duplicate detection and gallery filtering/sorting
- HLS video component, log viewer, and scrape management UI
- Dockerfile updated for Python/pywidevine, docker-compose volume for CDM

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 11:29:11 -06:00