RaidForums archive!
by - Thursday, January 1, 1970 at 12:00 AM
Hey everyone,

Kinda sucks that RF is down and I am beginning to think it's not coming up again. So I was thinking to scrape the existing cache of RF
and combine this into a simple archive site. As you may understand not every thread is archived. and even if it's archived it only has 12
messages max. I made two python scripts. The first scrapes wayback machine to get all the thread urls (it actually saves a lot of search
pages). The second script check if there is a Google cache page available of a certain thread. If so then it's being saved to a db.

edit: site is already live on https://rf-archive.com/ if that wasn't clear ^^

Should I continue scraping this? Otherwise I will just do it locally. And perhaps people have more ideas on how to get cache of threads
and/or want certain pages checked? Also does pompom perhaps already have all the thread archives? Because then it's kinda pointless
doing this haha. Also I don't have every thread page / messages scraped yet, I think I did the first 50 or so. So on the site you can sort
on number of messages to see full threads. Also Google doesn't like it to scrape a lot so perhaps, any know if proxies work to bypass that?

Hope you guys like it ^^

Here are some screenshots:


Another image of a thread:


EDIT:
The wayback machine scraper works really well, but I am having some troubles with the Google scraper. It works, but it's very sensitive to IP blocking (captcha verification).
I thought that putting a minute or so between 5 requests is good enough, but it still seems to block my IP. Also tried with proxies etc, random header but it's google so they know it
all prob haha. But if anyone has a good idea how to scrape google cache then let me know ^^
Reply
this looks really cool, definitely keep it online!


eidolon



Reply
cool will try to fetch some more data ^^ and I will try to keep those filthy posts away from my db @Minori ^^
Reply
(March 20, 2022, 04:33 PM)SpotnikSignal Wrote: cool will try to fetch some more data ^^ and I will try to keep those filthy posts away from my db @Minori ^^


You should also blacklist anything with "kilob" in it, thx :P

#databreach
#RIU
Reply
(March 20, 2022, 04:34 PM)thekilob Wrote:
(March 20, 2022, 04:33 PM)SpotnikSignal Wrote: cool will try to fetch some more data ^^ and I will try to keep those filthy posts away from my db @Minori ^^


You should also blacklist anything with "kilob" in it, thx :P


Cool, will post em on the front page ^^
Reply
Nice work!
Reply
(March 20, 2022, 04:41 PM)SpotnikSignal Wrote:
(March 20, 2022, 04:34 PM)thekilob Wrote:
(March 20, 2022, 04:33 PM)SpotnikSignal Wrote: cool will try to fetch some more data ^^ and I will try to keep those filthy posts away from my db @Minori ^^


You should also blacklist anything with "kilob" in it, thx :P


Cool, will post em on the front page ^^


I will fucking murder you.

#databreach
#RIU
Reply
Thanks @jacka113 and yea cool will blacklist your posts/threads @thekilob ^^
Reply
This is cool and weird at the same time, well done
//~ Young, Wild & Free ~//

Reply



please delete this
BTC: bc1qkrmfskhwfkxc9rf009cn5z0uf2vsjtnme0qne9

Reply


 Users viewing this thread: RaidForums archive!: No users currently viewing.