1 1
platypii

BASEjumper.net archive

Recommended Posts

People say "the internet never forgets", but that is bullshit. Archival takes a lot of hard work. Thank your local data hoarder.

https://basejumper.net/

This was the main documented history of base jumping from 2002-2021. Incident reports, ideas from jumpers who are gone. I felt it was important to preserve everything, including the attachments, etc. If you see bugs let me know, I might be able to fix them.

Hosting provided by BASEline. If you feel like donating, there is a paypal link on https://baseline.ws

You're welcome, assholes :-)

Kenny / BASEline

 

Edited by platypii
  • Like 5

Share this post


Link to post
Share on other sites
2 hours ago, Colm said:

Thanks, Kenny!

Out of curiosity, how much space (roughly) does the archive take up?

Approximately 1gb zipped, including all attachments. I would be happy to share copies with people so that there are backups. In case my basejumper.net mirror ever goes in.

  • Like 1

Share this post


Link to post
Share on other sites
On 12/6/2021 at 8:47 AM, platypii said:

You're welcome, assholes :-)

... I Knew It ...

Worked & conducted myself as if you weren't.

But so knew you would also be working on this.

Thank You, Kenny! Awesome setup for an archive.

Given the totality of Meso's statements within "personal messages",
I'm guessing Sangiro isn't going to have a problem with your archive.

However, if there are any copyright issues, may I "take you on as my first client"?

Ha ... JK.  As I'm going to be a party. -- Although, I guess I'd have to host something first...

In addition to gladly accepting a "direct copy" for backup (in the unlikely demise of your hosting), I would be very interested to hear how you went about preservation(1) Direct Technique? -- I knew of several ways but was only directly familiar with one. Because of the CGI Script issues, I stuck with that; Went Well. (2) Did you preserve the entire website, & then simply extract the forum aspects, or only kept the forums? Or everything, but it's all cut up into pieces? (3) Post Numbers? (4) Post Views?

Issues / Bugs: 

(1) Appears to have the exact same problem Meso had within new DZ Format:

Sporadic Multiple Character Error -- My favorite BASE picture

General: Make Scene ... MS 2 -- General: NPS -- Technical: CORVID

Hangout: Not in (2) surprisingly, but massively in -- JRE BASE Number

(2) Some Inline Attachments appear to be a problem. -- Taylor Swift

(3) Missing Posts 1:

I'm confused ... favorite -- Your archive stops after the split: 

Oct 12, 2007, 3:32 PM
Post #518 of 762

Tons of activity in 2011 ... & with the last post by OP: 

The Pick -- "still mt fav."

Registered: Mar 24, 2004
Posts: 170

Jan 28, 2015, 10:19 PM
Post #762 of 762 (3391 views)

(4) Missing Posts 2:

Did you preserve posting after your "full download" at the start of November? (Meso did not. lol.)

:bang::fire:

Share this post


Link to post
Share on other sites


By The Way... 

..."if it's any consolation":

The ah, "Magnificent Migration" ... killed that thread, completely.  

Just like from me stalking Sangiro's profile, within Off the Mountain: 

Probably wouldn't have realized the stopping point within your archive,
if I hadn't been stalking the old BASE forum, prior to the site migration. 

So, basically Kenny ... "it's the simulation" ... or not; shrug.



P.S. The "Post Preview" on this new Dorkzone Setup... Absolute Guano.
:bang:Boom:fire:

Share this post


Link to post
Share on other sites

I fixed some of the character encoding bugs people were seeing. That should take care of the "â€" bugs. I also fixed the inline image width with a css tweak. @dmcoco84 thanks for the heads up! Also thanks to those who have donated.

For those curious, it was not as simple as using a web scraper. Because of the way that basejumper.com was structured, there were "duplicate" pages based on sorting, pagination, etc. Also I wanted to make sure I got the incidents section which required being logged in.

So I wrote a structured scraper that parsed each page of the forums, threads, and individual posts. Raw data was written to json files. Attachments were also downloaded. The classic bj emojis obviously needed to be preserved too spacer.png

Then I made a quick webapp to render the json data and as html. I exported those rendered html files to amazon S3, fronted by cloudfront. There was some hacking to get the content-type to be correct on S3.

Here are downloadable archives of the rendered html, and the structured json data. @Colm @sfzombie13

https://basejumper.net/bj-html-v1.zip (820mb)
https://basejumper.net/bj-json-v1.zip (50mb)

  • Like 1

Share this post


Link to post
Share on other sites

when i was working on it, i logged in and used httrack.  it got the forums that needed to be logged in for, but i stopped so it didn't take away bandwidth from another user also grabbing it.  would that not have done the same thing without all the work?  i am glad you did the work, but just wondering.  i haven't used httrack much before, and it was a while ago.  thanx for sharing the information.  information should be free.

Share this post


Link to post
Share on other sites

I started down the path of using a scraper like httrack, but I don't think it's possible to do a good job with it in this case.

The problem is that the site had url parameters like "page", "max hits" and "sort". And the button for "add friend" was an http GET endpoint. So crawling links on the page will result in weird behavior, a huge number of downloads, and still probably won't get everything. The resulting archive would be massive. I think it was actually easier to write a structured parser specifically for bj.com.

Share this post


Link to post
Share on other sites

that's what i thought initially, but if set up right it gets everything perfectly, an exact mirror, if you turn off the links.  i thought there was a setting to ignore external or internal links, but it gets complicated if you don't work with it every day.  i have a friend who swears by wget, but it wasn't working right for me at first so i went with httrack unitl i stopped to save bandwidth for the other attempt.  i just hate information going away.  i haven't had a chance to look at this yet, thanx for putting it out there. 

Share this post


Link to post
Share on other sites

:halo:

May I Make a Joke, First...? 

23 hours ago, platypii said:

...I don't think it's possible to do a good job with it in this case.

Wrong

Quote
23 hours ago, platypii said:

I started down the path of using a scraper like httrack, but I don't think it's possible to do a good job with it in this case.

I think it was actually easier to write a structured parser specifically for bj.com.

What exactly did you try using? -- ("Like HTTrack")

.

So ... You Used Wget?

Quote

 

Share this post


Link to post
Share on other sites

I did actually start with httrack, but I'm not an expert with it, and switched to wget. So you are right in that sense.

But I stand by my statement -- I don't think it's possible to do a good job here by taking snapshots.

Basejumper.com was a dynamic site, and it would have been very difficult to get a static snapshot of it. Would you want httrack to have a copy of every permutation of "sort by date (ascending|descending), on page number (X), with (20|50|100) hits per page? It also had a "threaded" view that basically no one used, but makes for a whole new set of problems. All those links would be available to users in the mirror, so you either capture all of them, or there will be broken links. Also the site does some weird stuff: if you're logged in, it tries to take you to the last page of a thread, but once you've viewed it will take you to the first page. If you don't want "NEW" to show up everywhere you need to handle the site's logic on viewed threads carefully. The ads would have been snapshotted too, but likely would break after some time. Oh and don't forget to to block the "add friend" links or else you'll spam everyone on the site. Sorry anyone who got a friend request from throwaway6667 :halo:

Those are just some of the issues I ran into before I changed my strategy.

Share this post


Link to post
Share on other sites

i looked at the data you zipped, can't make a bit of sense out of it.  i looked at the httrack copy that was almost done, looks like the website looked before it went away.  you can even browse it like you were there.  you didn't spend enough time tweaking the preferences.  it doesn't take any snapshots of anything, it preserves what was there as it was.  you may lose something that was posted after you started it, but it would have minuscule compared to having a zip file like the one you posted.  thanx for saving all the info, but it would be nice to make it usable.  now i have another project, unless you wrote something to search it with...i wish i would have kept what i had so you could see just how good the copy would have been.  i didn't get it all and deleted what i had.

Share this post


Link to post
Share on other sites

:seenoevil::hearnoevil::speaknoevil:

May I Make a Second Joke, First... Again

...This however is directed not at you, but at xmesox. 

But, in a nice way; in a comical way. -- Just Breakin' Ur Ballz Bro.

15 hours ago, platypii said:

But I stand by my statement -- I don't think it's possible to do a good job...

You're Wrong

Scrubs: "The Wrong Song."

(I'm a Paramedic ... used to work in an ED.)

15 hours ago, platypii said:

Sorry anyone who got a friend request from throwaway6667 :halo:

See Attached: 

:halo: :bang: :fire:

BJ - TA - P.jpg

Share this post


Link to post
Share on other sites
On 12/13/2021 at 9:22 PM, dmcoco84 said:

 -- My favorite BASE picture --

(3) Missing Posts 1:

I'm confused ... favorite -- Your archive stops after the split: 

Oct 12, 2007, 3:32 PM
Post #518 of 762

Tons of activity in 2011 ... & with the last post by OP: 

The Pick -- "still mt fav."

Registered: Mar 24, 2004
Posts: 170

Jan 28, 2015, 10:19 PM
Post #762 of 762 (3391 views)


So ... Given that I now know exactly how you went about preservation, I'd say I'm even more confused about how this has occurred. -- I've checked a few of the other biggest threads & everything appears intact. So Far. 

Any thoughts on what has occurred? 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

1 1