Posts: 27 Threads: 0 Joined: N/A April 17, 2022 at 4:28 PM What in the fuck is this? every file has 50 copies of the same data! This is the kind of shit in this db: This is the contents of file 000000589 in the compressed archive. https://ghostbin.com/rO7RB
There is 700k files in this archive. Not ALL of them are like this, but a fuck ton are. Whoever made this file is a fucking moron. 50x file size for nothing. Might be able to pull 200k data out of here? I would call this a pass if you dont have a lot of credits. Posts: 48 Threads: 0 Joined: N/A April 17, 2022 at 7:52 PM (April 17, 2022, 04:28 PM)oneeyedwillis Wrote: What in the fuck is this? every file has 50 copies of the same data! This is the kind of shit in this db: This is the contents of file 000000589 in the compressed archive. https://ghostbin.com/rO7RB
There is 700k files in this archive. Not ALL of them are like this, but a fuck ton are. Whoever made this file is a fucking moron. 50x file size for nothing. Might be able to pull 200k data out of here? I would call this a pass if you dont have a lot of credits. It's probably the way the data was scraped and mind you each file is unique. You know it's easy to admit you don't know how to access the data and ask for help in doing so. Posts: 27 Threads: 0 Joined: N/A April 18, 2022 at 1:38 AM (April 17, 2022, 07:52 PM)Martinabel007 Wrote: (April 17, 2022, 04:28 PM)oneeyedwillis Wrote: What in the fuck is this? every file has 50 copies of the same data! This is the kind of shit in this db: This is the contents of file 000000589 in the compressed archive. https://ghostbin.com/rO7RB
There is 700k files in this archive. Not ALL of them are like this, but a fuck ton are. Whoever made this file is a fucking moron. 50x file size for nothing. Might be able to pull 200k data out of here? I would call this a pass if you dont have a lot of credits.
It's probably the way the data was scraped and mind you each file is unique.
You know it's easy to admit you don't know how to access the data and ask for help in doing so. yA, I mUsT nOt KnOw WhAt Im DoInG bEcAuSe I cAnT cTrL-F oN a TeXt DoCuMeNt... https://imgur.com/a/Jwt9iTv Its ok to admit you didnt bother even looking into the file before posting it here. Seems to be just folders 001-004 Im not shitting on you @ Martinabel007. I am just pointing out about the file so others can make their own decisions. There is some data here. But it might not be worth it to some, and 30gb uncompressed is a lot for a small amount of lines. Some wankers in here with their Pentium4s cant even process data that big... Posts: 9 Threads: 0 Joined: N/A April 18, 2022 at 12:57 PM Thanks! Posts: 15 Threads: 0 Joined: N/A April 18, 2022 at 1:12 PM Thanks Posts: 21 Threads: 0 Joined: N/A April 18, 2022 at 1:52 PM good job,thanks Posts: 53 Threads: 0 Joined: N/A April 18, 2022 at 3:13 PM (April 15, 2022, 09:44 PM)Martinabel007 Wrote: File type : Multiple folders containing JSON files File size : 4gb compressed (~30gb uncompressed) Date : (fairly recently I guess..early 2020) Source : RF
My little project now is trying to work out a possible solution to making the file accessible through a database
Hope MySql is a good start.
Converting to csv with python would have been an option but...I'm going database :D
credits: RF user that posted (forget the user) Data from the first file [ { "address": { "city": "HICKORY", "state": "KY", "street": "3919 STATE ROUTE 301", "zip": "42051" }, "age": 22, "aka": [ { "firstname": "ABBY", "lastname": "PADGETT", "middlename": "A" }, { "firstname": "ABBY", "lastname": "SPENCER", "middlename": "A" } ], "autos": [ { "make": "TOYOTA", "model": "COROLLA", "year": 2011 } ], "court": { "judgements": true }, "dateOfBirth": "1996-04-08", "dateOfDeath": "1997-12-18", "education": { "educationLevel": "High School" }, "emails": [ "[email protected]" ], "firstname": "ABBY", "gender": "F", "id": "5c6e3ed2d20c6db42860a479", "jobs": [ { "title": "Student" } ], "lastname": "BRYANT", "middlename": "A", "pastAddresses": [ { "city": "Billings", "dateRange": { "endYear": 2013, "startYear": 2010 }, "loc": "45.7813,-108.5727", "state": "MT", "street": "3909 Swallow Ln", "zip": "59102" } ], "phone": "7326951761", "politicalParty": "Republican", "professionalLicense": "NURSING", "profilePics": [ { "url": "https://media.licdn.com/mpr/mprx/0_YSs4eLVFQfq_ou1wVDo9IXMFGfE_mmPwVJXvmkwWWwIhW73LZgRq2mTCzRj" }, { "url": "https://graph.facebook.com/1179099847/picture?type=large" }, { "url": "https://s-media-cache-ak0.pinimg.com/avatars/booth_abby-12_140.jpg" }, { "url": "https://lh4.googleusercontent.com/-EPRX48-Rp8A/AAAAAAAAAAI/AAAAAAAAAAA/Te7uAkwYeuE/photo.jpg" }, { "url": "https://media.licdn.com/mpr/mpr/shrinknp_400_400/p/8/000/272/397/1e30b4b.jpg" }, { "url": "http://media.licdn.com/mpr/mpr/p/8/000/272/397/1e30b4b.jpg" } ], "race": "Caucasian", "religion": "Christian", "social": [ { "domain": "facebook", "url": "https://www.facebook.com/abbywonwon" }, { "domain": "linkedin", "url": "https://www.linkedin.com/pub/abby-shroyer/53/a92/605" }, { "domain": "facebook", "url": "https://www.facebook.com/people/_/100000628123662" } ], "title": "MS" } ]
Will post more sample when I get to my pc
Can you you upload somewhere else google has limit this file Posts: 48 Threads: 0 Joined: N/A April 18, 2022 at 7:24 PM (April 18, 2022, 03:13 PM)Vanish82 Wrote: Can you you upload somewhere else google has limit this file I'll advise you try again. It's still accessible Posts: 22 Threads: 0 Joined: N/A April 20, 2022 at 7:41 PM does it work? Posts: 48 Threads: 0 Joined: N/A April 20, 2022 at 10:33 PM (April 20, 2022, 07:41 PM)johnfk Wrote: does it work? Link is up and running |