Been verified scraped data
by - Thursday, January 1, 1970 at 12:00 AM
What in the fuck is this?
every file has 50 copies of the same data!
This is the kind of shit in this db:
This is the contents of file 000000589 in the compressed archive.
https://ghostbin.com/rO7RB

There is 700k files in this archive. Not ALL of them are like this, but a fuck ton are. Whoever made this file is a fucking moron. 50x file size for nothing.
Might be able to pull 200k data out of here?
I would call this a pass if you dont have a lot of credits.
Reply
(April 17, 2022, 04:28 PM)oneeyedwillis Wrote: What in the fuck is this?
every file has 50 copies of the same data!
This is the kind of shit in this db:
This is the contents of file 000000589 in the compressed archive.
https://ghostbin.com/rO7RB

There is 700k files in this archive. Not ALL of them are like this, but a fuck ton are. Whoever made this file is a fucking moron. 50x file size for nothing.
Might be able to pull 200k data out of here?
I would call this a pass if you dont have a lot of credits.


It's probably the way the data was scraped and mind you each file is unique.

You know it's easy to admit you don't know how to access the data and ask for help in doing so.
Reply
(April 17, 2022, 07:52 PM)Martinabel007 Wrote:
(April 17, 2022, 04:28 PM)oneeyedwillis Wrote: What in the fuck is this?
every file has 50 copies of the same data!
This is the kind of shit in this db:
This is the contents of file 000000589 in the compressed archive.
https://ghostbin.com/rO7RB

There is 700k files in this archive. Not ALL of them are like this, but a fuck ton are. Whoever made this file is a fucking moron. 50x file size for nothing.
Might be able to pull 200k data out of here?
I would call this a pass if you dont have a lot of credits.


It's probably the way the data was scraped and mind you each file is unique.

You know it's easy to admit you don't know how to access the data and ask for help in doing so.


yA, I mUsT nOt KnOw WhAt Im DoInG bEcAuSe I cAnT cTrL-F oN a TeXt DoCuMeNt...
https://imgur.com/a/Jwt9iTv

Its ok to admit you didnt bother even looking into the file before posting it here.

Seems to be just folders 001-004
Im not shitting on you @Martinabel007. I am just pointing out about the file so others can make their own decisions.
There is some data here. But it might not be worth it to some, and 30gb uncompressed is a lot for a small amount of lines. 
Some wankers in here with their Pentium4s cant even process data that big...
Reply
Thanks!
Reply
Thanks
Reply
good job,thanks
Reply
(April 15, 2022, 09:44 PM)Martinabel007 Wrote: File type : Multiple folders containing JSON files
File size : 4gb compressed (~30gb uncompressed)
Date : (fairly recently I guess..early 2020)
Source : RF

My little project now is trying to work out a possible solution to making the file accessible through a database

Hope MySql is a good start.

Converting to csv with python would have been an option but...I'm going database :D


credits: RF user that posted (forget the user)
Data from the first file
[
  {
    "address": {
      "city": "HICKORY",
      "state": "KY",
      "street": "3919 STATE ROUTE 301",
      "zip": "42051"
    },
    "age": 22,
    "aka": [
      {
        "firstname": "ABBY",
        "lastname": "PADGETT",
        "middlename": "A"
      },
      {
        "firstname": "ABBY",
        "lastname": "SPENCER",
        "middlename": "A"
      }
    ],
    "autos": [
      {
        "make": "TOYOTA",
        "model": "COROLLA",
        "year": 2011
      }
    ],
    "court": {
      "judgements": true
    },
    "dateOfBirth": "1996-04-08",
    "dateOfDeath": "1997-12-18",
    "education": {
      "educationLevel": "High School"
    },
    "emails": [
      "[email protected]"
    ],
    "firstname": "ABBY",
    "gender": "F",
    "id": "5c6e3ed2d20c6db42860a479",
    "jobs": [
      {
        "title": "Student"
      }
    ],
    "lastname": "BRYANT",
    "middlename": "A",
    "pastAddresses": [
      {
        "city": "Billings",
        "dateRange": {
          "endYear": 2013,
          "startYear": 2010
        },
        "loc": "45.7813,-108.5727",
        "state": "MT",
        "street": "3909 Swallow Ln",
        "zip": "59102"
      }
    ],
    "phone": "7326951761",
    "politicalParty": "Republican",
    "professionalLicense": "NURSING",
    "profilePics": [
      {
        "url": "https://media.licdn.com/mpr/mprx/0_YSs4eLVFQfq_ou1wVDo9IXMFGfE_mmPwVJXvmkwWWwIhW73LZgRq2mTCzRj"
      },
      {
        "url": "https://graph.facebook.com/1179099847/picture?type=large"
      },
      {
        "url": "https://s-media-cache-ak0.pinimg.com/avatars/booth_abby-12_140.jpg"
      },
      {
        "url": "https://lh4.googleusercontent.com/-EPRX48-Rp8A/AAAAAAAAAAI/AAAAAAAAAAA/Te7uAkwYeuE/photo.jpg"
      },
      {
        "url": "https://media.licdn.com/mpr/mpr/shrinknp_400_400/p/8/000/272/397/1e30b4b.jpg"
      },
      {
        "url": "http://media.licdn.com/mpr/mpr/p/8/000/272/397/1e30b4b.jpg"
      }
    ],
    "race": "Caucasian",
    "religion": "Christian",
    "social": [
      {
        "domain": "facebook",
        "url": "https://www.facebook.com/abbywonwon"
      },
      {
        "domain": "linkedin",
        "url": "https://www.linkedin.com/pub/abby-shroyer/53/a92/605"
      },
      {
        "domain": "facebook",
        "url": "https://www.facebook.com/people/_/100000628123662"
      }
    ],
    "title": "MS"
  }
]



Will post more sample when I get to my pc



Can you you upload somewhere else google has limit this file
Reply
(April 18, 2022, 03:13 PM)Vanish82 Wrote: Can you you upload somewhere else google has limit this file


I'll advise you try again. It's still  accessible
Reply
does it work?
Reply
(April 20, 2022, 07:41 PM)johnfk Wrote: does it work?


Link is up and running
Reply


 Users viewing this thread: Been verified scraped data: No users currently viewing.