How do I sort a very large database?
by - Thursday, January 1, 1970 at 12:00 AM
EMEditor, Python Dataframes
Reply
Go get the breachcompilation script template from github - you can study that - tried myself but havn't dived that much into it - besides i don't think its good for really huge amounts of data - got my own build up through time currently counting 3.5bn credentials - but it eats power and its expensive nowadays, so still missing a bunch of data to merge into it.

There is the original 43gb data out there - maybe still in the github - and then there is another on 100gb+ floating around in cyberspace.
Reply
(November 25, 2022, 10:49 AM)trollinator321 Wrote:
(October 22, 2022, 12:46 PM)geniusentity Wrote: I think if you need it sorted, better parse line by line and import content to a relational database where you can query and sort any way you like.



Hi all,

that would be the way I'd choose as well. While parsing you could also remove all the "irrelevant" data that makes sorting maybe even more complex.
Nice Index on the sort column should help as well.

Best


thanks for the advice
Reply
(November 24, 2022, 02:09 AM)Magroll Wrote:
(November 23, 2022, 11:34 PM)drsnape Wrote:
(October 20, 2022, 07:37 AM)Magroll Wrote: Given: Database as a text file, 100GB in size. It is necessary to sort alphabetically, ignoring the register (but not changing it!). How do I sort it? I've tried writing a Python program, but I'll get old before it gets through.


i have a splitter program i can sen you if you need. it can simply split txt or csv file into multiple number of files, all you have to do is to mention what do you want each file size to be. Easy.


How will this help me? Then they will need to be sorted separately and somehow glued together. It's not clear how.


https://askubuntu.com/questions/28847/text-editor-to-edit-large-4-3-gb-plain-text-file

Split and join are built into linux if that's what you're using. Don't use other programs when it's already a shell command! Probably something similar for windows already installed.

The link I gave lists a few text editors with higher memory availability to open larger files.

I was also going to say after getting it sorted, it'd probably be easiest to write a simple script to query it. I've made projects just using html and javascript (ok and a little php) to query text and sql files then print out on the screen. Makes it a lot easier writing it yourself, although you could also just use grep and write a command to display it all in terminal, even with the large file.
Reply
You need to create a good architectural method for this. Because best database must be with less data. You need the cut repeated data and use a good direction/rooting system for that. Maybe a good API can help you to manage this complexity.
Reply
(November 28, 2022, 01:03 AM)etkiliadam Wrote: You need to create a good architectural method for this. Because best database must be with less data. You need the cut repeated data and use a good direction/rooting system for that. Maybe a good API can help you to manage this complexity.


The fact of the matter is that there is no extra data, all duplicates have already been deleted
Reply


 Users viewing this thread: How do I sort a very large database?: No users currently viewing.