How do I sort a very large database?
by - Thursday, January 1, 1970 at 12:00 AM
Given: Database as a text file, 100GB in size. It is necessary to sort alphabetically, ignoring the register (but not changing it!). How do I sort it? I've tried writing a Python program, but I'll get old before it gets through.
Reply
https://linux.die.net/man/1/uniq
https://linux.die.net/man/1/sort

HINT: -u flag
Reply
sorting is my biggest issue if you could also pm me as well on how to sort better
Reply
Use EMEditor. It is meant for big files and has proper memory / thread management for it to be fast and safe. Worst case is you can use an AWS service for it to sort for you and pay a couple of cents.
Reply
I think if you need it sorted, better parse line by line and import content to a relational database where you can query and sort any way you like.
Reply
I agree that for a file that large, you need to get it into a DB - use a free one out there. Once there, sorting will be easier, but still take time.
Reply
try importing it into some like like MySQL and sorting it that way otherwise the sort command on Linux may help
Reply
Use radix sort algorithm (yes, you will need to write your own program). You will need as many passes as many symbols your strings contain (i decreases from the length of your strgins to 1), and at each pass, you will have as many files on your disk as many different symbols you have in the ith positions of all of your strings. Then merge your files.
Reply
(October 23, 2022, 03:39 AM)achaba742 Wrote: Use radix sort algorithm (yes, you will need to write your own program). You will need as many passes as many symbols your strings contain (i decreases from the length of your strgins to 1), and at each pass, you will have as many files on your disk as many different symbols you have in the ith positions of all of your strings. Then merge your files.


Thanks, I have used this method. Even before I read this post
Reply
(October 20, 2022, 07:37 AM)Magroll Wrote: Given: Database as a text file, 100GB in size. It is necessary to sort alphabetically, ignoring the register (but not changing it!). How do I sort it? I've tried writing a Python program, but I'll get old before it gets through.


i have a splitter program i can sen you if you need. it can simply split txt or csv file into multiple number of files, all you have to do is to mention what do you want each file size to be. Easy.
Reply


 Users viewing this thread: How do I sort a very large database?: No users currently viewing.