September 23, 2022 at 7:41 PM
(September 23, 2022, 08:47 AM)M0nk Wrote: I have the latest version with hash 0B63399191A6CC2979141333B012EC53.
The following are corrupted, but they can extracted in part:
ahzybaby.com {184.411} [HASH] (NoCategory)
artvictoria.com {57.029} [HASH+NOHASH] (NoCategory)
rmaxinternational.com {2.698} [HASH+NOHASH] (Medicine)
Two or three other files in the 000-Corrupted folder can be extracted in full. Do not remember what they are.
Still in those folders, a few RAR files are ZIP in disguise.
Yes, that hash is for the BF version.
I was able to extract this one in full:
rmaxinternational.com {2.698} [HASH+NOHASH] (Medicine)
But not the other two you mention, at all.
What tool are you using for this? I know it's possible to extract, in full, some of the RAR and ZIP files that are reported as corrupted during initial extraction. This relates to the XSS version and the non-premium folder. I actually had them moved to a folder I named "_failed_to_extract" and within that folder I made the folder "_but_still_extracted_somehow" that I moved the fully extracted archives to.
I tried to keep a clean log of the failed files. I have not checked and compared my failed files from XSS against the other versions, but these are the files that commonly fail to extract:
Form the premium folder:
awardsatlanta.com {3.433} [NOHASH] (Shopping)_special_for_XSS.IS.rar
comune.torreannunziata.na.it {1.161} [NOHASH] (Government and Legal Organizations)_special_for_XSS.IS.rar
encens-naturel.com {2.842} [NOHASH] (Shopping)_special_for_XSS.IS.rar
giornalistinews.it {1.918} [NOHASH] (Business)_special_for_XSS.IS.rar
hathawaycreativecenter.com {3.201} [NOHASH] (Real Estate)_special_for_XSS.IS.rar
klaus.com {1.419} [NOHASH] (Restaurant)_special_for_XSS.IS.rar
mis.bcnsurin.ac.th {1.202} [HASH+NOHASH] (Web Analytics)_special_for_XSS.IS.rar
paths-123.com {5.205} [NOHASH] (Medicine)_special_for_XSS.IS.rar
pinypon.es {1.157} [HASH+NOHASH] (Shopping)_special_for_XSS.IS.rar
rentamile.com {1.510} [NOHASH] (Business)_special_for_XSS.IS.rar
rubicameran.com {2.596} [NOHASH] (Arts)_special_for_XSS.IS.rar
tir.mab.hu {3.646} [HASH] (Education)_special_for_XSS.IS.rar
travel-in-morocco.com {40.042} [HASH] (Travel)_special_for_XSS.IS.rar
tuljabhavanipujari.com {1.511} [HASH+NOHASH] (Religion)_special_for_XSS.IS.rarFrom the non-premium folder:
asbiswasco.com {45.661} [HASH+NOHASH] (Business)_special_for_XSS.IS.rar
careersatcore.com {1.961} [NOHASH] (Job Search)_special_for_XSS.IS.rar
citaprevia.autocom.com.co {1.266} [HASH+NOHASH] (File Sharing)_special_for_XSS.IS.rar
cmsa.in {1.086} [HASH+NOHASH] (Business)_special_for_XSS.IS.rar
convertmyads.com {3.503} [NOHASH] (Business)_special_for_XSS.IS.rar
coworkguide.com.br {1.942} [NOHASH] (Business)_special_for_XSS.IS.rar
gabelstaplertechnik.de {350} [HASH] (Auto)_special_for_XSS.IS.rar
galerie.transitmag.ch {1.057} [HASH+NOHASH] (Entertainment)_special_for_XSS.IS.rar
gowebrachnasagar.com {1.265} [NOHASH] (Business)_special_for_XSS.IS.rar
horonumber.com {786} [NOHASH] (Religion)_special_for_XSS.IS.rar
jalstore.com {1.059} [NOHASH] (Advertising)_special_for_XSS.IS.rar
loveorigami.info {19.173} [HASH+NOHASH] (Social)_special_for_XSS.IS.rar
mail.agency-i2b.ru {9.788} [NOHASH] (Business)_special_for_XSS.IS.rar
news.educarriere.ci {22.377} [NOHASH] (Job Search)_special_for_XSS.IS.rar
poleinfo3.dauphine.fr {937} [NOHASH] (Education)_special_for_XSS.IS.rar
seodynamics.net {56.140} [HASH+NOHASH] (IT)_special_for_XSS.IS.rar
street58.com {2.711} [NOHASH] (Social)_special_for_XSS.IS.rar
sun-calgary.attractive.web.id {13.383} [NOHASH] (Adult)_special_for_XSS.IS.rar
switzerland-association-des-bib-abbs.maret.us {8.488} [NOHASH] (Adult)_special_for_XSS.IS.rar
thaitechno.net {47.471} [NOHASH] (Business)_special_for_XSS.IS.rar
tratorrental.com.br {1.044} [HASH+NOHASH] (Business)_special_for_XSS.IS.rar
uk.immofute.com {2.028} [NOHASH] (Real Estate)_special_for_XSS.IS.rar
vbckantibaden.ch {676} [HASH+NOHASH] (Sports)_special_for_XSS.IS.rar
wbetfm.com {893} [NOHASH] (Media)_special_for_XSS.IS.rar
wonderclub.com {3.941} [HASH] (Reference)_special_for_XSS.IS.rar
ahzybaby.com {.184.411} [HASH] (NoCategory)_special_for_XSS.IS.rar
ahzybaby.com {.184.411} [HASH] (NoCategory)_special_for_XSS.IS.rar
altaionline.ru {1.173} [NOHASH] (Travel)_special_for_XSS.IS.rar
artvictoria.com {57.029} [HASH+NOHASH] (NoCategory)_special_for_XSS.IS.rar
artvictoria.com {57.029} [HASH+NOHASH] (NoCategory)_special_for_XSS.IS.rar
cpp66.com {3.460} [NOHASH] (Business)_special_for_XSS.IS.rar
crescitsoftware.com {5.547} [HASH+NOHASH] (Business)_special_for_XSS.IS.rar
forum.dex-rpg.com {66.939} [HASH] (Games)_special_for_XSS.IS.rar
globaldance.info {6.679} [NOHASH] (Entertainment)_special_for_XSS.IS.rar
heraldry.rus.net {4.240} [HASH+NOHASH] (Shopping)_special_for_XSS.IS.rar
itrax.com {18.599} [HASH+NOHASH] (Software)_special_for_XSS.IS.rar
jasuda.net {7.287} [NOHASH] (Media)_special_for_XSS.IS.rar
jasuda.net {7.287} [NOHASH] (Media)_special_for_XSS.IS.rar
kompromat.flb.ru {12.750} [HASH+NOHASH] (Social)_special_for_XSS.IS.rar
modbulvar.ru {4.921} [HASH+NOHASH] (Shopping)_special_for_XSS.IS.rar
ns-catalog.ru {148.208} [HASH] (Business)_special_for_XSS.IS.rar
ntc-school.com {16.537} [HASH+NOHASH] (Education)_special_for_XSS.IS.rar
pugpups.com {83.335} [HASH+NOHASH] (Social)_special_for_XSS.IS.rar
refips.org {269.982} [HASH+NOHASH] (Medicine)_special_for_XSS.IS.rar
ribolovenatlas.com {606} [HASH+NOHASH] (Business)_special_for_XSS.IS.rar
rmaxinternational.com {2.698} [HASH+NOHASH] (Medicine)_special_for_XSS.IS.rar
seurch.rabota.bg {267.827} [HASH+NOHASH] (Job Search)_special_for_XSS.IS.rar
travelfelix.com {245.753} [HASH+NOHASH] (Business)_special_for_XSS.IS.rar
wifi.opengenova.org {586} [HASH] (Business)_special_for_XSS.IS.rarThat's 14 files in non-premium and 49 files in premium folder. Of the 14 files in non-premium folder, 13 of them "still extracted somehow".
I have not compared the corrupted files inside this BF release to those of XSS yet, but I can tell that the counts are about the same. There are 47 corrupted files in non-premium folder (maybe OP missed 2 corrupted during re-organization?) and 14 files in the premium folder, which is the same number. But interestingly, since you mentioned artvictoria.com and I tried to extract it and failed using the file in the BF release, I can see that I have indeed succeeded at extracting it but using the version from the XSS release. They may not be the same file, or the extractions are failing for some other reason, other than corruption.
Thanks for the tip about RAR files being ZIP in disguise. I guess someone made a blunder during a batch file rename process or something of that nature. I will look into it and see what I can uncover.
Anyone here who has the original from MEGA that has no corrupted files inside? Anyone else who knows something more about this? According to HIBP, the data was provided to Troy by Dehashed. But the version he himself used in the blog post appears to be the XSS release: https://www.troyhunt.com/inside-the-cit0day-breach-collection/
Troy wrote:
I extracted all the files, ran my usual email address extraction tool over it (effectively just a regex that can quickly enumerate through a large number of files), and found a total of 226,883,414 unique addresses. A substantial number, although not even in the top 10 largest breaches already in HIBP.
He never mentions any issues with corrupted files inside. The set he used for the blog post was 13 GB. But I'm thinking it may not be the same version that was loaded into HIBP.


