Return to forum

d3adm4n · August 3, 2022 at 2:05 AM

I am very new to python and image processing, so i want to ask whether captcha like these below here are solvable by processing (e.g. with opencv) and then using google's tesseract (or another ocr/ml process) to read them or not? To be exact, i want to know how to make these images more readable by tesseract.

darkhero77 · August 11, 2022 at 5:21 PM

maybe you can first remove the long colored lines that is obscuring the characters and then turn the image to black/white high contrast image? separate each characters by using the large white space between them? that would surely make things easier for OCR to do its work.

dthang · August 11, 2022 at 10:43 PM

If you start by mapping all the different chars you can find and assigning labels to them, you can use a CNN to solve the problem with pytorch, smth like the models used for imagenet.

Mysterious · August 12, 2022 at 9:59 AM

Captcha generally poor implemented like it will be implemented like post request does not send the Captcha, or you can bypass it. Tried that first.

Next thing I recommend what captcha code (I mean the service like breached also used bb2 captcha) is used or get them, use OCR( you can use the existing code made by someone or make your own (I will prefer this If you can do it)) and train yourself a model.

plaga789 · August 16, 2022 at 4:24 AM

(August 11, 2022, 10:43 PM)dthang Wrote: If you start by mapping all the different chars you can find and assigning labels to them, you can use a CNN to solve the problem with pytorch, smth like the models used for imagenet.

Thamks bro, is a great idea

d3adm4n · August 17, 2022 at 7:36 AM

(August 12, 2022, 09:59 AM)Mysterious Wrote: Captcha generally poor implemented like it will be implemented like post request does not send the Captcha, or you can bypass it. Tried that first.

Next thing I recommend what captcha code (I mean the service like breached also used bb2 captcha) is used or get them, use OCR( you can use the existing code made by someone or make your own (I will prefer this If you can do it)) and train yourself a model.

right, the website actually seems not so properly written, gonna look into that method. Image processing also looks like another monster to learn lol. Thanks for the suggestion.

zimzubs111 · August 18, 2022 at 1:06 PM

If you use OpenCV you can add a thresholding filter to the image to make letters stand out and those thin lines probably can be ignored as they are just too small.

jdkfl93 · August 19, 2022 at 9:31 AM

surely they are

tf1234567 · August 19, 2022 at 2:44 PM

There are a large amount of websites which have a captcha but dont actually need it or store the captcha in a session so you can reuse it over and over again.

nomadkora · August 21, 2022 at 4:53 AM

(August 16, 2022, 04:24 AM)plaga789 Wrote:
(August 11, 2022, 10:43 PM)dthang Wrote: If you start by mapping all the different chars you can find and assigning labels to them, you can use a CNN to solve the problem with pytorch, smth like the models used for imagenet.

Thamks bro, is a great idea

Its a great idea thank you