Python: Disable images in Selenium Google ChromeDriver

Here is another way to disable images: from selenium import webdriver chrome_options = webdriver.ChromeOptions() prefs = {“profile.managed_default_content_settings.images”: 2} chrome_options.add_experimental_option(“prefs”, prefs) driver = webdriver.Chrome(chrome_options=chrome_options) I found it below: http://nullege.com/codes/show/src@o@s@osintstalker-HEAD@fbstalker1.py/56/selenium.webdriver.ChromeOptions.add_experimental_option

Change IP address dynamically?

An approach using Scrapy will make use of two components, RandomProxy and RotateUserAgentMiddleware. Modify DOWNLOADER_MIDDLEWARES as follows. You will have to insert the new components in the settings.py: DOWNLOADER_MIDDLEWARES = { ‘scrapy.contrib.downloadermiddleware.retry.RetryMiddleware’: 90, ‘tutorial.randomproxy.RandomProxy’: 100, ‘scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware’: 110, ‘scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware’ : None, ‘tutorial.spiders.rotate_useragent.RotateUserAgentMiddleware’ :400, } Random Proxy You can use scrapy-proxies. This component will process Scrapy requests … Read more

How do I make a simple crawler in PHP? [closed]

Meh. Don’t parse HTML with regexes. Here’s a DOM version inspired by Tatu’s: <?php function crawl_page($url, $depth = 5) { static $seen = array(); if (isset($seen[$url]) || $depth === 0) { return; } $seen[$url] = true; $dom = new DOMDocument(‘1.0’); @$dom->loadHTMLFile($url); $anchors = $dom->getElementsByTagName(‘a’); foreach ($anchors as $element) { $href = $element->getAttribute(‘href’); if (0 !== … Read more

How to write a crawler?

You’ll be reinventing the wheel, to be sure. But here’s the basics: A list of unvisited URLs – seed this with one or more starting pages A list of visited URLs – so you don’t go around in circles A set of rules for URLs you’re not interested in – so you don’t index the … Read more

How to do HTTP-request/call with JSON payload from command-line?

You could use wget as well: wget -O- –post-data=”{“some data to post…”}” \ –header=”Content-Type:application/json” \ ‘http://www.example.com:9000/json’ Calling wget with the option -O providing the – (space in between will be ignored, so it could also be written as -O -) to it as its value will cause wget to output the HTTP response directly to … Read more

Python: maximum recursion depth exceeded while calling a Python object

Python don’t have a great support for recursion because of it’s lack of TRE (Tail Recursion Elimination). This means that each call to your recursive function will create a function call stack and because there is a limit of stack depth (by default is 1000) that you can check out by sys.getrecursionlimit (of course you … Read more

Click a Button in Scrapy

Scrapy cannot interpret javascript. If you absolutely must interact with the javascript on the page, you want to be using Selenium. If using Scrapy, the solution to the problem depends on what the button is doing. If it’s just showing content that was previously hidden, you can scrape the data without a problem, it doesn’t … Read more