TA: Who doesn't like proxies? Me!

Are you on a company network that is using a proxy to connect to the internet? And do you want (or are you planning) to connect your tests to an external selenium-grid. Than stay tuned, before you get lost in a bunch of answers on the internet that don't resolve the issue you're facing.


Source: https://nl.wikipedia.org/wiki/Proxyserver

I'm on a new project and looking into Robotframework. But then I bumped into an issue with the POC I had to make which might result in me convincing the company to switch to Ruby. Anyway long story short. They use a proxy and although the internet has all the answers, you do need to know what specific question to ask to get the correct result. And since proxies are not used that much anymore, it's hard to find the skilled people to help you out and push you in the right direction for a solution.

The setup

To access the internet, I need a configuration on my computer that will tell Windows to connect to a proxy server. I can access the internet, I can reach all the internal web applications as well as external websites. No additional proxy settings needed (no capabilities/profiles/chromeoptions/etc). It will just take the proxy from my computer. When I start up a local browser (IE/Edge/Chrome/FF) all works fine and I can run my automation scripts against the internal web applications. So far so good, everything works. 

The issue

So where is the fire? Well, I also wanted to use an external selenium-grid service and that is where the issue is in this setup. 

To access the internal applications from the external selenium-grid you do need to give the grid a way to connect back to the network again (i.e. Google for Browserstack local). For the sake of explaining the issue I faced, this is not needed so you can completely ignore that from now on, but keep this in mind because you most likely need it after your simple test. For now just imagine a simple test, where you use an external selenium-grid to launch a node and perform a simple Google search. 

Before I knew it, I spend about a week trying to figure out what the issue was and how to fix it with a simple solution.

Assumptions

Below a bunch of assumptions you might have, which are all not what was causing this issue in my case. But who knows, it can't hurt to give you all the details, maybe you find a solution for your situation by reading this.

It's a network issue

My lack of understanding the underlying cause made me think that something was blocked in the network traffic. It turned out that the service I needed to access had no issues connecting via curl (with adding some proxy settings) so why did it not work?

Robotframework keywords

I thought, maybe I made a mistake in the keywords because the documentation is there, but... well see for yourself. Which one do you need to use?
I used: "Open Browser"
This does mention a remote grid, but I don't see an example of a proxy
http://robotframework.org/SeleniumLibrary/SeleniumLibrary.html#Open%20Browser
Maybe I should have used: "Create Webdriver", but still I could not get this to work. I kept getting the same error message.
http://robotframework.org/SeleniumLibrary/SeleniumLibrary.html#Create%20Webdriver

Python + Selenium

At some point I was like... You know what, I don't know Robotframework well enough, let's give Python + Selenium and Python + Nerodia a try. But I got a similar issue like with Robotframework.
[ WARN ] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x05FE20B0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond')': /wd/hub/session

This confused me and I misunderstood the underlying issue. I thought that somehow I could not figure out how to give the browser some proxy settings. So I reverted back to what I knew best. Ruby + Watir.

Ruby and Watir + Proxies

So with this bias in mind, I looked at http://watir.com/guides/proxies/ and thought.... now it will be easy, I have used Ruby + Watir before, even i.c.w. browserstack, this will be a piece of cake. But this was not the real issue. Also the examples on this page show how to setup a browser connection and only then add the proxy to that browser. But I could not even connect to the remote url, so it must be something else. I just did not know what. I read more topics and some even made me think it could be the type of proxy. 

The real problem

After a bunch of assumptions, trials and error, knowing all related stack overflow issues and reading all documentations over and over, asking questions on Slack and Yammer in various channels, I kind of gave up. But then two programmers in my company that know Python and Java started to debug the issue and figured out.... "It looks like Python is not using the proxy that is set internally". So they tried it with Selenium + Java and figure out, the same happens there. However they did come up with a workaround to force Java to use the proxy. 

The solutions

Whitelist the external selenium-grid

The easiest solution would have been that that network administrators white-listed this external selenium-grid service. This would resolve all the issues plus I did not need to figure out what was going wrong. But that was a big NO-GO, so not an option for me.

Force the environment to use a proxy

The Solution is to tell your program language environment to use a proxy.

Java solution

What they showed me with Java (before starting the remote driver):
System.setProperty("https.proxyHost","192.168.1.10");
System.setProperty("https.proxyPort","3128");
Then I finally understood what to look for and found the following posts with a bunch more solutions that might help you if the above does not do the trick: https://memorynotfound.com/configure-http-proxy-settings-java/https://stackabuse.com/how-to-configure-network-settings-in-java/
I just didn't verify these yet, but feel free to do so and let me know.

Ruby solution

For Ruby you find a ton of answers on stackoverflow about setting a proxy for installing a gem (which I also needed by the way), but that was not what I was looking for. Like this one: https://stackoverflow.com/questions/4418/how-do-i-update-ruby-gems-from-behind-a-proxy-isa-ntlm
And even though it was not the answer I needed, one of the solutions triggered me to find the real answer. It has to be in the environment. So the solution for Ruby was (before starting the remote browser):
ENV['HTTP_PROXY'] = 'http://192.168.1.10:3128'

Python

Then I thought the solution in Python was easy:
https://stackoverflow.com/questions/31639742/how-to-pass-all-pythons-traffics-through-a-http-proxy
But no luck there. I tried it with Python + Selenium and Python + Nerodia, but all leads back to the same connection issue.
I even tried these keywords with Robotframework, but it also didn't work:
Set Environment Variable http_proxy ${HTTP_PROXY}
Set Environment Variable HTTP_PROXY ${HTTP_PROXY}
Set Environment Variable https_proxy ${HTTP_PROXY}
Set Environment Variable HTTPS_PROXY ${HTTP_PROXY}
It just looks like the real issue lies within the `urllib3` library and then it becomes a bit to much out of my comfort zone.
Below is part of a conversation I had on Yammer with a collegue
Looks indeed as if there isn't any more environment variable based injection of proxy settings possible for urllib3. You need to replace calls for
urllib3.PoolManager(...) with urllib3.ProxyManager('http://localhost:8080',...) or whatever your proxy_url is.
Since Selenium doesn't has implemented an own setting for this within their API, you need to do this change on your own within site-packages\selenium\webdriver\remote\remote_connection.py
Use of explicit proxies is meanwhile quite old skool - but of course that does not stop IT departments to use them. So a feature request to add some remote_connection_proxy parameter within Selenium webdriver remote might have a chance for success. See here: https://github.com/SeleniumHQ/selenium/issues/6264
I did not try the urllib3 PoolManager solution, because I did not get how to try this, so any suggestions with a detailed examples are very welcome. In the meantime I'm shifting back to Ruby + Watir since this works for me and has an easy solution.

The Alternatives

If you need to use Python and there is no way for you to figure out how to connect to the external selenium-grid, there are alternatives.

Think about an in-house selenium-grid with real machines, VM's, docker.
If you go for docker, check out the Zalanium project. This might maybe even give you the option to connect to the external selenium-grid, but I have not verifie
d this. https://opensource.zalando.com/zalenium/ (see how it works scenario 2). 

Another alternative might be to wait for a solution within Selenium itself (if that is possible): https://github.com/SeleniumHQ/selenium/issues/7247
Or wait for this issue to be fixed: https://github.com/SeleniumHQ/selenium/issues/6264 

Last but not least, I mentioned it before, if someone knows how to fix this in `urllib3`that can explain it to a scripter (I'm not a programmer, so I need almost step-by-step explanation), please reach out to me.

Comments

Popular posts from this blog

PowerShell - How to overcome Azure VM's fixed resolution limitation

TA Basics: Website Test Automation on mobile devices via Appium server