Python/Mechanize - Get source of iFrame without reloading?

chatmasta

Well-known member
Jan 7, 2007
2,611
68
48
NYC
A bit stuck on this problem right here...

I download pageA.html with mechanize. pageA.html includes an iFrame to pageB.html (which is on a different domain).

I want to get the contents of pageB.html without reloading it, because the contents will change if I do. Is this possible without something like selenium?

FYI, pageB.html is a recaptcha frame. Thanks


EDIT: Turns out this isn't necessary for recaptcha but I would still like to know if it's possible
 


Would you mind explaining how you're getting the recaptcha image then? Could do with this myself.
 
Would you mind explaining how you're getting the recaptcha image then? Could do with this myself.

I didn't realize how recaptcha was doing it. When you have javascript turned off, the iframe is simply a page on their site (pageB.html as I called it) that serves you a recaptcha image until you solve it correctly. Once you solve it, they give you a code to paste into a textarea on pageA.html (the site using recaptcha).

I didn't realize they were having you paste something into pageA.html. Knowing that, circumventing it was trivial.

lwbco said:
I'm not familiar with Mechanize -- when you load FrameA, does it also HTTP request the frame for FrameB? If so, you should be able to get at it through mechanize somehow.

That is my question. Is it loading FrameB? And if so, how do I access it?
 
A browser automatically loads all images, iframes etc on a webpage. Mechanize doesn't until you tell it to. So you get the iframe url and open it, solve the captcha or whatever, then continue with whatever you were doing in the first place. Mechanize has a method like open_novisit() which is good for this kind of stuff as the cookies are still sent.

PM me if you still have trouble.
 
I'm a newb, I still don't understand. Are you saying do it like the following?

(pseudo code)

Code:
b1 = new browser

b1.goto(http://www.url-with-captcha.com)

captcha_url = find(iframe.src)

b2 = new browser

b2.goto(captcha_url)

Grab captcha img and solve

b1.insert('recaptcha_challenge_field','CAPTCHA-VALUE')
 
you can still use cURL in python or ruby or any other language. It's just ugly as sin compared to things like Mechanize (same name for both python and ruby, btw)

The selling point for me is that this is my first mechanize project, and I have spent 100% of my debugging time debugging problems related to my actual project. With curl, I spent at least 40% of my debugging time hunting down strange oddities and curl gotchas, and the other 60% copy and pasting from Live HTTP Headers.

I've been about 100x more productive with python/mechanize so far than php/curl. Not to mention it just works better.