Python/Mechanize - Get source of iFrame without reloading?

chatmasta · Aug 17, 2011

A bit stuck on this problem right here...

I download pageA.html with mechanize. pageA.html includes an iFrame to pageB.html (which is on a different domain).

I want to get the contents of pageB.html without reloading it, because the contents will change if I do. Is this possible without something like selenium?

FYI, pageB.html is a recaptcha frame. Thanks

EDIT: Turns out this isn't necessary for recaptcha but I would still like to know if it's possible

Jake232 · Aug 17, 2011

Would you mind explaining how you're getting the recaptcha image then? Could do with this myself.

chatmasta · Aug 17, 2011

Jake232 said:
Would you mind explaining how you're getting the recaptcha image then? Could do with this myself.

I didn't realize how recaptcha was doing it. When you have javascript turned off, the iframe is simply a page on their site (pageB.html as I called it) that serves you a recaptcha image until you solve it correctly. Once you solve it, they give you a code to paste into a textarea on pageA.html (the site using recaptcha).

I didn't realize they were having you paste something into pageA.html. Knowing that, circumventing it was trivial.

lwbco said:
I'm not familiar with Mechanize -- when you load FrameA, does it also HTTP request the frame for FrameB? If so, you should be able to get at it through mechanize somehow.

That is my question. Is it loading FrameB? And if so, how do I access it?

j0hnsmith · Aug 17, 2011

A browser automatically loads all images, iframes etc on a webpage. Mechanize doesn't until you tell it to. So you get the iframe url and open it, solve the captcha or whatever, then continue with whatever you were doing in the first place. Mechanize has a method like open_novisit() which is good for this kind of stuff as the cookies are still sent.

PM me if you still have trouble.

stevehnsn · Aug 17, 2011

Wow. I long for the days when everyone used cURL.

mattseh · Aug 18, 2011

stevehnsn said:
Wow. I long for the days when everyone used cURL.

ewwwwwwww

Jake232 · Aug 18, 2011

I'm a newb, I still don't understand. Are you saying do it like the following?

(pseudo code)

Code:

b1 = new browser

b1.goto(http://www.url-with-captcha.com)

captcha_url = find(iframe.src)

b2 = new browser

b2.goto(captcha_url)

Grab captcha img and solve

b1.insert('recaptcha_challenge_field','CAPTCHA-VALUE')

Jake232 · Aug 18, 2011

Nevermind, I see what you mean now. Damn they make that easy for us.

dchuk · Aug 18, 2011

stevehnsn said:
Wow. I long for the days when everyone used cURL.

you can still use cURL in python or ruby or any other language. It's just ugly as sin compared to things like Mechanize (same name for both python and ruby, btw)

chatmasta · Aug 18, 2011

dchuk said:
you can still use cURL in python or ruby or any other language. It's just ugly as sin compared to things like Mechanize (same name for both python and ruby, btw)

The selling point for me is that this is my first mechanize project, and I have spent 100% of my debugging time debugging problems related to my actual project. With curl, I spent at least 40% of my debugging time hunting down strange oddities and curl gotchas, and the other 60% copy and pasting from Live HTTP Headers.

I've been about 100x more productive with python/mechanize so far than php/curl. Not to mention it just works better.

Jake232 · Aug 19, 2011

On another note, it's not used as widely as reCaptcha, but still pretty popular.

Heres a library to beat beat textCaptchas: https://github.com/kbhomes/TextCaptchaBreaker

j0hnsmith · Aug 20, 2011

Jake232 said:
On another note, it's not used as widely as reCaptcha, but still pretty popular.

Heres a library to beat beat textCaptchas: https://github.com/kbhomes/TextCaptchaBreaker

nice share, +rep

Search

Search

Python/Mechanize - Get source of iFrame without reloading?

chatmasta

Well-known member

Jake232

New member

chatmasta

Well-known member

j0hnsmith

Enlightened Member

stevehnsn

Slave to The Man

mattseh

import this

Jake232

New member

Jake232

New member

dchuk

Senior Botter

chatmasta

Well-known member

Jake232

New member

j0hnsmith

Enlightened Member