Check out SIKULI (Picture-driven computing).

RockDiesel

New member
Nov 29, 2007
1,419
16
0
DISCLAIMER: I haven't used SIKULI yet. I just watched the video, but it looks damn cool.

From the website - Project SIKULI
What's SIKULI?

Sikuli is a visual technology to search and automate graphical user interfaces (GUI) using images (screenshots). The first release of Sikuli contains Sikuli Script, a visual scripting API for Jython, and Sikuli IDE, an integrated development environment for writing visual scripts with screenshots easily. Sikuli Script automates anything you see on the screen without internal API's support. You can programmatically control a web page, a desktop application running on Windows/Linux/Mac OS X, or even an iphone application running in an emulator.
VIDEO:

[ame="http://www.youtube.com/watch?v=FxDOlhysFcM"]YouTube- Broadcast Yourself.[/ame]


My first thought after watching the video was combining SIKULI with UBot to automate even crazier web tasks with ease. My second thought was "Yo...Lord Brar should put some of this picture-driven computing stuff in UBot".
 


Yup, saw it a few days ago. Gotta say the screenshot thing is indeed very interesting and it is something I'd love to see if we can implement in UBot Studio.

That said, UBot Studio is very different from this one -- and our target audience is Marketers and that makes all the difference in context of how we design the software and what features we add. The next version of UBot Studio will have some amazing features that will raise the whole automation game to the next level. ;)
 
Last edited:
it looks slow as fuck and doesn't run in the background :( but a great/easy to use macro program idea nonetheless
 
Yeah, seems kind of slow. It's an interesting idea tho and might help with some test automation we are looking at.

Thanks!
 
it looks slow as fuck and doesn't run in the background :( but a great/easy to use macro program idea nonetheless
As a UBot Studio customer, it is my duty to let you know that multi-threading is in the pipeline as is the UI overhaul. ;)
 
The thing that popped in my head while watching the Sikuli stuff was "is it affected by screen size/resolution/other windows/etc?"

Maybe I didn't pay attention well enough, but is it for just your desktop, or can you package it up and give it to someone and have it work for them too? If the latter, that would be incredible, if not, it's just a novelty...made by MIT.
 
I'm using Sikuli for all my botting endeavors. No offense to Lord B, but it's way easier and faster for me than uBot. The fact that it can interact with anything means it's stupid easy for me to click the "New Tor Identity" button, and I don't have to fuck around with proxies. I have a simple subroutine that ensures Tor is always running, and I refresh for a new IP as often as I want.

The best part about Sikuli is that it's all Jython. I installed a few SQL drivers, and in under a week, I have bots to create accounts on a couple major sites [ace-Fa ook-Ba, !Y] and reading/storing login info in my remote database. Tomorrow, I'm gonna write a captcha cracker for it, and it'll be fully automated -- I don't have to wait for "the next version", because anything you can code, you can do in Sikuli [granted, using Jython is a real bitch, and I wish it were native Python]. If you're a programmer, and you want to actually write code to do your botting, Sikuli's the way to go.

If anyone else is seriously writing for Sikuli, hit me up on AIM, it'd be cool to trade tips, tricks and scripts.
 
I'm using Sikuli for all my botting endeavors. No offense to Lord B, but it's way easier and faster for me than uBot. The fact that it can interact with anything means it's stupid easy for me to click the "New Tor Identity" button, and I don't have to fuck around with proxies. I have a simple subroutine that ensures Tor is always running, and I refresh for a new IP as often as I want.

The best part about Sikuli is that it's all Jython. I installed a few SQL drivers, and in under a week, I have bots to create accounts on a couple major sites [ace-Fa ook-Ba, !Y] and reading/storing login info in my remote database. Tomorrow, I'm gonna write a captcha cracker for it, and it'll be fully automated -- I don't have to wait for "the next version", because anything you can code, you can do in Sikuli [granted, using Jython is a real bitch, and I wish it were native Python]. If you're a programmer, and you want to actually write code to do your botting, Sikuli's the way to go.

If anyone else is seriously writing for Sikuli, hit me up on AIM, it'd be cool to trade tips, tricks and scripts.

So your post has me thinking a lot now. In regards to what I asked above, can you distribute "bots" made with it? Can you compile them like uBot? Or do you script it to do something and then walk away as it takes over your machine to do its work?

It would be amazing if you could script it with screenshots and then when you run it, you can minimize it and get back to work, but it seems like it takes over the whole machine...or am I way the fuck off?
 
Alright, my 60 second test is complete. It does take over your machine. It literally moves your mouse around and just assumes control of the whole computer. So any practical use of this would need a dedicated machine.

Additionally, it doesn't seem too forgiving to being distributed to different machines. As it runs, it seems to take a screenshot, search for whatever it needs to find, then take the action, so I guess if you capture common screen elements it would work fine, but the potential for breakage is high.

You can export executables, but they're propietary sikuli formatted. It would be cool to package up a self contained exe, but it's not the end of the world.

For free, it's pretty sweet. I'm gonna try and play with it and make a bot for something along the lines of submitting a social bookmark or something and see what happens.

@uplinked - sending a pm to pick your brain about how you use the more advanced side of this thing...
 
Alright, my 60 second test is complete. It does take over your machine. It literally moves your mouse around and just assumes control of the whole computer. So any practical use of this would need a dedicated machine.

Which is why my copy of Windows Server 2008 will come in so handy with multiple profile logins via RDP.
 
Damn...still playing with this thing. You can include pretty much any python code you want as well. This could make interfacing with servers/databases/api's a breeze.

Gears are spinning
 
waiting for a sikuli script vault where we can trade

I'm still trying to decide whether this is possible or not. I guess if the script is for specific applications/for the browser, they should be able to run on any system. Actual system stuff wouldn't really translate well to different setups I don't think.

What's really interesting me right now though is that the scripts you write with it are saved as just standard python files, so that opens the door to chaining together different scripts dynamically. I'm going to keep playing with the system to see if it's possible to build libraries of sikuli functions that are system independent, because then we could create a wordpress.com account creator script for instance, then just host it somewhere for a community to keep up to date.

Then we can just pull down the specific scripts we need for a project and put them together however we want. Plus, all of this can be combined with standard python, for making server calls and such, that can then be passed into the sikuli calls.

Lot's of power in this system if it's approached the right way
 
VirtualBox (/VMWare) + Sikuli = win.

Basically, I've got a development VM, and four bot VMs. Each bot VM runs a copy of Tor, opens the Vidalia control panel (for managing Tor connections), clicks "Obtain New Identity", then pulls a DB row like ["fname", "lname", "male", "01-01-1900"] from my cloud server, and runs registerAccount(db_row) (which is a function I've defined). When I make updates to the development VM, I clone the harddrive and launch 4 new bots. Voila! Instant internet monies!

Sending Sikuli scripts around would be pretty useless, as you pointed out, because of screenshot limitations. Besides, I've no interest in sending any of you lazy asshats my scripts; the signup bots themselves are easy enough to write that anyone who can't figure it out shouldn't be running it. What would greatly interest me is an exchange of libraries like you said, for doing cool things in Sikuli. They can be packaged as JAR files, and would have to be written in Java or Jython. In particular, I'm working on a Decaptcher package ("import decaptcher; code = decaptcher.crack(<screenshot of captcha>, api_key); type('\t'+code"); " or something) and I'd like to eventually write a 'semi-multithreader', that will open up 10 firefox windows, rename them to '#1', '#2', '#3', etc, and switch between them to keep 10 concurrent signups loading at once.
 
VirtualBox (/VMWare) + Sikuli = win.

Basically, I've got a development VM, and four bot VMs. Each bot VM runs a copy of Tor, opens the Vidalia control panel (for managing Tor connections), clicks "Obtain New Identity", then pulls a DB row like ["fname", "lname", "male", "01-01-1900"] from my cloud server, and runs registerAccount(db_row) (which is a function I've defined). When I make updates to the development VM, I clone the harddrive and launch 4 new bots. Voila! Instant internet monies!

Sending Sikuli scripts around would be pretty useless, as you pointed out, because of screenshot limitations. Besides, I've no interest in sending any of you lazy asshats my scripts; the signup bots themselves are easy enough to write that anyone who can't figure it out shouldn't be running it. What would greatly interest me is an exchange of libraries like you said, for doing cool things in Sikuli. They can be packaged as JAR files, and would have to be written in Java or Jython. In particular, I'm working on a Decaptcher package ("import decaptcher; code = decaptcher.crack(<screenshot of captcha>, api_key); type('\t'+code"); " or something) and I'd like to eventually write a 'semi-multithreader', that will open up 10 firefox windows, rename them to '#1', '#2', '#3', etc, and switch between them to keep 10 concurrent signups loading at once.

Lovin it. I think this same idea can be extended into setting up micro self contained linux setups that can run as windows apps. If they interface with servers, they don't need disc access, and with something like Damn Small Linux and a stripped down Window Manager, you can probably have a very basic Linux install that can handle Firefox fine and can run in the background.

Does anyone know how to create a self contained exe of a linux distro? (that's a ridiculous question I know, but it's a shot)
 
self contained? not like you're talking about. IMO, XP is doing pretty well; it runs decent on 256mb RAM, so 4 instances only take ~1GB on my desktop. i don't think you can package a linux VM as an exe so-simple, you're probably still looking at distributing a virtual harddrive and running it on some sort of VM software (my vote is for VBox, it's free and crossplatform).

the above mention of Windows Server 2008 got me thinking- you can do multiple XSessions on a linux desktop by default, or you can use something like Xephyr to open up an XSession within an XSession. however, since they'd all be running on the same OS, it wouldn't work with my current Tor implementation (99% sure that Tor is system wide, so all the sessions would wind up using the same IP)

edit: i'm not nearly so interested in botting "hard" as i am in botting "smart"; that is, i'd rather spend my time working on better bot scripts and signing up for more types of accounts than cramming more instances into a machine. i can rent EC2 servers with 68gb RAM, and something tells me ~250 instances of XP will be enough to register anything I really want. shit, i might write a sikuli bot to boot XP vms and launch more Sikuli bots :-p