Up until now I've been using the Market Samurai Rank Tracker and it's been good enough for what I've needed, but with all the bullshit that's happened recently, I think I'd prefer to look after rank tracking myself.
I'm normally a PHP developer (using Wordpress for simple stuff and CodeIgniter for more complicated things) but I've been having a play with Python recently and I like what I see.
I know this won't be the quickest way to write this but it's a learning project too.
I've been playing around with the Scrapy framework and also written a few little scripts and scrapers but I think that a rank tracker could be a decent project to get my teeth stuck into Python with.
For those of you with any relevant experience, I'd appreciate any pointers that might be useful.
I'm only going to bother with checking the top 100 results in Google for now as this should limit the need for proxies to start with at least. I'm more worried about just getting the data into a database to start with and I can run manual queries on that, but I'll probably build out some sort of front-end at some point.
I'm interested in running the following kinds of reports/queries for each domain:
- 24 hour rank movement
- 7 day rank movement
- 30 day rank movement
- New domains entering top 100 in last 7/30 days for competing keywords
I've developed plenty of databases in the past but nothing more than a few thousand records so I'm wondering if this is the right way to go about setting up the database:
Domains table: Stores minimal info about the each domain being tracked and mainly present for an eventual front-end to the database so that I can return all rankings for every keyword associated with a domain
Following fields:
- domain_id
- domain_name
Keywords table: Stores all keywords being tracked and the relevant country for local tracking. Used to populate the list of keywords that needs to be scraped.
Following fields:
- keyword_id
- keyword_name
- keyword_locale
DomainKeywords table: Many-to-many link table between the above two for running reports/queries
- domainkeyword_id
- domain_id
- keyword_id
Serps table: This is where all the data will go with possibly the following fields:
- serp_id: Primary key
- keyword_id: Foreign key to the keywords table
- serp_rank: Position in the SERPs
- serp_datetime: Date/time that the result was pulled
I was thinking that I'd like to track data hourly, but I'm not sure whether I'll end up with too much data if I'm tracking a lot of keywords.
Do you think this will be an issue?
I was thinking that if I only check the rankings every 12 hours, I can store more data for competitor sites which would be useful for running automated reports to identify new competitors and track the movement of existing ones.
I know this part of the forum is pretty quiet so hopefully this thread might develop into something useful for others as well.
Any other tips you might have?
- Problems I might run into scraping Google compared to sites that aren't prepared for scraping
- Recommended libraries
Thanks for reading and for any advice
TL;DR: Looking for advice on coding a personal rank tracker using Python to replace Market Samurai.
I'm normally a PHP developer (using Wordpress for simple stuff and CodeIgniter for more complicated things) but I've been having a play with Python recently and I like what I see.
I know this won't be the quickest way to write this but it's a learning project too.
I've been playing around with the Scrapy framework and also written a few little scripts and scrapers but I think that a rank tracker could be a decent project to get my teeth stuck into Python with.
For those of you with any relevant experience, I'd appreciate any pointers that might be useful.
I'm only going to bother with checking the top 100 results in Google for now as this should limit the need for proxies to start with at least. I'm more worried about just getting the data into a database to start with and I can run manual queries on that, but I'll probably build out some sort of front-end at some point.
I'm interested in running the following kinds of reports/queries for each domain:
- 24 hour rank movement
- 7 day rank movement
- 30 day rank movement
- New domains entering top 100 in last 7/30 days for competing keywords
I've developed plenty of databases in the past but nothing more than a few thousand records so I'm wondering if this is the right way to go about setting up the database:
Domains table: Stores minimal info about the each domain being tracked and mainly present for an eventual front-end to the database so that I can return all rankings for every keyword associated with a domain
Following fields:
- domain_id
- domain_name
Keywords table: Stores all keywords being tracked and the relevant country for local tracking. Used to populate the list of keywords that needs to be scraped.
Following fields:
- keyword_id
- keyword_name
- keyword_locale
DomainKeywords table: Many-to-many link table between the above two for running reports/queries
- domainkeyword_id
- domain_id
- keyword_id
Serps table: This is where all the data will go with possibly the following fields:
- serp_id: Primary key
- keyword_id: Foreign key to the keywords table
- serp_rank: Position in the SERPs
- serp_datetime: Date/time that the result was pulled
I was thinking that I'd like to track data hourly, but I'm not sure whether I'll end up with too much data if I'm tracking a lot of keywords.
Do you think this will be an issue?
I was thinking that if I only check the rankings every 12 hours, I can store more data for competitor sites which would be useful for running automated reports to identify new competitors and track the movement of existing ones.
I know this part of the forum is pretty quiet so hopefully this thread might develop into something useful for others as well.
Any other tips you might have?
- Problems I might run into scraping Google compared to sites that aren't prepared for scraping
- Recommended libraries
Thanks for reading and for any advice
TL;DR: Looking for advice on coding a personal rank tracker using Python to replace Market Samurai.