Content scrapers and protecting your work

Status
Not open for further replies.

BlueYonder

Flaming panties
Aug 13, 2008
2,660
49
0
New York Metro
I'm reading about scrapers that steal your content and publish it at sites as their own.

Further reading explained methods to make the content theft actually work for me, and have a question about same. At the site The Tao of Blog Grizzly said "Just put a link in the first paragraph of your posts pointing back at your site. If they get removed then hide your link in a period."

How do I hide my site url in a period? I have no idea what this means. I thought Big G goes nuts when hidden text is detected.
 


Code:
I think they mean like this<a href="www.mylink.com">.</a>
Thanks, that's clear enough. Assuming Big G won't care if I link to myself.

I should have added the text with the other point he made: "Remember to use a keyword you want to be indexed for in your link's anchor text. Use scrapers to increase your backlinks and keyword athority."

Allowing the theft can be a good way to increase your inbound links.
 
Further reading explained methods to make the content theft actually work for me, and have a question about same. At the site The Tao of Blog Grizzly said "Just put a link in the first paragraph of your posts pointing back at your site. If they get removed then hide your link in a period."

Well that might work in a few cases, but I'd be stripping out anything but plain text if I were doing this.
 
Well that might work in a few cases, but I'd be stripping out anything but plain text if I were doing this.

Grizzly also said this: "Most scrapers use auto software to scrape sites and your link won't get removed."

He feels they mostly don't want to be bothered processing your text, and I agree. I don't see why they would care if your stolen article has a linkback to yourself.

Is there a reason for it? Please clarify.
 
My content gets scraped all the time. I once found a post of mine, that was spinned so beautifully that I couldnt recognize it myself.
But I dont care, I scrape content too on a daily basis.
And dont do hidden links and other bullshit. Just link back to your blog, and you will get loads of links
 
And dont do hidden links and other bullshit. Just link back to your blog, and you will get loads of links
Do you mean that you openly weave your site url into the content, as part of a normal paragraph?

This is really important stuff I only found out about today. An article I'm almost finished writing has tremendous potential for theft, so I want to process it correctly so I don't get screwed over.

Sorry so n00bish. I did notice that you guys don't seem to mind n00bs too much when they make a genuine effort and ask decent questions.
 
Grizzly also said this: "Most scrapers use auto software to scrape sites and your link won't get removed."

To be honest, I have no idea of the percentage of 'auto software' scrapers as compaired to custom made ones. I do know this though, anyone that knows what they are doing will be using a custom written one, so it does exactly what they want it to, like strip out everything they don't want.

He feels they mostly don't want to be bothered processing your text, and I agree. I don't see why they would care if your stolen article has a linkback to yourself.

Is there a reason for it? Please clarify.

Why wouldn't you be bothered processing the scraped content once you have it? You could end up with anything on your sites, including links to places you dont want them to go to, javascript that may be damaging, etc. I really couldn't imagine going to the trouble of scraping a site and not precessing the data before republishing it.
 
BlueJam is 100% correct here. Links and all other HTML shit are removed before content is spun so dropping it in a period or leaving an empty link (<a href="..."></a>) will not help your cause at all. Besides the links will be worthless.

There is much to gain for scrapers to remove the links as it shows authority given to someone else and also leaves footprints ... no decent programmer will let this happen.

Your best bet is to have a unique string (url without dots or whatever) in your text that will probably be left untouched. Then just do regular google searches for "wwwsitedomaincom" and send a DMCA to the hosts of whomever shows up.

That's really about all you can do to avoid this stuff as weak as it may sound.
 
didn't shoemonkey make a post on his blog about this a while back?
I just did a search and found it - will read, but whose advice do I take? So many conflicting opinions.

I just bookmarked the Copyscape site and will use it regularly to check on my content, but isn't it hard to get people to drop your stolen work? Seriously, why wouldn't they ignore just ignore me?

I have a feeling some black hats are seeing themselves in this thread.
 
I just did a search and found it - will read, but whose advice do I take? So many conflicting opinions.

If you are considering learning anything from shitmonkey you've already lost the game and we can't help you

A DMCA to the hosts will take a site down, you get to bypass the developer all together. If they ignore, their site goes down (just happened to me over some fucking flickr pictures).

Some of the better BHers will spoof their server or host in 3rd world countries and there's not really much you can do about this.

The best answer I can really give is for you to be big enough to move past this and not let it waste too much of your time. Cheaters happen, there's no reason for you to just tread water with your work because of it.

File the DMCA and move on with your life.
 
Not every scraper is competent enough to have software created for them that removes links. There are lots of people using wordpress plugins that create posts from rss feeds and that sort of thing.

Most of the sites I've found that have scraped my stuff have left the links in so it's worth it to put links back to your site in your content.
 
Besides leaving a link in the blog add a string nobody would ever use like 'blueyonderisalegend' :) and then check the SE's from time to time with that string. For the hardcore scrapers who strip all the html the string should still appear in the post and therefore make it easy to track.
 
Status
Not open for further replies.