Removing weird characters when scraping a page

Status
Not open for further replies.

s1mplygo1d

New member
Mar 16, 2008
56
0
0
Hi,
I am scraping the contents of a page (this isn't for scraping for spammy sites or anything. 2 seperate servers need to share information sometimes, quickest/easiest way is to just scrape it).

But everytime there is a bullet point, em dash or anything, it puts in extra characters like ”.

How can i stop this? It displays fine on the page it is being scraped from. And they have the same character encoding on both pages.

I am doing it in php, btw.
 


Hah! There are tons of ways to synchronize servers, scraping the content is the worst.
You problem is with character encoding.

Usually turning everything to utf8 works. And I mean everything, html encoding type, mysql data types, all of them.

CHARACTER SET utf8 COLLATE utf8_general_ci

Usually fixes the sql issues.
 
Status
Not open for further replies.