Jason Leveille's Blog

Web Development Intoxication

MySQL Replace Rocks

We have a client who is having issues with character encoding.  They recently had a massive content migration and along with the migration came numerous character issues.  They moved from a character set of iso-8859-1 to UTF-8, and with multiple content editors copying/pasting from many different resources (pdf’s, Word, Web Pages, etc) they now have some cleanup to do.  We were recently asked if we could provide a report on all the pages where certain characters resided.  No problem.  We ran some SQL queries (most content resides in the database thankfully) that looked similar to the following:

SELECT p.id, p.title FROM `pages` p WHERE
p.content REGEXP '(¢|â|Ã)';

In the case of the above query, we’re just taking advantage of MySQL’s native REGEXP operator to check for the existence of any of the three characters (simplified for illustration) in a content block.  After running our SQL statement we quickly realized that the work necessary to clean this up would likely amount to at least a few weeks for our client.  With an additional few hours of work (after backing things up and testing, testing, testing) we were able to provide a solution that will take a matter of minutes to implement (if the client chooses to hire us to do so). Here is an abbreviated version of what we came up with:

UPDATE `pages` p SET p.content = replace(p.content,'â€Â','');
UPDATE `pages` p SET p.content = replace(p.content,'â€Å\“','');
UPDATE `pages` p SET p.content = replace(p.content,'¢â‚','');
...etc

Conclusion

There isn’t anything earth shattering here.  These are simple queries, but they illustrate the inherent abilities of the technology employed with our client’s application.  These inherent abilities, if known, have the potential to save dozens of hours.  In the case of the above example, I truly believe we will likely save our client at least 30 man hours.


3 Comments for "MySQL Replace Rocks"

Comment 1 pokoka - Gravatar pokoka

Hmmm ... I never tried MySQLs REGEXP. Looks good. Normally I'd take the "just rip it out change and throw back" approach. Your's is a lot better ;P

Wed, 16 Jul 2008 17:42:31 +0000 Link

Comment 2 leveille - Gravatar leveille

Up to this point I had never had a need for replace, and its use here was definitely a time saver.

Wed, 16 Jul 2008 20:08:24 +0000 Link

Comment 3 leveille - Gravatar leveille

Sometimes when I go back and read old blog posts I have to laugh at myself. I should have just taken the time to pay more attention to utf-8 encoding and I wouldn't have had to use any kind of find/replace method to solve the problem I had above.

Mon, 13 Apr 2009 14:16:10 +0000 Link

Comments have been disabled for this post.

Mobify empowers marketers and developers to create amazing mobile web experiences. Tap to learn more

Mobify