Stephen Reese

Blogger is doing away with the option to host your blog via your own host and migrating everything to the cloud. I wanted to have the option to continue hosting my blog on my own server even though as of now I am still hosting with Blogger. The main concern I had was redirecting URLs that blogger had created to a new blogging platform such as WordPress. I looked around and found several methods here, here, and here for redirecting one URL to another. The two primary method were HTTP redirects by modifying the page header or Apaches [mod_rewrite][]. I like Apache so I opted for the latter.

I only had about 60 posts so creating a few mod_rewrite rules is not a big deal. There were a number bloggers had complaints about Blogger removing FTP/SFTP publishing capabilities and they were considering a migration away from Blogger. This got me thinking about how to help others in transferring thousands blog entries.

I decided to try to automate this process somewhat with a little scripting fu. This could be scripted into a single script and if there is enough interest, I will make it happen.

The first step is to import your Blogger posts into your WordPress database. Blogger can export its posts but WordPress does not have a native plug-in for importing the posts in the XML format that Blogger is capable of exporting. WordPress can however import posts and comments from a Blogger Blogspot hosted profile. Create a Blogspot host and import the posts that you have backed up from your main profiles XML file. Make sure to disable search engine indexing for the temporary site so that you do not hurt your SEO.

The second step is to import the posts into WordPress. This is relatively easy to do, basically login to your WordPress administrative tools and import the blogger posts from your Blogspot profile that you created in the first step. I tried using the recommended tools per WordPress and a third party tool but they did not work very well for me.

Now your WordPress install should have all of your content and comments and your WordPress install is working correctly. This tutorial also assumes you are using the following permalink format for your WordPress posts, if not you will have to adjust this tutorial to your liking:

/%year%/%monthnum%/%postname%/

You will notice that your URL conforms to the WordPress install and not to Bloggers. This means that when you migrate your DNS to point at your shiny WordPress install all of the links that users have bookmarked and the search engines have crawled will no longer be valid. Worse, this could hurt your search engine rankings as it will take time for search engines to realize the new content and during that time you will have duplicate content floating around. Not an ideal situation.

Third step is to determine all of the URLs that your Blogger account was using the XML file that you exported from your Blogger blogs profile. This will produce a file with your Blogger file names. It should be the same as the number of posts you have published on Blogger or in other words imported to WordPress. Note you will need to change the XML file name and domain name to match your settings:

# Produces blogger file names.
sed "s/\(href='[^']*'\)/\1\n/g" blog-02-04-2010.xml |   
grep "href='http://www.rsreese.com/20.*html'" |   
sed "s+.*href='http://www.domain.com/\(20[^']*\)'.*+\1+" |   
sort -ut/ -k3 | xargs -I{} basename {} | sort -u > /tmp/blogger.txt

Next you want to generate a similar listing from your WordPress install that is populated with all of your Blogger content. This involves logging into your MySQL install and exporting a little data.

mysql -u wordpress_user -p
mysql> USE wordpress_db;
mysql> SELECT post_name FROM wp_posts INTO OUTFILE '/tmp/wp.txt';

Next you want to ensure that your post line up from the two files. In my case I had some that were not sorted exactly right, this basically let me know how much manipulating I would have to do. Paste this into a file on your Linux and provide executable permissions such as ‘chmod +x filename’. Then run the file ‘/filename’. Note you will need to specify the paths to your wp.txt and blogger.txt in the small script.

paste blogger.txt wp.txt | while read Line
do set $Line
echo "This is from FileA: " $1
echo "This is from FileB: " $2
done

Lastly lets actually generate the mod_rewrite rules for Apache. Again when this runs the sort function may not match up the file names exactly right so you may have to do some manual manipulation.

paste blogger.txt wp.txt | while read Line
do set $Line
echo 'RewriteRule ^([0-9]{4})/([0-9]{1,2})/'$1'$ $1/$2/'$2'/ [NC,R=301,L]'
done

You probably want to redirect the output to a file so you can go in and fix the values that have not sorted correctly.

The last part of the configuration here is a section from my Apache configuration file. I have also included a little bit to redirect the feeds though for me this was not very important as I syndicate through FeedBurner allowing me to modify my feed without effect subscribers.

# This has two of my rewrite rules, I have many more but kept it brief for readability.
<Directory /var/www/apache2-default/wordpress/>
RewriteEngine OnRewriteBase /wordpress/
RewriteRule ^atom.xml$ feed/ [NC,R=301,L]
RewriteRule ^rss.xml$ feed/ [NC,R=301,L]
RewriteRule ^([0-9]{4})/([0-9]{1,2})/adding-character-to-line-using-perl.html$ $1/$2/adding-a-character-to-a-line-using-perl/ [NC,R=301,L]
RewriteRule ^([0-9]{4})/([0-9]{1,2})/authenicating-kerberos-against-active.html$ $1/$2/authenicating-kerberos-against-active-directory/ [NC,R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /wordpress/index.php [L]</Directory>

Finally you should test your setup to determine that all of the links redirect.

sed "s/\(href='[^']*'\)/\1\n/g" blog-02-07-2010.xml|   
grep "href='http://www.rsreese.com/20.*html'" |   
sed "s+.*href='\([^']*\)'.*+\1+" |   
sort -ut/ -k3 > /tmp/full_blogger_urls.txt

Next you can use wget to test the URLs to make sure they all redirect correctly.

wget -i /tmp/full_blogger_urls.txt

This tutorial is not an end all solution is not perfect by any means. It still requires some manipulation of data but if you have a large number of URLs to redirect then you may find it useful. Your mileage may vary though if you have problems or recommendations than drop a comment…


Comments

comments powered by Disqus