Friday, February 12, 2010

Redirect Blogger URL using Mod Rewrite and shell scripting fu

Blogger is doing away with the option to host your blog via your own host and migrating everything to the cloud. I wanted to have the option to continue hosting my blog on my own server even though as of now I am still hosting with Blogger. The main concern I had was redirecting URLs that blogger had created to a new blogging platform such as WordPress. I looked around and found several methods here, here, and here for redirecting one URL to another. The two primary method were HTTP redirects by modifying the page header or Apache's mod_rewrite. I like Apache so I opted for the latter.

I only had about 60 posts so creating a few mod_rewrite rules is not a big deal. There were a number bloggers had complaints about Blogger removing FTP/SFTP publishing capabilities and they were considering a migration away from Blogger. This got me thinking about how to help others in transferring thousands blog entries.

I decided to try to automate this process somewhat with a little scripting fu. This could be scripted into a single script and if there is enough interest, I will make it happen.

The first step is to import your Blogger posts into your WordPress database. Blogger can export it's posts but WordPress does not have a native plug-in for importing the posts in the XML format that Blogger is capable of exporting. WordPress can however import posts and comments from a Blogger Blogspot hosted profile. Create a Blogspot host and import the posts that you have backed up from your main profiles XML file. Make sure to disable search engine indexing for the temporary site so that you don't hurt your SEO.

The second step is to import the posts into WordPress. This is relatively easy to do, basically login to your WordPress administrative tools and import the blogger posts from your Blogspot profile that you created in the first step. I tried using the recommended tools per WordPress and a third party tool but they did not work very well for me.

Now your WordPress install should have all of your content and comments and your WordPress install is working correctly. This tutorial also assumes you are using the following permalink format for your WordPress posts, if not you will have to adjust this tutorial to your liking:
/%year%/%monthnum%/%postname%/

You will notice that your URL conforms to the WordPress install and not to Bloggers. This means that when you migrate your DNS to point at your shiny WordPress install all of the links that users have bookmarked and the search engines have crawled will no longer be valid. Worse, this could hurt your search engine rankings as it will take time for search engines to realize the new content and during that time you will have duplicate content floating around. Not an ideal situation.

Third step is to determine all of the URLs that your Blogger account was using the XML file that you exported from your Blogger blogs profile. This will produce a file with your Blogger file names. It should be the same as the number of posts you have published on Blogger or in other words imported to WordPress. Note you will need to change the XML file name and domain name to match your settings:

# Produces blogger file names.
sed "s/\(href='[^']*'\)/\1\n/g" blog-02-04-2010.xml | \
grep "href='http://www.rsreese.com/20.*html'" | \
sed "s+.*href='http://www.domain.com/\(20[^']*\)'.*+\1+" | \
sort -ut/ -k3 | xargs -I{} basename {} | sort -u > /tmp/blogger.txt

Next you want to generate a similar listing from your WordPress install that is populated with all of your Blogger content. This involves logging into your MySQL install and exporting a little data.

mysql -u wordpress_user -p
mysql> wordpress_db;
mysql> SELECT post_name FROM wp_posts INTO OUTFILE '/tmp/wp.txt';

Next you want to ensure that your post line up from the two files. In my case I had some that were not sorted exactly right, this basically let me know how much manipulating I would have to do. Paste this into a file on your Linux and provide executable permissions such as 'chmod +x filename'. Then run the file '/filename'. Note you will need to specify the paths to your wp.txt and blogger.txt in the small script.

paste blogger.txt wp.txt | while read Line
do
set $Line
echo "This is from FileA: " $1
echo "This is from FileB: " $2
done


Lastly lets actually generate the mod_rewrite rules for Apache. Again when this runs the sort function may not match up the file names exactly right so you may have to do some manual manipulation.

paste blogger.txt wp.txt | while read Line
do
set $Line
echo 'RewriteRule ^([0-9]{4})/([0-9]{1,2})/'$1'$ $1/$2/'$2'/ [NC,R=301,L]'
done

You probably want to redirect the output to a file so you can go in and fix the values that have not sorted correctly.

The last part of the configuration here's a section from my Apache configuration file. I have also included a little bit to redirect the feeds though for me this was not very important as I syndicate through FeedBurner allowing me to modify my feed without effect subscribers.

# This has two of my rewrite rules, I have many more but kept it brief for readability.
<Directory /var/www/apache2-default/wordpress/>
RewriteEngine On
RewriteBase /wordpress/
RewriteRule ^atom.xml$ feed/ [NC,R=301,L]
RewriteRule ^rss.xml$ feed/ [NC,R=301,L]
RewriteRule ^([0-9]{4})/([0-9]{1,2})/adding-character-to-line-using-perl.html$ $1/$2/adding-a-character-to-a-line-using-perl/ [NC,R=301,L]
RewriteRule ^([0-9]{4})/([0-9]{1,2})/authenicating-kerberos-against-active.html$ $1/$2/authenicating-kerberos-against-active-directory/ [NC,R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /wordpress/index.php [L]
</Directory>
Finally you should test your setup to determine that all of the links redirect.

sed "s/\(href='[^']*'\)/\1\n/g" blog-02-07-2010.xml| \
grep "href='http://www.rsreese.com/20.*html'" | \
sed "s+.*href='\([^']*\)'.*+\1+" | \
sort -ut/ -k3 > /tmp/full_blogger_urls.txt

Next you can use wget to test the URLs to make sure they all redirect correctly.

wget -i /tmp/full_blogger_urls.txt

This tutorial is not an end all solution is not perfect by any means. It still requires some manipulation of data but if you have a large number of URLs to redirect then you may find it useful. Your mileage may vary though if you have problems or recommendations than drop a comment...
posted by Stephen Reese at 2 Comments

Monday, February 08, 2010

A few tools that may help rid of malware

These tools may help rid a computer system of malware but be warned they can be very destructive to your system. In other words if you don't know what you're doing then backup what you can and take it to a professional.

Of course keep your current anti-spyware and virus installs and definitions up2date.
posted by Stephen Reese at 0 Comments

Setting up maildrop with Courier MTA

Setting up maildrop with Courier MTA

Before I get into the maildrop here's a few notes to myself for setting up Courier.

Before running ./configure you should add ssl bin directory to your path
To receive local mail indifferent of caps touch {your/etc/courier/dir}locallowercase

Account postmaster@ HAS to be set up as well in the /usr/lib/courier/etc/aliases/system file

To tell courier about hosted domains,

add domain to, /etc/courier/hosteddomains

then,as root, run makehosteddomains

and to tell courier to accept esmtp connections for the domain

add domains to /etc/courier/esmtpacceptmailfor.dir/domains

then,as root, run makeacceptmailfor

Also, the email account postmaster@ HAS to be set up as well.


Here's the maildrop stuff:

1. Edit the "/usr/lib/courier/etc/maildroprc" to have "| /usr/lib/courier/bin/maildrop" as your delivery method

2. Create a "$HOME/.mailfilter" file to be read by maildrop, there is no need for the most part of a ".courier" since mail drop is already being used!


3. Make sure your "/usr/lib/courier/etc/maildroprc" doesn't kill the install IE:


#attempt at a maildroprc file...
if ( $SIZE < 26144 )
{
exception {
xfilter "/usr/bin/spamassassin"
}
}
if (/^X-Spam-Flag: *YES/)
{
exception {
to "$HOME/Maildir/.Trash/"
}
}
#else
#{
# exception {
# to "$HOME/Maildir/"
# }
#}

The commented out part is no good since your ".mailfilter" will never be read so DON'T specifiy the default delivery since no matter what unless specified other wise by an exit command will courier deliver to the default "$HOME/Maildir" also goes for the .mailfilter, no matter where u send the mail to there is no need to send it to the default location unless you have some crazy kaos going on that is beyond my lame howto =)

4. The contents of your ".mailfilter should be something like the following:

"| /usr/lib/courier/bin/mailbot -t autoresponse -s 'AutoGoAwayMessage' -A 'From: test@prcdigital.com' /usr/sbin/sendmail -f "

A "autoresponse" file should be created and placed in the same $HOME directory as the ".mailfilter" is located, though a universal file can be created from multiple users to access if desired.

5. "chmod 600 .mailfilter autoresponse"

Also the same user:group that is owner of the Maildir should also own these two files so "chown user:group .mailfilter autoresponse"

or Once you get to maildrop, you don't want to bounce it. Your best bet is to just drop it. Also, I would suggest using spamc/spamd if at all possible. This is what I would do:

  if ( $SIZE < 204800 )
{
exception {
xfilter "/usr/bin/spamc"
}
}

if ((/^X-Spam-Flag: YES/))
{
if ((/^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*/))
{
echo "***** Dropping 15+ Spam *****"
EXITCODE = 0
exit
}
else
{
to "$HOME/Maildir/.Trash/"
}
}
to "$HOME/Maildir/"


You can get rid of the echo if you don't want an entry in the log when it drops an email.

if ((/^X-Spam-Flag: YES/))


Why double parentheses? This is what I am using and it is not working, though it seemed to work until recently:

if (/^X-Spam-Level: *\*\*\*\*\*\*\*/)
{
exception {
to "/dev/null"
}
}
posted by Stephen Reese at 0 Comments

Wednesday, February 03, 2010

Migrating from Blogger to WordPress

Blogger is removing the functionality to host your own "Blogger" content by disabling the FTP/SFTP functionality from their system. I'm considering their hosting solution or migrating to a WordPress solution.

If I stick with Google's Blogger hosting then bandwidth should not ever be an issue as they have a distributed computing system. The only downfall is that I'll probably have to use a sub-domain to host any static files. If I move to hosting my own WordPress then I'll probably have to increase my virtual host resources since PHP and MySQL will be required therefore using more system resources. This also increases my hosts vulnerability footprint. Not only am I essentially increasing adding two services but WordPress has had its fair share of security issues.

If you want to stick with Blogger the simple alternative is just to migrate to a hosted Blogspot and use custom domains. You can simply point your DNS host domain.com or sub.domain.com to Google's DNS servers and within a short amount of time you will be up and running again. With this said there are a number of variables that come into play.

Google's Blogspot does not support subfolders, one alternative is to use a URL redirection to point to the new host which means you will need to search around for the code to insert into the header of your template to accomplish this. Per the migration tool there is no sub-folder support.

domain.com/blog/ --> blog.domain.com

Since Google would hosting your blog there really isn't a wonderful way to handle this as there is not a provision to use Mod_Rewrite or something similar though with the number of complaints Google has received on their blog they may implement a feature.

If you are considering hosting with another solution such as Wordpress then you have more options available to you depending on your hosting solution. Wordpress has an integrated import function to import other Blogging but you must first convert you existing hosted Blogger account to a Blogspot solution. Blogger does have an export function but it seems broken per these posts. Wordpress also has custom URL functionality so it would be easier to match the format that blogger was using especially if you can utilize Mod_Rewrite.

Personally, I'm still undecided...
posted by Stephen Reese at 0 Comments