Clean URLs in dasBlog

After many long years of being annoyed by the URLs for dasBlog, I finally decided to spend the several hours required to fix them. I am using ISAPI Rewrite which I’ve spoken about in the past to allow me to clean up the URLs for my blog here that is running dasBlog.

The solution is no where near perfect; I would need to dasBlog team’s help to improve it, but here is the ISAPI Rewrite rule set I used complete with a bit if explanation:

### www.mikeschinkel.com/blog/
[ISAPI_Rewrite]
#Leave these alone
RewriteRule ^/blog/(*.).(gif|jpg|png|css|js)$ [I,L] #dasBlog needs these to have .aspx extensions 
RewriteRule ^/blog/Login.aspx$ [I,L] 
RewriteRule ^/blog/Edit(.*).aspx$ [I,L] #Don't rewrite anything in the admin subdir so I don't have to test it. 
RewriteRule ^/blog/admin/(*.)$ [I,L] #Fix searching to work within subdirectories (i.e. like categories) 
RewriteRule ^/blog/(.+)/?SearchView.aspx?(.*)$ /blog/search/?$2 [I,RP] 
RewriteRule ^/blog/search/?(.*) /blog/SearchView.aspx?$1 [I,L] #If it has a parameter, pass it thru 
RewriteRule ^/blog/(.*).aspx?(.*)$ [I,L] #Specific rule for categories - THIS IS CASE SENSITIVE! 
RewriteRule ^/blog/CategoryView,category,(.*).aspx$ /blog/category/$1/ [I,RP] 
RewriteRule ^/blog/category/([^/]+)/?$ /blog/CategoryView.aspx?category=$1 [I,O,L] #Specific rule set for cleaning up the disclaimer page 
RewriteRule ^/blog/FormatPage.aspx?path=siteConfig/disclaimer.format.html$ /blog/disclaimer/ [I,CL,RP] 
RewriteRule ^/blog/disclaimer/?$ /blog/FormatPage.aspx?path=siteConfig/disclaimer.format.html [I,CL,L] #Specific rule set for cleaning up the /Default.apx home page 
RewriteRule ^/blog/Default.aspx$ /blog/ [I,CL,RP] #General rule to removed ASPX and ASHX extensions from and files (CDF.ashx is a specific) 
RewriteRule ^/blog/Permalink.aspx?title=(.*)$ /blog/$1/ [I,CL,RP] 
RewriteRule ^/blog/(.*).(aspx|ashx)$ /blog/$1/ [I,CL,RP] 
RewriteRule ^/blog/(cdf|microsummary)/ /blog/$1.ashx [I,CL,L] 
RewriteRule ^/blog/([^/]+)(/?)$ /blog/$1.aspx [I,CL,L] #Somewhat general rule to clean the Rss and Atom URLs ("R"&"A" in GetRss and GetAtom must be capitalized) 
RewriteRule ^/blog/SyndicationService.asmx/Get(Rss|Atom)$ /$1/ [I,CL,RP] 
RewriteRule ^/blog/rss/$ /blog/SyndicationService.asmx/GetRss [I,L] 
RewriteRule ^/blog/atom/$ /blog/SyndicationService.asmx/GetAtom [I,L] 

The following are my comments explaining what I did and why:

  • Any line that starts with a hash (“#”) is a comment.
  • The caret (“^”) and the dollar sign (“$”) in each rewrite rule denotes the beginning and ending of the URL. This way it won’t match just a portion of the URL. I used it to ensure it didn’t match URLs I didn’t expect it to match.
  • The square-bracket enclosed characters at the end of each rewrite rule are directives.
  • I – Ignore case, i.e. match an incoming url with usser case “.ASPX” even if the rule referenced it as lowercase “.aspx”
  • RP – Permanent Redirect. This doesn’t rewrite but instead redirects which allows ugly clicked links to be made pretty in the browser URL box, although it does slow page load because it takes two round trips to the server instead of one. This permanent redirect should tell Google and others that the old ugly URL has been replaced by the new clean URL. If the dasBlog developers would put in an options for “Clean URLs with ISAPI Rewrite” then they could generate these clean URLs and the lines with RP would not be needed.
  • CL – Convert to lower. URLs should ideally be in lowercase. I plan write a blog post at blog.welldesignedurls.org in the future to explain why. NOTE: All rules with CL are paired with an RP as the only reason to convert to lowercase is because of a redirect.
  • L – Last rule; this stops the scanning of rules and makes this the last rule processed. These are the real rewrite rules. Without the “L” ISAPI Rewrite would loop indefinitely of the way I am first redirecting to a clean URL and then rewriting back to the ugly URL.
  • The rule with (gif|jpg|png|css|js) just simply says “Don’t change anything for URLs ending in .GIF, .JPG, .PNG, .CSS, and .JS, and stop processing rules right now.” The “$&” just provides a copy of the URL.
  • The next two do the same for Login.aspx and Edit(.*).aspx:
  • Login.aspx – I couldn’t get Login.aspx to work correctly during rewriting and wasn’t sure if it was dasBlog battling Firefox’s caching or what, so rather than stress over it I just didn’t worry about it.
  • Edit(.*).aspx – The are the admin URLs that I didn’t really need to rewrite (although I would have liked to) but I ignored them so I didn’t have to debug them.
  • /Admin/(.*) – This I also ignored so I wouldn’t have to debut.
  • CategoryView – This is where it gets interesting. The dasBlog developers chose a novel strategy for rewriting the category URL (not sure why.) The category URLs take the form /blog/CategoryView,category,{category-name}.aspx where {category-name} is the name of the category you see in the category list on the side bar. But if you type in this URL it works too: /blog/CategoryView.aspx?category={category-name} but only if {category-name} is in the exact case as presented on the sidebar (i.e. “dasBlog” not “Dasblog” or “dasblog”). That means I can’t use lowercase, which I’d prefer, but at least I can get the Category URLs clean if not both lowercase by default and case-sensitive.
  • Disclaimer – The next is just cleaning up the disclaimer URL to be only /blog/disclaimer/
  • Default.aspx – This just gets rid of the Default.aspx which is linked form my blog title.
  • Parmeters (?) – Again, rather than do a lot of debugging, I decided to pass anything with a parameter. Later I might try to tackle cleaning these up like Search.aspx
  • Trailing Slashes – Note that I use the syntax “/?” for matching incoming URLs. The “?” matches zero or one occurances so this allows me to match both URLs like “/blog/disclaimer/” and “/blog/disclaimer”.