Search engine-friendly links with mod_rewrite
Introduction
One of the most frequent questions posted in the Apache Server forum is “How can I change my dynamic URLs to static URLs using mod_rewrite?” So this post is intended to answer that question and to clear up a very common misconception.
Mod_rewrite cannot “change” the URLs on your pages
First, the misconception: Mod_rewrite cannot be used to change the URL that the visitor sees in his/her browser address bar unless an external redirect is invoked. But an external redirect would ‘expose’ the underlying dynamic URL to search engines and would therefore completely defeat the purpose here. This application calls for an internal server rewrite, not an external client redirect.
It’s also important to realize that mod_rewrite works on requested URLs after the HTTP request is received by the server, and before any scripts are executed or any content is served. That is, mod_rewrite changes the server filepath and script variables associated with a requested URL, but has no effect whatsoever on the content of ‘pages’ output by the server.
How to change dynamic to static URLs
With that in mind, here’s the procedure to implement search engine-friendly static URLs on a dynamic site:
An earnest warning
It is not my purpose here to explain all about regular expressions and mod_rewrite; The Apache mod_rewrite documentation and many other tutorials are readily available on-line to anyone who searches for them (see also the references cited in the Apache Forum Charter and the tutorials in the Apache forum section of the WebmasterWorld Library).
Trying to use mod_rewrite without studying that documentation thoroughly is an invitation to disaster. Keep in mind that mod_rewrite affects your server configuration, and that one single typo or logic error can make your site inaccessible or quickly ruin your search engine rankings. If you depend on your site’s revenue for your livlihood, intense study is indicated.
That said, here’s an example which should be useful for study, and might serve as a base from which you can customize your own solution.
Working example
Old dynamic URL format: /index\.php?product=widget&color=blue&size=small&texture=fuzzy&maker=widgetco
New static URL format: /product/widget/blue/small/fuzzy/widgetco
Mod_rewrite code for use in .htaccess file:
# Enable mod_rewrite, start rewrite engine
Options +FollowSymLinks
RewriteEngine on
#
# Internally rewrite search engine friendly static URL to dynamic filepath and query
RewriteRule ^product/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/?$ /index.php?product=$1&color=$2&size=$3&texture=$4&maker=$5 [L]
#
# Externally redirect client requests for old dynamic URLs to equivalent new static URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?product=([^&]+)&color=([^&]+)&size=([^&]+)&texture=([^&]+)&maker=([^\ ]+)\ HTTP/
RewriteRule ^index\.php$ http://example.com/product/%1/%2/%3/%4/%5? [R=301,L]
Note that the keyword “product” always appears in both the static and dynamic forms. This is intended to make it simple for mod_rewrite to detect requests where the above rules need to be applied. Other methods, such as tesing for file-exists are also possible, but less efficient and more prone to errors compared to this approach.
Differences between .htaccess code and httpd.conf or conf.d code
If you wish to use this code in a container in the http.conf or conf.d server configuration files, you will need to add a leading slash to the patterns in both RewriteRules, i.e. change “RewriteRule ^index\.php$” to “RewriteRule ^/index\.php$”. Also remember that you will need to restart your server before changes in these server config files take effect.
How this works
In order for the code above to work, it must be placed in the .htaccess file in the same directory as the /index.php file. Or it must be placed in a <directory> container in httpd.conf or conf.d that refers to that directory. Alternatively, the code can be modified for placement in any Web-accessible directory above the /index.php directory by changing the URL-paths used in the regular-expressions patterns for RewriteCond and RewriteRule.
Regular-expressions patterns
Just one comment on the regular expressions subpatterns used in the code above. I have avoided using the very easy, very popular, and very inefficient construct “(.*)/(.*)” in the code. That’s because multiple “.*” subpatterns in a regular-expressions pattern are highy ambiguous and highly inefficient.
The reason for this is twofold; First, “.*” means “match any number of any characters”. And second, “.*” is ‘greedy,’ meaning it will match as many characters as possible. So what happens with a pattern like “(.*)/(.*)” is that multiple matching attempts must be made before the requested URL can match the pattern or be rejected, with the number of attempts equal to (the number of characters between “/” and the end of the requested URL plus two) multiplied by (the number of “(.*)” subpatterns minus one) — It is easy to make a multiple-“(.*)” pattern that requires dozens or even hundreds of passes to match or reject a particular requested URL.
Let’s take a short example. Note that the periods are used only to force a ‘table’ layout on this forum. Bearing in mind that back-reference $1 contains the characters matched into the first parenthesized sub-pattern, while $2 contains those matched into the second sub-pattern:
Requested URL: http://example.com/abc/def
Local URL-path: abc/def
Rule pattern: ^(.*)/(.*)$
Pass# ¦ $1 value ¦ $2 value ¦ Result
1 … ¦ abc/def .¦ – …… ¦ no match
2 … ¦ abc/de . ¦ f …… ¦ no match
3 … ¦ abc/d .. ¦ ef ….. ¦ no match
4 … ¦ abc/ … ¦ def …. ¦ no match
5 … } abc …. ¦ def …. ¦ Match
I’ll hazard a guess that many many sites are driven to unnecessary server upgrades every year by this one error alone.
Instead, I used the unambiguous constructs “([^/]+)”, “([^&]+)”, and “([^\ ]+)”. Roughly translated, these mean “match one or more characters not equal to a slash,” “match one or more characters not equal to an ampersand,” and “match one or more characters not equal to a space,” respectively. The effect is that each of those subpatterns will ‘consume’ one or more characters from the requested URL, up to the next occurance of the excluded character, thereby allowing the regex parser to match the requested URL to the pattern in one single left-to-right pass.
Common problems
A common problem encountered when implementing static-to-dynamic URL rewrites is that relative links to images and included CSS files and external JavaScripts on your pages will become broken. The key is to remember that it is the client (e.g. the browser) that resolves relative links; For example, if you are rewriting the URL /product/widget/blue/fuzzy/widgetco to your script, the browser will see a page called “widgetco”, and see a relative link on that page as being relative to the ‘virtual’ directory /product/widget/blue/fuzzy/. The two easiest solutions are to use server-relative or absolute (canonical) links, or to add additional code to rewrite image, CSS, and external JS URLs to the correct location. An example would be to use the server-relative link =”/logo.gif”> to replace the page-relative link <img src=”logo.gif”>.
Avoiding testing problems
For both .htaccess and server config file code, remember to flush your browser cache before testing any changes; Otherwise, your browser will likely serve any previously-requested pages from its cache instead of fetching them from your server. Obviously, in that case, no code on your server can have any effect on the transaction.
Read first, then write and test
I hope this post is helpful. If you still have problems after studying the mod_rewrite documentation and regular expressions tutorials, and writing and testing your own code, feel free to post relevant entries from your server error log and ask specific questions in the Apache Server forum. Please take a few minutes to read the WebmasterWorld Terms of Service and the Apache Forum Charter before posting (Thanks!).
Hi everyone, it’s my first pay a visit at this web page, and piece of writing is truly fruitful in favor of me, keep up posting these posts.