.htaccess file

stub

New Member
I've just signed up for a Managed VPS. I have a little, but small knowledge of VPS's and Hosting in general. I have an .htaccess file which has been passed down over many years. I'd like to modernize it and bring it up to date and to remove/improve anything which might or might not be needed. Don't laugh :) It's not a joke :) I can understand <Files .htaccess>....</Files> Everything else is fair game? Suggestions/Additions /Corrections/Deletions/Rearangments are all welcome. I would like to thank everbody in advance.

RewriteEngine on
Options +FollowSymlinks
RewriteCond %{SERVER_PORT} ^80$
RewriteRule ^.*$ https://%{SERVER_NAME}%{REQUEST_URI} [R=301,L]

<Files .htaccess>
Order allow,deny
Deny from all
</Files>

# Error Documents
ErrorDocument 401 /error.php?error=401
ErrorDocument 403 /error.php?error=403
ErrorDocument 404 /error.php?error=404
ErrorDocument 500 /error.php?error=500
ErrorDocument 503 /error.php?error=503


# Gzip Compression
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE image/x-icon
AddOutputFilterByType DEFLATE image/svg+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/x-font
AddOutputFilterByType DEFLATE application/x-font-truetype
AddOutputFilterByType DEFLATE application/x-font-ttf
AddOutputFilterByType DEFLATE application/x-font-otf
AddOutputFilterByType DEFLATE application/x-font-opentype
AddOutputFilterByType DEFLATE application/vnd.ms-fontobject
AddOutputFilterByType DEFLATE font/ttf
AddOutputFilterByType DEFLATE font/otf
AddOutputFilterByType DEFLATE font/opentype

# For Olders Browsers Which Can't Handle Compression
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
</IfModule>

# BEGIN GZIP
#<ifmodule mod_deflate.c>
#AddOutputFilterByType DEFLATE text/text text/html text/plain text/xml text/css application/x-javascript application/javascript
#</ifmodule>
# END GZIP

# Prevent hot-linking of file types
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^https://(www\.)?xxxx.com/.*$ [NC]
RewriteRule \.(gif|jpg|png|js|css)$ https://www.xxxx.com/images/clear.gif [R,L]

# BLOCK attempts to use my server as a proxy, but allow absolute URI requests to my site
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /?http:// [NC]
RewriteCond %{THE_REQUEST} !^(GET|HEAD|POST|OPTIONS|PROPFIND|TRACE)\ /?http://([^.]+\.)?xxxx\.com/
RewriteRule .* - [F]

# Blocking bad bots and site rippers
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

# php -- BEGIN cPanel-generated handler, do not edit
# Set the “ea-php56” package as the default “PHP” programming language.
<IfModule mime_module>
AddType application/x-httpd-ea-php56 .php .php5 .phtml
</IfModule>
# php -- END cPanel-generated handler, do not edit
 
I see that you're using Options +FollowSymlinks. I was also using "Options +FollowSymlinks" in my .htaccess before. But someone in webmasterworld told me that in these modern times, using that is no longer necessary? Can someone confirm truth on that?
 
I thank you for all and every comments which might help to improve my .htaccess file. So I'm looking forward to seeing any and all replies to your question.
 
Most of what I see in the example above relates to custom configuration for blocking bots or for denying files. I'd like to point out though that your gzip directives can probably be removed and can be addressed by using the "Optimize Website" feature in cPanel, which can create an .htaccess file in your /home/username directory that will compress all website content.

Additionally, cPanel has Hotlink Protection options as well that can be used to prevent hotlinking and manage those rules automatically.

Sometimes, position is important in .htaccess, so you may need to move the cPanel-generated directives above or below other rules or rewrites in the file for expected functionality to occur.
 
Thanks Daniel,

I'll take a look into what cPanel has to offer for me. Although I'm not really so comfortable using cPanel, or any of this .htaccess stuff. This is a very old version of .htaccess which I would like to update.

One question about all these bots. Are these all still needed individually? Is there a simpler way of preventing all these bots? Ot for refining anything else for that matter :)
 
There is a lot of different stuff to consider here. It would really best to break each line down one-by-one and determine if that is something you really need, and additionally ensure it's not covered by other cPanel features like "Optimize Website" (enable gzip compression) or Hotlink protection.

As far as bots, there are a few bots that we seem to notice getting overzealous from time to time. You may or may not need to block them with .htaccess. Most bots can actually be blocked by robots.txt directives to tell them not to scan the site. However, .htaccess might be helpful in some cases where the bots are malicious and aren't going to be checking a robots.txt file.

For my needs, I don't run a lot of websites that need public exposure. If I did, I would probably want to account for GoogleBot, msnbot/bingbot, and a few others to ensure those bots have access to the website. I'd probably want to block the bigger search engines that cater to foreign languages like Yandex, Baidu, etc, as well as bots that crawl for SEO purposes like MJ12bot and Ahrefsbot. Those are probably the busiest crawlers I typically see in logs, though there are some others. All of those bots are considered legitimate and can be blocked using robots.txt, but you'll need .htaccess for additional user agent filtering.

There isn't really a "right answer" here, it really depends on how strict you want to be about keeping potential bad actors out, and often times this is tailored to the application that is being covered by .htaccess. I wouldn't want to provide advice here that might be counter to your websites' needs, so my main advice would probably be to look around at what kind of common vulnerabilities and bots affect the applications you are running, and look into securing those points and from those user agents commonly associated with attacks.
 
I know this is old, but that looks like one of my htaccess files from 2008, on a major site I used to manage. I'd start over from scratch and re-evaluate based on traffic and loading. Every time the site is accessed, every one of those rules has to be run by apache. Very inefficient.
 
Top