.htaccess file

Discussion in 'The Lounge' started by stub, Jul 6, 2018.

Tags:
  1. stub

    stub New Member

    I've just signed up for a Managed VPS. I have a little, but small knowledge of VPS's and Hosting in general. I have an .htaccess file which has been passed down over many years. I'd like to modernize it and bring it up to date and to remove/improve anything which might or might not be needed. Don't laugh :) It's not a joke :) I can understand <Files .htaccess>....</Files> Everything else is fair game? Suggestions/Additions /Corrections/Deletions/Rearangments are all welcome. I would like to thank everbody in advance.

    RewriteEngine on
    Options +FollowSymlinks
    RewriteCond %{SERVER_PORT} ^80$
    RewriteRule ^.*$ https://%{SERVER_NAME}%{REQUEST_URI} [R=301,L]

    <Files .htaccess>
    Order allow,deny
    Deny from all
    </Files>

    # Error Documents
    ErrorDocument 401 /error.php?error=401
    ErrorDocument 403 /error.php?error=403
    ErrorDocument 404 /error.php?error=404
    ErrorDocument 500 /error.php?error=500
    ErrorDocument 503 /error.php?error=503


    # Gzip Compression
    <IfModule mod_deflate.c>
    AddOutputFilterByType DEFLATE text/html
    AddOutputFilterByType DEFLATE text/css
    AddOutputFilterByType DEFLATE text/javascript
    AddOutputFilterByType DEFLATE text/xml
    AddOutputFilterByType DEFLATE text/plain
    AddOutputFilterByType DEFLATE image/x-icon
    AddOutputFilterByType DEFLATE image/svg+xml
    AddOutputFilterByType DEFLATE application/rss+xml
    AddOutputFilterByType DEFLATE application/javascript
    AddOutputFilterByType DEFLATE application/x-javascript
    AddOutputFilterByType DEFLATE application/xml
    AddOutputFilterByType DEFLATE application/xhtml+xml
    AddOutputFilterByType DEFLATE application/x-font
    AddOutputFilterByType DEFLATE application/x-font-truetype
    AddOutputFilterByType DEFLATE application/x-font-ttf
    AddOutputFilterByType DEFLATE application/x-font-otf
    AddOutputFilterByType DEFLATE application/x-font-opentype
    AddOutputFilterByType DEFLATE application/vnd.ms-fontobject
    AddOutputFilterByType DEFLATE font/ttf
    AddOutputFilterByType DEFLATE font/otf
    AddOutputFilterByType DEFLATE font/opentype

    # For Olders Browsers Which Can't Handle Compression
    BrowserMatch ^Mozilla/4 gzip-only-text/html
    BrowserMatch ^Mozilla/4\.0[678] no-gzip
    BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
    </IfModule>

    # BEGIN GZIP
    #<ifmodule mod_deflate.c>
    #AddOutputFilterByType DEFLATE text/text text/html text/plain text/xml text/css application/x-javascript application/javascript
    #</ifmodule>
    # END GZIP

    # Prevent hot-linking of file types
    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^$
    RewriteCond %{HTTP_REFERER} !^https://(www\.)?xxxx.com/.*$ [NC]
    RewriteRule \.(gif|jpg|png|js|css)$ https://www.xxxx.com/images/clear.gif [R,L]

    # BLOCK attempts to use my server as a proxy, but allow absolute URI requests to my site
    RewriteCond %{THE_REQUEST} ^[A-Z]+\ /?http:// [NC]
    RewriteCond %{THE_REQUEST} !^(GET|HEAD|POST|OPTIONS|PROPFIND|TRACE)\ /?http://([^.]+\.)?xxxx\.com/
    RewriteRule .* - [F]

    # Blocking bad bots and site rippers
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
    RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
    RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
    RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus
    RewriteRule ^.* - [F,L]

    # php -- BEGIN cPanel-generated handler, do not edit
    # Set the “ea-php56” package as the default “PHP” programming language.
    <IfModule mime_module>
    AddType application/x-httpd-ea-php56 .php .php5 .phtml
    </IfModule>
    # php -- END cPanel-generated handler, do not edit
     
  2. Yagami

    Yagami Member

    I see that you're using Options +FollowSymlinks. I was also using "Options +FollowSymlinks" in my .htaccess before. But someone in webmasterworld told me that in these modern times, using that is no longer necessary? Can someone confirm truth on that?
     
  3. stub

    stub New Member

    I thank you for all and every comments which might help to improve my .htaccess file. So I'm looking forward to seeing any and all replies to your question.
     
  4. KH-DanielL

    KH-DanielL New Member Staff Member

    Most of what I see in the example above relates to custom configuration for blocking bots or for denying files. I'd like to point out though that your gzip directives can probably be removed and can be addressed by using the "Optimize Website" feature in cPanel, which can create an .htaccess file in your /home/username directory that will compress all website content.

    Additionally, cPanel has Hotlink Protection options as well that can be used to prevent hotlinking and manage those rules automatically.

    Sometimes, position is important in .htaccess, so you may need to move the cPanel-generated directives above or below other rules or rewrites in the file for expected functionality to occur.
     
  5. stub

    stub New Member

    Thanks Daniel,

    I'll take a look into what cPanel has to offer for me. Although I'm not really so comfortable using cPanel, or any of this .htaccess stuff. This is a very old version of .htaccess which I would like to update.

    One question about all these bots. Are these all still needed individually? Is there a simpler way of preventing all these bots? Ot for refining anything else for that matter :)
     
  6. stub

    stub New Member

  7. KH-DanielL

    KH-DanielL New Member Staff Member

    There is a lot of different stuff to consider here. It would really best to break each line down one-by-one and determine if that is something you really need, and additionally ensure it's not covered by other cPanel features like "Optimize Website" (enable gzip compression) or Hotlink protection.

    As far as bots, there are a few bots that we seem to notice getting overzealous from time to time. You may or may not need to block them with .htaccess. Most bots can actually be blocked by robots.txt directives to tell them not to scan the site. However, .htaccess might be helpful in some cases where the bots are malicious and aren't going to be checking a robots.txt file.

    For my needs, I don't run a lot of websites that need public exposure. If I did, I would probably want to account for GoogleBot, msnbot/bingbot, and a few others to ensure those bots have access to the website. I'd probably want to block the bigger search engines that cater to foreign languages like Yandex, Baidu, etc, as well as bots that crawl for SEO purposes like MJ12bot and Ahrefsbot. Those are probably the busiest crawlers I typically see in logs, though there are some others. All of those bots are considered legitimate and can be blocked using robots.txt, but you'll need .htaccess for additional user agent filtering.

    There isn't really a "right answer" here, it really depends on how strict you want to be about keeping potential bad actors out, and often times this is tailored to the application that is being covered by .htaccess. I wouldn't want to provide advice here that might be counter to your websites' needs, so my main advice would probably be to look around at what kind of common vulnerabilities and bots affect the applications you are running, and look into securing those points and from those user agents commonly associated with attacks.
     

Share This Page