For those that are well versed in, or even just slightly familiar with Search Engine Optimization techniques, you're probably aware of the SEO concerns with duplicate content. If not, duplicate content involves having the same content published online in different places, whether it be within the same domain or across different domains. This is punishable because this is often done to deceive search engines to achieve better rankings, and subsequently, more traffic.
One can understand why this would often accompany malware such as droppers and bitcoin mining sites. The more traffic driven to these sites, the more successful the malware campaign is.
Another consideration with duplicate content is the search engine's responsibility to the user. A user researching a particular topic will not want the same content shown in each link clicked. This can be frustrating for the user and lead to an overall depreciated user experience. Hence, even unintentional duplicate content must be handled accordingly.
In cPanel, a mail subdomain is generated automatically when a user creates an account. Unbeknownst to most, this mail subdomain is accessible via the browser and displays the exact same content as the main domain. This leaves site administrators with SEO1) concerns regarding duplicate content penalties.
Google's stance on duplicate content is that they will not intentionally punish duplicate content that is unintentional, but can you trust that Google will accurately identify this unintentionally duplicated content as inadvertent on the site administrator's part and not punish their site's rankings? Google admits that mistakes regarding the categorization of duplicate content as deceptive or not are possible. 2)
Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a "regular" and "printer" version of each article, and neither of these is blocked with a noindex meta tag, we'll choose one of them to list. In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.
Because of Google's obligations to the user, they must choose one of the domains to index. Will Google also choose to index the main domain over the mail subdomain as desired?
cPanel configures the mail subdomain as described because it is necessary for AutoSSL to automatically generate a certificate for the mail subdomain. Many will use a mail subdomain as the server name when connecting their mail clients, so a SSL on this subdomain is necessary for those using the mail subdomain in order to connect securely to their mail server and simultaneously avoid SSL errors.
cPanel forums contain the following statement: 3)
This behavior is by-design as of cPanel version 60:
Change in mail. alias behavior for Apache server
The system now automatically creates an Apache server alias for the mail. subdomain for each domain, parked domain, and addon domain (but not subdomains). This allows the mail alias to appear in the same virtual host as the parent domain. We made this change in order to simplify Mail SNI and SSL certificate management and reduce unnecessary mail client warnings.
For example, Apache will now respond to mail.example.com as an alias for example.com. However, Apache will not automatically respond to mail.subdomain.example.com as an alias for the subdomain.example.com subdomains.
Technically, one could simply use the domain itself or the server's hostname instead, so the SEO issue could be averted completed if cPanel advised against using the mail subdomain for secure connections and didn't configure the mail alias at all. There are other options, too.
Instructions for the removal of the mail subdomain Alias are posted on cPanel forums 4), but the easiest method would be to simply remove the DNS records for the mail subdomain (that is, if you don't require the mail subdomain to have a valid SSL):
You can manual remove the "mail" entry from the "serveralias" line in the following configuration files under the /var/cpanel/userdata/$username directory:
Then, remove the .cache files for these domain names:
Next, rebuild the Apache configuration file:
However, keep in mind this is part of what allows SSL certificate validation for mail.domain.tld as part of the Domain TLS functionality:
What is Domain TLS - cPanel Knowledge Base - cPanel Documentation 5)
After performing the steps described above via the cPanel post, then restart apache and php-fpm if needed, and then remove the CNAME or A records for the mail subdomain from the DNS zone file for the main domain. Now, the mail subdomain, when accessed via the browser, should display just like any other non-existent subdomain.
This is the most recommended method. Google has published a thorough article regarding canonicalization 6), but I'll try to sum this up succinctly here.
Canonicalization is the method by which Google reviews duplicated/similar URLs and tries to determine which is most complete and would be most valued by visitors. It then marks this page as the canonical URL and crawls this page most often and the others far less. After all, you don't want Googlebot consuming all of your server's resources by crawling the same content repeatedly. The canonical URL is also what is shown in search results.
You can choose to explicitly signify your preferred canonical URL to Google using any of the of the following methods:
Additionally, following other guidelines that Google recommends for optimal SEO for your preferred URL , such as using HTTPS with a valid SSL, will help Google to know that you prefer this page over any other duplicates that may not have SSL. Also, don't misuse other tools to try to indicate the canonical URL (e.g., the robots.txt file, noindex directives, the URL removal tool as it removes all versions of a URL from search, etc.). Do link consistently to your preferred URL rather than a duplicate URL when linking throughout your site.
Do note, though, that Google may select a different canonical URL, despite explicitly setting one:
Use the URL Inspection tool 8) to learn which page Google considers canonical. Note that even if you explicitly designate a canonical page, Google might choose a different canonical for various reasons, such as performance or content.
Another option proposed via the cPanel forums 9) is the following, which would allow one to process the mail subdomain for a SSL certificate, but also not permit it be used to serve duplicate content.
As a workaround, you could manually remove the serveralias entry for mail from the Apache configuration using the instructions on the following post:
Mail Subdomain added as alias to main domain in httpd.conf 10)
Then, remove the DNS entry for "mail" from this domain name using "WHM » Edit DNS Zone", and add "mail.domain.tld" as a subdomain to the cPanel account using the "Subdomains" option. Once you do this, AutoSSL should still work for the mail subdomain, and you can upload a custom index page to display the content you prefer to load when someone opens "mail.domain.tld" in their web browser.
Rather than adding a custom index page, you could alternatively add a 301 redirect to the main site.
This option can be accomplished via an .htaccess file easily, however this isn't recommend by Google and thus should avoided:
Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Search Console. 11)