CSF blocking googlebot / google site indexing

Member for

1 month
By admin , 25 April 2025

We had a problem on one of our webservers whereby google was refusing to index the site even though our robots.txt was correct. Upon investigation we found that the booglebot requests were being blocked (403) by our trusty CSF firewall. 

If you have the same problem you could apply the fix we did.

edit your /etc/csf/csf.rignore and make sure to uncomment/add the below line

.googlebot.com

then create /opt/allowgooglebotincsf.py which is a small python script that will query google for it's range of googlebot IP's and then add this list to your  /etc/csf/csf.ignore file. This allows the googlebot IP's to bypass the LFD checks of CSF.

#!/usr/bin/env python3
import json, os, requests, shutil
r = requests.get('https://developers.google.com/search/apis/ipranges/googlebot.json')
if ips := [p['ipv4Prefix'] for p in r.json()['prefixes'] if 'ipv4Prefix' in p]:
   shutil.copy('/etc/csf/csf.ignore', '/etc/csf/csf.ignore.bak')
   with open('/etc/csf/csf.ignore', 'r+') as f:
       lines = [l for l in f if not l.strip().startswith(('# Googlebot IP ranges', '# End Googlebot IP ranges'))]
       f.seek(0); f.writelines(lines[:-1] + ['# Googlebot IP ranges\n'] + [ip + '\n' for ip in ips] + ['# End Googlebot IP ranges\n'])
       f.truncate()
   os.system('csf -r')
 

Make it executable (chmod +x) and cron it, for example to update at midnight each evening:

#allow googlebot by updatinbg csf.ignore from google's IP ranges
00 00 * * * /opt/allowgooglebotincsf.py > /dev/null 2>&1
 

Afterwards you can go to your google search console and re-request a site index or re-submit your site's xml sitemap to force a fresh crawl :)