獲取最新惡意爬蟲列表配置 Fail2ban Filter

secure Nginx with fail2ban badbots filter

安裝並設置好 Fail2ban 後(可以參考Y Cheung 之前寫的『Fail2ban 配置 Nginx filter』),可以看到在 /etc/fail2ban/filter.d/nginx-badbots.conf 內容如下:

# Fail2Ban configuration file
#
# Regexp to catch known spambots and software alike. Please verify
# that it is your intent to block IPs which were driven by
# above mentioned bots.

[Definition]

badbotscustom = EmailCollector|WebEMailExtrac|TrackBack/1\.02|sogou music spider|(?:Mozilla/\d+\.\d+ )?Jorgee

badbots = Atomic_Email_Hunter/4\.0|atSpider/1\.0|autoemailspider|bwh3_user_agent|China Local Browse 2\.6|ContactBot/0\.2|ContentSmartz|DataCha0s/2\.0|DBrowse 1\.4b|DBrowse 1\.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1\.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1\.00|ESurf15a 15|ExtractorPro|Franklin Locator 1\.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|Guestbook Auto Submitter|Industry Program 1\.0\.x|ISC Systems iRc Search 2\.1|IUPUI Research Bot v 1\.9a|LARBIN-EXPERIMENTAL \([email protected]\.net\)|LetsCrawl\.com/1\.0 \+http\://letscrawl\.com/|Lincoln State Web Browser|LMQueueBot/0\.2|LWP\:\:Simple/5\.803|Mac Finder 1\.0\.xx|MFC Foundation Class Library 4\.0|Microsoft URL Control - 6\.00\.8xxx|Missauga Locate 1\.0\.0|Missigua Locator 1\.9|Missouri College Browse|Mizzu Labs 2\.2|Mo College 1\.9|MVAClient|Mozilla/2\.0 \(compatible; NEWT ActiveX; Win32\)|Mozilla/3\.0 \(compatible; Indy Library\)|Mozilla/3\.0 \(compatible; scan4mail \(advanced version\) http\://www\.peterspages\.net/?scan4mail\)|Mozilla/4\.0 \(compatible; Advanced Email Extractor v2\.xx\)|Mozilla/4\.0 \(compatible; Iplexx Spider/1\.0 http\://www\.iplexx\.at\)|Mozilla/4\.0 \(compatible; MSIE 5\.0; Windows NT; DigExt; DTS Agent|Mozilla/4\.0 [email protected]\.net|Mozilla/5\.0 \(Version\: xxxx Type\:xx\)|NameOfAgent \(CMS Spider\)|NASA Search 1\.0|Nsauditor/1\.x|PBrowse 1\.4b|PEval 1\.4b|Poirot|Port Huron Labs|Production Bot 0116B|Production Bot 2016B|Production Bot DOT 3016B|Program Shareware 1\.0\.2|PSurf15a 11|PSurf15a 51|PSurf15a VA|psycheclone|RSurf15a 41|RSurf15a 51|RSurf15a 81|searchbot [email protected]\.com|ShablastBot 1\.0|snap\.com beta crawler v0|Snapbot/1\.0|Snapbot/1\.0 \(Snap Shots, \+http\://www\.snap\.com\)|sogou develop spider|Sogou Orion spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sogou spider|Sogou web spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sohu agent|SSurf15a 11 |TSurf15a 11|Under the Rainbow 2\.2|User-Agent\: Mozilla/4\.0 \(compatible; MSIE 6\.0; Windows NT 5\.1\)|VadixBot|WebVulnCrawl\.unknown/1\.0 libwww-perl/5\.803|Wells Search II|WEP Search 00

failregex = ^<HOST> -.*"(GET|POST|HEAD).*HTTP.*"(?:%(badbots)s|%(badbotscustom)s)"$

ignoreregex =

datepattern = {^LN-BEG}%%ExY(?P<_sep>[-/.])%%m(?P=_sep)%%d[T ]%%H:%%M:%%S(?:[.,]%%f)?(?:\s*%%z)?
              ^[^\[]*\[({DATE})
              {^LN-BEG}
      
# DEV Notes:
# List of bad bots fetched from http://www.user-agents.org
# Generated on Thu Nov  7 14:23:35 PST 2013 by files/gen_badbots.
#
# Author: Yaroslav Halchenko
/etc/fail2ban/filter.d/nginx-badbots.conf

如果沒有找到這個文件,請新建一個,或者將apache-badbots.conf 拷貝並重命名。

可以看到,這個文件裡的badbots 列表很陳舊了,該列表最後更新也是差不多十年前了,已經不能阻擋猶如雨後春筍般的spam bots attack了。

因此本文的重點在於修改這個filter文件的 badbots 參數值。

Y Cheung 參考 『apache-badbots -- update? #1950』中的討論,實踐總結出以下設置步驟。

1.使用以下命令獲取最新惡意爬蟲列表生成 badbots 值。

wget -q -O- "https://raw.githubusercontent.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/master/_generator_lists/bad-user-agents.list" | uniq | sed -e 's/\\\\ / /g' | sed -e 's/\./\\\./g' | tr '\n' '|'

解釋一下這條命令:

wget -q -O- "https://..." 意思是靜默下載鏈接並輸出/dev/null

uniq 意思是忽略重複的行。

sed -e 's/\\\\ / /g' 意思是將 \ 和一個空格都替換成一個空格,也就是刪除 \

sed -e 's/./\./g' 意思是將 . 都替換成 \. 也就是在點號 . 前增加一個 \

tr '\n' '|' 意思是將換行用  | 連接。

執行完命令後,在屏幕上你會看到以下輸出:

01h4x\.com|360Spider|404checker|404enemy|80legs|ADmantX|AIBOT|ALittle Client|ASPSeek|Abonti|Aboundex|Aboundexbot|Acunetix|AfD-Verbotsverfahren|AhrefsBot|AiHitBot|Aipbot|Alexibot|AllSubmitter|Alligator|AlphaBot|Anarchie|Anarchy|Anarchy99|Ankit|Anthill|Apexoo|Aspiegel|Asterias|Attach|AwarioRssBot|AwarioSmartBot|BBBike|BDCbot|BDFetch|BLEXBot|BackDoorBot|BackStreet|BackWeb|Backlink-Ceck|BacklinkCrawler|Badass|Bandit|Barkrowler|BatchFTP|Battleztar Bazinga|BetaBot|Bigfoot|Bitacle|BlackWidow|Black Hole|Blackboard|Blow|BlowFish|Boardreader|Bolt|BotALot|Brandprotect|Brandwatch|Buck|Buddy|BuiltBotTough|BuiltWith|Bullseye|BunnySlippers|BuzzSumo|CATExplorador|CCBot|CODE87|CSHttp|Calculon|CazoodleBot|Cegbfeieh|CensysInspect|CheTeam|CheeseBot|CherryPicker|ChinaClaw|Chlooe|Claritybot|Cliqzbot|Cloud mapping|Cocolyzebot|Cogentbot|Collector|Copier|CopyRightCheck|Copyscape|Cosmos|Craftbot|Crawling at Home Project|CrazyWebCrawler|Crescent|CrunchBot|Curious|Custo|CyotekWebCopy|DBLBot|DIIbot|DSearch|DTS Agent|DataCha0s|DatabaseDriverMysqli|Demon|Deusu|Devil|Digincore|DigitalPebble|Dirbuster|Disco|Discobot|Discoverybot|Dispatch|DittoSpyder|DnBCrawler-Analytics|DnyzBot|DomCopBot|DomainAppender|DomainCrawler|DomainSigmaCrawler|DomainStatsBot|Domains Project|Dotbot|Download Wonder|Dragonfly|Drip|ECCP/1\.0|EMail Siphon|EMail Wolf|EasyDL|Ebingbong|Ecxi|EirGrabber|EroCrawler|Evil|Exabot|Express WebPictures|ExtLinksBot|Extractor|ExtractorPro|Extreme Picture Finder|EyeNetIE|Ezooms|FDM|FHscan|FemtosearchBot|Fimap|Firefox/7\.0|FlashGet|Flunky|Foobot|Freeuploader|FrontPage|Fuzz|FyberSpider|Fyrebot|G-i-g-a-b-o-t|GT::WWW|GalaxyBot|Genieo|GermCrawler|GetRight|GetWeb|Getintent|Gigabot|Go!Zilla|Go-Ahead-Got-It|GoZilla|Gotit|GrabNet|Grabber|Grafula|GrapeFX|GrapeshotCrawler|GridBot|HEADMasterSEO|HMView|HTMLparser|HTTP::Lite|HTTrack|Haansoft|HaosouSpider|Harvest|Havij|Heritrix|Hloader|Humanlinks|HybridBot|IDBTE4M|IDBot|IRLbot|Iblog|Id-search|IlseBot|Image Fetch|Image Sucker|IndeedBot|Indy Library|InfoNaviRobot|InfoTekies|Intelliseek|InterGET|InternetSeer|Internet Ninja|Iria|Iskanie|IstellaBot|JOC Web Spider|JamesBOT|Jbrofuzz|JennyBot|JetCar|Jetty|JikeSpider|Joomla|Jorgee|JustView|Jyxobot|Kenjin Spider|Keyword Density|Kinza|Kozmosbot|LNSpiderguy|LWP::Simple|Lanshanbot|Larbin|Leap|LeechFTP|LeechGet|LexiBot|Lftp|LibWeb|Libwhisker|LieBaoFast|Lightspeedsystems|Likse|LinkScan|LinkWalker|Linkbot|Linkdexbot|LinkextractorPro|LinkpadBot|LinksManager|LinqiaMetadataDownloaderBot|LinqiaRSSBot|LinqiaScrapeBot|Lipperhey|Lipperhey Spider|Litemage_walker|Lmspider|Ltx71|MFC_Tear_Sample|MIDown tool|MIIxpc|MJ12bot|MQQBrowser|MSFrontPage|MSIECrawler|MTRobot|Mag-Net|Magnet|Mail\.RU_Bot|Majestic-SEO|Majestic12|Majestic SEO|MarkMonitor|MarkWatch|Mass Downloader|Masscan|Mata Hari|MauiBot|Mb2345Browser|MeanPath Bot|Meanpathbot|Mediatoolkitbot|MegaIndex\.ru|Metauri|MicroMessenger|Microsoft Data Access|Microsoft URL Control|Mister PiX|Moblie Safari|Mojeek|Mojolicious|Morfeus Fucking Scanner|Mozlila|Mr\.4x3|Msrabot|Musobot|NICErsPRO|NPbot|Name Intelligence|Nameprotect|Navroad|NearSite|Needle|Nessus|NetAnts|NetLyzer|NetMechanic|NetSpider|NetZIP|Net Vampire|Netcraft|Nettrack|Netvibes|NextGenSearchBot|Nibbler|Niki-bot|Nikto|NimbleCrawler|Nimbostratus|Ninja|Nmap|Nuclei|Nutch|Octopus|Offline Explorer|Offline Navigator|OnCrawl|OpenLinkProfiler|OpenVAS|Openfind|Openvas|OrangeBot|OrangeSpider|OutclicksBot|OutfoxBot|PECL::HTTP|PHPCrawl|POE-Component-Client-HTTP|PageAnalyzer|PageGrabber|PageScorer|PageThing\.com|Page Analyzer|Pandalytics|Panscient|Papa Foto|Pavuk|PeoplePal|Petalbot|Pi-Monster|Picscout|Picsearch|PictureFinder|Piepmatz|Pimonster|Pixray|PleaseCrawl|Pockey|ProPowerBot|ProWebWalker|Probethenet|Psbot|Pu_iN|Pump|PxBroker|PyCurl|QueryN Metasearch|Quick-Crawler|RSSingBot|RankActive|RankActiveLinkBot|RankFlex|RankingBot|RankingBot2|Rankivabot|RankurBot|Re-re|ReGet|RealDownload|Reaper|RebelMouse|Recorder|RedesScrapy|RepoMonkey|Ripper|RocketCrawler|Rogerbot|SBIder|SEOkicks|SEOkicks-Robot|SEOlyticsCrawler|SEOprofiler|SEOstats|SISTRIX|SMTBot|SalesIntelligent|ScanAlert|Scanbot|ScoutJet|Scrapy|Screaming|ScreenerBot|ScrepyBot|Searchestate|SearchmetricsBot|SemanticJuice|Semrush|SemrushBot|SentiBot|SeoSiteCheckup|SeobilityBot|Seomoz|Shodan|Siphon|SiteCheckerBotCrawler|SiteExplorer|SiteLockSpider|SiteSnagger|SiteSucker|Site Sucker|Sitebeam|Siteimprove|Sitevigil|SlySearch|SmartDownload|Snake|Snapbot|Snoopy|SocialRankIOBot|Sociscraper|Sogou web spider|Sosospider|Sottopop|SpaceBison|Spammen|SpankBot|Spanner|Spbot|Spinn3r|SputnikBot|Sqlmap|Sqlworm|Sqworm|Steeler|Stripper|Sucker|Sucuri|SuperBot|SuperHTTP|Surfbot|SurveyBot|Suzuran|Swiftbot|Szukacz|T0PHackTeam|T8Abot|Teleport|TeleportPro|Telesoft|Telesphoreo|Telesphorep|TheNomad|The Intraformant|Thumbor|TightTwatBot|Titan|Toata|Toweyabot|Tracemyfile|Trendiction|Trendictionbot|True_Robot|Turingos|Turnitin|TurnitinBot|TwengaBot|Twice|Typhoeus|URLy\.Warning|URLy Warning|UnisterBot|Upflow|V-BOT|VB Project|VCI|Vacuum|Vagabondo|VelenPublicWebCrawler|VeriCiteCrawler|VidibleScraper|Virusdie|VoidEYE|Voil|Voltron|WASALive-Bot|WBSearchBot|WEBDAV|WISENutbot|WPScan|WWW-Collector-E|WWW-Mechanize|WWW::Mechanize|WWWOFFLE|Wallpapers|Wallpapers/3\.0|WallpapersHD|WeSEE|WebAuto|WebBandit|WebCollage|WebCopier|WebEnhancer|WebFetch|WebFuck|WebGo IS|WebImageCollector|WebLeacher|WebPix|WebReaper|WebSauger|WebStripper|WebSucker|WebWhacker|WebZIP|Web Auto|Web Collage|Web Enhancer|Web Fetch|Web Fuck|Web Pix|Web Sauger|Web Sucker|Webalta|WebmasterWorldForumBot|Webshag|WebsiteExtractor|WebsiteQuester|Website Quester|Webster|Whack|Whacker|Whatweb|Who\.is Bot|Widow|WinHTTrack|WiseGuys Robot|Wonderbot|Woobot|Wotbox|Wprecon|Xaldon WebSpider|Xaldon_WebSpider|Xenu|YoudaoBot|Zade|Zauba|Zermelo|Zeus|Zitebot|ZmEu|ZoomBot|ZoominfoBot|ZumBot|ZyBorg|adscanner|archive\.org_bot|arquivo-web-crawler|arquivo\.pt|autoemailspider|backlink-check|cah\.io\.community|check1\.exe|clark-crawler|coccocbot-web|cognitiveseo|com\.plumanalytics|crawl\.sogou\.com|crawler\.feedback|crawler4j|dataforseo\.com|demandbase-bot|domainsproject\.org|eCatch|evc-batch|facebookscraper|gopher|heritrix|instabid|internetVista monitor|ips-agent|isitwp\.com|lwp-request|lwp-trivial|magpie-crawler|meanpathbot|mediawords|muhstik-scan|netEstate NE Crawler|oBot|page scorer|pcBrowser|plumanalytics|polaris version|probe-image-size|ripz|s1z\.ru|satoristudio\.net|scalaj-http|scan\.lol|seobility|seocompany\.store|seoscanners|seostar|serpstatbot|sexsearcher|sitechecker\.pro|siteripz|sogouspider|sp_auditbot|spyfu|sysscan|tAkeOut|trendiction\.com|trendiction\.de|ubermetrics-technologies\.com|voyagerx\.com|webmeup-crawler|webpros\.com|webprosbot|x09Mozilla|x22Mozilla|xpymep1\.exe|zauba\.io|zgrab|

將這一大串複製粘貼至 /etc/fail2ban/filter.d/nginx-badbots.conf 文件中的 batbots = 後面,作為它的值。

2. 修改匹配正則

failregex = ^<HOST> -.*"(GET|POST|HEAD).*HTTP.*(?:%(badbotscustom)s|%(badbots)s).*"$

3. 保存文件並退出編輯

4. 查看確認jail配置文件

[nginx-badbots]
#屏蔽恶意爬虫
enabled  = true
port    = http,https
filter  = nginx-badbots
action = iptables-multiport[name=BadBots, port="http,https"]
logpath  = /var/log/nginx/*access.log
           /var/log/nginx/*error.log
maxretry = 1
bantime  = 604800
findtime = 604800
/etc/fail2ban/jail.d/nginx.conf

5.重啟 Fail2ban 客戶端

fail2ban-client restart	

延伸閱讀:

Y Cheung
Blogger, Programer & Traveler.
Shanghai