gbBotDetectPlugin plugin
Overview
The gbBotDetectPlugin is a symfony plugin that provides bots detection facility of web requests.
WARNING:: Since version 1.0.0 gbBotDetectPlugin is not backward compatible. The most important changes are:
- The facility is provided at actions from the sfRequest object and not from sfUser object.
- The configuration directives have been updated.
Please read on for more information.
Installation
1.1 Using the Symfony plugin installation task:
./symfony plugin:install gbBotDetectPlugin
1.2 Using the svn method
cd plugins
svn co http://svn.symfony-project.com/plugins/gbBotDetectPlugin
2.Enable the plugin into your ProjectConfiguration
Edit your application config/ProjectConfiguration.class.php to enable the gbBotDetect plugin, and add the line below in the setup function (if not automatically added by the install task)
$this->enablePlugins('gbBotDetectPlugin');
Configuration
gbBotDetect plugin can be configured in two levels:
The app.yml -global or application- configuration file, used for regular configuration
The bot_detect_factories.yml global configuration, used for advanced configuration
Regular configuration - app.yml
The conifuration directives in app.yml, are specified under the gbBotDetectPlugin key. The following configuration directives are supported in the app.yml:
all:
gbBotDetectPlugin:
listtype: basic
Explanations:
- listtype - The list type to use when searching for bots. See Bot list types section, below.
Bot list types
For performance reasons, 2 built-in types of bot list exist, the basic (default), which has a small list of bots and the extended, which has a large list of bots.
To modify the list type to the extended, write at the app.yml:
all:
gbBotDetectPlugin:
listtype: extended
It is also possible to add your custom list types by just creating a <fileprefix><custom_list_type_name>.yml file at the <basedir> and specifying the listtype to "custom_list_type_name" at the app.yml.
Where <fileprefix>="bots.", <basedir>="data/" (global), but they can be configured (see Advanced configuration section below).
For example if you want to define a mylist list type, create a file bots.mylist.yml under data/ dir (in you application symfony root) and specify at app.yml :
all:
gbBotDetectPlugin:
listtype: mylist
Usage
There two basic ways to use the gbBotDetect facility:
the sfRequest object, from an action
the sfContext object, from filters, or anywhere else
sfRequest object
The following methods are available as extensions to the sfRequest object:
$request->isBot()
Will return true if the user is a know bot, and false otherwise.
$request->whatBot()
Will return the recognized bot id (as specified through the bots.*.yml id key), or false when not found.
The same methods are available in a template by using the symfony $sf_request template variable.
sfContext object
The bot detect facility is available, thought the sfContext singleton object, to any place other than actions (In actions you should always use sfRequest object). Also note that sfContext::getInstance() should, in general, be avoided; sfContext object should be obtained by local object getters when appropriate. For example to get the context in a filter use $this->getContext().
To get an instance of gbBotDetect object:
$context->getBotDetect()
The following utility methods are provided by gbBotDetect class:
$gbBotDetect->whatBot($useragent, $ip, $type = null)
Will return the bot id or false if not found, for the provided $useragent string, $ip address and bot list $type (or the listtype from app config, when type not provided)
$gbBotDetect->getMeta($botid)
Will return an associative array of bot meta data as specified in the bot list. Note that the bot meta is not mandatory for every bot definition. See Bot list definition section bellow.
Advanced configuration - bot_detect_factories.yml
The gbBotDetect plugin manages it's advanced configuration through the mechanism of symfony factories. This provided more flexibility than the app.yml method, like using configurable classes for gbBotDetect object. Moreover it features a standard way of config caching that is both efficient and fast.
In order to configure the bot_detect factory, copy the plugin_dir/config/bot_detect_factories.yml to the global or application config/ directory and specify the wanted configuration entries.
The configuration entries and their default values are:
all:
bot_detect:
class: gbBotDetectFile
param:
basedir: data
fileprefix: 'bots.'
defaultmatch: patterni
Explanations:
- class - The gbBotDetect backend class. For the moment only file backend is supported.
- basedir - The base dir relative to sf_root_dir where the bot definition files live. This could also be a full path. Configuration variable substitution is also done (e.g. %%SF_ROOT_DIR%%)
- fileprefix - The prefix for the bot definition filenames. The final name will be in the format
PREFIX TYPE.yml
- defaultmatch - The default match operator for the bot list entries, when one is not specified. See Bot list definition for supported operators.
Bot list definition
The format of the bots definitions list is a yaml file with the following entries:
Bots:
%botId%:
agent: %matchstring%
ip: [can be null]
match: [optional] regexp (default), regexpi (case insensitive), exact, pattern (*,? meta chars), patterni (case insensitive)
meta: [optional, also all subfields optional and arbitrary]
url: %boturl%
co: %botcompany%
co_url: %company url%
type: Bot|Crawler|....
TODO (28/02/2012)
- Add a update bots list task, possibly read from http://user-agent-string.info/ file format, and/or Add UASParser convertion script
- Add Database version with Administration Module
- Add database backend with import, export tasks