![]() |
|
gbBotDetectPlugin - 1.0.0Detect search bots/crawler and spam bot. |
|
The gbBotDetectPlugin is a symfony plugin that provides bots detection facility of web requests.
WARNING:: Since version 1.0.0 gbBotDetectPlugin is not backward compatible. The most important changes are:
Please read on for more information.
1.1 Using the Symfony plugin installation task:
./symfony plugin:install gbBotDetectPlugin
1.2 Using the svn method
cd plugins
svn co http://svn.symfony-project.com/plugins/gbBotDetectPlugin
2.Enable the plugin into your ProjectConfiguration
Edit your application config/ProjectConfiguration.class.php to enable the gbBotDetect plugin, and add the line below in the setup function (if not automatically added by the install task)
$this->enablePlugins('gbBotDetectPlugin');
gbBotDetect plugin can be configured in two levels:
The app.yml -global or application- configuration file, used for regular configuration
The bot_detect_factories.yml global configuration, used for advanced configuration
app.ymlThe conifuration directives in app.yml, are specified under the gbBotDetectPlugin key. The following configuration directives are supported in the app.yml:
all:
gbBotDetectPlugin:
listtype: basic
Explanations:
For performance reasons, 2 built-in types of bot list exist, the basic (default), which has a small list of bots and the extended, which has a large list of bots.
To modify the list type to the extended, write at the app.yml:
all:
gbBotDetectPlugin:
listtype: extended
It is also possible to add your custom list types by just creating a <fileprefix><custom_list_type_name>.yml file at the <basedir> and specifying the listtype to "custom_list_type_name" at the app.yml.
Where <fileprefix>="bots.", <basedir>="data/" (global), but they can be configured (see Advanced configuration section below).
For example if you want to define a mylist list type, create a file bots.mylist.yml under data/ dir (in you application symfony root) and specify at app.yml :
all:
gbBotDetectPlugin:
listtype: mylist
There two basic ways to use the gbBotDetect facility:
the sfRequest object, from an action
the sfContext object, from filters, or anywhere else
The following methods are available as extensions to the sfRequest object:
$request->isBot()
Will return true if the user is a know bot, and false otherwise.
$request->whatBot()
Will return the recognized bot id (as specified through the bots.*.yml id key), or false when not found.
The same methods are available in a template by using the symfony $sf_request template variable.
The bot detect facility is available, thought the sfContext singleton object, to any place other than actions (In actions you should always use sfRequest object). Also note that sfContext::getInstance() should, in general, be avoided; sfContext object should be obtained by local object getters when appropriate. For example to get the context in a filter use $this->getContext().
To get an instance of gbBotDetect object:
$context->getBotDetect()
The following utility methods are provided by gbBotDetect class:
$gbBotDetect->whatBot($useragent, $ip, $type = null)
Will return the bot id or false if not found, for the provided $useragent string, $ip address and bot list $type (or the listtype from app config, when type not provided)
$gbBotDetect->getMeta($botid)
Will return an associative array of bot meta data as specified in the bot list. Note that the bot meta is not mandatory for every bot definition. See Bot list definition section bellow.
The gbBotDetect plugin manages it's advanced configuration through the mechanism of symfony factories. This provided more flexibility than the app.yml method, like using configurable classes for gbBotDetect object. Moreover it features a standard way of config caching that is both efficient and fast.
In order to configure the bot_detect factory, copy the plugin_dir/config/bot_detect_factories.yml to the global or application config/ directory and specify the wanted configuration entries.
The configuration entries and their default values are:
all:
bot_detect:
class: gbBotDetectFile
param:
basedir: data
fileprefix: 'bots.'
defaultmatch: patterni
Explanations:
PREFIX TYPE.ymlThe format of the bots definitions list is a yaml file with the following entries:
Bots:
%botId%:
agent: %matchstring%
ip: [can be null]
match: [optional] regexp (default), regexpi (case insensitive), exact, pattern (*,? meta chars), patterni (case insensitive)
meta: [optional, also all subfields optional and arbitrary]
url: %boturl%
co: %botcompany%
co_url: %company url%
type: Bot|Crawler|....