gbBotDetectPlugin - 1.0.0

Detect search bots/crawler and spam bot.

You are currently browsing
the website for symfony 1

Visit the Symfony2 website


« Back to the Plugins Home

Signin


Forgot your password?
Create an account

Tools

Stats

advanced search
Information Readme Releases Changelog Contribute
Show source

gbBotDetectPlugin plugin

Overview

The gbBotDetectPlugin is a symfony plugin that provides bots detection facility of web requests.

WARNING:: Since version 1.0.0 gbBotDetectPlugin is not backward compatible. The most important changes are:

  • The facility is provided at actions from the sfRequest object and not from sfUser object.
  • The configuration directives have been updated.

Please read on for more information.

Installation

1.1 Using the Symfony plugin installation task:

./symfony plugin:install gbBotDetectPlugin

1.2 Using the svn method

cd plugins
svn co http://svn.symfony-project.com/plugins/gbBotDetectPlugin

2.Enable the plugin into your ProjectConfiguration

Edit your application config/ProjectConfiguration.class.php to enable the gbBotDetect plugin, and add the line below in the setup function (if not automatically added by the install task)

$this->enablePlugins('gbBotDetectPlugin');

Configuration

gbBotDetect plugin can be configured in two levels:

  • The app.yml -global or application- configuration file, used for regular configuration

  • The bot_detect_factories.yml global configuration, used for advanced configuration

Regular configuration - app.yml

The conifuration directives in app.yml, are specified under the gbBotDetectPlugin key. The following configuration directives are supported in the app.yml:

all:
  gbBotDetectPlugin:
    listtype: basic

Explanations:

  • listtype - The list type to use when searching for bots. See Bot list types section, below.

Bot list types

For performance reasons, 2 built-in types of bot list exist, the basic (default), which has a small list of bots and the extended, which has a large list of bots.

To modify the list type to the extended, write at the app.yml:

all:
  gbBotDetectPlugin:
    listtype: extended

It is also possible to add your custom list types by just creating a <fileprefix><custom_list_type_name>.yml file at the <basedir> and specifying the listtype to "custom_list_type_name" at the app.yml. Where <fileprefix>="bots.", <basedir>="data/" (global), but they can be configured (see Advanced configuration section below).

For example if you want to define a mylist list type, create a file bots.mylist.yml under data/ dir (in you application symfony root) and specify at app.yml :

all:
  gbBotDetectPlugin:
    listtype: mylist

Usage

There two basic ways to use the gbBotDetect facility:

  • the sfRequest object, from an action

  • the sfContext object, from filters, or anywhere else

sfRequest object

The following methods are available as extensions to the sfRequest object:

$request->isBot()

Will return true if the user is a know bot, and false otherwise.

$request->whatBot()

Will return the recognized bot id (as specified through the bots.*.yml id key), or false when not found.

The same methods are available in a template by using the symfony $sf_request template variable.

sfContext object

The bot detect facility is available, thought the sfContext singleton object, to any place other than actions (In actions you should always use sfRequest object). Also note that sfContext::getInstance() should, in general, be avoided; sfContext object should be obtained by local object getters when appropriate. For example to get the context in a filter use $this->getContext().

To get an instance of gbBotDetect object:

$context->getBotDetect()

The following utility methods are provided by gbBotDetect class:

$gbBotDetect->whatBot($useragent, $ip, $type = null)

Will return the bot id or false if not found, for the provided $useragent string, $ip address and bot list $type (or the listtype from app config, when type not provided)

$gbBotDetect->getMeta($botid)

Will return an associative array of bot meta data as specified in the bot list. Note that the bot meta is not mandatory for every bot definition. See Bot list definition section bellow.

Advanced configuration - bot_detect_factories.yml

The gbBotDetect plugin manages it's advanced configuration through the mechanism of symfony factories. This provided more flexibility than the app.yml method, like using configurable classes for gbBotDetect object. Moreover it features a standard way of config caching that is both efficient and fast.

In order to configure the bot_detect factory, copy the plugin_dir/config/bot_detect_factories.yml to the global or application config/ directory and specify the wanted configuration entries.

The configuration entries and their default values are:

all:
  bot_detect:
    class: gbBotDetectFile
    param:
      basedir: data
      fileprefix: 'bots.'
      defaultmatch: patterni

Explanations:

  • class - The gbBotDetect backend class. For the moment only file backend is supported.
  • basedir - The base dir relative to sf_root_dir where the bot definition files live. This could also be a full path. Configuration variable substitution is also done (e.g. %%SF_ROOT_DIR%%)
  • fileprefix - The prefix for the bot definition filenames. The final name will be in the format PREFIX TYPE.yml
  • defaultmatch - The default match operator for the bot list entries, when one is not specified. See Bot list definition for supported operators.

Bot list definition

The format of the bots definitions list is a yaml file with the following entries:

Bots:
  %botId%:
    agent: %matchstring%
    ip: [can be null]
    match: [optional] regexp (default), regexpi (case insensitive), exact, pattern (*,? meta chars), patterni (case insensitive)
    meta: [optional, also all subfields optional and arbitrary]
        url: %boturl%
        co: %botcompany%
        co_url: %company url%
        type: Bot|Crawler|....

TODO (28/02/2012)

  • Add a update bots list task, possibly read from http://user-agent-string.info/ file format, and/or Add UASParser convertion script
  • Add Database version with Administration Module
  • Add database backend with import, export tasks