CLUSTERER Module

Marius Cristian Eseanu

   OpenSIPS Solutions

   Copyright © 2015 www.opensips-solutions.com
     __________________________________________________________

   Table of Contents

   1. Admin Guide

        1.1. Overview
        1.2. Dependencies

              1.2.1. OpenSIPS Modules
              1.2.2. External Libraries or Applications

        1.3. Exported Parameters

              1.3.1. db_url
              1.3.2. db_table
              1.3.3. server_id
              1.3.4. persistent_mode
              1.3.5. cluster_id_col
              1.3.6. machine_id_col
              1.3.7. clusterer_id_col
              1.3.8. state_col
              1.3.9. url_col
              1.3.10. description_col
              1.3.11. failed_attempts_col
              1.3.12. last_attempt_col
              1.3.13. duration_col
              1.3.14. no_tries_col

        1.4. Exported Functions
        1.5. Exported MI Functions

              1.5.1. clusterer_reload
              1.5.2. clusterer_list
              1.5.3. clusterer_set_status

        1.6. Usage Example

   2. Developer Guide

        2.1. Available Functions

              2.1.1. get_nodes(cluster_id, proto)
              2.1.2. free_nodes(nodes)
              2.1.3. set_state(cluster_id, machine_id, state,
                      proto)

              2.1.4. check(cluster_id, sockaddr, server_id, proto)

              2.1.5. get_my_id()
              2.1.6. send_to(cluster_id, protocol)
              2.1.7. register_module(module_name, protocol,
                      callback_function, timeout, auth_check,
                      cluster_id)

   List of Examples

   1.1. Set db_url parameter
   1.2. Set db_table parameter
   1.3. Set server_id parameter
   1.4. Set persistent_mode parameter
   1.5. Set cluster_id_col parameter
   1.6. Set machine_id_col parameter
   1.7. Set clusterer_id_col parameter
   1.8. Set state_col parameter
   1.9. Set url_col parameter
   1.10. Set description_col parameter
   1.11. Set failed_attempts_col parameter
   1.12. Set last_attempt_col parameter
   1.13. Set duration_col parameter
   1.14. Set no_tries_col parameter
   1.15. Example database content - clusterer table
   1.16. Node A configuration
   1.17. Node B configuration
   2.1. get_nodes usage
   2.2. free_nodes usage
   2.3. set_state usage
   2.4. check usage
   2.5. get_my_id usage
   2.6. send_to usage
   2.7. register_module usage

Chapter 1. Admin Guide

1.1. Overview

   The clusterer module is used to organize multiple OpenSIPS
   instances into groups that can communicate with each other in
   order to replicate, share information or perform distributed
   tasks. The module itself only stores information about the
   nodes in a group/cluster and provides an interface to check or
   tune their state and parameters. The distributed logic is
   performed by different modules that use this interface (i.e.
   the dialog module can replicate profiles, the ratelimit module
   can share pipes across multiple instances, etc). Provisioning
   the nodes within a cluster is done over the database but, for
   efficiency, the node-related information is cached into
   OpenSIPS memory. This information can be checked or updated by
   sending commands over the MI interface.

   The clusterer module can also detect node availability, by
   using certain parameters provisioned in the database. When a
   destination is not reachable, it is put in a probing state - it
   is periodically pinged until a maximum number of failed
   attempts is reached, when it is marked as temporarily disabled.
   It stays in this state for a period (equal to the duration
   parameter), and then the number of retries reset to 0 and the
   node is considered up again.

   Modules (like dialog or ratelimit can use nodes within the
   cluster to replicate information. They also register a specific
   timeout to invalidate data from specific nodes, in case no
   updates have been within an interval. The clusterer notifies
   the module if the timeout is reached and puts the node in a
   temporary disabled state. If a packet has arrived for a
   temporary disabled server, the packet is dropped and a
   temporary disabled notification is sent to the registered
   module. After the disabled period (2 * timeout) has passed, the
   server is up again.

   By default, the state information of the nodes is not
   persistent. To make them persistent via a database, one must
   set the persistent_mode parameter.

1.2. Dependencies

1.2.1. OpenSIPS Modules

   The following modules must be loaded before this module:
     * a database module.

1.2.2. External Libraries or Applications

   The following libraries or applications must be installed
   before running OpenSIPS with this module loaded:
     * None.

1.3. Exported Parameters

1.3.1. db_url

   The database url.

   Default value is “NULL”.

   Example 1.1. Set db_url parameter
...
modparam("clusterer", "db_url",
        "mysql://opensips:opensipsrw@localhost/opensips")
...

1.3.2. db_table

   The name of the table storing the clustering information.

   Default value is “clusterer”.

   Example 1.2. Set db_table parameter
...
modparam("clusterer", "db_table", "clusterer")
...

1.3.3. server_id

   It specifies the server_id the current instance has. this field
   should correspond with one of the machine_id fields in the
   database.

   Default value is 0.

   Example 1.3. Set server_id parameter
...
modparam("clusterer", "server_id", 2)
...

1.3.4. persistent_mode

   If persistent mode is enabled, a timer synchronizes the
   information used by the clusterer module and the information
   stored in the database.

   Default value is 0 (disabled).

   Example 1.4. Set persistent_mode parameter
...
modparam("clusterer", "persistent_mode", 1)
...

1.3.5. cluster_id_col

   The name of the column in the db_table where the cluster_id is
   stored.

   Default value is “cluster_id”.

   Example 1.5. Set cluster_id_col parameter
...
modparam("clusterer", "cluster_id_col", "cluster_id")
...

1.3.6. machine_id_col

   The name of the column in the db_table where the machine_id is
   stored.

   Default value is “machine_id”.

   Example 1.6. Set machine_id_col parameter
...
modparam("clusterer", "machine_id_col", "machine_id")
...

1.3.7. clusterer_id_col

   The name of the column in the db_table where the machine_id is
   stored.

   Default value is “id”.

   Example 1.7. Set clusterer_id_col parameter
...
modparam("clusterer", "clusterer_id_col", "clusterer_id")
...

1.3.8. state_col

   The name of the column in the db_table where the state is
   stored.

   Default value is “state”.

   Example 1.8. Set state_col parameter
...
modparam("clusterer", "state_col", "state")
...

1.3.9. url_col

   The name of the column in the db_table where the url is stored.

   Default value is “url”.

   Example 1.9. Set url_col parameter
...
modparam("clusterer", "url_col", "url")
...

1.3.10. description_col

   The name of the column in the db_table where the machine's
   description is stored.

   Default value is “description”.

   Example 1.10. Set description_col parameter
...
modparam("clusterer", "description_col", "description")
...

1.3.11. failed_attempts_col

   The name of the column in the db_table where the maximum
   allowed number of failed attempts is stored.

   Default value is “failed_attempts”.

   Example 1.11. Set failed_attempts_col parameter
...
modparam("clusterer", "failed_attempts_col", "failed_attempts")
...

1.3.12. last_attempt_col

   The name of the column in the db_table where the UNIX time of
   last last failed attempt is stored.

   Default value is “last_attempt”.

   Example 1.12. Set last_attempt_col parameter
...
modparam("clusterer", "last_attempt_col", "last_attempt")
...

1.3.13. duration_col

   The name of the column in the db_table where the duration of a
   machine belonging to a certain cluster temporary disabled state
   is stored.

   Default value is “duration”.

   Example 1.13. Set duration_col parameter
...
modparam("clusterer", "duration_col", "duration")
...

1.3.14. no_tries_col

   The name of the column in the db_table where the number of
   failed connecting tries is stored.

   Default value is “no_tries”.

   Example 1.14. Set no_tries_col parameter
...
modparam("clusterer", "no_tries_col", "no_tries")
...

1.4. Exported Functions

   none

1.5. Exported MI Functions

1.5.1.  clusterer_reload

   Reloads data from the clusterer database. If the persistent
   mode is disabled the changes made to the locally stored data
   are lost.

   Name: clusterer_reload

   Parameters:none

   MI FIFO Command Format:
                :clusterer_reload
                _empty_line_

1.5.2.  clusterer_list

   Lists in a table format all the data stored in OpenSIPS cache.

   Name: clusterer_list

   Parameters:none

   MI FIFO Command Format:
                :clusterer_list
                _empty_line_

1.5.3.  clusterer_set_status

   Sets the status(UP, DOWN) of a machine belonging to a certain
   cluster.

   Name: clusterer_set_status

   Parameters:
     * cluster_id - indicates the id of the cluster.
     * machine_id - indicates the id of the machine.
     * status - indicates the new status( 0 - permanent down, 1 -
       up).
     * protocol - indicates the protocol.

   MI FIFO Command Format:
                :clusterer_set_status:
                1
                2
                0
                bin
                _empty_line_

1.6. Usage Example

   This section provides an usage example for replicating
   ratelimit pipes between two OpenSIPS instances. It uses the
   clusterer module to manage the replicating nodes, and the
   proto_bin modules to send the replicated information.

   The setup topology is simple: we have two OpenSIPS nodes
   running on two separate machines (although they could run on
   the same machine as well): Node A has IP 192.168.0.5 and Node B
   has IP 192.168.0.6. Both have, besides the traffic listeners
   (UDP, TCP, etc.), bin listeners bound on port 5566. These
   listeners will be used by the ratelimit module to replicate the
   pipes. Therefore, we have to provision them in the clusterer
   table.

   Example 1.15. Example database content - clusterer table
+----+------------+------------+----------------------+-------+---------
-----+-----------------+----------+----------+-------------+
| id | cluster_id | machine_id | url                  | state | last_att
empt | failed_attempts | no_tries | duration | description |
+----+------------+------------+----------------------+-------+---------
-----+-----------------+----------+----------+-------------+
|  1 |          1 |          1 | bin:192.168.0.5:5566 |     1 |
   0 |               0 |        0 |       30 | Node A      |
|  2 |          1 |          2 | bin:192.168.0.6:5566 |     1 |
   0 |               0 |        0 |       30 | Node B      |
+----+------------+------------+----------------------+-------+---------
-----+-----------------+----------+----------+-------------+

     * “cluster_id” - this column represents the identifier of the
       cluster. All nodes within a group/cluster should have the
       same id (in our example, both nodes have ID 1)
     * “machine_id” - this represents the identifier of the
       machine/node, and each instance within a cluster should
       have a different ID. In our example, Node A will have ID 1,
       and Node B ID 2
     * “url” - this indicates the URL where the instance will
       receive the replication information. In our example, each
       node will receive the date over the bin protocol
     * “state” - this is the state of the machine: 1 means on, 0
       means off, and 2 means it is in probing. Note that if you
       want the node to be active right away, you have to set it
       in state 1
     * “last_attempt”, “failed_attempts” and “no_tries” - are
       fields used for the probing mechanisms, and should be set
       to 0 by default. They are automatically updated by the
       clusterer module if the persistent_mode parameter is set to
       1
     * “duration” - is used to specify the period a node stays in
       the temporary disabled state. In our example, if the node
       does not respond, it is disabled for 30 seconds before
       retrying to send data again
     * “description” - is an opaque value used to identify the
       node

   After provisioning the two nodes in the database, we have to
   configure the two instances of OpenSIPS. First, we configure
   Node A:

   Example 1.16. Node A configuration
...
listen = bin:192.168.0.5:5566 # bin listener for Node A

loadmodule "proto_bin.so"

loadmodule "clusterer.so"
modparam("clusterer", "db_url", "mysql://opensips@192.168.0.7/opensips")
modparam("clusterer", "server_id", 1) # machine_id for Node A

loadmodule "ratelimit.so"
# replicate pipes to cluster id 1
modparam("ratelimit", "replicate_pipes_to", 1)
# accept replicated data from nodes within cluster 1
modparam("ratelimit", "accept_pipes_from", 1)
# if a node does not reply in a 5 seconds interval,
#the information from that node is invalidated
modparam("ratelimit", "accept_pipes_timeout", 5)
...

   Similarly, the configuration for Node B is as follows:

   Example 1.17. Node B configuration
...
listen = bin:192.168.0.6:5566 # bin listener for Node B

loadmodule "proto_bin.so"

loadmodule "clusterer.so"
# ideally, use the same database for both nodes
modparam("clusterer", "db_url", "mysql://opensips@192.168.0.7/opensips")
modparam("clusterer", "server_id", 2) # machine_id for Node B

loadmodule "ratelimit.so"
# replicate pipes to cluster id 1
modparam("ratelimit", "replicate_pipes_to", 1)
# accept replicated data from nodes within cluster 1
modparam("ratelimit", "accept_pipes_from", 1)
# if a node does not reply in a 5 seconds interval,
# the information from that node is invalidated
modparam("ratelimit", "accept_pipes_timeout", 5)
...

   Note that the server_id parameter for Node B is 2. Starting the
   two OpenSIPS instances with the above configurations provides
   your platform the ability to used shared ratelimit pipes in a
   very efficient and scalable way.

Chapter 2. Developer Guide

2.1. Available Functions

2.1.1.  get_nodes(cluster_id, proto)

   The function will return all a copy of all the needed
   information from the nodes (machine_id, state, description,
   sock address) stored in shm, whos state is up(1) and have a
   certain cluster_id and protocol.

   This function is usually used for replication purposes.

   This function returns NULL on error.

   Meaning of the parameters is as follows:
     * int cluster_id - the cluster id
     * int proto - the protocol

   Example 2.1. get_nodes usage
...
get_nodes(cluster_id, proto);
...

2.1.2.  free_nodes(nodes)

   This function will free the allocated data for the copy.

   Meaning of the parameters is as follows:
     * clusterer_node_t *nodes - the data returned by the
       get_nodes function

   Example 2.2. free_nodes usage
...
free_nodes(nodes);
...

2.1.3.  set_state(cluster_id, machine_id, state, proto)

   The function sets the state of a machine belonging to a certain
   cluster, which have the specified protocol.

   This function is usually used for replication purposes.

   Meaning of the parameters is as follows:
     * int cluster_id - cluster_id
     * int machine_id - machine_id
     * int state - the server state
     * int proto - protocol

   Example 2.3. set_state usage
...
set_state(1,1,2,PROTO_BIN);
...

2.1.4.  check(cluster_id, sockaddr, server_id, proto)

   This function is used to check if the source of a receiving
   packet is known.

   It returns 1 if the source is known, else it returns 0.

   Meaning of the parameters is as follows:
     * int cluster_id - cluster id
     * union sockaddr_union* sockaddr - incoming connexion socket
       address
     * int server_id - incoming connexion server_id
     * int proto - protocol

   Example 2.4. check usage
...
check(1, sockaddr, 2, PROTO_BIN)
...

2.1.5.  get_my_id()

   This function will return the server id's.

   Example 2.5. get_my_id usage
...
get_my_id()
...

2.1.6.  send_to(cluster_id, protocol)

   This function will replicate information to the nodes belonging
   to a cluster_id that have a specific protocol.

   Meaning of the parameters is as follows:
     * int cluster_id - cluster_id
     * int protocol - protocol

   Example 2.6. send_to usage
...
send_to(cluster_id, protocol)
...

2.1.7.  register_module(module_name, protocol, callback_function,
timeout, auth_check, cluster_id)

   This function registers a module to a certain protocol. It acts
   like an intermediary: when a valid packet has arrived, if the
   auth_check parameter is specified then it is checked for
   authenticity. After that, the timestamps are updated and the
   callback function from the registered module is called. The
   clusterer module checks for every registered module if the
   duration between the last receiving packet and the current time
   is greater than the module specified timeout. If it is, the
   servers are temporary disabled for a period of timestamp * 2.
   If any packets are received for the temporary disabled servers
   the registered module is notified.

   Meaning of the parameters is as follows:
     * char *module_name - module name
     * int protocol - protocol
     * void (*callback_function)(int, struct receive_info *, int)
       - the registered module callback function
     * int timeout - timeput
     * int auth_check - 0 if the authentication check is disabled,
       1 if the authentication check is enabled
     * int cluster_id - cluster_id

   Example 2.7. register_module usage
...
register_module(dialog, PROTO_BIN, cb, timeout, auth_check, cluster_id)
...
