Wednesday 30 September 2015

Check Point - Clustering

Virtual Routing Redundancy Protocol 

VRRP cluster can be used for High Availability or Load Sharing. The check point implementation of VRRP includes additional functionality called Monitored Circuit VRRP which prevents black holes. 
You cannot deploy a standalone deployment (Security Gateway and SMS on the same computer) in a Gaia VRRP cluster.

A VRRP router might participate in more than one VRID. The VRID mappings and priorities are different for each VRID. 

Monitored Circuit VRRP eliminates black holes caused by asymmetric routes that can be created if only one interface on the master fails as opposed to the entire platform. Monitored Circuit VRRP monitors all of the VRRP-configured interfaces on the platform. If an interface fails, the master release its priority over all of the VRRP-configured interfaces. To release the priority, Gaia subtracts the Priority Delta from the priority to calculate the Effective Priority. Make sure to calculate the Priority delta value so that Gaia releases priority over all interfaces on a virtual router to let failover occur when one interface fails.

If the platforms run firewall software, you must configure the firewall policies to accept VRRP packets. The Multicast address assigned for VRRP is 224.0.0.18. If the policy does not accept packets to 224.0.0.18, Firewall platforms in the same VRRP group take on Master state.

With Monitored Circuit VRRP, some ethernet switches might not recognize the VRRP MAC address after a master to backup change. This is because many switches cache the MAC address related to the Ethernet device attached to a port. When the change to a backup router occurs, the MAC address for virtual router shifts to a different port and switches that cache the MAC address might not change to the correct port during VRRP change. To prevent this, replace the switch with a hub; disable MAC address caching or set the address ageing value sufficiently low; also enable portfast.

Cluster XL

ClusterXL provides both load sharing and high availability solutions.ClusterXL must be installed in a distributed configuration in which the SMS and the cluster members are on different machines. ClusterXL is part of the standard security gateway installation.

A Critical Device is a device that is critical to the operation of the cluster member. It is also known as a Problem Notification (PNote). It can be hardware or a process. The fwd and cphad processes as well as the Security Policy itself are predefined as critical devices. Use cphaprob command to add.

Cluster Control Process (CCP) is used specifically for clustered environments to allow gateways to report their own states and learn about the states of other members in the cluster. It is essential means by witch State Synchronization works to provide failover in the event an active member goes down.
There is no need to add a rule to the Rule Base that accepts CCP. When clustering is configured on the gateways, an implied rule is created making this provision.

ClusterXL uses unique physical IP and MAC addresses for the cluster members and virtual IP addresses to represent the cluster itself. Virtual IP addresses do not belong to an actual machine interface.

Cluster Synchronization  

In order to make sure each Gateway cluster member is aware of the connections going through the other members, a mechanism called State Synchronization exists witch allows status information about connections on the Security Gateways to be shared between the members. Every IP based service including TCP and UDP, recognized by the security gateway is synchronized. State synchronization is used both by ClusterXL and by third-party OPSEC-certified clustering products. 2 modes
   Full Synchronization - Transfers all Firewall Kernel table information from one cluster member to another. It's handled by the fwd daemon, using an encrypted TCP connection. Full synchronization is used for initial transfers of state information for thousands of connections. If a cluster member is brought up after failing down, it will perform full sync. Once all members are synchronized, only updates are transferred via delta sync.

   Delta Synchronization - Transfers changes in the Kernel tables between cluster members. Delta sync is handled by the Firewall Kernel using UDP Multicast or Broadcast on port 8116.

A user authenticated connection through a cluster member will be lost if the cluster member fails. However, a Client Authenticated or Session Authenticated connection will not be lost.
When failover, accounting information that was accumulated on the failed member but not yet reported to the SMS is lost.

Checkpoint recommends securing the synchronization interfaces by using a dedicated syn network or connecting the physical network interfaces of the cluster members directly.

ClusterXL: Load Sharing

Machines in a ClusterXL load sharing configuration must be synchronized. Machines in a ClusterXL HA configuration do not have to be synchronized but connections wll be lost upon failover if they are not. Multicast and unicast are 2 available modes in a load sharing environment.

Multicast Load Sharing 

Every member of the cluster receives all of the packets sent to the cluster IP address. ClusterXL decision algorithm on all cluster members decides which cluster member should perform enforcement processing on the packet. Only that machine processes the packet and sends the packet to its destination The other machines drop the packet.

Unicast Load Sharing

In this mode, one machine called the Pivot machine receives all traffic from a router with a Unicast configuration and redistributes the packets to the other machines in the cluster. The Pivot machine is chosen automatically by ClusterXL.

The Pivot is the only machine that communicates with the router and the router uses only the Pivot's Unicast MAC address to communicate to the cluster.

Sticky connections - A connection is sticky when all of its packets are handled, in either direction, by a single cluster member. In HA mode, all connections are routed through the same cluster member. In load sharing mode, this is not the case but certain connections can be made sticky by enabling the Sticky Decision Function (SDF).

Perform a Manual Failover of the Firewall Cluster

The best practice method for initiating a manual failover: run below command on an active cluster member that creates a problem notification entry with no refresh time in a problem state.
     cphaprob -d STOP -s problem -t 0 register
Running the command cphaprob list on this machine will show an entry named STOP. To remove the problematic STOP entry from the cluster member,
     cphaprob -d STOP unregister

Also can be done from expert mode
     clusterXL_admin down
     clusterXL_admin up

A manual failover can also be induced from the Gateways status screen in Smart View Monitor via Stop Cluster member.

ClusterXL CCP on the cluster members uses Multicasts by default as it's more efficient. If the connecting switch is incapable of forwarding Multicast, change the CCP mode to Broadcast
     cphaconf set_ccp broadcast
     cphaconf set_ccp multicast   //to change back to multicast

--------------------------------------------------------------------------------------------------------------------

Management High Availability

The SMS consists of several databases with information on different aspects of the system such as objects, users and policy information. In the absence of SMS, essential operations performed by the gateways, such as fetching of the Security Policy and the retrieval of the CRL, cannot take place.

In Management HA, the Active SMS always has one or more backup Standby SMS. These standby SMS must all be of the same operating system and version. In a Management HA deployment, the first installed SMS is specified as the Primary SMS.

The Secondary SMS is created with empty databases that are filled with information received from the Active SMS. Secondary SMS is ready once
  • It is represented on the Primary SMS by a network object
  • SIC has been initialized between it and the Primary SMS
  • Manual synchronization has been completed with the Primary SMS for the first time
All management operations are done by the Active SMS. The transition from Standby to Active must be initiated manually. The Standby SMS are synchronized to the Active SMS so they are kept up-to-date with all changes in the databases and Security Policy Security Gateways can fetch the Security Policy and retrieve a CRL from both SMS.

In order for Management HA to function properly, there must be a backup of Database (such as Objects and Users), Certificate information such as Certificate Authority data and CRL, and the installed Security Policy.

Manual or Automatic synchronization. Synchronization status can be viewed in the Management High Availability Servers window or in SmartView Monitor depending on whether you are connected to the Active or Standby SMS. The possible statuses are : never been synchronized, synchronized, lagging (the peer SMS has not been synchronized since the Active SMS has changes applied to it), advanced (the peer SMS is more up-to-date), collision (the active SMS and its peer have different installed policies and/or databases).

No comments:

Post a Comment