Wednesday, 30 September 2015

Check Point - Clustering

Virtual Routing Redundancy Protocol 

VRRP cluster can be used for High Availability or Load Sharing. The check point implementation of VRRP includes additional functionality called Monitored Circuit VRRP which prevents black holes. 
You cannot deploy a standalone deployment (Security Gateway and SMS on the same computer) in a Gaia VRRP cluster.

A VRRP router might participate in more than one VRID. The VRID mappings and priorities are different for each VRID. 

Monitored Circuit VRRP eliminates black holes caused by asymmetric routes that can be created if only one interface on the master fails as opposed to the entire platform. Monitored Circuit VRRP monitors all of the VRRP-configured interfaces on the platform. If an interface fails, the master release its priority over all of the VRRP-configured interfaces. To release the priority, Gaia subtracts the Priority Delta from the priority to calculate the Effective Priority. Make sure to calculate the Priority delta value so that Gaia releases priority over all interfaces on a virtual router to let failover occur when one interface fails.

If the platforms run firewall software, you must configure the firewall policies to accept VRRP packets. The Multicast address assigned for VRRP is 224.0.0.18. If the policy does not accept packets to 224.0.0.18, Firewall platforms in the same VRRP group take on Master state.

With Monitored Circuit VRRP, some ethernet switches might not recognize the VRRP MAC address after a master to backup change. This is because many switches cache the MAC address related to the Ethernet device attached to a port. When the change to a backup router occurs, the MAC address for virtual router shifts to a different port and switches that cache the MAC address might not change to the correct port during VRRP change. To prevent this, replace the switch with a hub; disable MAC address caching or set the address ageing value sufficiently low; also enable portfast.

Cluster XL

ClusterXL provides both load sharing and high availability solutions.ClusterXL must be installed in a distributed configuration in which the SMS and the cluster members are on different machines. ClusterXL is part of the standard security gateway installation.

A Critical Device is a device that is critical to the operation of the cluster member. It is also known as a Problem Notification (PNote). It can be hardware or a process. The fwd and cphad processes as well as the Security Policy itself are predefined as critical devices. Use cphaprob command to add.

Cluster Control Process (CCP) is used specifically for clustered environments to allow gateways to report their own states and learn about the states of other members in the cluster. It is essential means by witch State Synchronization works to provide failover in the event an active member goes down.
There is no need to add a rule to the Rule Base that accepts CCP. When clustering is configured on the gateways, an implied rule is created making this provision.

ClusterXL uses unique physical IP and MAC addresses for the cluster members and virtual IP addresses to represent the cluster itself. Virtual IP addresses do not belong to an actual machine interface.

Cluster Synchronization  

In order to make sure each Gateway cluster member is aware of the connections going through the other members, a mechanism called State Synchronization exists witch allows status information about connections on the Security Gateways to be shared between the members. Every IP based service including TCP and UDP, recognized by the security gateway is synchronized. State synchronization is used both by ClusterXL and by third-party OPSEC-certified clustering products. 2 modes
   Full Synchronization - Transfers all Firewall Kernel table information from one cluster member to another. It's handled by the fwd daemon, using an encrypted TCP connection. Full synchronization is used for initial transfers of state information for thousands of connections. If a cluster member is brought up after failing down, it will perform full sync. Once all members are synchronized, only updates are transferred via delta sync.

   Delta Synchronization - Transfers changes in the Kernel tables between cluster members. Delta sync is handled by the Firewall Kernel using UDP Multicast or Broadcast on port 8116.

A user authenticated connection through a cluster member will be lost if the cluster member fails. However, a Client Authenticated or Session Authenticated connection will not be lost.
When failover, accounting information that was accumulated on the failed member but not yet reported to the SMS is lost.

Checkpoint recommends securing the synchronization interfaces by using a dedicated syn network or connecting the physical network interfaces of the cluster members directly.

ClusterXL: Load Sharing

Machines in a ClusterXL load sharing configuration must be synchronized. Machines in a ClusterXL HA configuration do not have to be synchronized but connections wll be lost upon failover if they are not. Multicast and unicast are 2 available modes in a load sharing environment.

Multicast Load Sharing 

Every member of the cluster receives all of the packets sent to the cluster IP address. ClusterXL decision algorithm on all cluster members decides which cluster member should perform enforcement processing on the packet. Only that machine processes the packet and sends the packet to its destination The other machines drop the packet.

Unicast Load Sharing

In this mode, one machine called the Pivot machine receives all traffic from a router with a Unicast configuration and redistributes the packets to the other machines in the cluster. The Pivot machine is chosen automatically by ClusterXL.

The Pivot is the only machine that communicates with the router and the router uses only the Pivot's Unicast MAC address to communicate to the cluster.

Sticky connections - A connection is sticky when all of its packets are handled, in either direction, by a single cluster member. In HA mode, all connections are routed through the same cluster member. In load sharing mode, this is not the case but certain connections can be made sticky by enabling the Sticky Decision Function (SDF).

Perform a Manual Failover of the Firewall Cluster

The best practice method for initiating a manual failover: run below command on an active cluster member that creates a problem notification entry with no refresh time in a problem state.
     cphaprob -d STOP -s problem -t 0 register
Running the command cphaprob list on this machine will show an entry named STOP. To remove the problematic STOP entry from the cluster member,
     cphaprob -d STOP unregister

Also can be done from expert mode
     clusterXL_admin down
     clusterXL_admin up

A manual failover can also be induced from the Gateways status screen in Smart View Monitor via Stop Cluster member.

ClusterXL CCP on the cluster members uses Multicasts by default as it's more efficient. If the connecting switch is incapable of forwarding Multicast, change the CCP mode to Broadcast
     cphaconf set_ccp broadcast
     cphaconf set_ccp multicast   //to change back to multicast

--------------------------------------------------------------------------------------------------------------------

Management High Availability

The SMS consists of several databases with information on different aspects of the system such as objects, users and policy information. In the absence of SMS, essential operations performed by the gateways, such as fetching of the Security Policy and the retrieval of the CRL, cannot take place.

In Management HA, the Active SMS always has one or more backup Standby SMS. These standby SMS must all be of the same operating system and version. In a Management HA deployment, the first installed SMS is specified as the Primary SMS.

The Secondary SMS is created with empty databases that are filled with information received from the Active SMS. Secondary SMS is ready once
  • It is represented on the Primary SMS by a network object
  • SIC has been initialized between it and the Primary SMS
  • Manual synchronization has been completed with the Primary SMS for the first time
All management operations are done by the Active SMS. The transition from Standby to Active must be initiated manually. The Standby SMS are synchronized to the Active SMS so they are kept up-to-date with all changes in the databases and Security Policy Security Gateways can fetch the Security Policy and retrieve a CRL from both SMS.

In order for Management HA to function properly, there must be a backup of Database (such as Objects and Users), Certificate information such as Certificate Authority data and CRL, and the installed Security Policy.

Manual or Automatic synchronization. Synchronization status can be viewed in the Management High Availability Servers window or in SmartView Monitor depending on whether you are connected to the Active or Standby SMS. The possible statuses are : never been synchronized, synchronized, lagging (the peer SMS has not been synchronized since the Active SMS has changes applied to it), advanced (the peer SMS is more up-to-date), collision (the active SMS and its peer have different installed policies and/or databases).

Saturday, 26 September 2015

Wireshark notes - 4 - Tips

Try to keep my Wireshark trace files to 100 MB size maximum.

Define a useful naming scheme for your trace files as soon as possible. Consider including capture location, capture purpose and any notes about the trace file in your trace file names.
sw1-msmith-slowsalesforce.pcapng
sw1-msmith-backgroundidle.pcapng
local-gspicer-slowbrowse.pcapng
local-gspicer-uploadstuck.pcapng
fs2-disconnects.pcapng
rtr2side1-slowpath.pcapng
rtr2side2-slowpath.pcapng


Tips for Analyzing TCP-Based Applications

-Look at the TCP handshake to get a snapshot of round trip time.
   If capturing at the client, measure the time between the SYN and the SYN/ACK.
   If capturing at the server, measure the time between the SYN/ACK and ACK.
-Open SYN and SYN/ACK packets and examine TCP peer capabilities (TCP Options).
   Decent MSS size?
   SACK supported by both?
   Window Scaling supported by both?
   Decent scaling factor?
-Launch the IO Graph and look for drops in throughput.
   Add the Bad TCP coloring rule filter to the IO Graph to correlate drops in throughput with TCP issues (the Golden Graph).
-Open the Expert Infos to view detected problems.
   Focus on Errors, Warnings and Notes.
   Expand sections and click on packets to jump to that location in the trace file and explore further.
-View and sort the TCP Delta column (tcp.time_delta).
   Sort the column from high to low and examine delays.
   Do not get distracted by "normal delays" (refer to Do not Focus on "Normal" or Acceptable Delays).
 -View and sort the Calculated window size field to look for issues.
   Do not worry about FIN or RST packets with Window 0 values.
   Look for low window size values and delays in close proximity.

Tips for Locating the Cause of Intermittent Problems

Consider using a Ring Buffer during the capture process. To capture intermittent problems, set up a capture machine close to one of the machines that experiences the problem. Start capturing traffic to a file set and define the number of files to be saved by the Ring Buffer. Do not set an auto stop condition—stop the capture as soon as possible after the problem occurs.


When you stop capturing the last file is displayed. Work backwards through this file and then the
other files in the file set to locate the problem. Select File | File Set | List Files to view and navigate
between files in the file set.

Tips for Detecting WLAN Problems

You need to capture the 802.11 Management, Control and Data frames, the 802.11 header, and have a pseudoheader applied. Management and Control frames are necessary to identify problems with associating and authenticating to a WLAN. Data frames provide us with the actual throughput rates on a WLAN.

Tips for Sanitizing Trace Files

Security rule: Never share trace files that may contain confidential information. Use TraceWrangler that was created specifically to sanitize .pcapng files.

Tips for When you get stuck

Search www.ietf.org, www.wiresharkbook.com/resources.html, and also consider asking for help at ask.wireshark.org. 


Saturday, 19 September 2015

Wireshark notes - 3 - Application Errors and Advanced IO Graph


dns.flags.rcode > 0
http.response.code >= 400 or
http.response.code > 399

HTTP response code
1xx: Informational—Request received, continuing process
2xx: Success—The action was successfully received, understood, and accepted
3xx: Redirection—Further action must be taken in order to complete the request
4xx: Client Error—The request contains bad syntax or cannot be fulfilled
5xx: Server Error—The server failed to fulfill an apparently valid request


SMB response code(NT status) of 0 indicates the request was successful.
smb.nt_status > 0 || smb2.nt_status > 0


SIP is a request/response-based application. SIP can run over UDP or TCP. When SIP is configured to run over TCP, we hope to see an ACK to our SIP request in a reasonable amount of time and then a successful response. SIP response codes are
1xx: Provisional — request received, continuing to process the request.
2xx: Success — the action was successfully received, understood, and accepted.
3xx: Redirection — further action needs to be taken in order to complete the request.
4xx: Client Error — the request contains bad syntax or cannot be fulfilled at this server.
5xx: Server Error — the server failed to fulfill an apparently valid request.
6xx: Global Failure — the request cannot be fulfilled at any server.

sip.Status-Code >= 400 or
sip.Status-Code > 399

----------------------------------------------------------------------------------------------------

A picture is worth a thousand packets

Use an IO Graph to compare the throughput of separate conversations
Use an IO Graph to compare application throughput based on port numbers in use Consider using Advanced IO Graphs when you need the Calc functions (such as MIN, AVG, MAX)

When the application runs over TCP and you have the option of using an application name filter (such as http), it is recommend you use a port-based filter (such as tcp.port==80) instead in order to include the TCP overhead (such as TCP handshake packets, ACKs, FINs, and RSTs) in your graph.


The Advanced IO Graph offers Calc functions for summing the contents of a field, counting the occurrences of a field and more.
  -Use Calc: SUM(*) to add the contents of a numerical field, such as tcp.len, which does not exist in a packet, but is Wireshark's field to count just data bytes in packets.
  -Use Calc: COUNT FRAMES(*) to count the occurrence of specific type of frame or Expert Infos item such as tcp.analysis.retransmission.
  -Use Calc: COUNT FIELDS(*) to count the occurrence of a field, such as the IP ID (ip.id) field which occurs twice in some ICMP packets.
  -Use Calc: MIN(*), AVG(*) and MAX(*) to graph the minimum, average and maximum value of a numerical field, such as the tcp.window_size field.
  -Use Calc: LOAD(*) to graph response time fields, such as smb.time.

There is no field in a packet called tcp.len, but Wireshark uses this value to define the number of data bytes in each TCP segment. Tcp.len value does not count header values.

---------------------------------------------------

Detect Consistently low throughput due to low packet sizes

Low packet sizes may be caused by an application that intentionally wants to transfer smaller amounts of data. Low packet sizes can also be an indication of a low Maximum Segment Size (MSS) setting. For MSS setting, check tcp handshake.


Identify Queuing Delays along a Path

Interconnecting devices can inject delays by queuing (holding the packets temporarily before forwarding them) along a path. Consider using a traffic generator to detect queuing along a path. A tool such as iPerf/jPerf can be used to transmit traffic at a steady rate.



Correlate drops in Throughput with TCP Problems (the Golden Graph)

This graph can determine if throughput issues are related to network problems such as lost packets or
zero window sizes. This is a great graph to build whenever anyone complains about slow performance of a TCP-based application.


Graph Time Delays

This is a great way to identify slow responses for an application that does not have a delta time function.

Graph High TCP Delta Time (TCP-Based Application)

Some TCP-based applications (such as HTTP and SMB) have a delta time tracking function in Wireshark. If the application does not have the delta time tracking function built into the dissector, you can still graph high delta times using tcp.time_delta.


Graph Other Network Problems

You can graph window size issues based on the TCP analysis flag (tcp.analysis.zero_window) or the actual Calculated window size file value.
You can graph packet loss and recovery processes using the TCP analysis flags for each part of the process. 
Although TCP time-sequence graph can be very busy, it can depict not only packet loss but it can also depict selective ACKs.




The above graph clearly depicts the points in the trace where Wireshark noticed packet loss. In addition, the graph depicts the packet loss recovery process by graphing Duplicate ACKs and Retransmissions.



Thursday, 10 September 2015

Wireshark notes - 2

Expert Info Messages

Previous Segment not Captured

tcp.analysis.lost_segment

Packet loss recovery method #1 - Fast Recovery
If the receiver supports Fast Recovery and notices the jump in sequence number value, it will immediately begin sending Duplicate Acknowledgments requesting sequence number 7,920. Upon receipt of 4 identical ACK (can be more than 4), sender retransmit the packet.


Packet loss recovery method #2 - Sender Retransmission Timeout (RTO)
If the sender notices that a data packet has not been acknowledged within its Retransmission Timeout (RTO) timer value, it will retransmit the packet.



To determine how many packets were lost, add 3 colums - sequence number, next sequence number and acknowledgement number.



Location of the capture



Since we know the sequence number of the packet that is missing, we can use that information to
determine if we see the original and the Retransmission or just the Retransmission. That will tell us if
we are upstream or downstream from packet loss. Filter tcp.seq==9164761.

Duplicate ACKs

tcp.analysis.duplicate_ack Duplicate ACKs are an indication that a host supports Fast Recovery and noticed that a packet arrived with a sequence number beyond the calculated next sequence number. Duplicate ACKs are usually a sign of packet loss, but Duplicate ACKs can also be an indication of out-of-order packets.

If the packet with the missing sequence number arrives within 3 ms, Wireshark marks that packet as
Out-of-Order (tcp.analysis.out_of_order). If the packet with the missing sequence number arrives later than 3 ms. later, Wireshark will indicate that the packet is a either a Retransmission or a Fast Retransmission.

These Duplicate ACKs complain about a missing sequence number. If SACK is in use we should see
only the missing packets being retransmitted. The SACK Left Edge and SACK Right Edge fields in
the TCP Options area acknowledge other data packets received while the Acknowledgment Number
field still indicates the desired missing sequence number.
If SACK is not in use we may see many unnecessary retransmissions as the sender retransmits every
data packet starting at the missing sequence number.


Out-of-Order Packets

tcp.analysis.out_of_order  Out-of-order packets may not affect performance if there is very little time(1-3 ms) between their expected arrival and their actual arrival. If out-of-order packets arrive after quite a delay, or there are many out-of-order packets, there may be a noticeable degradation in performance. TCP cannot pass received data up to the application until all the bytes are in the correct order.

Determining if a packet is Out of order, Retransmission or Fast Retransmission

Fast Retransmission

tcp.analysis.fast_retransmission  Fast Retransmissions are triggered by receipt of three identical ACKs (the original ACK and two Duplicate ACKs).

Retransmission 

tcp.analysis.retransmission  Standard Retransmissions are not triggered by Duplicate ACKs. Standard Retransmissions are triggered by a Retransmission Time Out (RTO) at the sender. The RTO timer is used to ensure data delivery continues even if the TCP peer stops communicating (with ACKs). When the RTO timer expires without receiving an ACK for the data packet, the sender retransmits the unacknowledged data packet.




You do not want to spend time troubleshooting Retransmissions or Fast Retransmissions when these
packets are actually Out-of-Order packets that did not arrive within 3 ms of the higher Sequence
Number field value.

Remember, Duplicate ACKs lead to Fast Retransmissions. An expired RTO at the sender leads to
Retransmissions. Each of these is an indication of packet loss which typically occurs at interconnecting devices. Capturing at different points on the network can help you find the point of
packet loss.
Applications cannot pick up data from the buffer until all sequential bytes have been received. Outof-
Order problems typically aren't felt by network users unless there is a large gap in time between
the expected arrival time and actual arrival time.

ACKed Unseen Segment

tcp.analysis.ack_lost_segment  This Expert Infos warning indicates that Wireshark sees an ACK, but it did not see the data packet that is being acknowledged.



Zero Window

tcp.analysis.zero_window  Each side of a TCP conversation advertises its receive buffer space in the Window Size Value field (tcp.window_size_value). When a receiving application cannot pull data out of the receive buffer fast enough, this advertised Window Size value can drop to zero



The Window Size Value field indicates the actual Window Size being advertised. When Window
Scaling is in use, Wireshark multiplies the Scaling Factor by the advertised Window Size Value field
to provide the scaled Window Size (Calculated window size field). The TCP FIN or RST packets would not be colored by the Bad TCP.



Window Full

tcp.analysis.window_full  Window Full is an indication that the target will be out of receive buffer space when the data packet arrives. Use 'Bytes in Flight" to watch a stuck application.

Zero Window Probe and Zero Window Probe ACK

tcp.analysis.zero_window_probe, tcp.analysis.zero_window_probe_ack  Zero Window Probe packets may be sent by a host to a TCP peer that is advertising a Zero Window condition in the hope of eliciting a Window Update response. Keep in mind that a host may send a TCP Keep Alive (decrementing the sequence number by 1) instead of a Zero Window Probe packet.


Tuesday, 8 September 2015

Wireshark notes - 1

some checksum algorithms are able to recover the error simply by calculating what the error is and just repari it. Some can't and so, retransmit if the protocol supports.

Flow Graph --> Statistics - Flow Graph
 -TCP handshake in full view - review the handshake and patterns associated with communications
 -Helps to find errors in communications


TCP Stream Graph --> Statistics - TCP Steam Graph


Service Response Time --> Statistics -
 For protocls SMB, LDAP, others.


Analyzing Packet Lengths
 -shows fragmentation problems
 -tiny packets problems

IO Graph (covered in foundation videos)
===================================

If the application is TCP-based, you should use a display filter based on the port number in order to view the TCP overhead (such as the TCP handshake, ACKs and connection tear down) as well as the application traffic. For example, the filter tcp.port==21 would display the FTP command channel traffic, including the TCP handshake, ACKs, and the TCP connection teardown packets.


Sample useful display filters

eth.addr == d4:85:64:a7:bf:a3
ip.addr==10.1.1.1
http.request.method  --to view all HTTP client request packets
dns.flags.rcode > 0  -- to identify DNS error responses
tcp.window_size < 1000 -- to identify advertised buffer space issue. Look for window update messages

Use !/not with ==/eq when you filter on a field name that matches two fields such as ip.addr, tcp.port or udp.port.
Use (!=) when you filter on a field name that only matches one field such as dns.flags.rocde or tcp.dstport

tcp.flags.syn == 1

Change the TCP Dissector Reassembly Setting to Properly Measure HTTP Response Times


Use Statistics --> Conversations to find top talkers.
Use Statistics --> IO Graph to quickly spot a throughput problem.

=======================================================
Wireshark profiles are saved in Personal configuration folder.

Normal or acceptable delays should be ignored in the trace file. They are
 -Delays before DNS queries,
 -delays before TCP FIN or Reset packets,
 -delays before a client sends a request to server,
 -delays before keep-alive or zero window probes(zero window probe is sent during a zero window situation to determine if more buffer space is avail at the target),
 -delays before TLS encrypted alert followed by a TCP FIN or RST
 -delays before a periodic set of packets in a connection that is otherwise idle (applicaiton's own keep alive packet)
 -

Knowing what "normal" delay times are will help. These delays below do matter
 -delays before a server responds with a SYN/ACK
 -delays before a client completes the 3-way TCP handshake
 -delays before a server sends a response
 -delays before the next packet in a data stream (buffer space)
 -delays before an ACK from a TCP peer (delays before transmitted data is ACKed)
 -delays before a window update (tcp.window.size; no expert info warning for this "low window size" problem.

Various time measurements and application response time measurements
Delta time (frame.time_delta)
Delta displayed time (frame.time_delta_displayed and Delta time displayed)
TCP delta time (tcp.time_delta)
DNS response time (dns.time)
HTTP response time (http.time)
SMB response time (smb.time)


Using IO Graph to display latency


Calculating conversation timestamps of TCP

Wireshark numbers each separate TCP conversation with a TCP Stream index (tcp.stream) value starting with 0. After you have enabled the Calculate conversation timestamp preference setting, Time since previous frame in this TCP stream (tcp.time_delta) will be visible at the end of the TCP header. Unlike the basic delta time value, this time value tracks the time from the end of one packet in a TCP conversation (aka "stream") to the end of the next packet in that same TCP conversation.

The TCP Delta column is a key column to add when troubleshooting TCP-based applications. It's one of the first steps I use when locating the cause of poor performance of a TCP-based applications on a network.

In large trace file, to find the most active TCP conversation, use Conversations menu. From there, click on TCP/UDP and sort by Bytes. Right click on the converation with highest bytes and apply as filter.


Obtain Round Trip Time (RTT) using TCP Handshake
If capturing at the clinet, look at the tcp.time_delta value between client's TCP SYN packet and server's TCP SYN/ACK response. If at server, look at value between server's TCP SYN/ACK and clinet TCP ACK response. If capturing inside the infrastructure, add up the delta time between TCP SYN and ACK packets of the handshake.

To filter the first 2 packets of TCP handshake - tcp.flags.syn==1
To filter SYN/ACK - (tcp.flags.syn==1 && tcp.flags.ack==1)
To filter ACK, - (tcp.seq==1 && tcp.ack==1)
To filter SYN/ACK and ACK - (tcp.flags.syn==1 && tcp.flags.ack==1) || (tcp.seq==1 && tcp.ack==1) && tcp.len==0 && tcp.flags.fin==0


filter: tcp.time_delta > 1 && tcp.flags.fin==0 && tcp.flags.reset==0 && !http.request.method=="GET"
Using IO Graph for TCP Delay


Identify High HTTP Response Time

The Http response time field is called http.time which exists only in Http response packets.