ValueOps ConnectALL Product Community

 View Only

TIM: Fine tuning the TIM (incl. OS)

By Jörg Mertin posted Jun 17, 2015 03:44 AM

  

Introduction

The default installation of the Software-Only TIM is not providing any fine-tuning to the operating system or any other TIM specific issue. This document tries to address these short-comes in the regular or non-existing documentation, and also provide the "why" with detailed explanation.

Operation system preparation

After having installed the OS (CentOS or RedHat) - some simple steps can be performed to increase the performance of the TIM.

File systems

since the creation of the TIM - the Linux operating system has evolved, so have the file-system and the Linux kernel.

One thing - that however has by many Linux geeks including Linus Torvalds, be seen as a design-failure - is the atime update of files. The POSIX standard mandates that operating systems maintain file system metadata that records when each file was last accessed. This time-stamp is known as atime and atime comes with a performance penalty – every read operation on a file-system generates a write operation. The Problem is that for very small files, or often updated files that are read - a write operation takes place which slows down read operations up to 30%.
Best comment on this  however comes comes from Ingo:

Linux kernel developer Ingo Molnár claimed that it (atime) was “perhaps the most stupid Unix design idea of all times.” 
To disable the tracking of
atime, the noatime option can be used to mount filesystems. For IO intensive tasks, the performance
reward for turning off
atime can be immediately apparent. But, turning off atime unconditionally will occasionally break certain
software like mail tools that compare
mtime and atime to determine whether there is unread mail or not. The tmpwatch utility
and some backup tools also use
atime and can misbehave if atime is incorrect. Audit requirements are another reason for keeping
atime
enabled.

Linux keeps three time-stamps for each file on the file-system – mtime, ctime and atime (modified time, change time and access time). These can be displayed with the stat command.

jmertin@antigone:~$ stat GIT.txt 
File: ‘GIT.txt’
Size: 1145 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 526728 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ jmertin) Gid: ( 1000/ jmertin)
Access: 2015-03-27 09:50:20.244079506 +0100
Modify: 2015-03-27 11:11:54.220167423 +0100
Change: 2015-03-27 11:11:54.220167423 +0100
Birth: -


For the TIM - one can use the noatime option without problem (It is already done on some of the field kickstart images and has shown no side-effects).

For that - edit the file /etc/fstab - and add the option: noatime into the options line for the file system in use. Note - SWAP won't work !

UUID=96ed7aac-1b3e-40be-bf75-4a1c15494618 / ext3 defaults 1 1
UUID=25a6aa00-a693-4e90-bff6-f1402cb440ae /boot ext3 defaults 1 2
UUID=2555242a-cbcf-4c47-9efa-6ed40fe92175 swap swap defaults 0 0

change it to:

UUID=96ed7aac-1b3e-40be-bf75-4a1c15494618 / ext3 defaults,noatime 1 1
UUID=25a6aa00-a693-4e90-bff6-f1402cb440ae /boot ext3 defaults 1 2
UUID=2555242a-cbcf-4c47-9efa-6ed40fe92175 swap swap defaults 0 0

Note that this is valid for ext3 and ext4. On ext4 filesystems, you can also use the relatime option instead of noatime. This will update the file access times only under certain circumstances.
This is not yet active though - but that modification will make the noatime/relatime option to be reboot-safe. You can either reboot, or remount the drive manually with the noatime option on the live system:

~# mount -o remount,noatime /

 

TIM Network interface buffer sizing

Network cards have to deliver the data to the underlying OS. What happens is that the network cards do have, lets say, a 2x4096 descriptors (Each descriptor has a 2048Byte buffer), so as long as this buffer is not nearly full, the network card driver will not issue a IRQ (Interrupt Request) for the OS/Kernel/CPU to poll that data out. On a high volume bidirectional data feed, this is not really an issue if one network card has access to the full ~16MBytes buffer - note the card will have only half at disposal per data direction (Tx or Rx).

On a unidirectional data-feed (like when data comes from a SPAN or TAP), we have already only half the buffer. But take also into account the following:

On a SPAN data-feed, the data is re-aligned in one direction and merged with the other direction before being sent to the SPAN port at the Switch level. The network card - because it receives the data through one only physical link (Hald Duplex actually) also only uses half the available buffer. In our example 8MBytes. Here - we feed a complete Full Duplex link re-arranged into one Half Duplex Link => Increased data for the link.

That per se would not really be an issue, if there wasn't the bad habit of Network card manufacturers to actually connect 2 or more ports on a network card chip without increasing the number of available descriptors.

So - in the normal case (If we assume a total buffer size of 16MBytes) we have 8MBytes of data the card can buffer on SPAN Data before it needs to get rid of it. On a 2 Port card, we already have only 4MBytes per SPAN Feed, on a 4 port card, 1MBytes.

It is physically almost not possible for the system to get all the data off the network cards if the CPU has lots to do and a high-data volume comes in. There are simply too many interrupts triggered. The OS will try to get the data from the cards to free up the buffers - but simply cannot. They will fill up way too fast and the interface will start dropping packets. And this behavior gets worse the more ports per network card chip you have - if the buffer size on these cards has not been adapted accordingly. The worst part however is that a SPAN feed, which is unidirectional has no way to tell the remote end to re-send the missing packets. There is no real communication happening here. It simply is "shoot and forget". Whatever is dropped on either side (SPAN provider or receiver), is lost

Hence - if you use a 2port card to actually collect the data stream of a network TAP (One Tx and one Rx), it should be fine. If however the "I don't think approach, I feed it everything I have available"  is used, as usually done on Switches/SPAN - we start to have a problem.

And if now a 10GBps ports is used and what has been written before is not taken into account, problems are programmed. As not only will we have more than one only real data feed that is aggregated and re-aligned/arranged into one only feed, but the volume received by the SPAN receiver will be too much for the hardware and OS to handle.

In the end, Expect to have dropped packets at the interface level already. The TIM itself will also have problems because of a probable port over-subscription which causes the SPAN Sender port to already drop packets (which will never be known by TIM). TIM will start filling its OOQB buffers waiting for the missing packets which will never come - and in the end drop these too.

So - what we have here is an overly loaded network data-feed and the data which reaches the probe is so broken that analyzing it does not make sense anymore, as most that will be shown are false positives.

Note that the 16MBytes I mentioned before are actually large. Usually on-board adapters will only use 1MBytes total memory by default - so do some dual-port and quad-port cards... But this is also where one can tune the network port adapter.

If a multi-port card adapter is in use - make sure it at least has 16MBytes hardware buffers per port on board to compensate !!!

To re-assign more descriptors to a port  under Linux, check first what the current setup provides:

For example - the default configuration on a regular TIM appliance (eth1 is the SPAN receiver port).

[caadmin@saswattim01 ~]$ ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 256

It uses 256 Descriptors per data direction - this makes 256*2048 = 512KBytes buffer. A 4Mbps data stream would require the system to free the buffer every second. A 40Mbps - 10 times/Seconds, a 400Mbps - 100 times/second and a 4Gbps link, 1000/second.

Reconfigure this interface to provide the system all available buffers with:

[caadmin@saswattim01 ~]$ sudo ethtool -G eth1 rx 4096 tx 4096
[sudo] password for caadmin:
[caadmin@saswattim01 ~]$ sudo ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096

The Network card now has larger buffers (All available - note that the Tx buffer has not to be set, as the TIM will never send out data) and the data polling from the card should occur less often.
The CPU will have more time to handle requests from the TIM analysis.

 

Log-file tuning

By default, the TIM logfile size is configured to be 1MByte to prevent the client Browser from crashing in case it gets too large. The problem however is when one activates the transaction tracing - and the data volume analyzed by the TIM is high, the logging generates approx 8 times more volume than analyzed data actually comes in.

See below table for the logfile size to volume ratio:

 

Analyzed Bps

 

Volume/sec

Log data size/sec

Rotations/sec

10Mbps1,25 MByte/s1,25MByte/s * 8 = 10MBytes10
100Mbps12,5 MBytes/s12,5MBytes * 8 = 120MBytes120
1Gbps125 MBytes/s125MBytes * 8 = 1GByte1000

When the logging is activated, we have also to take into account the CPU cycles required to do the logging (approx 10% more), and the I/O strains imposed upon the filesystem. On a 1Gbps feed, worst case scenario - we will have the log-file to be rotated 1000 times per second. On a regular TIM this may not happen, however a MTP will have a problem with it very fast. The best is to avoid logging when in production. Unfortunately, the TIM Workers verbosity on the MTP cannot be completely disabled, hence there will always be some data going into the Log.

What can be done - is to set the TIM Log size to 1GB in TIM Settings (At least 100MBytes).
Note though that if accessing the TIM Log through the WebInterface chances are that the Browser will crash, and sometimes (Windows XP) also crash the OS itself.

On some MTP's - it is sometimes necessary to actually move the tim/logs directory to a ram-filesystem.
For this - identify the current size of the log-filesystem:

[caadmin@saswattim01 tim]$ pwd
/opt/CA/APM/tim
[caadmin@saswattim01 tim]$ cd logs
[caadmin@saswattim01 logs]$ du -sh .
42M .

Stop the TIM, move the old Logs directory to a temporary old logs directory, create the new ram-filesystem, copy all the files of the old log directory to it, and finally re-start the TIM.
The size of the RAM Filesystem will depend on the available RAM on the TIM and the size of the log file directory. In this case, the logfile size is configured at 100MBytes.

As we have 1 rotation in the default configuration - account at least 200MBytes for the TIM Logfile, and add 1GB for any other logfiles that could eventually grow.

[caadmin@saswattim01 ~]$ sudo -i
[sudo] password for caadmin:
[root@saswattim01 ~]# cd /opt/CA/APM/tim
[root@saswattim01 tim]# pwd
/opt/CA/APM/tim
[root@saswattim01 tim]# service tim stop
Stopping tim: [ OK ]
[root@saswattim01 tim]# mv logs logs.static
[root@saswattim01 tim]# mkdir logs
[root@saswattim01 tim]# chown apache logs
[root@saswattim01 tim]# echo "tmpfs      /opt/CA/APM/tim/logs          tmpfs      defaults,noatime,size=1200M,mode=0755,uid=apache    0    0" >> /etc/fstab
[root@saswattim01 tim]# mount /opt/CA/APM/tim/logs
[root@saswattim01 tim]# cp -aR logs.static/* logs/
[root@saswattim01 tim]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 16476584 3530088 12109532 23% /
tmpfs 1953964 0 1953964 0% /dev/shm
/dev/sda1 99150 64144 29886 69% /boot
tmpfs 1228800 42448 1186352 4% /opt/CA/APM/tim/logs
[root@saswattim01 tim]# service tim start
Starting tim: [ OK ]
The logging will now take place on the tmpfs. Note that any reboot or crash of the OS will wipe the RAM File system content.
The reboot data loss can be prevented by modifying the following lines (start / end) in the /etc/rc.d/tim
# See how we were called.
case "$1" in
start)
  # Check if the timlogs.txt file exists. If it exist - it is just a tim restart.
  if [ ! -f /opt/CA/APM/tim/logs/timlog.txt ]
  then
  cp -aR /opt/CA/APM/tim/logs.static/* /opt/CA/APM/tim/logs/
  fi
  start "$2"
  ;;
stop)
  stop
  # Remove the static logs dir.
  rm -rf /opt/CA/APM/tim/logs.static/*
  # Copy the content of the TIM RAM logs directory to the static logs directory
  cp -aR /opt/CA/APM/tim/logs/* /opt/CA/APM/tim/logs.static/
  ;;

This will keep the content of logs (RAM) updated upon reboot. I can't prevent log-data loss in case of crash of the OS, however even in that case - the content of the old logs.static will be played back ensuring the required directory structure the TIM expects to find.

This will drastically reduce the I/O impact on the file-system which is so much required on the MTP/TIM for other processes.

 

RAM/TMP Filesystem for apmpacket (TIM 9.7.x and later - high performance TIM)

The new high-performance TIM works like the MTP in that it expects packet capture files to be available in a per TIM Worker Directory. The pcap file snippets will be copied to a data directory by the apmpacket program.
By default, apmpacket will write these on the regular file-system under /opt/CA/APM/apmpacket/data/pcap.
When the traffic volume gets high however, the file-system speed and I/O will become the limiting factor. So it is smart to actually make the apmpacket pcap directory a ram-filesystem as is done by default on the MTP.

For this, stop the TIM and apmpacket, add the RAM filesystem definition in the /etc/fstab, create the ram-filesystem target directory and mount the RAM file system. Then start apmpacket and TIM in that order.
This would look like the following:

[root@saswattim01 data]# pwd
/opt/CA/APM/apmpacket/data
[root@saswattim01 data]# service tim stop
Stopping tim: [ OK ]
[root@saswattim01 data]# service apmpacket stop
Stopping apmpacket
[root@saswattim01 data]# echo "tmpfs /opt/CA/APM/apmpacket/data/pcap tmpfs defaults,noatime,mode=0744 0 0" >> /etc/fstab
[root@saswattim01 data]# mount /opt/CA/APM/apmpacket/data/pcap
[root@saswattim01 data]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 16476584 3520056 12119564 23% /
tmpfs 1953964 0 1953964 0% /dev/shm
/dev/sda1 99150 64144 29886 69% /boot
tmpfs 1228800 32120 1196680 3% /opt/CA/APM/tim/logs
tmpfs 1953964 0 1953964 0% /opt/CA/APM/apmpacket/data/pcap
[root@saswattim01 data]# service apmpacket start
Starting apmpacket
[root@saswattim01 data]# service tim start
Starting tim: [ OK ]

Note that the size of the ram-filesystem can be adapted by adding a size parameter to the options line. By default 2GB will be used, however some circumstances will require the admin to increase the size to 4GB.

tmpfs /opt/CA/APM/apmpacket/data/pcap tmpfs defaults,noatime,mode=0744,size=4G 0 0

Make sure however that there really is enough memory on the system !!!

 

Note: Main document is on the cawiki/SWAT Team. Any comments, enhancement requests and questions - please send to the author.

14 comments
8 views