Monday

SilkPerformer - Linux - counters analysis

Linux - counters, symptoms and Resolution

Virtual Memory Statistics ( vmstat )
vmstat – vmstat reports virtual memory statistics of process, virtual memory, disk, trap, and CPU activity.

example% vmstat 5
procs memory page disk faults cpu
r b w swap free re mf pi p fr de sr s0 s1 s2 s3 in sy cs us sy id
0 0 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82
0 0 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62

Following columns has to be watched to determine if there is any cpu issue
1. Processes in the run queue (procs r)
2. User time (cpu us)
3. System time (cpu sy)
4. Idle time (cpu id)
procs cpu
r b w us sy id
0 0 0 4 14 82
0 0 1 3 35 62
0 0 1 3 33 64
0 0 1 1 21 78
Problem symptoms

A. CPU issues
1.) Number of processes in run queue
1.) If the number of processes in run queue (procs r) are consistently greater than the number of CPUs on the system it will slow down system as there are more processes then available CPUs .
2.) if this number is more than four times the number of available CPUs in the system then system is facing shortage of cpu power and will greatly slow down the processess on the system.
3.) If the idle time (cpu id) is consistently 0 and if the system time (cpu sy) is double the user time (cpu us) system is facing shortage of CPU resources.
Resolution
Resolution to these kind of issues involves tuning of application procedures to make efficient use of cpu and as a last resort increasing the cpu power or adding more cpu to the system.

B. Memory Issues
Memory bottlenecks are determined by the scan rate (sr) . The scan rate is the pages scanned by the clock algorithm per second. If the scan rate (sr) is continuously over 200 pages per second then there is a memory shortage.
Resolution
1. Tune the applications & servers to make efficient use of memory and cache.
2. Increase system memory .
3. Implement priority paging in s in pre solaris 8 versions by adding line “set priority paging=1″ in
/etc/system. Remove this line if upgrading from Solaris 7 to 8 & retaining old /etc/system file.

The top Command

The top program provides a dynamic real-time view of a running system. It can display system summary information as well as a list of tasks currently being managed by the Linux kernel.
The top command monitors CPU utilization, process statistics, and memory utilization. The top section contains information related to overall system status - uptime, load average, process counts, CPU status, and utilization statistics for both memory and swap space.
[root@bigboy tmp]# top
3:04pm up 25 days, 23:23, 2 users, load average: 0.00, 0.02, 0.00
78 processes: 76 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: 0.9% user, 0.5% system, 0.0% nice, 0.8% idle
Mem: 384716K av, 327180K used, 57536K free, 0K shrd, 101544K buff
Swap: 779112K av, 0K used, 779112K free 130776K cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
27191 root 15 0 1012 1012 780 R 5.6 0.2 0:00 top
4545 root 16 0 5892 5888 4956 S 0.9 1.5 169:26 magicdev
1 root 15 0 476 476 432 S 0.0 0.1 0:05 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 keventd
5 root 15 0 0 0 0 SW 0.0 0.0 0:41 kswapd
6 root 25 0 0 0 0 SW 0.0 0.0 0:00 bdflush

Find out who is monopolizing or eating the CPUs

Finally, you need to determine which process is monopolizing or eating the CPUs. Following command will displays the top 10 CPU users on the Linux system.
# ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
OR
# ps -eo pcpu,pid,user,args | sort -r -k1 | less
Output:
%CPU PID USER COMMAND
96 2148 vivek /usr/lib/vmware/bin/vmware-vmx -C /var/lib/vmware/Virtual Machines/Ubuntu 64-bit/Ubuntu 64-bit.vmx -@ ""
0.7 3358 mysql /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --socket=/var/lib/mysql/mysql.sock
0.4 29129 lighttpd /usr/bin/php
0.4 29126 lighttpd /usr/bin/php
0.2 2177 vivek [vmware-rtc]
0.0 9 root [kacpid]
0.0 8 root [khelper]
Now you know vmware-vmx process is eating up lots of CPU power. ps command displays every process (-e) with a user-defined format (-o pcpu). First field is pcpu (cpu utilization). It is sorted in reverse order to display top 10 CPU eating process.

Input Output statistics ( iostat )

iostat reports terminal and disk I/O activity and CPU utilization. The first line of output is for the time period since boot & each subsequent line is for the prior interval . Kernel maintains a number of counters to keep track of the values.

Basic synctax is iostat interval count
option – let you specify the device for which information is needed like disk , cpu or terminal. (-d , -c , -t or -tdc ) . x options gives the extended statistics .

To run iostat , sysstat package should be installed.
iostat -xtc 5 2
extended disk statistics tty cpu
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b tin tout us sy wt id
sd0 2.6 3.0 20.7 22.7 0.1 0.2 59.2 6 19 0 84 3 85 11 0
sd1 4.2 1.0 33.5 8.0 0.0 0.2 47.2 2 23
sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
The fields have the following meanings:
disk name of the disk
r/s reads per second
w/s writes per second
Kr/s kilobytes read per second
Kw/s kilobytes written per second
wait average number of transactions waiting for service (Q length)
actv average number of transactions actively being serviced
(removed from the queue but not yet completed)
%w percent of time there are transactions waiting
for service (queue non-empty)
%b percent of time the disk is busy (transactions
in progress)

The values to look from the iostat output are:
* Reads/writes per second (r/s , w/s)
* Percentage busy (%b)
* Service time (svc_t)
If a disk shows consistently high reads/writes along with , the percentage busy (%b) of the disks is greater than 5 percent, and the average service time (svc_t) is greater than 30 milliseconds, then one of the following action needs to be taken
1.) Tune the application to use disk i/o more efficiently by modifying the disk queries and using available cache facilities of application servers .
2.) Spread the file system of the disk on to two or more disk using disk striping feature of volume manager /disksuite etc.
3.) Increase the system parameter values for inode cache , ufs_ninode , which is Number of inodes to be held in memory. Inodes are cached globally (for UFS), not on a per-file system basis
4.) Move the file system to another faster disk /controller or replace existing disk/controller to a faster one.

Network Statistics (netstat)

netstat displays the contents of various network-related data structures in depending on the options selected.

Example 1:
Network Response
$ netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 77814 0 77814 0 0 0
hme0 1500 server1 server1 10658 3 48325 0 279257 0

This option is used to diagnose the network problems when the connectivity is there but it is slow in response .
Values to look at:
1. Collisions (Collis)
2. Output packets (Opkts)
3. Input errors (Ierrs)
4. Input packets (Ipkts)

The above values will give information to workout
i. Network collision rate as follows :
Network collision rate = Output collision counts / Output packets
Network-wide collision rate greater than 10 percent will indicate
* Overloaded network,
* Poorly configured network,
* Hardware problems.
ii. Input packet error rate as follows :
Input Packet Error Rate = Ierrs / Ipkts.
If the input error rate is high (over 0.25 percent), the host is dropping packets. Hub/switch cables etc needs to be checked for potential problems.

Example 2:
#netstat -a
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
*.* *.* 0 0 24576 0 IDLE
*.22 *.* 0 0 24576 0 LISTEN
if you see a lots of connections in FIN_WAIT state tcp/ip parameters have to be tuned because the
connections are not being closed and they gets accumulating . After some time system may run out of
resource . TCP parameter can be tuned to define a time out so that connections can be released and used by new connection.


sar command
The command is run by typing in:
sar -options int #samples
where valid options generally are:
-g or -p Paging
-q Average Q length
-u CPU Usage
-w Swapping and Paging
-y Terminal activity
-v State of kernel tables

The free Utility

The free utility can determine the amount of free RAM on your system. The output is easier to understand than vmstat's. Here's a sample.

[root@bigboy tmp]# free
total used free shared buffers cached
Mem: 126060 119096 6964 0 58972 40028
-/+ buffers/cache: 20096 105964
Swap: 522072 15496 506576
[root@bigboy tmp]#

You should generally try to make your system run with at least 20% free memory on average, which should allow it to handle moderate spikes in usage caused by running memory-intensive cron batch jobs or tape backups. If you cannot achieve this, consider running more efficient versions of programs, offloading some applications to servers with less load, and, of course, upgrading the capacity of your RAM.

No comments:

Post a Comment