As already pointed out, without the onstat output it's very difficult to
diagnose what went on. However, there are some things you can look at in
hindsight that I will detail. Also there are a couple of suspect ONCONFIG
values I can point you at (the suggested tests will help sort these out also).
If you have not yet zero'd out your server stats (onstat -z) DON'T DO IT YET!
First calculate my three basic metrics:
Bufwaits Ratio (BR), Buffer Turnover Rate (BTR), and ReadAhead Utilization
(RAU). You can download the script ratios.ksh from the IIUG site to do this for
you. These are the formula:
BR = ((bufwaits/(pagreads + bufwrits)) * 100)
BTR = ((bufwrits+pagreads)/BUFFERS)/<fractional hours since stats were cleared>
RAU = ((ixda-RA+idx-RA+da-RA) / RA-pgsused) * 100)
BR is % of buffer accesses that had to wait for a buffer or for an LRU queue
BR <= 7 server's cruising
7 < BR <= 10 server's struggling
BR > 10 users are complaining, server's hosed
BTR is an approximation of how often your server turns over the entire cache (in
practice high BTR more often tells you that a smaller number of buffers is
being churned very quickly).
BTR should be small single digits - I like to keep mine at 6 or less
RAU is the percentage of readahead pages that are actually accessed. This value
should be as close to 100% as possible (I worry when it falls below 99.5%!)
Note that you have only 8 LRUS configured with 300 users that means that over 40
users are trying to access each LRU concurrently during peak load. I would be
surprised if your BR value isn't double digits. Perhaps as high as 25 or so!
If this is so, increase LRUS to the maximum value (128 for 32bit versions, 256
for 64bit versions). You should always set CLEANERS >= LRUS so increase that
value as well.
BUFFERS = 90000 - Unless every one of those 300 users is accessing the same
data this looks low. BTR will start you in the right direction to see that.
Also look at onstat -P over time during peak loads to see what partnums are
hogging the cache and to see if a small number of partnums are trading buffers
back an forth between them. All signs that you need more buffers.
Are you using KAIO? If not monitor onstat -g iov to make sure that at least one
AIO VP has an io/wup value below 1.0. If not you can take advantage of more
aio vps. These are VERY low overhead (they sleep if they are not busy) so add a
bunch if needed and cull them latter (decrease by the number of vps with io.wup
< 0.5 or so).
Art S. Kagel
----- Original Message -----
From: Chris Salch <ids@iiug.org>
At: 8/31 13:23:45
We have an HP9000/L2000 with 4 64bit PA-RISC chips running at 440Mhz and
5 Gigs of ram. We're running Inforimx 9.40.HC5 on HPUX 11i. We ran
into an issue on Monday where our database engine seemed to grind to a
halt. The machine itself responded reasonably to everything but
database queries. It seemed like anything having to do with extracting
or querying data in the database took exceptionally long to respond.
Interacting with onstat did not. Unfortunately, no one thought to save
off a copy of what onstat dumped during the incident.
Our application is a touch heterogeneous, we have everything from cognos
impromptu, MS Access "applications", a massive 4gl based application
using shm to connect, to perl dbi based connections running off the same
database. The majority of users probably connect using shm connections
through the 4gl application but, MS Access makes an excessive number of
connections per user of a given application. Everything but the MS
Access and Congos Impromptu reports/applications are running from the
same machine as the engine. This machine also has an apache web server
to handle some cgi scripts, mostly perl code and the source of most of
our perl dbi connections.
When our problem occured, we had a reasonably heavy load from all
around. The machine had approximatly 560 or so proccess running total,
some 300-400 connections appearing in onstat at any one instant, and
three oninit processes eating up 90% of the cpu time solid for about an
hour. Simple queries would take 2 to 3 minutes to complete. There
were no apparent harware problems with the equipment, no excessive
numbers of locks or full logical logs, no single process or small group
of processes that looked like they were doing anything out of the
ordinary.
Fortunately, everything seemed to calm down again about an hour latter.
As a note, that day represented an exceptionally high load in comparison
to our normal operations and this could all be related to having maxed
out what our hardware could handle.
Is there any tuning that might be suggested for our system? I've
attached a copy of our config file. ( I've sense increased the frequency of my
logging cron job )
CONFIG FILE:
#**************************************************************************
#
# IBM Corporation
#
# Title: onconfig.std
# Description: IBM Informix Dynamic Server Configuration Parameters
#
#**************************************************************************
# Root Dbspace Configuration
ROOTNAME root # Root dbspace name
ROOTPATH /opt/informix/dev/root.1 # Path for device containing root
dbspace
ROOTOFFSET 0 # Offset of root dbspace into device (Kbytes)
ROOTSIZE 1024000 # Size of root dbspace (Kbytes)
# Disk Mirroring Configuration Parameters
MIRROR 1 # Mirroring flag (Yes = 1, No = 0)
MIRRORPATH /opt/informix/dev/root.1-m # Path for device containing
mirrored root
MIRROROFFSET 0 # Offset into mirrored device (Kbytes)
# Physical Log Configuration
PHYSDBS root # Location (dbspace) of physical log
PHYSFILE 50000 # Physical log file size (Kbytes)
# Logical Log Configuration
LOGFILES 60 # Number of logical log files
LOGSIZE 10000 # Logical log size (Kbytes)
# Diagnostics
MSGPATH /opt/informix/Logs/cars.log # System message log file path
CONSOLE /dev/console # System console message path
# To automatically backup logical logs, edit alarmprogram.sh and set
# BACKUPLOGS=Y
ALARMPROGRAM /opt/informix/etc/log_full.sh # Alarm program path
TBLSPACE_STATS 1 # Maintain tblspace statistics
# System Archive Tape Device
TAPEDEV /dev/rmt/1m # Tape device path
TAPEBLK 32 # Tape block size (Kbytes)
TAPESIZE 0 # Maximum amount of data to put on tape (Kbytes)
# Log Archive Tape Device
LTAPEDEV /dev/rmt/0m # Log tape device path
LTAPEBLK 32 # Log tape block size (Kbytes)
LTAPESIZE 0 # Max amount of data to put on log tape (Kbytes)
# Optical
STAGEBLOB # Informix Dynamic Server staging area
# System Configuration
SERVERNUM 0 # Unique id corresponding to a OnLine instance
DBSERVERNAME paul # Name of default database server
DBSERVERALIASES carsitcp # List of alternate dbservernames
NETTYPE ipcshm,2,500,CPU # Configure poll thread(s) for nettype
NETTYPE soctcp,1,100,NET # Configure poll thread(s) for nettype
DEADLOCK_TIMEOUT 60 # Max time to wait of lock in distributed env.
RESIDENT 1 # Forced residency flag (Yes = 1, No = 0)
MULTIPROCESSOR 1 # 0 for single-processor, 1 for
multi-processor
NUMCPUVPS 3 # Number of user (cpu) vps
SINGLE_CPU_VP 0 # If non-zero, limit number of cpu vps
to one
NOAGE 1 # Process aging
AFF_SPROC 0 # Affinity start processor
AFF_NPROCS 0 # Affinity number of processors
# Shared Memory Parameters
LOCKS 256000 # Maximum number of locks
BUFFERS 90000 # Maximum number of shared buffers
NUMAIOVPS 8 # Number of IO vps
PHYSBUFF 32 # Physical log buffer size (Kbytes)
LOGBUFF 32 # Logical log buffer size (Kbytes)
LOGSMAX 120 # Maximum number of logical log files
CLEANERS 8 # Number of buffer cleaner processes
SHMBASE 0x0 # Shared memory base address
SHMVIRTSIZE 196608 # initial virtual shared memory segment size
SHMADD 16384 # Size of new shared memory segments
(Kbytes)
SHMTOTAL 0 # Total shared memory (Kbytes).
0=>unlimited
CKPTINTVL 900 # Check point interval (in sec)
LRUS 8 # Number of LRU queues
LRU_MAX_DIRTY 4 # LRU percent dirty begin cleaning limit
LRU_MIN_DIRTY 2 # LRU percent dirty end cleaning limit
TXTIMEOUT 0x12c # Transaction timeout (in sec)
STACKSIZE 64 # Stack size (Kbytes)
# Dynamic Logging
# DYNAMIC_LOGS:
# 2 : server automatically add a new logical log when necessary. (ON)
# 1 : notify DBA to add new logical logs when necessary. (ON)
# 0 : cannot add logical log on the fly. (OFF)
#
# When dynamic logging is on, we can have higher values for
LTXHWM/LTXEHWM,
# because the server can add new logical logs during long transaction
rollback.
# However, to limit the number of new logical logs being added,
LTXHWM/LTXEHWM
# can be set to smaller values.
#
# If dynamic logging is off, LTXHWM/LTXEHWM need to be set to smaller
values
# to avoid long transaction rollback hanging the server due to lack of
logical
# log space, i.e. 50/60 or lower.
DYNAMIC_LOGS 0
LTXHWM 45
LTXEHWM 55
# System Page Size
# BUFFSIZE - OnLine no longer supports this configuration parameter.
# To determine the page size used by OnLine on your platform
# see the last line of output from the command, 'onstat -b'.
# Recovery Variables
# OFF_RECVRY_THREADS:
# Number of parallel worker threads during fast recovery or an offline
restore.
# ON_RECVRY_THREADS:
# Number of parallel worker threads during an online restore.
OFF_RECVRY_THREADS 10 # Default number of offline worker threads
ON_RECVRY_THREADS 1 # Default number of online worker threads
# Data Replication Variables
DRINTERVAL 30 # DR max time between DR buffer flushes (in sec)
DRTIMEOUT 30 # DR network timeout (in sec)
DRLOSTFOUND /opt/informix/etc/dr.lostfound # DR lost+found file path
# CDR Variables
CDR_EVALTHREADS 1,2 # evaluator threads (per-cpu-vp,additional)
CDR_DSLOCKWAIT 5 # DS lockwait timeout (seconds)
CDR_QUEUEMEM 4096 # Maximum amount of memory for any CDR queue (Kbytes)
CDR_NIFCOMPRESS 0 # Link level compression (-1 never, 0 none, 9 max)
CDR_SERIAL 0 # Serial Column Sequence
CDR_DBSPACE # dbspace for syscdr database
CDR_QHDR_DBSPACE # CDR queue dbspace (default same as catalog)
CDR_QDATA_SBSPACE # List of CDR queue smart blob spaces
# CDR_MAX_DYNAMIC_LOGS
# -1 => unlimited
# 0 => disable dynamic log addition
# >0 => limit the no. of dynamic log additions with the specified value.
# Max dynamic log requests that CDR can make within one server session.
CDR_MAX_DYNAMIC_LOGS 0 # Dynamic log addition disabled by default
# Backup/Restore variables
BAR_ACT_LOG /tmp/bar_act.log # ON-Bar Log file - not in /tmp please
BAR_DEBUG_LOG /tmp/bar_dbug.log # ON-Bar Debug Log - not in /tmp please
BAR_MAX_BACKUP 0
BAR_RETRY 1
BAR_NB_XPORT_COUNT 10
BAR_XFER_BUF_SIZE 31
RESTARTABLE_RESTORE off
BAR_PROGRESS_FREQ 0
# Informix Storage Manager variables
ISM_DATA_POOL ISMData
ISM_LOG_POOL ISMLogs
# Read Ahead Variables
RA_PAGES # Number of pages to attempt to read ahead
RA_THRESHOLD # Number of pages left before next group
# DBSPACETEMP:
# OnLine equivalent of DBTEMP for SE. This is the list of dbspaces
# that the OnLine SQL Engine will use to create temp tables etc.
# If specified it must be a colon separated list of dbspaces that exist
# when the OnLine system is brought online. If not specified, or if
# all dbspaces specified are invalid, various ad hoc queries will create
# temporary files in /tmp instead.
DBSPACETEMP temp0:temp1 # Default temp dbspaces
# DUMP*:
# The following parameters control the type of diagnostics information
which
# is preserved when an unanticipated error condition (assertion failure)
occurs
# during OnLine operations.
# For DUMPSHMEM, DUMPGCORE and DUMPCORE 1 means Yes, 0 means No.
DUMPDIR /tmp # Preserve diagnostics in this directory
DUMPSHMEM 1 # Dump a copy of shared memory
DUMPGCORE 0 # Dump a core image using 'gcore'
DUMPCORE 0 # Dump a core image (Warning:this aborts OnLine)
DUMPCNT 1 # Number of shared memory or gcore dumps for
# a single user's session
FILLFACTOR 90 # Fill factor for building indexes
# method for OnLine to use when determining current time
USEOSTIME 0 # 0: use internal time(fast), 1: get time from OS(slow)
# Parallel Database Queries (pdq)
MAX_PDQPRIORITY 100 # Maximum allowed pdqpriority
DS_MAX_QUERIES # Maximum number of decision support queries
DS_TOTAL_MEMORY # Decision support memory (Kbytes)
DS_MAX_SCANS 1048576 # Maximum number of decision support scans
DATASKIP off # List of dbspaces to skip
# OPTCOMPIND
# 0 => Nested loop joins will be preferred (where
# possible) over sortmerge joins and hash joins.
# 1 => If the transaction isolation mode is not
# "repeatable read", optimizer behaves as in (2)
# below. Otherwise it behaves as in (0) above.
# 2 => Use costs regardless of the transaction isolation
# mode. Nested loop joins are not necessarily
# preferred. Optimizer bases its decision purely
# on costs.
OPTCOMPIND 0 # To hint the optimizer
DIRECTIVES 1 # Optimizer DIRECTIVES ON (1/Default) or OFF (0)
ONDBSPACEDOWN 2 # Dbspace down option: 0 = CONTINUE, 1 = ABORT, 2
= WAIT
LBU_PRESERVE 1 # Preserve last log for log backup
OPCACHEMAX 0 # Maximum optical cache size (Kbytes)
# HETERO_COMMIT (Gateway participation in distributed transactions)
# 1 => Heterogeneous Commit is enabled
# 0 (or any other value) => Heterogeneous Commit is disabled
HETERO_COMMIT 0
SBSPACENAME # Default smartblob space name - this is where
blobs
# go if no sbspace is specified when the smartblob is
# created. It is also used by some datablades as
# the location to put their smartblobs.
SYSSBSPACENAME # Default smartblob space for use by the
Informix
# Server. This is used primarily for Informix Server
# system statistics collection.
BLOCKTIMEOUT 3600 # Default timeout for system block
SYSALARMPROGRAM /opt/informix/etc/evidence.sh # System Alarm
program path
# Optimization goal: -1 = ALL_ROWS(Default), 0 = FIRST_ROWS
OPT_GOAL -1
ALLOW_NEWLINE 0 # embedded newlines(Yes = 1, No = 0 or anything
but 1)
#
# The following are default settings for enabling Java in the database.
# Replace all occurrences of /usr/informix with the value
of /opt/informix.
#VPCLASS jvp,num=1 # Number of JVPs to start with
JVPJAVAHOME /opt/informix/extend/krakatoa/jre # JRE
installation root directory
JVPHOME /opt/informix/extend/krakatoa # Krakatoa installation
directory
JVPPROPFILE /opt/informix/extend/krakatoa/.jvpprops # JVP property
file
JVPLOGFILE /opt/informix/Logs/jvp.log # JVP log file.
JDKVERSION 1.3 # JDK version supported by this server
# The path to the JRE libraries relative to JVPJAVAHOME
JVPJAVALIB /lib/PA_RISC
# The JRE libraries to use for the Java VM
JVPJAVAVM hpi:server:java:net:zip:jpeg
# use JVPARGS to change Java VM configuration
#To display jni call
#JVPARGS -verbose:jni
# Classpath to use upon Java VM start-up (use _g version for debugging)
#JVPCLASSPATH
/opt/informix/extend/krakatoa/krakatoa_g.jar:/opt/informix/extend/krakatoa/jdbc_
g.jar
JVPCLASSPATH
/opt/informix/extend/krakatoa/krakatoa.jar:/opt/informix/extend/krakatoa/jdbc.ja
r
*******************************************************************************
Forum Note: Use "Reply" to post a response in the discussion forum.