The work around to the problem of leaving the lock file around is to place
the session's pid in the file. If you open the lock file and the pid that's
in there is a live task (kill -0 <pid> will return an error if the pid does
not exist) the the lock is valid and you can safely exit. If the pid does
not exist then the lock file is a left over and you can overwrite it with
your own pid effectively locking other iterations out. There is a potential
race condition, but this operation should take VERY little time and it is
unlikely that another log will fill to launch another copy of the alarm
program in that time.
Another thing you can do to prevent loss of log files is to follow these
steps in the alarmprogram:
- rm <LTAPEDEV>
- touch new tapefile with the final name it should have containing the
first logical log number contained in it (in the args passed in)
- link the new tapefile to <LTAPEDEV>
- run ontape
In this way if another copy of the alarm program starts, even if it happens
to get past all that you are doing to prevent it, it will simply delete your
link to <LTAPEDEV> but the actual archive file will be fine and safe and
this copy of ontape will still be able to write to it once it opens the
file. The new copy of the alarm program should be passed the next logical
log number and so will create a different file to link to <LTAPEDEV> for its
run. If the first iteration is still running the second ontape will error
out and either the first one will copy the next log file or some later alarm
program run will, so all is OK. That's how my alarm program in utils4_ak
works.
Art
On Thu, Nov 6, 2008 at 10:27 AM, Martin Fuerderer <MARTINFU@de.ibm.com>wrote:
> Hi,
>
> one possible workaround is to implement
> a "lock file mechanism". This will then work more or less
> like a semaphore ...
>
> In the script (e.g. alarm program section for ontape log backup) ,
> you would first thing try to CREATE a file with a specific name.
> You have to do the create in a way that it will fail if the file already
> exists. For this you probably have to write a trivial program
> (e.g. in C using system call "open()" with the correct flags).
>
> Only if the file creation was successful will the script continue to
> actually do the log file backup. After the log file backup is
> complete, in the script simply remove the file.
>
> If the file creation was not successful (because the file already
> exists), then do nothing (and exit). Here nothing needs to be done
> because the running log backup process will backup all eligible
> log files. And if it misses the last one (because it became full
> just between ending the log backup and removing the file, then
> the next log backup to be started will back it up, too.
>
> A concern with this approach is to make sure that the lock file
> is always removed properly. That means foremost no exit from the
> script without removing the lock file ...
> If (for whatever reason) it is left over (i.e. not removed correctly),
> then log backup will not happen because it would always think
> one is already in progress. Manually removing this leftover lock
> file will resolve that situation - but it would require manual
> interaction ... :-(
>
> Regards,
> Martin
> --
> Martin Fuerderer
> IBM Informix Development Munich, Germany
> Information Management
>
> IBM Deutschland Research & Development GmbH
> Chairman of the Supervisory Board: Martin Jetter
> Board of Management: Erich Baier
> Corporate Seat: Boeblingen, Germany
> Reg.-Gericht: Amtsgericht Stuttgart, HRB 243294
>
> "VIKAS HIVARKAR" <vikas.hivarkar@gmail.com>
> Sent by: ids-bounces@iiug.org
> 06.11.2008 12:34
> Please respond to
> ids@iiug.org
>
> To
> ids@iiug.org
> cc
>
> Subject
> Re: Logical Log files MISSING [13926]
>
> Hello Martin,
>
> Your Note :"this is not 100% error proof as there still is a gap between
> checking for a running "ontape -a -d" process and starting a new one. "
> came true and the server missed backing of one more logical log yesterday.
>
> I tried
>
> if ( `test x${BACKUPLOGS} = xY` ) then
>
> ## $BACKUP_CMD 2>&1 >> /dev/null
>
> COUNT=`ps -aef | grep "ontape -a -d" |grep -v grep | wc -l`
>
> if [ $COUNT -eq 0 ]; then
>
> $BACKUP_CMD 2>&1 |tee -a $LOG > $CPLOG
>
> EXIT_STATUS=$?
>
> it worked well untill yestarday and we missed backing yet another logical
> log
> and writing the description along with error = 2 in my log file.
>
> only 11.10 FC3 will help us when it arrives and till then we have to live
> with
> this bug.
>
> Could you please explain me what stituation helped your Note come true ?
> yes there was heavy loading at the time when this happened.
>
> Any improvement i can do with the workaround in alarmprogram.sh ?
>
> Regards
> Vikas
>
> ------------------------------------------------------------
> ----------------------------------------------------
> Re: Logical Log files MISSING
>
> Posted By: Martin Fuerderer <Send E-Mail>
> Date: Monday, 13 October 2008, at 5:05 a.m.
>
> In Response To: Re: Logical Log files MISSING (Davis Kwong)
>
> Hi,
>
> most likely (> 98%), the problem is what Davis described
> in his answer (see below). If you want to be 100% sure,
> then you have to modify your alarm program script (or
> whatever script/program actually executes the ontape
> command) to capture the output of the ontape processes
> in a file. If you see messages indicating that some
> temporary file cannot be renamed and/or removed
> because it does not exist (errno 2), then this is the
> described problem.
>
> Besides taking a level-0 archive when the situation of
> log file backup loss occured, you may consider the
> following to improve the situation:
>
> a) in your script/program add some code to first
> check whether an "ontape -a -d" command is already
> being executed (e.g. using "ps ..." output). If so, then
> do not start a new "ontape -a -d". The one that is
> running will backup all the used log files anyway
> (except the current log file).
> Note: this is not 100% error proof as there still is a
>
> gap between checking for a running "ontape -a -d"
>
> process and starting a new one.
>
> b) Capture all ontape output (stdout and stderr) to a
> file and put in your alarm program some additional
> code to check this file for above dscribed errors.
> If such an error is detected, write a warning (to online.log
> or whatever) because now surely it is time to do a
> level-0 archive ...
> Note: this may also not bee 100% error proof. An error
>
> may still slip your attention. It highly depends on the
>
> implementation of the error checking/scanning ...
>
> c) The respective defect is: idsdb00165611 (for IDS 11.10).
> While in the upcoming 11.50.xC3 this is fixed already,
> you may contact IBM Informix Tech Support (check
> your contract) to get a fix of this problem for
> your IDS 11.10 ...
>
> Regards,
> Martin
> --
> Martin Fuerderer
> IBM Informix Development Munich, Germany
> Information Management
>
> IBM Deutschland Research & Development GmbH
> Chairman of the Supervisory Board: Martin Jetter
> Board of Management: Erich Baier
> Corporate Seat: Boeblingen, Germany
> Reg.-Gericht: Amtsgericht Stuttgart, HRB 243294
>
> ids-bounces@iiug.org wrote on 10.10.2008 21:55:01:
>
> > ids-bounces@iiug.org wrote on 10/10/2008 10:01:44 AM:
> >
> > > Hello,
> > >
> > > IDS 11.10.FC2W4 on solaris 10
> > >
> > > Ontape for backing up of logical log files to a directory using the
> > > alarmprogram.sh script
> > >
> > > I see that manyyy logical log files are missing from the logical log
> > backup
> > > directory..
> > >
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:11
> > proddb_1_Log0000020073
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:11
> > proddb_1_Log0000020074
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:11
> > proddb_1_Log0000020075
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:11
> > proddb_1_Log0000020076
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:11
> > proddb_1_Log0000020077
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:11
> > proddb_1_Log0000020078
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:12
> > proddb_1_Log0000020079
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:12
> > proddb_1_Log0000020080
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:12
> > proddb_1_Log0000020081
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:12
> > proddb_1_Log0000020082
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:12
> > proddb_1_Log0000020083
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:12
> > proddb_1_Log0000020085
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:12
> > proddb_1_Log0000020086
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:12
> > proddb_1_Log0000020087
> > > -rw-rw---- 1 informix informix 20578304 Oct 7 23:12
> > proddb_1_Log0000020088
> > >
> > > i checked in onstat -l and found that log with uniqid
> > >
> > > a3180d77e0 142 U-B---- 20066 5:1290053 10000 10000 100.00
> > > a3180d7848 143 U-B---- 20067 5:1300053 10000 10000 100.00
> > > a3180d78b0 144 U-B---- 20068 5:1310053 10000 10000 100.00
> > > a3180d7918 145 U-B---- 20069 5:1320053 10000 10000 100.00
> > > a3180d7980 146 U-B---- 20070 5:1330053 10000 10000 100.00
> > > a3180d79e8 147 U-B---- 20071 5:1340053 10000 10000 100.00
> > > a3180d7a50 148 U-B---- 20072 5:1350053 10000 10000 100.00
> > > a3180d7ab8 149 U-B---- 20073 5:1360053 10000 10000 100.00
> > > a3180d7b20 150 U-B---- 20074 5:1370053 10000 10000 100.00
> > > a3180d7b88 151 U-B---- 20075 5:1380053 10000 10000 100.00
> > > a3180d7bf0 152 U-B---- 20076 5:1390053 10000 10000 100.00
> > > a3180d7c58 153 U-B---- 20077 5:1400053 10000 10000 100.00
> > > a3180d7cc0 154 U-B---- 20078 5:1410053 10000 10000 100.00
> > > a3180d7d28 155 U-B---- 20079 5:1420053 10000 10000 100.00
> > > a3180d7d90 156 U-B---- 20080 5:1430053 10000 10000 100.00
> > > a3180d7df8 157 U-B---- 20081 5:1440053 10000 10000 100.00
> > > a3180d7e60 158 U-B---- 20082 5:1450053 10000 10000 100.00
> > > a3180d7ec8 159 U-B---- 20083 5:1460053 10000 10000 100.00
> > > a3180d7f30 160 U-B---- 20084 5:1470053 10000 10000 100.00
> > > a3180d7f98 161 U-B---- 20085 5:1480053 10000 10000 100.00
> > > a3180d8028 162 U-B---- 20086 5:1490053 10000 10000 100.00
> > > a3180d8090 163 U-B---- 20087 5:1500053 10000 10000 100.00
> > > a3180d80f8 164 U-B---- 20088 5:1510053 10000 10000 100.00
> > >
> > > So the Logical log 20084 is missing from the logical log backup
> > directory.
> > > and there are manyyyy more...
> > > Plssss let me know what could be the reason and what more do i have to
>
> > look
> > > for.
> >
> > This is a bug that is fixed in 11.50xC3
> >
> > The problem is that when log files are filled rapidly (matter of
> seconds)
> > and
> > the alarmprogram.sh is used to perform automatic ontape backup to
> > directory,
> > the alarmprogram.sh which invokes ontape may cause ontape to run when
> > there,
> > is already another ontape running that is backing up a logfile to
> > directory.
> > This can cause the loss of the logfile that was being backed up by the
> > ontape that was already running.
> >
> > The current workaround is that when this is detected, take a level 0
> > archive.
> > Since you are on 11.10, I'll see if we can backport the fix to 11.10.
> >
> > Thanks. Davis.
> >
> > >
> > > the logical log directory has RWX to user and group informix and no
> other
> > than
> > > the dbas so at first i dont think any one would delete that log.
> > >
> > > any idea what should i do with this ....
> > >
> > >
> > >
>
>
>
> *******************************************************************************
>
> Forum Note: Use "Reply" to post a response in the discussion forum.
>
>
>
> *******************************************************************************
> Forum Note: Use "Reply" to post a response in the discussion forum.
>
>
--
Art S. Kagel
Oninit (www.oninit.com)
IIUG Board of Directors (art@iiug.org)
Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Oninit, the IIUG, nor any other organization
with which I am associated either explicitly or implicitly. Neither do
those opinions reflect those of other individuals affiliated with any entity
with which I am affiliated nor those of the entities themselves.