Category Archives: Dataguard

DataGuard Switchover Recipe MRC

Currently there is a status of UNRESOLVABLE GAP
on database DBFSXD01 and DBFSXD02

mrcche1de [DBFSXD021]> sql

SQL*Plus: Release 12.1.0.2.0 Production on Mon Jul 23 14:54:50 2018
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.

Total System Global Area 8589934592 bytes
Fixed Size                  7663544 bytes
Variable Size            1862271048 bytes
Database Buffers         6710886400 bytes
Redo Buffers                9113600 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-01677: standby file name conversion parameters differ from other instance

status for listener on this node

Service "DBFSXD02.mrcconsulting.com" has 3 instance(s).
Instance "DBFSXD021", status BLOCKED, has 1 handler(s) for this service...
Instance "DBFSXD022", status READY, has 1 handler(s) for this service...
Instance "DBFSXD023", status READY, has 1 handler(s) for this service...
BLOCKED

During rolling patching of the cluster

First stop the service, then disable the service, then shutdown the instance

Non-rolling we will stop database

To stop service and instance on cluster

   srvctl stop service -d <DB_NAME> -s <SERVICE_NAME> -i <INSTANCE_NAME>
   srvctl stop service -d DBFSXB01 -s <SERVICE_NAME>  -i DBFSXB011
   srvctl disable service -d <DB_NAME> -s <SERVICE_NAME> -i <INSTANCE_NAME>
   srvctl stop instance -d <DB_NAME> -n <HOSTNAME>

To start service and instance on cluster

   srvctl start instance -d <DB_NAME> -n <HOSTNAME>
   srvctl enable service -d <DB_NAME> -s <SERVICE_NAME> -i <INSTANCE_NAME>
   srvctl start service -d <DB_NAME> -s <SERVICE_NAME> -i <INSTANCE_NAME>

Then we stop cluster as well
This is for rolling patching only

For non-rolling commands

Stop DB and service on the cluster

   srvctl stop service -d <DB_NAME>
   srvctl disable service -d <DB_NAME> -s <SERVICE_NAME>
   srvctl stop database -d <DB_NAME>

To start DB and instance on cluster

   srvctl start database -d <DB_NAME>
   srvctl enable service -d <DB_NAME> -s <SERVICE_NAME>
   srvctl start service -d <DB_NAME>

After stopping database/service we can stop CRS (cluster).
and is this on node for MRP

   alter database recover managed standby database cancel;

to disconnect don’t run anything just stop service and DB

DG is looking for thread1 archive 926 which is applied earlier and got deleted
so we need to re-image
As we don’t have backup we need to restore DG from primary
we just have it doing DBFS

And increase archive retention from 7 days to 15 days so that it will kept for long
it may take 2 hr est if no issues

   sqlplus "/as sysdba"
   startup nomount pfile='/oracle/admin/DBFSXB01/pfile/initstdby.ora'

   and then

   rman target sys/password@DBFSXD01 auxiliary sys/password@DBFSXB01
   duplicate target database for standby from active database;

RMAN completed

now both Primary and standby are in sync

Primary : DBFSXD01
Standby : DBFSXB01

We already recreated standby for D01 to correct the issue

Rebuilt standby from primary using RMAN duplicate

if we have a failure how long DB duplicate from active will take
Actual copy took less than 30 mins. as DB is very small
Total time it took around 2 hrs.

i think that if we have the message we got at first we will have to rebuild that database
in the future and to prevent it from happening again
we will need to make sure we shutdown and startup properly

increase archive retention to 15 days. so that we don’t need to rebuild as long as archives are not deleted in that 15 days.
or we make sure they are not deleted before being applied to the standby side
current script already checking applied and more than 7 days old
in this case it was applied and got deleted after 7 days was applied?

due to instance crash, it is looking for instance1 archive while recovering instance2 archive for crash recovery
but instane1 archive got deleted as it was applied previously

increasing archive retention to 15 days will avoid this issue

on node 1 of VA

I should first take down the mrp process on the standby side

And make sure no lag between primary and standby before switchover

First we need to stop all instances except first node on each side that is both primary and standby, normally MRP will run on standby first node.

After stopping other instances on both primary and standby

Run this command on primary :

   SELECT DATABASE_ROLE from v$DATABASE;
   SELECT SWITCHOVER_STATUS FROM V$DATABASE;
   ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;

after this is done we need to run below commands on standby side

   SELECT SWITCHOVER_STATUS FROM V$DATABASE;
   ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
   alter database open;

and then modify CRS configuration with SRVCTL as primary and standby.

This command we need to run on primary

   srvctl modify database -d <new_primary> -r primary -s open 

Now we need to start instances on all nodes of primary with SRVCTL
on new standby we need to stop existing instance and CRS configuration as standby by using below command

   srvctl modify database -d <new_standby_db_name> -r physical_standby -s open

Now start DB with SRVCTL command

Recap again:

unmount all DBFS filesystems first on each node

fusermount -uz /dbfs_test01

First on the standby:

srvctl stop instance -d DBFSXB01 -i DBFSXB012,DBFSXB013

Second on the primary

srvctl stop instance -d DBFSXD01 -i DBFSXD012,DBFSXD013

once you have started all instances with SRVCTL
then we need to start MRP on standby


srvctl modify database -d DBFSXB01 -r physical_standby -s open

Here we are modifying the shell of what the name was previously


srvctl modify database -d DBFSXD01 -r physical_standby -s open

D01 and B02

D01 primary for B01

D02 primary for B02

i had failed over D01

D01 Primary for B01

we don’t need to any thing with B01 on TX side just we need to make D01 as standby with this command
Previously this was the primary database


srvctl modify database -d DBFSXD01 -r physical_standby -s open

then


srvctl start database -d DBFSXD01

then we need to start MRP on just one node

And we need to open new primary with below command once open DB and then stop


srvctl modify database -d DBFSXB01 -r primary -s open

then we need to start DB on all nodes with SRVCTL
then we need to start MRP on Standby side

mrche1de [DBFSXB011]-> srvctl status database -d DBFSXB01
Instance DBFSXB011 is running on node mrche1de
Instance DBFSXB012 is running on node mrche1df
Instance DBFSXB013 is running on node mrche1dg

this is on VA cluster

Then we can start MRP on node 1 in TX in D01


alter database recover managed standby database using current logfile disconnect from session;

DR_Mon.ksh


#!/bin/ksh
#set -x
Set_Env()
{
HOSTNAME=`hostname`
HR2SPOOL=/tmp/spool4drmonh2.rep;
MAXSPOOL=/tmp/spool4drmonmax.rep;
DRSPOOLFILE=/tmp/spool4dr.rep;
MREPORT="/oracle/admin/scripts/logs/dr_monitor_${ORA_SID}_at_`date +%H`.log"
rm -f $MREPORT
REMOTE_SID=DPODSB01
#NOTIFY2="mike.culp@gmail.com,test.culp@gmail.com"
NOTIFY2="mike.culp@gmail.com"
}

Get_Info()
{
sqlplus -s /<<EOF
set echo off verify off head off feed off pages0 trimspool on;
spool $MAXSPOOL;
select   thread#, max(sequence#)
from     v\$archived_log val, v\$database vdb
where    val.resetlogs_change# = vdb.resetlogs_change#
group by thread# order by 1;
spool off;
set head on pages 100;
spool $HR2SPOOL;
select thread#, count(1) Count from v\$log_history where first_time>sysdate - 1 group by thread# order by 1;
spool off;
exit
EOF
}

Get_From_DR()
{
sqlplus -s sys/ora98dba@$REMOTE_SID as sysdba<<EOF
set echo off verify off head off feed off pages0 trimspool on;
spool $DRSPOOLFILE;
select   thread#, max(sequence#)
from     v\$archived_log val, v\$database vdb
where    val.resetlogs_change# = vdb.resetlogs_change#
and      val.applied='YES'
group by thread# order by 1;
spool off;
exit
EOF
}

Compile_Data()
{
echo "" >> $MREPORT
echo "Following are the Current Log Numbers in Primary and DR" >> $MREPORT
echo "" >> $MREPORT
GTOTAL=0
echo "INST PRIMARY  DR  DIFF" >> $MREPORT
echo "~~~~~~~~~~~~~~~~~~~~~~~~" >> $MREPORT
cat $MAXSPOOL | while read TNO MNO
do
RMNO=`cat $DRSPOOLFILE | egrep -v "Lagging|Applied|Average" | grep " $TNO " | awk '{print $2}'`
let THEDIFF=$MNO-$RMNO
let GTOTAL=$GTOTAL+$THEDIFF;
echo "$TNO     $MNO    $RMNO   $THEDIFF" >> $MREPORT
done
echo "" >> $MREPORT
echo "DR is lagging behind $GTOTAL logs from Primary " >> $MREPORT
echo "" >> $MREPORT
echo "Following are Log Generation in last one day " >> $MREPORT
cat $HR2SPOOL >> $MREPORT
echo "" >> $MREPORT
}

###########
# M A I N #
###########
if [ "$1 " = " " ]
then
echo "

Usage : DR_Mon.ksh <DBNAME>

"
exit
fi

DBNAME=$1

if [ `ps -ef | grep -v grep | grep pmon | grep -c ${DBNAME}` -eq 0 ]
then
print "\n\nDont see the database `tput smso`${DBANME}`tput rmso` Running... It may be running on local failover node. \n\n"
exit 1
else
export ORA_SID=`ps -ef|grep pmon|grep ${DBNAME}|awk '{print $8}'|sed 's/ora_pmon_//'`
fi

if [ "$ORA_SID " != " " ]
then
export NEW_ORACLE_SID=${ORA_SID}; . ~/.profile   > /dev/null
fi

Set_Env;
Get_Info;
Get_From_DR;
Compile_Data;
cat $MREPORT | mailx -s "`date +%x_%X`:: $ORA_SID :DR Progress Detail" -r  dg.consultants@mrcconsulting.com "$NOTIFY2"

Dataguard Services Setup via SRVCTL

Starting with 11gR2, with the advent of the new Grid Infrastructure software, we have the ability to set up services in our databases that can now automatically be tied to the specific role that a database is playing in our Data Guard configuration.  Data Guard Broker must be configured and running to use this new feature.

This can be used for both RAC and single instance databases, just so long as Grid Infrastructure had been installed.  The services would have to be set up using SRVCTL rather than DBMS_SERVICE.

The great advantage to this for administrators is to simplify the management of client connections to our databases.  We can now have clients connect to primary databases, logical standby databases, physical standby databases and snapshot databases.  The challenge is to ensure that a client always connects to the “right” database.  By connecting via services, and linking the service to only run in a database playing a specific role, we can simplify connection headaches.

The basic syntax for adding a service would be:

SRVCTL
ADD SERVICE -d <db unique name> -s <service name>
-l <PRIMARY | PHYSICAL STANDBY | LOGICAL STANDBY | SNAPSHOT STANDBY -y
AUTOMATIC | MANUAL

The options for -l indicate which role the database must be in for the service to be running.  -y indicates whether the service should be started automatically when the database instance starts in the specified role.

However, if the database is already running, you would have to manually start the service using:

SRVCTL
START SERVICE -d <db unique name> -s <service name>

Oracle Dataguard Auto Block Recovery

In this post, I will demonstrate automatic block media recovery introduced in 11g Data Guard. This feature can be used only if active dataguard is enabled. This feature enables the automatic repair of corrupt blocks transparent to the user and application. If corruption occurs on the primary database, block media recovery is performed automatically using a good copy of the block from the standby database and vice versa.

OVERVIEW:
– Create and populate a test table on primary
– Corrupt two blocks containing table data
– Check that blocks corrupted using dbv
– Flush buffer cache and issue query for corrupt blocks
– The query succeeds as blocks have been repaired automatically
– Verify automatic BMR using

. Alert log of Primary
. dbv utility
— IMPLEMENTATION –
– Create a test tablespace on primary


PRI> drop tablespace test including contents and datafiles;

create tablespace test datafile ‘/u01/app/oracle/oradata/orcl/test01.dbf’ size 30M;

– CREATE A TEST USER ON PRIMARY

PRI> create user test 
     identified by test 
     default tablespace test;
grant connect, resource to test;

– CREATE AND POPULATE TABLE TEST_TAB IN TEST TABLESPACE ON PRIMARY
PRI> Create table test.test_tab as select * from hr.employees;
insert into test.test_tab select * from test.test_tab;
/
commit;

select table_name, 
       tablespace_name 
  from dba_tables 
 where table_name=’TEST_TAB';
select count(*) 
  from test.test_tab;
– CHECK THE LEAST BLOCK OCCUPIED BY THE TABLE IN DATAFILE
PRI> select min(dbms_rowid.rowid_block_number(rowid))
       from test.test_tab;
MIN(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
—————————————–
131
– corrupt two blocks
#dd if=/dev/zero of=/u01/app/oracle/oradata/orcl/test01.dbf bs=8192 conv=notrunc seek=136 count=2
– check if datafile corrupted

[oracle@host1 ~]$ dbv file=/u01/app/oracle/oradata/orcl/test01.dbf blocksize=8192
Page 136 is marked corrupt
Corrupt block relative dba: 0x01800088 (file 6, block 136)
Completely zero block found during dbv:
Page 137 is marked corrupt
Corrupt block relative dba: 0x01800089 (file 6, block 137)
Completely zero block found during dbv:
Total Pages Marked Corrupt : 2
-- FLUSH BUFFER CACHE AND ISSUE QUERY ON CORRUPTED BLOCKS
- QUERY SUCCEEDS AS BLOCKS HAVE BEEN REPAIRED AUTOMATICALLY

PRI>alter system flush buffer_cache;
select count(*) from test.test_tab;
– CHECK THE ALERT LOG OF PRIMARY DATABASE
# tailf /u01/app/oracle/diag/rdbms/orcl/orcl/trace/alert_orcl.log
——————
Tue Dec 18 12:43:01 2012
ALTER SYSTEM: Flushing buffer cache
Tue Dec 18 12:43:26 2012
– BLOCK 136 IS FOUND CORRUPT AND IS DUMPED IN TRACE FILE
Hex dump of (file 6, block 136) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_12608.trc
Corrupt block relative dba: 0x01800088 (file 6, block 136)
Completely zero block found during multiblock buffer read
Reading datafile ‘/u01/app/oracle/oradata/orcl/test01.dbf’ for corruption at rdba: 0x01800088 (file 6, block 136)
Reread (file 6, block 136) found same corrupt data
– REQUEST FOR AUTO BMR SENT FOR BLOCK 136
Requesting Auto BMR for (file# 6, block# 136)
– BLOCK 137 IS FOUND CORRUPT AND IS DUMPED IN TRACE FILE
Hex dump of (file 6, block 137) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_12608.trc
Corrupt block relative dba: 0x01800089 (file 6, block 137)
Completely zero block found during multiblock buffer read
Reading datafile ‘/u01/app/oracle/oradata/orcl/test01.dbf’ for corruption at rdba: 0x01800089 (file 6, block 137)
Reread (file 6, block 137) found same corrupt data
– REQUEST FOR AUTO BMR SENT FOR BLOCK 136
Requesting Auto BMR for (file# 6, block# 137)
– AUTO BMR SUCCESSFUL FOR BLOCK 136
Waiting Auto BMR response for (file# 6, block# 136)
Auto BMR successful
– AUTO BMR SUCCESSFUL FOR BLOCK 137
Waiting Auto BMR response for (file# 6, block# 137)
Auto BMR successful
– CHECK THAT BLOCKS HAVE BEEN REPAIRED
oracle@host1 ~]$ dbv file=/u01/app/oracle/oradata/orcl/test01.dbf blocksize=8192
Total Pages Marked Corrupt : 0

Thanks for your time. Your comments and suggestions are welcome !!!
References:

ABMR – Automatic Block Media Recovery

————————————————————————————–
Related links:
HOME
11g Dataguard Index
11g DataGuard : Automatic Client Failover

11g DataGuard : Automatic Job Failover
11g DataGuard: Flashback Standby After Resetlogs On Primary
Flashback Through Role Transition For Physical Standby
Recover Standby Datafile From Primary
———————————-

——————

Data Guard Protection Modes

Maximum Availability

This protection mode provides the highest level of data protection that is possible without compromising the availability of a primary database. Under normal operations, transactions do not commit until all redo data needed to recover those transactions has been written to the online redo log AND based on user configuration, one of the following is true:

  • redo has been received at the standby, I/O to the standby redo log has been initiated, and acknowledgement sent back to primary
  • redo has been received and written to standby redo log at the standby and acknowledgement sent back to primary

If the primary does not receive acknowledgement from at least one synchronized standby, then it operates as if it were in maximum performance mode to preserve primary database availability until it is again able to write its redo stream to a synchronized standby database.

If the primary database fails, then this mode ensures no data loss occurs provided there is at least one synchronized standby in the Oracle Data Guard configuration. See Performance Versus Protection in Maximum Availability Mode for information about the redo transport settings necessary to support Maximum Availability and associated trade-offs.

Transactions on the primary are considered protected as soon as Oracle Data Guard has written the redo data to persistent storage in a standby redo log file. Once that is done, acknowledgment is quickly made back to the primary database so that it can proceed to the next transaction. This minimizes the impact of synchronous transport on primary database throughput and response time. To fully benefit from complete Oracle Data Guard validation at the standby database, be sure to operate in real-time apply mode so that redo changes are applied to the standby database as fast as they are received. Oracle Data Guard signals any corruptions that are detected so that immediate corrective action can be taken.

Maximum Protection

Maximum protection is similar to maximum availability but provides an additional level of data protection in the event of multiple failure events. Unlike maximum availability, which allows the primary to continue processing if it is unable to receive acknowledgement from a standby database, maximum protection shuts the primary database down rather than allowing it to continue processing transactions that are unprotected.

Because this data protection mode prioritizes data protection over primary database availability, Oracle recommends that a minimum of two standby databases be used to protect a primary database that runs in maximum protection mode to prevent a single standby database failure from causing the primary database to shut down.

Note:

Asynchronously committed transactions are not protected by Oracle Data Guard against loss until the redo generated by those transactions has been written to the standby redo log of at least one synchronized standby database.

For more information about the asynchronous commit feature, see:

 

Maximum Performance

This protection mode provides the highest level of data protection that is possible without affecting the performance of a primary database. This is accomplished by allowing transactions to commit as soon as all redo data generated by those transactions has been written to the online log. Redo data is also written to one or more standby databases, but this is done asynchronously with respect to transaction commitment, so primary database performance is unaffected by delays in writing redo data to the standby database(s).

This protection mode offers slightly less data protection than maximum availability mode and has minimal impact on primary database performance.

This is the default protection mode.