Sunday, August 30, 2015

Tuning JVM settings correctly

Tuning JVM settings correctly to avoid full Garbage Collection. (Doc ID 1334842.1)

DISCONNECT - INFO  - conn=1317882 reason="other" msg="Exception caught while polling client connection LDAPS.X.Y.Z.W.64087 -- java.io.IOException: Connection reset by peer"

GC with old space cleanup

JVM space is separated in 3 different areas:
  • Young Space
  • Old Space
  • Permanent Space
The way GC works is that there are partial GC's to recycle Young Space and Full GC to recycle Old Space.

There is also a promotion mechanism from young space to old space and old space to permanent, but the goal of this article is not to make a detailed presentation of the JVM but to be able to understand it's internal behaviour to diagnose it.

Full GC Old space is extremely costly, and is the cause of DPS being unreachable for limited amount of time.


Using Jmap and Jstat
You need to collect information about JVM  heap. 

Modifying JVM settings:
Old JVM settings (causing DPS outage)
  • jvmArgs=-Xmx2000M  -Xms2000M -XX:NewRatio=2

 New JVM settings:
  • jvmArgs: -Xms2g -Xmx2g -Xmn1g -XX:SurvivorRatio=4 -XX:+UseParNewGC  -XX:+UseConcMarkSweepGC

Explanation of parameters:
  • Xms/Xmx: size of heap (2g, size at startup)
  • Xmn: size of young space (1g to maximize young space, to avoid old space promotion)
  • Garbage collation algorithm option choosen
    • -XX:+UseConcMarkSweepGC (concurrent low pause Collector)
    • -XX:+UseParNewGC To enable a parallel young generation GC with the concurrent GC
Results

Before applying JVM Setting Changes (system running since 6 days)

time: 06010018
 S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
 0.00  41.27  81.54  74.68  49.73    849   39.436     1    8.526   47.962
 0.00  41.27  81.54  74.68  49.73    849   39.436     1    8.526   47.962
 0.00  41.27  81.56  74.68  49.73    849   39.436     1    8.526   47.962
 0.00  41.27  81.58  74.68  49.73    849   39.436     1    8.526   47.962
 0.00  41.27  81.66  74.68  49.73    849   39.436     1    8.526   47.962


time: 06160918
 S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
56.63   0.00  86.07  73.87  49.96   2596  104.367     5   33.984  138.351
56.63   0.00  86.09  73.87  49.96   2596  104.367     5   33.984  138.351
56.63   0.00  86.11  73.87  49.96   2596  104.367     5   33.984  138.351
56.63   0.00  86.21  73.87  49.96   2596  104.367     5   33.984  138.351
56.63   0.00  86.43  73.87  49.96   2596  104.367     5   33.984  138.351

after applying JVM changes (system running after 11 days)

time: 06160948
 S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
 0.00  29.60  87.95   0.00  61.88      3    0.197     0    0.000    0.197
 0.00  29.60  87.95   0.00  61.88      3    0.197     0    0.000    0.197
 0.00  29.60  87.98   0.00  61.88      3    0.197     0    0.000    0.197
 0.00  29.60  88.06   0.00  61.88      3    0.197     0    0.000    0.197
 0.00  29.60  88.06   0.00  61.88      3    0.197     0    0.000    0.197

time: 06271918
 S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
 8.40   0.00  66.56  14.22  85.35   1134   38.757     0    0.000   38.757
 8.40   0.00  66.56  14.22  85.35   1134   38.757     0    0.000   38.757
 8.40   0.00  66.73  14.22  85.35   1134   38.757     0    0.000   38.757
 8.40   0.00  66.79  14.22  85.35   1134   38.757     0    0.000   38.757
 8.40   0.00  66.86  14.22  85.35   1134   38.757     0    0.000   38.757


Conclusion
The difference between them both is:

before: (6 days run time)
  • YGC=2596 YGCT=104.367     FGC=5   GCT=33.984  FCGT=138.351

after: (11days run time)
  • YGCT=1134   YGCT=38.757     FGC=0    GCT=0.000   FCGT=38.757


The major differences between before and after are:
  • Young generation garbage collection time reduced of 2/3. (38.757 versus 104.367)
  • No old space full garbage collection (0 compared to 5) (which is a killer for performance)
overall GC time was 38s in 11 days versus 140s in 6 days. Overall GC time reduced between 8 to 10.



Sunday, August 23, 2015

Data Guard Notes

Data Guard Modes: Oracle Data Guard can operate in 3 different modes:
Maximum Protection: Transactions are not allowed to commit until all redo data are written to the online redo logs and propagated to at least one synchronized secondary database. If for any reason, the primary database cannot propagate its redo stream to one secondary database, the primary will shutdown to ensure maximum protection mode.
Maximum Performance: Transactions are not allowed to commit as soon as the redo are written to the online redo logs. The redo stream is asynchronously propagated to the secondary databases to ensure maximum performance mode.
Maximum Availability: Transactions are not allowed to commit until all redo data are written to the online redo logs and propagated to at least one synchronized secondary database. If for any reason, the primary database cannot propagate its redo stream to one secondary database, the primary will NOT shutdown and operates as it were in maximum performance mode until issues are fixed.
  • LOG_ARCHIVE_DEST_n .- It controls different aspects of how redo transport services transfer redo data from primary database destination to a standby.
    This parameter has several attributes that are needed to setup your Dataguard environment, I will only mention the critical ones:
    • ASYNC .-This is the default, the redo data generated by a transaction need not have been received at a destination which has this attribute before that transaction can commit.
      or
    • SYNC .-The redo data generated by a transaction must have been received by every enabled destination that has this attribute before that transaction can commit.
    • AFFIRM and NOAFFIRM .- Control whether a redo transport destination acknowledges received redo data before or after writing it to the standby redo log. The default is NOAFFIRM.
    • DB_UNIQUE_NAME .- Specifies a unique name for the database at this destination. You must specify a name; there is no default value.
    • VALID_FOR .-Identifies when redo transport services can transmit redo data to destinations based on the following factors:
      • redo_log_type .-whether online redo log files, standby redo log files, or both are currently being archived on the database at this destination
      • database_role .-whether the database is currently running in the primary or the standby role
  • FAL_SERVER .-Specifies the FAL (fetch archive log) server for a standby database. The value is an Oracle Net service name.
  • FAL_CLIENT .-Specifies the FAL (fetch archive log) client name that is used by the FAL service, configured through the FAL_SERVER initialization parameter, to refer to the FAL client.
    The value is an Oracle Net service name, which is assumed to be configured properly on the FAL server system to point to the FAL client (standby database).
  • LOG_ARCHIVE_CONFIG .- Enables or disables the sending of redo logs to remote destinations and the receipt of remote redo logs.
    This parameter has several attributes, the most important for this exercise is below:
    • DG_CONFIG .- Specifies a list of up to 30 unique database names (defined with the DB_UNIQUE_NAME initialization parameter) for all of the databases in the Data Guard configuration.

db_name=GALAXY
db_block_size=8192
db_files=1000

db_cache_size=200M
java_pool_size=100M
large_pool_size=100M
shared_pool_size=250M
streams_pool_size=80M

pga_aggregate_target=400M
workarea_size_policy=AUTO

diagnostic_dest=/dwdata/test/diag

control_files='/dwdata/test/db/galaxy_control01.ctl','/dwdata/test//db/galaxy_control02.ctl'
compatible='10.2.0.3.0'

DB_UNIQUE_NAME= GALAXY
LOG_ARCHIVE_CONFIG= 'DG_CONFIG=(GALAXY,QUANTUM)'

log_archive_dest_1=         'LOCATION=/dwdata/test/arch VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=GALAXY'
log_archive_format=         galaxy_%t_%s_%r.arc
log_archive_dest_state_1=   enable

LOG_ARCHIVE_DEST_2='SERVICE=QUANTUM LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=QUANTUM'
LOG_ARCHIVE_DEST_STATE_2= enable
log_archive_dest_3 =       'LOCATION=/dwdata/test/arch2 VALID_FOR=(STANDBY_LOGFILES,STANDBY_ROLE) DB_UNIQUE_NAME=GALAXY'
LOG_ARCHIVE_DEST_STATE_3= enable

REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE

FAL_SERVER=QUANTUM
FAL_CLIENT=GALAXY

DB_FILE_NAME_CONVERT='/dwhr5_4/test/db','/dwdata/test/db'
LOG_FILE_NAME_CONVERT='/dwhr5_4/test/db','/dwdata/test/db'

STANDBY_FILE_MANAGEMENT=AUTO
AQ_TM_PROCESSES=3
#job_queue_processes=10

undo_management='AUTO'#
undo_retention=10800
undo_tablespace='UNDO'

nls_length_semantics='BYTE'

open_cursors=4000
session_cached_cursors=400
processes=600
sessions=600

parallel_max_servers=2
parallel_min_servers=0

max_dump_file_size='20480'# trace file size
timed_statistics=true

audit_file_dest=/dwdata/test/audit

PHYSICAL STANDBY CREATION STEP-BY-STEP
Enable Force Logging and Enable Archive log mode
SQL> select OPEN_MODE, DATABASE_ROLE, LOG_MODE, protection_mode, protection_level ,force_logging from v$database;

OPEN_MODE  DATABASE_ROLE    LOG_MODE     PROTECTION_MODE      PROTECTION_LEVEL     FOR
---------- ---------------- ------------ -------------------- -------------------- --------------------------------------
READ WRITE PRIMARY          ARCHIVELOG   MAXIMUM PERFORMANCE  MAXIMUM PERFORMANCE  YES

CREATE PASSWORD FILE
Create a password file if one does not already exist. Every database in a Data Guard
configuration must use a password file, and the password for the SYS user must be
identical on every system for redo data transmission to succeed
EX: orapwd file=$ORACLE_HOME/dbs/orapw$ORACLE_SID password=secret
Entries=15 force=y (OPTIONAL)
Or Copy from PROD & Rename it.

Configure a Standby Redo Log (Optional)

Step 1 Ensure log file sizes are identical on the primary and standby databases.
select group#, thread#, bytes, members from v$log;
select * from v$logfile;

Step 2 Determine the appropriate number of standby redo log file groups.
Step 3 Verify related database parameters and settings.
Step 4 Create standby redo log file groups.

SQL> ALTER DATABASE ADD STANDBY LOGFILE THREAD 5 ('/oracle/dbs/log1c.rdo','/oracle/dbs/log2c.rdo') SIZE 500M;
SQL> ALTER DATABASE ADD STANDBY LOGFILE GROUP 10 ('/oracle/dbs/log1c.rdo','/oracle/dbs/log2c.rdo') SIZE 500M;

Step 5 Verify the standby redo log file groups were created.
SQL> select group#, type, member from v$logfile order by group#, member;
SQL> SELECT GROUP#,THREAD#,SEQUENCE#,ARCHIVED,STATUS FROM V$STANDBY_LOG;

Set Primary Database Initialization Parameters

DB_NAME=PROD
DB_UNIQUE_NAME=GHX
LOG_ARCHIVE_CONFIG='DG_CONFIG=(PROD,PRODDR,GHX)'
CONTROL_FILES=
log_archive_dest_1='location=/ecdata/PROD/PRODarch valid_for=(all_logfiles,all_roles) db_unique_name=PROD'
log_archive_dest_2='service=PRODDR LGWR ASYNC valid_for=(online_logfiles,primary_role) db_unique_name=PRODDR'
LOG_ARCHIVE_DEST_3='SERVICE=GHX LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=GHX'

LOG_ARCHIVE_DEST_STATE_1=ENABLE
LOG_ARCHIVE_DEST_STATE_2=ENABLE
LOG_ARCHIVE_DEST_STATE_3=ENABLE
REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE
LOG_ARCHIVE_FORMAT=%t_%s_%r.arc
LOG_ARCHIVE_MAX_PROCESSES=2

Primary Database: Standby Role Initialization Parameters
FAL_SERVER=GHX
FAL_CLIENT=PROD
DB_FILE_NAME_CONVERT='GHX','PROD'
LOG_FILE_NAME_CONVERT='/arch1/ghx/','/arch1/prod/','/arch2/ghx/','/arch2/prod/'
STANDBY_FILE_MANAGEMENT=AUTO

Enable Archiving

SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP MOUNT;
SQL> ALTER DATABASE ARCHIVELOG;
SQL> ALTER DATABASE OPEN;

Step-by-Step Instructions for Creating a Physical Standby Database

Create a Backup Copy of the Primary Database Datafiles
You can use any backup copy of the primary database to create the physical standby database, as long as you have the necessary archived redo log files to completely
recover the database. Oracle recommends that you use the Recovery Manager utility (RMAN).

Create a Control File for the Standby Database
If the backup procedure required you to shut down the primary database, issue the following SQL*Plus statement to start the primary database:
SQL> STARTUP MOUNT;
Then, create the control file for the standby database, and open the primary database to user access, as shown in the following example:
SQL> ALTER DATABASE CREATE STANDBY CONTROLFILE AS '/tmp/GHX.ctl';
SQL> ALTER DATABASE OPEN;

Prepare an Initialization Parameter File for the Standby Database
Perform the following steps to create a standby initialization parameter file.
Step 1 Copy the primary database parameter file to the standby database.
SQL> CREATE PFILE='/tmp/initboston.ora' FROM SPFILE;

Step 2 Set initialization parameters on the physical standby database.

Modifying Initialization Parameters for a Physical Standby Database
.
.
.
DB_NAME=PROD
DB_UNIQUE_NAME=ghx
LOG_ARCHIVE_CONFIG='DG_CONFIG=(PROD,PRODDR,GHX)'
CONTROL_FILES='/arch1/ghx/control1.ctl', '/arch2/ghx/control2.ctl'
DB_FILE_NAME_CONVERT='PROD','GHX'
LOG_FILE_NAME_CONVERT='/arch1/PROD/','/arch1/boston/','/arch2/chicago/','/arch2/boston/'
LOG_ARCHIVE_FORMAT=log%t_%s_%r.arc
log_archive_dest_1=’location=/ecarch/PROD/PRODarch valid_for=(all_logfiles,all_roles) db_unique_name=GHX’
log_archive_dest_3=’service=PROD LGWR ASYNC valid_for=(online_logfiles,primary_role) db_unique_name=PROD’

LOG_ARCHIVE_DEST_STATE_1=ENABLE
LOG_ARCHIVE_DEST_STATE_2=ENABLE
REMOTE_LOGIN_PASSWORDFILE=EXCLUSIVE
STANDBY_FILE_MANAGEMENT=AUTO
FAL_SERVER=PROD
FAL_CLIENT=GHX

Copy Files from the Primary System to the Standby System
Use an operating system copy utility to copy the following binary files from the primary system to the standby system:
Backup data files
Standby control file
Initialization parameter file

Set Up the Environment to Support the Standby Database
Create a password file.

On platforms other than Windows, create a password file, and set the password for the SYS user to the same password used by the SYS user on the primary database.
The password for the SYS user on every database in a Data Guard configuration must be identical for redo transmission to succeed.

Configure listeners for the primary and standby databases.
On both the primary and standby sites, use Oracle Net Manager to configure a listener for the respective databases.
To restart the listeners (to pick up the new definitions), enter the following LSNRCTL utility commands on both the primary and standby systems:
% lsnrctl stop
% lsnrctl start

Create Oracle Net service names.
Create a server parameter file for the standby database

CREATE SPFILE FROM PFILE='initboston.ora';

Start the physical STANDBY Database

SQL> STARTUP MOUNT
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

The statement includes the DISCONNECT FROM SESSION option so that Redo Apply runs in a background session.

Test archival operations to the physical standby database.
SQL> ALTER SYSTEM SWITCH LOGFILE;

Identify the existing archived redo log files. On the standby database, query the V$ARCHIVED_LOG view to identify existing files
in the archived redo log. For example:
SQL> SELECT SEQUENCE#, FIRST_TIME, NEXT_TIME FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;

Verify new archived redo log files were applied.
On the standby database, query the V$ARCHIVED_LOG view to verify the archived redo log files were applied.
SQL> SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;
SEQUENCE# APP
--------- -------------
8 YES
9 YES
10 YES
11 YES
4 rows selected.

SQL> alter system set log_archive_dest_state_2=enable scope=both;
SQL> alter system archive log current;

Redo Transport

From the primary database, perform a log switch and then verify the transmissions of the archive redo log file was successful:

SQL> alter system switch logfile;
System altered.

SQL> select status, error from v$archive_dest where dest_id = 2;
STATUS    ERROR
--------- ---------------------------------------------------------
VALID

If the transmission was successful, the status of the destination will be VALID as shown above. If for any reason the transmission was unsuccessful, the status will be INVALID and the full text of the error message will be populated in the ERROR column which can be used to investigate and correct the issue.

Redo Apply

To verify Redo Apply, identify the existing archived redo logs on the standby, archive a log or two from the primary, and then check the standby database again. This test will ensure that redo data was shipped from the primary and then successfully received, archived, and applied to the standby.

First, identify the existing archived redo logs on the standby database:

[oracle@vmlinux2 ~]$ sqlplus sys/MySysPassword@ghx as sysdba

SQL> alter session set nls_date_format = 'DD-MON-YYYY HH24:MI:SS';
Session altered.

SQL> select sequence#, first_time, next_time, archived, applied from v$archived_log order by sequence#;

 SEQUENCE# FIRST_TIME           NEXT_TIME            ARCHIVED APPLIED
----- -------------------- -------------------- --------------------------- -------
       320 10-DEC-2010 08:34:06 10-DEC-2010 08:49:39 YES      YES
       321 10-DEC-2010 08:49:39 10-DEC-2010 09:05:15 YES      YES
       322 10-DEC-2010 09:05:15 10-DEC-2010 09:20:48 YES      YES
       323 10-DEC-2010 09:20:48 10-DEC-2010 09:36:20 YES      YES
       324 10-DEC-2010 09:36:20 10-DEC-2010 09:51:52 YES      YES
       325 10-DEC-2010 09:51:52 10-DEC-2010 10:07:27 YES      YES
       326 10-DEC-2010 10:07:27 10-DEC-2010 10:14:23 YES      YES
       327 10-DEC-2010 10:14:23 10-DEC-2010 10:14:49 YES      YES
       328 10-DEC-2010 10:14:49 10-DEC-2010 10:30:23 YES      YES

From the primary database, archive the current log using the following SQL statement:

SQL> alter system archive log current;
System altered.

Go back to the standby database and re-query the V$ARCHIVED_LOG view to verify redo data was shipped, received, archived, and applied:

SQL> select sequence#, first_time, next_time, archived, applied from v$archived_log order by sequence#;

 SEQUENCE# FIRST_TIME           NEXT_TIME            ARCHIVED APPLIED
---------- -------------------- -------------------- -------- -------------------------
       320 10-DEC-2010 08:34:06 10-DEC-2010 08:49:39 YES      YES
       321 10-DEC-2010 08:49:39 10-DEC-2010 09:05:15 YES      YES
       322 10-DEC-2010 09:05:15 10-DEC-2010 09:20:48 YES      YES
       323 10-DEC-2010 09:20:48 10-DEC-2010 09:36:20 YES      YES
       324 10-DEC-2010 09:36:20 10-DEC-2010 09:51:52 YES      YES
       325 10-DEC-2010 09:51:52 10-DEC-2010 10:07:27 YES      YES
       326 10-DEC-2010 10:07:27 10-DEC-2010 10:14:23 YES      YES
       327 10-DEC-2010 10:14:23 10-DEC-2010 10:14:49 YES      YES
       328 10-DEC-2010 10:14:49 10-DEC-2010 10:30:23 YES      YES
       329 10-DEC-2010 10:30:23 10-DEC-2010 10:39:56 YES      YES


Perform Failover Operation to a Physical Standby


Identify and resolve any archive log gaps that may exist on the target standby database.
Query the view V$ARCHIVE_GAP on the target standby database to determine if there are gaps in the archive sequence. This view contains the sequence numbers of any archived redo logs that are known to be missing for each thread. The data returned reflects the lowest and highest known gap in the archive sequence.

SQL> select * from v$archive_gap;

no rows selected
If the above query against V$ARCHIVE_GAP returns no rows, there are no known archive log gaps.
  1. If the above query against V$ARCHIVE_GAP returns a record, it will display the sequence numbers of the archived redo logs known to be missing from the standby database as illustrated in the following example:

SQL> select * from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
         1           854            859
  1. From the output above, the physical standby database is currently missing archived redo logs from sequence 854 to sequence 859 for thread 1. If possible locate and copy all of the identified missing archived redo logs to the target standby database from either the primary database (if available) or from other standbys. If working in a RAC environment, this would need to include missing archived redo logs from all threads. Once copied over, the missing archived redo logs will need to be registered with the target standby database:

SQL> alter database register logfile '/u03/flash_recovery_area/TURLOCK/archivelog/2010_12_16/o1_mf_1_854_6jmc5ngd_.arc';

Database altered.

SQL> alter database register logfile '/u03/flash_recovery_area/TURLOCK/archivelog/2010_12_16/o1_mf_1_855_6jmc5pht_.arc';

Database altered.

SQL> alter database register logfile '/u03/flash_recovery_area/TURLOCK/archivelog/2010_12_16/o1_mf_1_856_6jmc5t4t_.arc';

Database altered.

SQL> alter database register logfile '/u03/flash_recovery_area/TURLOCK/archivelog/2010_12_16/o1_mf_1_857_6jmc6jq9_.arc';

Database altered.

SQL> alter database register logfile '/u03/flash_recovery_area/TURLOCK/archivelog/2010_12_16/o1_mf_1_858_6jmc7z5o_.arc';

Database altered.

SQL> alter database register logfile '/u03/flash_recovery_area/TURLOCK/archivelog/2010_12_16/o1_mf_1_859_6jmcb9m0_.arc';

Database altered.
  1. Copy and register any other missing archived redo logs. Query the V$ARCHIVED_LOG view on all other available standby databases in the configuration (if any others exist) to obtain the highest log sequence number for each thread.

SQL> select unique thread# as "Thread", max(sequence#) over (partition by thread#) as "Last" from v$archived_log;

    Thread       Last
---------- ----------
         1        900
  1. If you find any archived redo logs that contain sequence numbers higher than the highest sequence number available on the target standby database, copy and register them to the target standby database. If the standby database is in managed recovery mode (and why wouldn't it be), the newly registered archived redo logs will be automatically applied to the standby. If the standby database is not in managed recovery mode, then manually recover the newly registered archived redo logs. This must be done for all threads if you are in a RAC environment.

SQL> alter database register logfile '<log_file_spec_n>';
http://www.idevelopment.info/images/popup_dialog_information_mark.gif 
About Partial Archived Redo Logs
It is possible to copy over and register what is known as a partial archived redo log file. A partial archived redo log file contains all of the primary database redo data received by the standby database when the primary database fails, but the archived redo log is not automatically registered in the standby database.
When you register a partial archived redo log, it prevents the recovery of the standby redo logs (if they exist). Therefore, whether or not you have registered a partial archived redo log determines which failover command will be necessary to run (next step).
You will know if you registered a partial archived redo log if you get receive the following message when attempting to register the archived redo log:
Register archivelog 'filespec1' was created due to a network disconnect;
archivelog contents are valid but missing subsequent data
  1. Perform terminal recovery on the target standby by issuing managed recovery mode with the FINISH keyword. If the physical standby database is configured with active standby redo logs AND you have not registered any partial archived redo log files, issue the following command to initiate the failover operation:
SQL> alter database recover managed standby database finish;

Database altered.
  1. If the standby database was not configured with standby redo logs, or they are not active, you must enter the following command instead:
SQL> alter database recover managed standby database finish skip standby logfile;

Database altered.
  1. The above SQL statement performs incomplete recovery until the last SCN included in the latest archived redo log available at the physical standby database.
  2. Once the terminal recovery command completes, convert the physical standby into a primary database using the following command:
SQL> alter database commit to switchover to primary;

Database altered.
  1. After the "COMMIT TO SWITCHOVER" command completes, you can no longer use this database as a standby database. Also, subsequent redo logs from the original primary database cannot be applied.
http://www.idevelopment.info/images/popup_dialog_exclamation_mark.gif 
The above SQL statement will only succeed if the correct "ALTER DATABASE RECOVER MANAGED STANDBY DATABASE ... FINISH;" statement was issued in the previous step. If you forgot the "... SKIP STANDBY LOGFILE" clause although you have no standby redo log files, the "COMMIT TO SWITCHOVER" will fail with the error that more media recovery is required.
If the "COMMIT TO SWITCHOVER" fails for any reason, you have to use the "ACTIVATE STANDBY DATABASE" SQL statement which forces the failover operation:
SQL> alter database activate standby database;
  1. To complete the failover operation, restart the new primary database in read/write mode using the appropriate initialization parameter file (or SPFILE) for the primary role:

SQL> shutdown immediate;
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.


SQL> startup;
ORACLE instance started.

Total System Global Area 1241513984 bytes
Fixed Size                  1273420 bytes
Variable Size             318767540 bytes
Database Buffers          905969664 bytes
Redo Buffers               15503360 bytes
Database mounted.
Database opened.
  1. Perform a full backup of the new primary database!
  2. Keep in mind that if this was the only standby database in the Data Guard configuration, then the database environment will be running unprotected until a new standby database can be setup and synchronized with the current primary. If Flashback Database was enabled on the original primary database before the failover and the original primary database is still available after the original primary machine comes back online, it is possible to restore the original primary database back into the Data Guard configuration as a physical standby database using the steps outlined in the following guide:
Flashing Back a Failed Primary Database into a Physical Standby Database
The steps outlined in this section assume the user has already performed a failover involving a physical standby database and Flashback Database was enabled on the old primary database before the failover:

SQL> select flashback_on from v$database;

FLASHBACK_ON
-------------
YES
The procedures documented in this section brings the old primary database back into the Data Guard configuration as a new physical standby database.
  1. Determine the SCN at which the old standby database became the primary database.
The first step in the process is to determine the point in which the standby database became the primary database. Given the fact that all redo is applied by a SCN, we need to determine the failover SCN from the new primary. After a failover (when the physical standby becomes the primary), Oracle writes the failover SCN to the control file and makes the value available through the STANDBY_BECAME_PRIMARY_SCN column of theV$DATABASE fixed view.
On the new primary database, issue the following query to determine the SCN at which the old standby database became the new primary database:

SQL> select to_char(standby_became_primary_scn) from v$database;

TO_CHAR(STANDBY_BECAME_PRIMARY_SCN)
----------------------------------------
3272186
  1. Flash back the failed primary database.
To create a new physical standby database, shut down the old primary database (if necessary), mount it, and flash it back to the value for STANDBY_BECAME_PRIMARY_SCN that was determined in the previous step.

SQL> shutdown immediate;

<old primary database was not started>


SQL> startup mount;
ORACLE instance started.

Total System Global Area 1241513984 bytes
Fixed Size                  1273420 bytes
Variable Size             318767540 bytes
Database Buffers          905969664 bytes
Redo Buffers               15503360 bytes
Database mounted.


SQL> flashback database to scn 3272186;

Flashback complete.
  1. Convert the database to a physical standby database.
Perform the following steps on the old primary database:
    1. Issue the following statement on the old primary database:

SQL> alter database convert to physical standby;

Database altered.
    1. This statement will dismount the database after successfully converting the control file to a standby control file.
    2. Shut down and restart the old primary database:

SQL> shutdown immediate;

ORA-01507: database not mounted


ORACLE instance shut down.


SQL> startup mount;
ORACLE instance started.

Total System Global Area 1241513984 bytes
Fixed Size                  1273420 bytes
Variable Size             318767540 bytes
Database Buffers          905969664 bytes
Redo Buffers               15503360 bytes
Database mounted.
  1. Restart transporting redo to the new physical standby database.
Before the new standby database was created, the new primary database probably stopped transmitting redo to the remote destination (due to a timeout for example). To restart redo transport services, perform the following steps on the new primary database. The primary / physical standby database used in this guide is using LOG_ARCHIVE_DEST_2.
    1. Issue the following query to see the current state of the archive destinations:

SQL> select dest_name, status, protection_mode, destination, error
  2  from v$archive_dest_status;

DEST_NAME            STATUS    PROTECTION_MODE      DESTINATION                 ERROR
-------------------- --------- -------------------- --------------------------- --------------
LOG_ARCHIVE_DEST_1   VALID     MAXIMUM PERFORMANCE
LOG_ARCHIVE_DEST_2   ERROR     MAXIMUM PERFORMANCE  turlock.idevelopment.info   ORA-03113:
                                                                                end-of-file on
                                                                                communication
                                                                                channel

LOG_ARCHIVE_DEST_3   VALID     MAXIMUM PERFORMANCE  /u04/oracle/oraarch/MODESTO
LOG_ARCHIVE_DEST_4   INACTIVE  MAXIMUM PERFORMANCE
LOG_ARCHIVE_DEST_5   INACTIVE  MAXIMUM PERFORMANCE
LOG_ARCHIVE_DEST_6   INACTIVE  MAXIMUM PERFORMANCE
LOG_ARCHIVE_DEST_7   INACTIVE  MAXIMUM PERFORMANCE
LOG_ARCHIVE_DEST_8   INACTIVE  MAXIMUM PERFORMANCE
LOG_ARCHIVE_DEST_9   INACTIVE  MAXIMUM PERFORMANCE
LOG_ARCHIVE_DEST_10  INACTIVE  MAXIMUM PERFORMANCE
    1. If necessary, enable the destination from the new primary primary database.

SQL> alter system set log_archive_dest_state_2='enable';

System altered.
    1. Perform a log switch from the new primary to ensure the standby database begins receiving redo data from the new primary database, and verify it was sent successfully. At the SQL prompt, enter the following statements:

SQL> alter system switch logfile;

System altered.

SQL> select dest_name, status, protection_mode, destination, error
  2  from v$archive_dest_status;

DEST_NAME            STATUS    PROTECTION_MODE      DESTINATION                 ERROR  
-------------------- --------- -------------------- --------------------------- --------------
LOG_ARCHIVE_DEST_1   VALID     MAXIMUM PERFORMANCE                                     
LOG_ARCHIVE_DEST_2   VALID     MAXIMUM PERFORMANCE  turlock.idevelopment.info
LOG_ARCHIVE_DEST_3   VALID     MAXIMUM PERFORMANCE  /u04/oracle/oraarch/MODESTO        
LOG_ARCHIVE_DEST_4   INACTIVE  MAXIMUM PERFORMANCE                                     
LOG_ARCHIVE_DEST_5   INACTIVE  MAXIMUM PERFORMANCE                                     
LOG_ARCHIVE_DEST_6   INACTIVE  MAXIMUM PERFORMANCE                                     
LOG_ARCHIVE_DEST_7   INACTIVE  MAXIMUM PERFORMANCE                                     
LOG_ARCHIVE_DEST_8   INACTIVE  MAXIMUM PERFORMANCE                                     
LOG_ARCHIVE_DEST_9   INACTIVE  MAXIMUM PERFORMANCE                                     
LOG_ARCHIVE_DEST_10  INACTIVE  MAXIMUM PERFORMANCE                                     
    1. On the new standby database, you may also need to change the LOG_ARCHIVE_DEST_n initialization parameters so that redo transport services do not transmit redo data to other databases. This step can be skipped if both the primary and standby database roles were set up with the VALID_FOR attribute in the server parameter file (SPFILE). By doing this, the Data Guard configuration operates properly through role transitions.
  1. Start Redo Apply.
Start Redo Apply or real-time apply on the new physical standby database:
    • To start Redo Apply:

SQL> alter database recover managed standby database disconnect;

Database altered.
    • To start real-time apply:

SQL> alter database recover managed standby database using current logfile disconnect;

Database altered.
Once the failed primary database is restored and is running in the standby role, you can optionally perform a switchover to transition the databases back to their original (pre-failure) roles. See "Switchovers Involving a Physical Standby Database" for more information.
  1. Restart Redo Apply on the new physical standby database after recovering the EOR marker.
The new physical standby database is now running at SCN 3272186, and the MRP will start to look for redo from that point on. As part of this process, the MRP will need to recover the redo that was contained in log sequence number 6666 since this is the exact point we were at on the original primary when the failover occurred (see alert.log information below). If you remember back to the actual failover process, we needed toperform terminal recovery on the target standby. After entering the FINISH command, Oracle wrote a special End Of Redo (EOR) marker in the current log file. This is unlike a role transition involving a switchover which forces a log switch followed by the EOR in the header of the next redo log. After the new physical standby database reads in the EOR marker, the MRP stops. But notice in the alert.log that it is telling us that is knows there is more redo beyond the current archived redo log containing the EOR marker:

Media Recovery Log /u04/oracle/oraarch/MODESTO/1_6666_736458474.dbf
Identified End-Of-Redo for thread 1 sequence 6666
Fri Feb 18 09:39:25 EST 2011
Media Recovery End-Of-Redo indicator encountered
Fri Feb 18 09:39:25 EST 2011
Media Recovery Applied until change 3272188
Fri Feb 18 09:39:25 EST 2011
MRP0: Media Recovery Complete: End-Of-REDO (modesto)
Begin: Standby Redo Logfile archival
End: Standby Redo Logfile archival
Media Recovery archivelogs detected beyond End-Of-REDO
Resetting standby activation ID 0 (0x0)
Fri Feb 18 09:39:27 EST 2011
MRP0: Background Media Recovery process shutdown (modesto)
After the new physical standby database reads in the EOR marker and the MRP stops, all we have to do is restart Redo Apply once again.

SQL> alter database recover managed standby database using current logfile disconnect;

Database altered.
Something interesting here that you may have noticed and is worth mentioning now. Similar to an OPEN RESET LOGS, the failover operation will reset the log sequence numbers for each thread back to 1. In Oracle9i, this would have broke the standby database and rendering it unusable. Not anymore with Oracle Database 10g. Since 10g, Data Guard is now able to process a reset like this and apply only what it needs from the old redo log stream and then switch over to the new redo log stream which began at 1 again.

Fri Feb 18 10:37:12 EST 2011
alter database recover managed standby database using current logfile disconnect
Fri Feb 18 10:37:12 EST 2011
Attempt to start background Managed Standby Recovery process (modesto)
MRP0 started with pid=24, OS id=30863
Fri Feb 18 10:37:12 EST 2011
MRP0: Background Managed Standby Recovery process started (modesto)
Managed Standby Recovery starting Real Time Apply
Media Recovery apply resetlogs offline range for datafile 1, incarnation : 0
Media Recovery apply resetlogs offline range for datafile 2, incarnation : 0
Media Recovery apply resetlogs offline range for datafile 3, incarnation : 0
Media Recovery apply resetlogs offline range for datafile 4, incarnation : 0
Media Recovery apply resetlogs offline range for datafile 5, incarnation : 0
Media Recovery apply resetlogs offline range for datafile 6, incarnation : 0
Media Recovery apply resetlogs offline range for datafile 7, incarnation : 0
Media Recovery apply resetlogs offline range for datafile 8, incarnation : 0
Media Recovery apply resetlogs offline range for datafile 9, incarnation : 0
 parallel recovery started with 2 processes
Fri Feb 18 10:37:18 EST 2011
Waiting for all non-current ORLs to be archived...
Media Recovery Log /u04/oracle/oraarch/MODESTO/1_1_743418834.dbf
Fri Feb 18 10:37:19 EST 2011
Completed: alter database recover managed standby database using current logfile disconnect
Fri Feb 18 10:37:20 EST 2011
Media Recovery Log /u04/oracle/oraarch/MODESTO/1_2_743418834.dbf
Media Recovery Log /u04/oracle/oraarch/MODESTO/1_3_743418834.dbf
Media Recovery Log /u04/oracle/oraarch/MODESTO/1_4_743418834.dbf
Media Recovery Log /u04/oracle/oraarch/MODESTO/1_5_743418834.dbf
Media Recovery Log /u04/oracle/oraarch/MODESTO/1_6_743418834.dbf
Media Recovery Log /u04/oracle/oraarch/MODESTO/1_7_743418834.dbf
Media Recovery Waiting for thread 1 sequence 8
Fri Feb 18 10:39:38 EST 2011
RFS[2]: Successfully opened standby log 4: '/u02/oradata/MODESTO/onlinelog/o1_mf_4_6hvg3qk9_.log'
Fri Feb 18 10:39:41 EST 2011
Media Recovery Log /u04/oracle/oraarch/MODESTO/1_8_743418834.dbf
Media Recovery Waiting for thread 1 sequence 9
Data Guard will now continue to resolve any gaps in the redo stream and apply any missing redo to the new physical standby database. Once the failed primary database is restored and is running in the standby role, you can optionally perform a switchover to transition the databases back to their original (pre-failure) roles. See "Switchovers Involving a Physical Standby Database" for more information.
What Is a Snapshot Standby Database?
A snapshot standby database is a fully updateable standby database created by converting a physical standby database into a snapshot standby database.
A snapshot standby database receives and archives, but does not apply, redo data from a primary database. The redo data received from the primary database is applied once a snapshot standby database is converted back into a physical standby database, after discarding all local updates to the snapshot standby database.
A snapshot standby can be created from Enterprise Manager, the Data Guard Broker command-line interface (DGMGRL) or from SQL*Plus.
Resolving Archive Gap Sequence on Physical Standby Databases

Once the archived logs required for gap resolution have been identified, the logs should be copied into the directory specified by the standby_archive_dest initialization parameter on the standby site. Also, if the log_archive_format on the standby and the primary database are not same, these files must be renamed to match the format specified by the log_archive_format parameter of the standby database.
The file can be renamed using the operating system utility. Since these logs were not transferred by the log transfer service, the managed recovery process will not have any information about these logs. These logs will need to be manually registered with the managed recovery process before they will be applied by the log apply service. To register the logs with the MRP, use the following statement:
ALTER DATABASE REGISTER LOGFILE ‘filespec’;
For example:
ALTER DATABASE REGISTER LOGFILE ‘/oracle/appsdb/arch/stdby_1_607.dbf’;
At this point, the managed recovery process will start applying this archive log file.

When the DBA queries the V$ARCHIVE_GAP view and has a record returned, this indicates a gap in the archived redo logs as illustrated below and may require manual intervention by the DBA:

SQL> select * from v$archive_gap;

THREAD#  LOW_SEQUENCE#  HIGH_SEQUENCE#
-------- -------------- --------------
       1            24              28
From the output above, the physical standby database is currently missing logs from sequence 24 to sequence 28 for thread 1. Note that this view only returns the next gap that is currently blocking managed recovery from continuing. After resolving the identified gap and starting managed recovery, the DBA should query the V$ARCHIVE_GAP view again on the physical standby database to determine the next (if any) gap sequence. This process should be repeated until there are no more gaps.

After identifying a gap (as shown above), the DBA will need to query the primary database to locate the archived redo logs on the primary database. The following query assumes the local archive destination on the primary database is LOG_ARCHIVE_DEST_1:

SELECT name FROM v$archived_log WHERE thread# = 1  AND dest_id = 1  AND sequence# BETWEEN 24 and 28;

NAME
--------------------------------------
/u02/oraarchive/TESTDB/arch_t1_s24.dbf
/u02/oraarchive/TESTDB/arch_t1_s25.dbf
/u02/oraarchive/TESTDB/arch_t1_s26.dbf
/u02/oraarchive/TESTDB/arch_t1_s27.dbf
/u02/oraarchive/TESTDB/arch_t1_s28.dbf
Copy the above redo log files to the physical standby database and register them using the ALTER DATABASE REGISTER LOGFILE ... SQL statement on the physical standby database. For example:

SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s24.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s25.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s26.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s27.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s28.dbf';
After the redo logs have been registered on the physical standby database, the DBA can restart the managed recovery operations. For example, to put the physical standby database into automatic recovery managed mode:

SQL> alter database recover managed standby database disconnect from session

FAL_CLIENT
Specifies the TNS network services name for the standby database (which is sent to the FAL server process by log apply services) that should be used by the FAL server process to connect to the standby database. The
FAL_CLIENT=<net_service_name_of_standby_database>
FAL_SERVER
Specifies the TNS network service name that the standby database should use to connect to the FAL server process. The syntax would be:
FAL_SERVER=<net_service_name_of_primary_database>


 PHYSICAL STANDBY PARAMETERS
===========================
PARAMETERS:
===========
*.db_file_multiblock_read_count=8
*.db_name='PROD'
*.db_unique_name=PRODDR
*.log_archive_config='dg_config=(PROD,PRODDR)'
*.log_archive_dest_1='location=/ecarch/PROD/PRODarch valid_for=(all_logfiles,all_roles) db_unique_name=PRODDR'
*.log_archive_dest_2='service=PROD LGWR ASYNC valid_for=(online_logfiles,primary_role) db_unique_name=PROD'
*.log_archive_dest_state_1='ENABLE'
*.log_archive_dest_state_2='ENABLE'
*.standby_file_management=AUTO
*.standby_archive_dest='/ecarch/PROD/PRODarch'
*.fal_server=PROD
*.fal_client=PRODDR

SQL> select OPEN_MODE, DATABASE_ROLE, LOG_MODE, protection_mode, protection_level ,force_logging from v$database;

OPEN_MODE  DATABASE_ROLE    LOG_MODE     PROTECTION_MODE      PROTECTION_LEVEL     FOR
---------- ---------------- ------------ -------------------- -------------------- ---
MOUNTED    PHYSICAL STANDBY ARCHIVELOG   MAXIMUM PERFORMANCE  MAXIMUM PERFORMANCE  YES


PRIMARY DB PARAMETERS
======================
db_name=PROD
db_unique_name=PROD
log_archive_config="dg_config=(PROD", "PRODDR)"
log_archive_dest_1='location=/ecdata/PROD/PRODarch valid_for=(all_logfiles,all_roles) db_unique_name=PROD'
log_archive_dest_2='service=PRODDR LGWR ASYNC valid_for=(online_logfiles,primary_role) db_unique_name=PRODDR'
log_archive_dest_state_1='ENABLE'
log_archive_dest_state_2='ENABLE'
standby_archive_dest=?/dbs/arch
fal_client=PROD
fal_server=PRODDR
local_listener=LISTENER_PROD

SQL> select OPEN_MODE, DATABASE_ROLE, LOG_MODE, protection_mode, protection_level ,force_logging from v$database;

OPEN_MODE  DATABASE_ROLE    LOG_MODE     PROTECTION_MODE      PROTECTION_LEVEL     FOR
---------- ---------------- ------------ -------------------- -------------------- ---
READ WRITE PRIMARY          ARCHIVELOG   MAXIMUM PERFORMANCE  MAXIMUM PERFORMANCE  YES
==============
PRIMARY PARAMETERS
==================

log_archive_config="dg_config=(PROD", "PRODDR)"
log_archive_dest_1='location=/ecdata/PROD/PRODarch valid_for=(all_logfiles,all_roles) db_unique_name=PROD'
log_archive_dest_2='service=PRODDR LGWR ASYNC valid_for=(online_logfiles,primary_role) db_unique_name=PRODDR'
DB_FILE_NAME_CONVERT=no values
LOG_FILE_NAME_CONVERT=no values

log_archive_dest_state_1='ENABLE'
log_archive_dest_state_2='ENABLE'
db_unique_name=PROD
dml_locks=10000
fal_client=PROD
fal_server=PRODDR


STANDBY PARAMETERS
==================
log_archive_config="dg_config=(PROD", "PRODDR)"
log_archive_dest_1=’location=/ecarch/PROD/PRODarch  valid_for=(all_logfiles,all_roles) db_unique_name=PRODDR'
log_archive_dest_2=’service=PROD LGWR ASYNC valid_for=(online_logfiles,primary_role) db_unique_name=PROD'
DB_FILE_NAME_CONVERT=no values
LOG_FILE_NAME_CONVERT=no values

db_unique_name=PRODDR
dml_locks=10000