NODE EVICTION OVERVIEW
The Oracle Clusterware is designed to
perform a node eviction by removing one or more nodes from the cluster if some
critical problem is detected. A critical problem could be a node not
responding via a network heartbeat, a node not responding via a disk heartbeat,
a hung or severely degraded machine, or a hung ocssd.bin process. The
purpose of this node eviction is to maintain the overall health of the cluster
by removing bad members.
Starting in 11.2.0.2 RAC (or if you are on Exadata), a node eviction may not
actually reboot the machine. This is called a rebootless restart.
In this case we restart most of the clusterware stack to see if that fixes the
unhealthy node.
Starting in 11.2.0.2 RAC (or if you are on Exadata), a node eviction may not actually reboot the machine. This is called a rebootless restart. In this case we restart most of the clusterware stack to see if that fixes the unhealthy node.
1.0 -
PROCESS ROLES FOR REBOOTS
OCSSD (aka CSS daemon) - This process
is spawned by the cssdagent process. It runs in both vendor clusterware and non-vendor clusterware environments. OCSSD's
primary job is internode health monitoring and RDBMS instance endpoint
discovery. The health monitoring includes a network heartbeat and a disk
heartbeat (to the voting files). OCSSD can also evict a node after
escalation of a member kill from a client (such as a database LMON process).
This is a multi-threaded process that runs at an elevated priority and runs as
the Oracle user.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin -->
cssdagent --> ocssd --> ocssd.bin
CSSDAGENT -
This process is spawned by OHASD and is responsible for spawning the OCSSD
process, monitoring for node hangs (via oprocd functionality), and monitoring
to the OCSSD process for hangs (via oclsomon functionality), and monitoring
vendor clusterware (via vmon functionality). This is a multi-threaded
process that runs at an elevated priority and runs as the root user.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin -->
cssdagent
CSSDMONITOR -
This proccess also monitors for node hangs (via oprocd functionality), monitors
the OCSSD process for hangs (via oclsomon functionality), and monitors vendor
clusterware (via vmon functionality). This is a multi-threaded process that
runs at an elevated priority and runs as the root user.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin -->
cssdmonitor
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent --> ocssd --> ocssd.bin
CSSDAGENT - This process is spawned by OHASD and is responsible for spawning the OCSSD process, monitoring for node hangs (via oprocd functionality), and monitoring to the OCSSD process for hangs (via oclsomon functionality), and monitoring vendor clusterware (via vmon functionality). This is a multi-threaded process that runs at an elevated priority and runs as the root user.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent
CSSDMONITOR - This proccess also monitors for node hangs (via oprocd functionality), monitors the OCSSD process for hangs (via oclsomon functionality), and monitors vendor clusterware (via vmon functionality). This is a multi-threaded process that runs at an elevated priority and runs as the root user.
Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdmonitor
2.0 -
DETERMINING WHICH PROCESS IS RESPONSIBLE FOR A REBOOT
Important files to review:
- Clusterware
alert log in <GRID_HOME>/log/<nodename>
- The
cssdagent log(s) in
<GRID_HOME>/log/<nodename>/agent/ohasd/oracssdagent_root
- The
cssdmonitor log(s) in
<GRID_HOME>/log/<nodename>/agent/ohasd/oracssdmonitor_root
- The
ocssd log(s) in <GRID_HOME>/log/<nodename>/cssd
- The
lastgasp log(s) in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp
- IPD/OS
or OS Watcher data
- 'opatch
lsinventory -detail' output for the GRID home
- *Messages
files:
* Messages file locations:
- Linux:
/var/log/messages
- Sun:
/var/adm/messages
- HP-UX:
/var/adm/syslog/syslog.log
- IBM:
/bin/errpt -a > messages.out
Please refer to the
following document which provides information on collecting together most of
the above files:
Document 1513912.1 -
TFA Collector - Tool for Enhanced Diagnostic Gathering
11.2 Clusterware evictions should, in most cases, have some kind of meaningful
error in the clusterware alert log. This can be used to determine which
process is responsible for the reboot. Example message from a
clusterware alert log:
[ohasd(11243)]CRS-8011:reboot advisory
message from host: sta00129, component: cssagent, with timestamp:
L-2009-05-05-10:03:25.340
[ohasd(11243)]CRS-8013:reboot advisory message text: Rebooting after limit
28500 exceeded; disk timeout 27630, network timeout 28500, last heartbeat from
CSSD at epoch seconds 1241543005.340, 4294967295 milliseconds ago based on
invariant clock value of 93235653
This particular eviction happened when we had hit the network timeout.
CSSD exited and the cssdagent took action to evict. The cssdagent knows the
information in the error message from local heartbeats made from
CSSD.
If no message is in the evicted node's clusterware alert log, check the
lastgasp logs on the local node and/or the clusterware alert logs of other nodes.
11.2 Clusterware evictions should, in most cases, have some kind of meaningful error in the clusterware alert log. This can be used to determine which process is responsible for the reboot. Example message from a clusterware alert log:
[ohasd(11243)]CRS-8013:reboot advisory message text: Rebooting after limit 28500 exceeded; disk timeout 27630, network timeout 28500, last heartbeat from CSSD at epoch seconds 1241543005.340, 4294967295 milliseconds ago based on invariant clock value of 93235653
This particular eviction happened when we had hit the network timeout. CSSD exited and the cssdagent took action to evict. The cssdagent knows the information in the error message from local heartbeats made from CSSD.
If no message is in the evicted node's clusterware alert log, check the lastgasp logs on the local node and/or the clusterware alert logs of other nodes.
3.0 -
TROUBLESHOOTING OCSSD EVICTIONS
If you have encountered an OCSSD
eviction review common causes in section 3.1 below.
3.1 -
COMMON CAUSES OF OCSSD EVICTIONS
- Network
failure or latency between nodes. It would take 30 consecutive missed
checkins (by default - determined by the CSS misscount) to cause a node
eviction.
- Problems
writing to or reading from the CSS voting disk. If the node cannot
perform a disk heartbeat to the majority of its voting files, then the
node will be evicted.
- A
member kill escalation. For example, database LMON process may
request CSS to remove an instance from the cluster via the instance
eviction mechanism. If this times out it could escalate to a node
kill.
- An
unexpected failure or hang of the OCSSD process, this can be caused by any
of the above issues or something else.
- An
Oracle bug.
3.2 -
FILES TO REVIEW AND GATHER FOR OCSSD EVICTIONS
All files from section 2.0 from all
cluster nodes. More data may be required.
Example of an eviction due to loss of voting disk:
CSS log:
2012-03-27 22:05:48.693: [
CSSD][1100548416](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 3 configured
voting disks available, need 2
2012-03-27 22:05:48.693: [ CSSD][1100548416]###################################
2012-03-27 22:05:48.693: [ CSSD][1100548416]clssscExit: CSSD aborting from
thread clssnmvDiskPingMonitorThread
OS messages:
Mar 27 22:03:58 choldbr132p kernel:
Error:Mpx:All paths to Symm 000190104720 vol 0c71 are dead.
Mar 27 22:03:58 choldbr132p kernel: Error:Mpx:Symm 000190104720 vol 0c71 is
dead.
Mar 27 22:03:58 choldbr132p kernel: Buffer I/O error on device sdbig, logical
block 0
...
Example of an eviction due to loss of voting disk:
CSS log:
2012-03-27 22:05:48.693: [ CSSD][1100548416]###################################
2012-03-27 22:05:48.693: [ CSSD][1100548416]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
OS messages:
Mar 27 22:03:58 choldbr132p kernel: Error:Mpx:Symm 000190104720 vol 0c71 is dead.
Mar 27 22:03:58 choldbr132p kernel: Buffer I/O error on device sdbig, logical block 0
...
4.0 -
TROUBLESHOOTING CSSDAGENT OR CSSDMONITOR EVICTIONS
If you have encountered a CSSDAGENT or
CSSDMONITOR eviction review common causes in section 4.1 below.
4.1 -
COMMON CAUSES OF CSSDAGENT OR CSSDMONITOR EVICTIONS
- An
OS scheduler problem. For example, if the OS is getting locked up in
a driver or hardware or there is excessive amounts of load on the machine
(at or near 100% cpu utilization), thus preventing the scheduler from
behaving reasonably.
- A
thread(s) within the CSS daemon hung.
- An
Oracle bug.
4.2 -
FILES TO REVIEW AND GATHER FOR CSSDAGENT OR CSSDMONITOR EVICTIONS
All files from section 2.0 from all
cluster nodes. More data may be required.
Importance of master node in a cluster:
- Master
node has the least Node-id in the cluster. Node-ids are assigned to the
nodes in the same order as the nodes join the cluster. Hence, normally the node
which joins the cluster first is the master node.
- CRSd
process on the Master node is responsible to initiate the OCR backup as
per the backup policy
-
Master node is also responsible to sync OCR cache across the nodes
-
CRSd process oth the master node reads from and writes to OCR on disk
-
In case of node eviction, The cluster is divided into two sub-clusters. The
sub-cluster containing fewer no. of nodes is evicetd. But, in case both the
sub-clusters have same no. of nodes, the sub-cluster having the master node
survives whereas the other sub-cluster is evicted.
- When OCR master (crsd.bin process)
stops or restarts for whatever reason, the crsd.bin on surviving node with
lowest node number will become new OCR master.
The following method can be used to find OCR master:
1.
By searching crsd.l* on all nodes:
grep
"OCR MASTER" $ORA_CRS_HOME/log/$HOST/crsd/crsd.l*
Query V$GES_RESOURCE to identified master node.
how to monitor block transfer interconnects nodes in rac ?
The v$cache_transfer and v$file_cache_transfer views are used to
examine RAC statistics.
The types of blocks that use the cluster interconnects in a RAC
environment are monitored with the v$ cache transfer series of views:
v$cache_transfer: This view shows the types and classes of blocks that Oracle
transfers over the cluster interconnect on a per-object basis.
The forced_reads and forced_writes columns can be used to
determine the types of objects the RAC instances are sharing.
Values in the forced_writes column show how often a certain block type
is transferred out of a local buffer cache due to the current version being
requested by another instance.
Querying GV$CLUSTER_INTERCONNECTS view lists the interconnect used by all the
participating instances of the RAC database.
SQL> select INST_ID, IP_ADDRESS from
GV$CLUSTER_INTERCONNECTS;
INST_ID IP_ADDRESS
---------- ----------------
1 192.168.261.1
2 192.168.261.2
New Features for Release 2 (11.2)
Oracle Automatic Storage Management and Oracle Clusterware
Installation
With
Oracle Grid Infrastructure 11g release 2 (11.2), Oracle Automatic Storage
Management (Oracle ASM) and Oracle Clusterware are installed into a single home directory,
which is referred to as the Grid Infrastructure home. Configuration assistants
start after the installer interview process that configures Oracle ASM and Oracle
Cluster ware. The installation of the combined products is called Oracle Grid
Infrastructure. However, Oracle Clusterware and Oracle Automatic Storage
Management remain
separate
products.
Oracle Automatic Storage Management and Oracle Clusterware Files
With
this release, Oracle Cluster Registry (OCR) and voting disks can be placed on
Oracle Automatic Storage Management (Oracle ASM).
This
feature enables Oracle ASM to provide a unified storage solution, storing all
the data for the clusterware and the database, without the need for third-party
volume managers
or cluster filesystems. For new installations, OCR and voting disk files can be
placed either on Oracle ASM or
on a cluster file system or NFS system. Installing Oracle Clusterware files on
raw or block devices is no longer supported, unless an existing system is being
upgraded.
Fix up Scripts and Grid Infrastructure Checks
With
Oracle Clusterware 11g release 2 (11.2), Oracle Universal Installer (OUI)
detects when minimum requirements for installation are not completed, and
creates shell script
programs, called fix up scripts, to resolve many incomplete system
configuration requirements. If OUI detects an incomplete task that is marked
"fixable", then you can easily
fix the issue by generating the fix up script by clicking the Fix & Check
Again button.
The
fixup script is generated during installation. You are prompted to run the
script as root in a separate terminal session. When you run the script, it
raises kernel values to required
minimums, if necessary, and completes other operating system configuration
tasks. You also can have Cluster Verification Utility (CVU) generate fixup
scripts before installation.
Grid Plug and Play
In
the past, adding or removing servers in a cluster required extensive manual
preparation. With this release, you can continue to configure server nodes
manually or use
Grid Plug and Play to configure them dynamically as nodes are added or removed
from the cluster.
Grid
Plug and Play reduces the costs of installing, configuring, and managing server
nodes by starting a grid naming service within the cluster to allow each node
to perform
the following tasks dynamically:
■
Negotiating appropriate network identities for itself
■
Acquiring additional information it needs to operate from a configuration
profile
■
Configuring or reconfiguring itself using profile data, making host names and
addresses resolvable on the network
Because
servers perform these tasks dynamically, the number of steps required to add or
delete nodes is minimized.
Improved Input/Output Fencing Processes
Oracle
Clusterware 11g release 2 (11.2) replaces the oprocd and Hangcheck processes
with the cluster synchronization service daemon Agent and Monitor to provide
more Accurate
recognition of hangs and to avoid false termination
Intelligent Platform Management Interface (IPMI) Integration
Intelligent
Platform Management Interface (IPMI) is an industry standard management
protocol that is included with many servers today. IPMI operates
independently
of the operating system, and can operate even if the system is not powered on.
Servers with IPMI contain a baseboard management controller (BMC) which
is used to communicate to the server.
If
IPMI is configured, then Oracle Clusterware uses IPMI when node fencing is
required and the server is not responding.
SCAN for Simplified Client Access
With
this release, the Single Client Access Name (SCAN) is the host name to provide
for all clients connecting to the cluster. The SCAN is a domain name registered
to at
least
one and up to three IP addresses, either in the domain name service (DNS) or
the Grid Naming Service (GNS).
The primary benefit of a Single Client Access Name (SCAN) is
not having to update client connection information (such as TNSNAMES.ora) every
time you add or remove nodes from an existing RAC cluster.
Clients use a simple EZconnect string
and JDBC connections
can use a JDBC thin URL to access the database, which is done independently of
the physical hosts that the database instances are running on. Additionally,
SCAN automatically provides both failover and load balancing of connects, where
the new connection will be directed to the least busy instance in the cluster
by default.
It should be noted here that because
EZconnect is used with SCAN, the SQLNET.ora file should include EZconnect as
one of the naming methods, for example:
NAMES.DIRECTORY_PATH=(tnsnames,ezconnect,ldap)
An EZconnect string would look like
sqlplus user/pass@mydb-scan:1521/myservice
A JDBC thin string would look like
jdbc:oracle:thin@mydb-scan:1521/myservice
It's highly recommended that the
clients are Oracle 11g R2 clients, to allow them to fully take advantage of the
failover with the SCAN settings.
The TNSNAMES.ora file would now
reference the SCAN rather than the VIPs as has been done in previous versions.
This is what a TNSNAMES entry would be:
MYDB
(DESCRIPTION=
(ADDRESS=(PROTOCOL=TCP)(HOST=mydb-scan.ORACLE.COM)(PORT=1521))
(CONNECT_DATA=(SERVICE_NAME=myservice.ORACLE.COM)))
There are two methods available for
defining the SCAN. These are to use your corporate DNS to define the SCAN; the
second option is to use Grid Naming Service.
- Master node has the least Node-id in the cluster. Node-ids are assigned to the nodes in the same order as the nodes join the cluster. Hence, normally the node which joins the cluster first is the master node.
The following method can be used to find OCR master:
how to monitor block transfer interconnects nodes in rac ?
The v$cache_transfer and v$file_cache_transfer views are used to examine RAC statistics.
The types of blocks that use the cluster interconnects in a RAC environment are monitored with the v$ cache transfer series of views:
v$cache_transfer: This view shows the types and classes of blocks that Oracle transfers over the cluster interconnect on a per-object basis.
The forced_reads and forced_writes columns can be used to determine the types of objects the RAC instances are sharing.
Values in the forced_writes column show how often a certain block type is transferred out of a local buffer cache due to the current version being requested by another instance.
Querying GV$CLUSTER_INTERCONNECTS view lists the interconnect used by all the participating instances of the RAC database.
SQL> select INST_ID, IP_ADDRESS from GV$CLUSTER_INTERCONNECTS;
INST_ID IP_ADDRESS
---------- ----------------
1 192.168.261.1
2 192.168.261.2
Define
Scan Using DNS
To use the DNS method for defining
your SCAN, the network administrator must create a single name that resolves to
three separate IP addresses using round-robin algorithms. Regardless of how
many systems are part of your cluster, Oracle recommends that 3 IP addresses
are configured to allow for failover and load-balancing.
It is important that the IP addresses
are on the same subnet as the public network for the server. The other two
requirements are that the name (not including the domain suffix) are 15
characters or less in length and that the name can be resolved without using
the domain suffix. Also, the IP addresses should not be specifically assigned
to any of the nodes in the cluster.
You can test the DNS setup by running
an nslookup on the scan name two or more times. Each time, the IP addresses
should be returned in a different order:
Syntax: nslookup mydatabase-scan
Define
Scan Using Grid Naming Solutions (GNS)
Using GNS assumes that a DHCP server is
running on the public network with enough available addresses to assign the
required IP addresses and the SCAN VIP. Only one static IP address is required
to be configured and it should be in the DNS domain.
Database
Parameters for SCAN
The database will register each
instance to the scan listener using the REMOTE_LISTENER parameter in the
spfile. Oracle 11g R2 RAC databases will only register with the SCAN listeners.
Upgraded databases, however, will continue to register with the local listener
as well as the SCAN listener via the REMOTE_LISTENER parameter. The
LOCAL_LISTENER parameter would be set to the node VIP for upgraded systems.
The REMOTE_LISTENER parameter, rather
than being set to an alias that would be in a server side TNSNAMES file (as it
has been in previous versions), would be set simply to the SCAN entry: The
alter command would be
ALTER SYSTEM SET REMOTE_LISTENER=mydb-scan.oracle.com:1521
POINTS TO BE NOTED FOR SCAN LISTENER
An Oracle Database 11g release 2
(11.2) database service automatically registers with the listeners specified in
the LOCAL_LISTENER
and REMOTE_LISTENER
parameters. During registration, PMON
sends information such as the service name, instance names, and workload
information to the listeners. This feature is called service
registration
Services coordinate their sessions by
registering their workload, or the amount of work they are
currently handling, with the local listener and the SCAN listeners. Clients are
redirected by the SCAN listener to a local listener on the least-loaded node
that is running the instance for a particular service. This feature is
called load balancing. The local listener either directs the client
to a dispatcher process (if the database was configured for shared server), or
directs the client to a dedicated server process.
When a listener starts after the Oracle instance starts, and the
listener is available for service registration, registration does not occur
until the next time the Oracle Database process monitor (PMON) starts its
discovery routine. By default, the PMON discovery routine is started every 60
seconds. To override the 60-second delay, use the SQL statement ALTER SYSTEM REGISTER
. This statement forces
PMON to register the service immediately.
LOCAL_LISTENER
and REMOTE_LISTENER
parameters. During registration, PMON
sends information such as the service name, instance names, and workload
information to the listeners. This feature is called service
registrationALTER SYSTEM REGISTER
. This statement forces
PMON to register the service immediately.
Local Listeners
-
Starting with Oracle Database 11g release 2 (11.2),
the local listener, or default listener, is located in the Grid home when you have
Oracle Grid Infrastructure installed.
Grid_home\network\admin
directory.
-
Oracle Clusterware 11g release 2 and later, the
listener association no longer requires tnsnames.ora
file entries. The
listener associations are configured as follows:
· -
DBCA no longer sets the LOCAL_LISTENER
parameter. The
Oracle Clusterware agent that starts the database sets the LOCAL_LISTENER
parameter
dynamically, and it sets it to the actual value, not an alias. So listener_
alias entries are no
longer needed in the tnsnames.ora
file.
· - The REMOTE_LISTENER
parameters are
configured by DBCA to reference the SCAN and SCAN port, without any need for
a tnsnames.ora
entry. Oracle Clusterware uses the Easy Connect naming
method with scanname:scanport, so no listener associations for
the REMOTE_LISTENER
parameter are needed in the tnsnames.ora
file.
Three SCAN addresses are configured for the
cluster, and allocated to servers. When a client issues a connection request
using SCAN, the three SCAN addresses are returned to the client. If the first
address fails, then the connection request to the SCAN name fails over to the
next address. Using multiple addresses allows a client to connect to an
instance of the database even if the initial instance has failed.
The net service name does not need to know the
physical address of the server on which the database, database instance, or
listener runs. SCAN is resolved by DNS, which returns three IP addresses to the
client. The client then tries each address in succession until a connection is
made.
\network\admin
directory. tnsnames.ora
file entries. The
listener associations are configured as follows:LOCAL_LISTENER
parameter. The
Oracle Clusterware agent that starts the database sets the LOCAL_LISTENER
parameter
dynamically, and it sets it to the actual value, not an alias. So listener_
alias entries are no
longer needed in the tnsnames.ora
file.REMOTE_LISTENER
parameters are
configured by DBCA to reference the SCAN and SCAN port, without any need for
a tnsnames.ora
entry. Oracle Clusterware uses the Easy Connect naming
method with scanname:scanport, so no listener associations for
the REMOTE_LISTENER
parameter are needed in the tnsnames.ora
file.
Understanding SCAN
SCAN is a fully
qualified name (host name.domain name) that is configured to resolve to
all the addresses allocated for the SCAN listeners.
The default value for SCAN is cluster_name.GNS_sub_domain,
or, cluster_name-scan.domain_name if GNS is not used.
For example, in a cluster that does not use GNS, if your cluster name
issalesRAC, and your domain is example.com, then the default SCAN address
is salesRAC-scan.example.com:1521.
SCAN is configured in DNS to resolve to three IP addresses, and DNS should
return the addresses using a round-robin algorithm. This means that when SCAN
is resolved by DNS, the IP addresses are returned to the client in a different
order each time.
- Based on the
environment, the following actions occur when you use SCAN to connect to an
Oracle RAC database using a service name.
1. The PMON process of each instance registers
the database services with the default listener on the local node and with each
SCAN listener, which is specified by the REMOTE_LISTENER
database
parameter.
2. The listeners are dynamically updated on the
amount of work being handled by the instances and dispatchers.
The client
issues a database connection request using a connect descriptor of the form:
orausr/@scan_name:1521/sales.example.com
Note:
If you use the Easy
Connect naming method, then ensure the sqlnet.ora
file on the client
contains EZCONNECT
in the list of naming methods specified by theNAMES.DIRECTORY_PATH
parameter.
3. The client uses DNS to resolve scan_name.
After DNS returns the three addresses assigned to SCAN, the client sends a connect
request to the first IP address. If the connect request fails, then the client
attempts to connect using the next IP address.
4. When the connect request is successful,
the client connects to a SCAN listener for the cluster which hosts the sales
database. The SCAN
listener compares the workload of the instances sales1
and sales2
and the workload
of the nodes on which they are running. Because node2
is less loaded
than node1
, the SCAN listener selects node2
and sends the
address for the listener on that node back to the client.
The client connects to the local
listener on node2
. The local listener starts a dedicated server process for the
connection to the database. The client connects directly to the
dedicated server process on node2 and accesses the sales2
database instance.
SRVCTL Command Enhancements for Patching
With
this release, you can use the server control utility SRVCTL to shut down all
Oracle software running within an Oracle home, in preparation for patching.
Oracle Grid
Infrastructure patching is automated across all nodes, and patches can be
applied in a multi-node, multi-patch fashion.
Typical Installation Option
To
streamline cluster installations, especially for those customers who are new to
clustering, Oracle introduces the Typical Installation path. Typical
installation defaults as
many options as possible to those recommended as best practices.
Voting Disk Backup Procedure Change
In
prior releases, backing up the voting disks using a dd command was a required
post installation task. With Oracle Cluster ware release 11.2 and later,
backing up and restoring
a voting disk using the dd command is not supported. Backing up voting disks
manually is no longer required, because voting disks are backed
up automatically in the OCR as part of any configuration change. Voting disk
data is automatically restored to any added voting disks.
Oracle RAC Background Processes
The
GCS and GES processes, and the GRD collaborate to enable Cache Fusion. The
Oracle RAC processes and their identifiers are as follows:
■ ACMS: Atomic
Controlfile to Memory Service (ACMS)
In
an Oracle RAC environment, the ACMS per-instance process is an agent that
contributes to ensuring a distributed SGA memory update is either globally
committed
on success or globally aborted if a failure occurs.
■ GTX0-j:
Global Transaction Process
The
GTX0-j process provides transparent support for XA global transactions in an
Oracle RAC environment. The database autotunes the number of these processes
based
on the workload of XA global transactions.
■ LMD: Global
Enqueue Service Daemon (GES)
The
LMD process manages incoming remote resource requests within each instance.
■ LMON: Global
Enqueue Service Monitor (GES)
The
LMON process manages the GES, it maintains consistency of GCS memory structure
in case of process death. It is also responsible for cluster reconfiguration
and locks reconfiguration (node joining or leaving), it checks for instance
deaths and listens for local messaging. A detailed log file is created that
tracks any reconfigurations that have happened.
■ LMSn: Lock
Manager Server process (GCS)
This
is the cache fusion part and the most active process; it handles the consistent
copies of blocks that are transferred between instances.
It
receives requests from LMD to perform lock requests. It rolls back any
uncommitted transactions.
There
can be up to ten LMS processes running and can be started dynamically if demand
requires it.
They
manage lock manager service requests for GCS resources and send them to a
service queue to be handled by the LMSn process.
It
also handles global deadlock detection and monitors for lock conversion
timeouts.
As
a performance gain you can increase this process priority to make sure CPU
starvation does not occur
You
can see the statistics of this daemon by looking at the view X$KJMSDP
■ LCK0:
Instance Enqueue Process
The
LCK0 process manages non-Cache Fusion resource requests such as library and row
cache requests.
■ RMSn: Oracle
RAC Management Processes (RMSn)
The
RMSn processes perform manageability tasks for Oracle RAC. Tasks accomplished
by an RMSn process include creation of resources related to Oracle
RAC
when new instances are added to the clusters.
RSMN: Remote Slave Monitor manages background slave process
creation and communication on remote instances. These background slave
processes perform
Tasks
on behalf of a coordinating process running in another instance.
DIAG: Diagnostic Daemon
This
is a lightweight process; it uses the DIAG framework to monitor the health of
the cluster. It captures information for later diagnosis in the event of
failures. It will perform any necessary recovery
If
an operational hang is detected.
Top 5 issues for Instance Eviction (Doc ID 1374110.1)
srvctl status database -d HRPRD
Instance
HRPRD1 is running on node hrprddb1
Instance
HRPRD2 is running on node hrprddb2
============================================
$crsctl query css votedisk
##
STATE File Universal
Id
File Name Disk group
--
-----
-----------------
--------- ---------
1.
ONLINE 8c377cc40f344f0dff8da3d9b0d7d610 (/psvoting01/hrprd_vote) []
2.
ONLINE a371af8108beffcfbfb43e12ff17d2c4 (/psvoting02/hrprd_vote) []
3.
ONLINE 8b2ee6a3aabcefc6bf61114b6802cc48 (/psvoting03/hrprd_vote) []
Located
3 voting disk(s).
Cluster Verification Utility (CVU) command to verify OCR integrity
of all of the nodes in your cluster database:
cluvfy
comp ocr -n all -verbose
List the nodes in your cluster by running the following command on
one node:
olsnodes
Listing Backup Files
$ocrconfig
-showbackup
eldevdb2
2014/06/24 15:09:02
/oracrs/app/11.2.0.3/grid/cdata/hhcpsoft/backup00.ocr
eldevdb2
2014/06/24 11:09:00
/oracrs/app/11.2.0.3/grid/cdata/hhcpsoft/backup01.ocr
eldevdb2
2014/06/24 07:08:58
/oracrs/app/11.2.0.3/grid/cdata/hhcpsoft/backup02.ocr
eldevdb2
2014/06/23 03:08:44
/oracrs/app/11.2.0.3/grid/cdata/hhcpsoft/day.ocr
eldevdb2
2014/06/11 23:06:00
/oracrs/app/11.2.0.3/grid/cdata/hhcpsoft/week.ocr
hrdevdb2
2011/08/18 12:53:04 /oracrs/app/11.2.0/grid/cdata/hhcpsoft/backup_20110818_125304.ocr
ocrconfig - Configuration tool for Oracle Cluster/Local Registry.
Synopsis:
ocrconfig [option]
option:
[-local] -export <filename>
- Export OCR/OLR contents to a file
[-local] -import
<filename> - Import
OCR/OLR contents from a file
[-local] -upgrade [<user>
[<group>]]
- Upgrade OCR from previous version
-downgrade [-version <version
string>]
- Downgrade OCR to the specified version
[-local] -backuploc <dirname> -
Configure OCR/OLR backup location
[-local] -showbackup [auto|manual] - Show OCR/OLR backup information
[-local]
-manualbackup
- Perform OCR/OLR backup
[-local] -restore <filename> -
Restore OCR/OLR from physical backup
-replace <current filename> -replacement <new filename>
- Replace a OCR device/file <filename1> with <filename2>
-add
<filename>
- Add a new OCR device/file
-delete
<filename>
- Remove a OCR device/file
-overwrite
- Overwrite OCR configuration on disk
-repair -add <filename> | -delete <filename> | -replace <current
filename> -replacement <new filename>
- Repair OCR configuration on the local node
-help
-
Print out this help information
Run the following command to inspect the contents and verify the
integrity of the backup file:
ocrdump
-backupfile backup_file_name
Verify the integrity of OCR:
ocrcheck
Run ocrcheck
and
if the command returns a failure message, then both the primary OCR and the OCR
mirror have failed.
The OCRCHECK utility displays the version of
the OCR's block format, total space available and used space, OCRID, and the
OCR locations that you have configured. OCRCHECK performs a block-by-block
checksum operation for all of the blocks in all of the OCRs that you have
configured. It also returns an individual status for each file as well as a
result for the overall OCR integrity check.
OCRCHECK creates a log file in the directory CRS_home/log/hostname/client
. To
change amount of logging, edit the file CRS_home/srvm/admin/ocrlog.ini.
$./ocrcheck -local
Status
of Oracle Local Registry is as follows :
Version
: 3
Total space (kbytes) : 262120
Used space (kbytes)
: 2584
Available space (kbytes) : 259536
ID
: 814444380
Device/File Name :
/oracrs/app/11.2.0/grid/cdata/hrprddb2.olr
Device/File integrity check succeeded
Local registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
Run the following command to inspect the contents and verify the
integrity of the backup file:
ocrdump
-backupfile /oracrs/app/11.2.0.3/grid/cdata/hhcpsoft/backup01.ocr
The number of voting files you can store in a particular Oracle ASM disk group depends upon the redundancy of the
disk group.
· External redundancy: A disk group with
external redundancy can store only one voting disk
· Normal redundancy: A disk group with
normal redundancy stores three voting disks
· High redundancy: A disk group with
high redundancy stores five voting disks
To migrate voting disks to Oracle ASM, specify the Oracle ASM disk
group name in the following command:
$
crsctl replace votedisk +asm_disk_group
Backing up Voting Disks
In
Oracle Clusterware 11g release 2 (11.2), you no longer have to back up the
voting disk.
The
voting disk data is automatically backed up in OCR as part of any configuration
change and is automatically
restored
to any voting disk added. If all voting disks are corrupted, however,
Restoring Voting Disks
If
all of the voting disks are corrupted, then you can restore them, as follows:
I.
Restore OCR: This step is necessary only if OCR is also corrupted or otherwise
unavailable,
such
as if OCR is on Oracle ASM and the disk group is no longer available.
Restoring Oracle Cluster Registry
If
a resource fails, then before attempting to restore OCR, restart the resource.
As
a definitive verification that OCR failed, run ocrcheck and if the command
returns a failure message,
then
both the primary OCR and the OCR mirror have failed. Attempt to correct the
problem using the
OCR
restoration procedure for your platform.
Use
the following procedure to restore OCR on Linux or UNIX systems:
1.
List the nodes in your cluster by running the following command on one node:
$
olsnodes
2.
Stop Oracle Clusterware by running the following command as root on all of the
nodes:
#
crsctl stop crs
If
the preceding command returns any error due to OCR corruption, stop Oracle
Clusterware by running the following command as root on all of the nodes:
#
crsctl stop crs -f
3.
If you are restoring OCR to a cluster file system or network file system, then
run the following command as root to restore OCR with an OCR backup that you
can identify in "Listing Backup Files":
#
ocrconfig -restore file_name
After
you complete this step, proceed to step 10.
4.
Start the Oracle Clusterware stack on one node in exclusive mode by running the
following command as root:
#
crsctl start crs -excl -nocrs
The
-nocrs option ensures that the crsd process and OCR do not start with the rest
of the Oracle Clusterware stack.
Ignore
any errors that display.
Check
whether crsd is running. If it is, then stop it by running the following
command as root:
#
crsctl stop resource ora.crsd -init
Caution:
Do
not use the -init flag with any other command.
5.
If you want to restore OCR to an Oracle ASM disk group, then you must first
create a disk group using SQL*Plus that has the same name as the disk group you
want to restore and mount it on the local node.
If
you cannot mount the disk group locally, then run the following SQL*Plus
command:
SQL>
drop diskgroup disk_group_name force including contents;
Optionally,
if you want to restore OCR to a raw device, then you must run the ocrconfig
-repair -replace command as root, assuming that you have all the necessary
permissions on all nodes to do so and that OCR was not previously on Oracle
ASM.
6.
Restore OCR with an OCR backup that you can identify in "Listing Backup
Files" by running the following command as root:
#
ocrconfig -restore file_name
Notes:
Ensure
that the OCR devices that you specify in the OCR configuration exist and that
these OCR devices are valid.
If
you configured OCR in an Oracle ASM disk group, then ensure that the Oracle ASM
disk group exists and is mounted.
See
Also:
Oracle
Grid Infrastructure Installation Guide for information about creating OCRs
Oracle
Automatic Storage Management Administrator's Guide for more information about
Oracle ASM disk group management
7.
Verify the integrity of OCR:
#
ocrcheck
8.
Stop Oracle Clusterware on the node where it is running in exclusive mode:
#
crsctl stop crs -f
9.
Run the ocrconfig -repair -replace command as root on all the nodes in the
cluster where you did not the ocrconfig -restore command. For example, if you
ran the ocrconfig -restore command on node 1 of a four-node cluster, then you
must run the ocrconfig -repair -replace command on nodes 2, 3, and 4.
10.
Begin to start Oracle Clusterware by running the following command as root on
all of the nodes:
#
crsctl start crs
11.
Verify OCR integrity of all of the cluster nodes that are configured as part of
your cluster by running the following CVU command:
$
cluvfy comp ocr -n all –verbose
II.
Run the following command as root from only one node to start the Oracle
Clusterware stack in exclusive mode,
which
does not require voting files to be present or usable:
#
crsctl start crs -excl
III.
Run the crsctl query css votedisk command to retrieve the list of voting files
currently defined, similar to the following:
$
crsctl query css votedisk
--
-----
-----------------
--------- ---------
##
STATE File Universal
Id
File Name Disk group
1.
ONLINE 7c54856e98474f61bf349401e7c9fb95 (/dev/sdb1) [DATA]
This
list may be empty if all voting disks were corrupted, or may have entries that
are marked as status 3 or OFF.
IV.
Depending on where you store your voting files, do one of the following:
If
the voting disks are stored in Oracle ASM, then run the following command to
migrate the voting disks to the Oracle ASM disk group you specify:
crsctl
replace votedisk +asm_disk_group
The
Oracle ASM disk group to which you migrate the voting files must exist in
Oracle ASM. You can use this
command
whether the voting disks were stored in Oracle ASM or some other storage
device.
If
you did not store voting disks in Oracle ASM, then run the following command
using the File Universal Identifier (FUID) obtained in the previous step:
$
crsctl delete css votedisk FUID
Add
a voting disk, as follows:
$
crsctl add css votedisk path_to_voting_disk
V.
Stop the Oracle Clusterware stack as root:
#
crsctl stop crs
Note:
If
the Oracle Clusterware stack is running in exclusive mode, then use the -f
option to force the shutdown of the stack.
VI.
Restart the Oracle Clusterware stack in normal mode as root:
#
crsctl start crs
Voting Files
stored in ASM - How many disks per disk group do I need?
If Voting Files are stored in ASM, the
ASM disk group that hosts the Voting Files will place the appropriate number of
Voting Files in accordance to the redundancy level. Once Voting Files are
managed in ASM, a manual addition, deletion, or replacement of Voting Files
will fail, since users are not allowed to manually manage Voting Files in
ASM.
If the redundancy level of the disk group is set to "external", 1
Voting File is used.
If
the redundancy level of the disk group is set to "normal", 3 Voting
Files are used.
If
the redundancy level of the disk group is set to "high", 5 Voting
Files are used.
Note that Oracle Clusterware will store the disk within a disk group that holds
the Voting Files. Oracle Clusterware does not rely on ASM to access the Voting
Files.
In addition, note that there can be only one Voting File per failure group. In
the above list of rules, it is assumed that each disk that is supposed to hold
a Voting File resides in its own, dedicated failure group.
In other words, a disk group that is supposed to hold the above mentioned
number of Voting Files needs to have the respective number of failure groups
with at least one disk. (1 / 3 / 5 failure groups with at least one disk)
Consequently, a normal redundancy ASM disk group, which is supposed to
hold Voting Files, requires 3 disks in separate failure groups, while a
normal redundancy ASM disk
group that is not used to store Voting Files requires only 2 disks in separate
failure groups.
What happens if I lose my voting disk(s)?
If you lose 1/2 or more of all of your
voting disks, then nodes get evicted from the cluster, or nodes kick themselves
out of the cluster. It doesn't threaten database corruption. Alternatively you
can use external redundancy which means you are providing redundancy at the
storage level using RAID.
For
this reason when using Oracle for the redundancy of your voting disks, Oracle
recommends that customers use 3 or more voting disks in Oracle RAC 10g Release
2. Note: For best availability, the 3 voting files should be physically
separate disks. It is recommended to use an odd number as 4 disks will not be
any more highly available than 3 disks, 1/2 of 3 is 1.5...rounded to 2, 1/2 of
4 is 2, once we lose 2 disks, our cluster will fail with both 4 voting disks or
3 voting disks.
Restoring
corrupted voting disks is easy since there isn't any significant persistent
data stored in the voting disk. See the Oracle Clusterware Admin and Deployment
Guide for information on backup and restore of voting disks.
An
odd number of voting disks is required for proper clusterware configuration. A
node must be able to strictly access more than half of the voting disks at any
time. So, in order to tolerate a failure of n voting disks, there must be at
least 2n+1 configured. (n=1 means 3 voting disks).
Why should we have an odd number of voting disks?
The
odd number of voting disks should be configured to provide a method to
determine who in the cluster should survive.
A
node must be able to access more than half of the voting disks at any
time. For example, let’s have a two node cluster with an even number of
let’s say 2 voting disks. Let Node1 is able to access voting disk1 and Node2 is
able to access voting disk2. This means that there is no common file where
clusterware can check the heartbeat of both the nodes. If we have 3
voting disks and both the nodes are able to access more than half i.e. 2 voting
disks, there will be at least on disk which will be accessible by both the
nodes. The clusterware can use that disk to check the heartbeat of both the
nodes. Hence, each node should be able to access more than half the
number of voting disks. A node not able to do so will have to be evicted
from the cluster by another node that has more than half the voting disks, to
maintain the integrity of the cluster. After the cause of the failure has been
corrected and access to the voting disks has been restored, you can instruct
Oracle Clusterware to recover the failed node and restore it to the cluster.
Loss of more than half your voting disks will cause the entire cluster to
fail!!
HOW TO
IDENTIFY THE MASTER NODE IN RAC?
Master
node has the least Node-id in the cluster.
Node-ids
are assigned to the nodes in the same order as the nodes join the cluster.
Hence,
normally the node which joins the cluster first is the master node.
-
CRSd process on the Master node is responsible to initiate the OCR backup as per
the backup policy
-
Master node is also responsible to sync OCR cache across the nodes
-
CRSd process oth the master node reads from and writes to OCR on disk
-
In case of node eviction, the cluster is divided into two sub-clusters. The
sub-cluster containing fewer no. of nodes is evicetd. But, in case both the
sub-clusters have same no. of nodes, the sub-cluster having the master node
survives whereas the other sub-cluster is evicted.
Oracle
ClusterWare master’s information can be found
-
by scanning ocssd logs from various nodes
-
by scanning crsd logs from various nodes.
-
by identifying the node which takes the backup of the OCR.
If
master node gets evicted/rebooted, another node becomes the master.
HOW TO
FIND THE RESOURCE MASTER?
In
RAC, every data block is mastered by an instance. Mastering a block simply
means that master instance
keeps
track of the state of the block until the next reconfiguration event.
–
Remastering can be triggered as result of
– Manually
– Resource affinity
– Instance crash
-
Method – I gets info about master node from v$gcspfmaster_info
using data_object_id
-
Method – II gets info about master node from v$dlm_ress and
v$ges_enqueue using resource name in hexadecimal format
-
Method – III gets info about master node from x$kjbl with x$le using resource
name in hexadecimal format
–
CURRENT SCENARIO -
-
3 node setup
-
name of the database – orcl
—
SETUP –
SYS@NODE1>create
table scott.emp1 as select * from scott.emp;
–
Get data_object_id for scott.emp1
SQL>
col owner for a10
col
data_object_id for 9999999
col
object_name for a15
select
owner, data_object_id, object_name from dba_objects where owner = ‘SCOTT’ and
object_name = ‘EMP1';
OWNER
DATA_OBJECT_ID OBJECT_NAME
———-
————– —————
SCOTT
74652 EMP1
For
Method-II and Method-III, we need to find out file_id and block_id and hence
GCS resource name in hexadecimal format
–
Get File_id and range of block_ids of emp1 table
–
It can be seen that emp1 lies in block 523 of file 4.
SQL>select
dbms_rowid.rowid_relative_fno(rowid) FILE_NO,
min(dbms_rowid.rowid_block_number(rowid))
MIN_BLOCK_ID,
max(dbms_rowid.rowid_block_number(rowid))
MAX_BLOCK_ID
from
scott.emp1
group
by dbms_rowid.rowid_relative_fno(rowid);
FILE_NO MIN_BLOCK_ID MAX_BLOCK_ID
———-
———— ————
4
523 523
–
Find the GCS resource name to be used in the query using blodk_id and
data_object_id retrieved above.
x$kjbl.kjblname = resource name in hexadecimal format([id1],[id2],[type]
x$kjbl.kjblname2 = resource name in decimal format
Hexname will be used to query resource master using v$dlm_ress , v$ges_enqueue,
$kjbl
and x$le
SQL>
col hexname for a25
col
resource_name for a15
select
b.kjblname hexname, b.kjblname2 resource_name
from
x$le a, x$kjbl b
where
a.le_kjbl=b.kjbllockp
and
a.le_addr = ( select le_addr
from
x$bh
where
dbablk = 523
and
obj = 74652
and
class = 1
and
state <> 3);
HEXNAME
RESOURCE_NAME
————————-
—————
[0x20b][0x4],[BL]
523,4,BL
–
Manually master the EMP table to node1 –
SYS@NODE1>oradebug
lkdebug -m pkey <objectid>
SYS@NODE1>oradebug
lkdebug -m pkey 74652
—-
GET RESOURCE MASTER NAME ———-
Method
– I gets info about master node from v$gcspfmaster_info using data_object_id
–
——-
–
Note that current master is node1 (Node numbering starts from 0)
SYS@node1>col
object_name for A10
select o.object_name, m.CURRENT_MASTER
from
dba_objects o, v$gcspfmaster_info m
where
o.data_object_id=74652
and
m.data_object_id = 74652 ;
OBJECT_NAM
CURRENT_MASTER
———-
————–
EMP1
0
—-
Method II gets info about master node from v$dlm_ress and v$ges_enqueue
using resource name in hexadecimal format
–
check that master node is node1 (node numbering starts with 0)
SYS@NODE1>col
resource_name for a22 select a.resource_name, a.master_node
from
v$dlm_ress a, v$ges_enqueue b
where
upper(a.resource_name) = upper(b.resource_name1)
and
a.resource_name like ‘%[0x20b][0x4],[BL]%’;
RESOURCE_NAME
MASTER_NODE
———————-
———–
[0x20b][0x4],[BL]
0
Method
– III gets info about master node from x$kjbl with x$le
using resource name in hexadecimal format
–
This SQL joins x$kjbl with x$le to retrieve resource master for a
block
–
Note that current master is node1(MASTER=0)
SYS@NODE1>
select kj.kjblname, kj.kjblname2, kj.kjblmaster master
from
(select kjblname, kjblname2, kjblowner, kjblmaster, kjbllockp
from
x$kjbl
where
kjblname = ‘[0x20b][0x4],[BL]‘
)
kj, x$le le
where
le.le_kjbl = kj.kjbllockp
order
by le.le_addr;
KJBLNAME
KJBLNAME2
MASTER
——————————
—————————— ———-
[0x20b][0x4],[BL]
523,4,BL
0
RAC Troubleshooting, RAC LOG FILES
Cluster
Ready Services Daemon (crsd) Log Files: CRS home/log/hostname/crsd
Oracle Cluster Registry (OCR) records
For
the OCR tools (OCRDUMP, OCRCHECK, OCRCONFIG) record log information in the
following location:
CRS_Home/log/hostname/client
To
change the amount of logging, edit the path in the CRS_home/srvm/admin/ocrlog.ini
file.
The OCR server records log information in the following location:
CRS_home/log/hostname/crsd
To
change the amount of logging, edit the path in the
CRS_home/log/hostname/crsd/crsd.ini file.
Oracle
Process Monitor Daemon (OPROCD)
The
following path is specific to Linux: /etc/oracle/hostname.oprocd.log
This
path is dependent upon the installed Linux or UNIX platform.
Cluster
Synchronization Services (CSS): CRS_home/log/hostname/cssd
Event
Manager (EVM) information generated by evmd: CRS_home/log/hostname/evmd
Oracle
RAC RACG
The
Oracle RAC high availability trace files are located in the following two
locations:
CRS_home/log/hostname/racg
$ORACLE_HOME/log/hostname/racg
Core
files are in subdirectories of the log directory.
Each
RACG executable has a subdirectory assigned exclusively for that executable.
The
name of the RACG executable subdirectory is the same as the name of the
executable.
Enable Threads for remaining 3 nodes
Alter
database enable public thread 2;
Alter
database enable public thread 3;
Alter
database enable public thread 4;
shutdown
database on primary node vmohswort018.
Login
to below dbnodes as root user and perform mentioned steps
vmohswort022
vmohswort026
vmohswort029
cd
/oracrs/oracle/product/112/bin
./crsctl
start crs
./crsctl
enable crs
Login
to vmohswort022 server as orpwor1i osuser
SQL>startup
mount;
alter
system set undo_tablespace='APPS_UNDOTS2' scope=spfile sid='PWOR1I2';
alter
system set undo_tablespace='APPS_UNDOTS3' scope=spfile sid='PWOR1I3';
alter
system set undo_tablespace='APPS_UNDOTS4' scope=spfile sid='PWOR1I4';
alter
database open;
SQL>
alter database disable thread 1;
SQL>
alter database enable public thread 1;
SQL>
shutdown immediate;
Start
DB & DB listnener on all RAC nodes
RCONFIG TO CONVERT RAC
1. As the oracle user, navigate to the directory
$ORACLE_HOME/assistants/rconfig/sampleXMLs, and open the file ConvertToRAC.xml
using a text editor, such as vi.
2. Review the ConvertToRAC.xml file, and modify the parameters as
required for your system. The XML sample file contains comment lines that
provide instructions for how to configure the file.
When you have completed making changes, save the file with the
syntax filename.xml. Make a note of the name you select.
3. Navigate to the directory $ORACLE_HOME/bin, and use the
following syntax to run the command rconfig: rconfig input.xml
Where input.xml is the name of the XML input file you configured in
step 2.
For example, if you create an input XML file called convert.xml,
then enter the following command
$./rconfig convert.xml
Note:
The Convert verify option in the ConvertToRAC.xml file has three
options:
Convert verify="YES": rconfig performs checks to ensure
that the prerequisites for single-instance to Oracle RAC conversion have been
met before it starts conversion
Convert verify="NO": rconfig does not perform
prerequisite checks, and starts conversion
Convert verify="ONLY" rconfig only performs prerequisite
checks; it does not start conversion after completing prerequisite checks
If performing the conversion fails, then use the following
procedure to recover and reattempt the conversion.:
· Attempt to delete the database using
the DBCA delete database option.
· Restore the source database.
· Review the conversion log, and fix any
problems it reports that may have caused the conversion failure.
The rconfig log files are under the rconfig directory
in $ORACLE_BASE/cfgtoollogs.
· Reattempt the conversion.
RESTORE VOTE DISK
root@hrdevdb2
/oracrs/app/11.2.0.3/grid/bin =>crsctl query css votedisk
Start
CRS in exclusive mode
./crsctl
start crs -excl
Query
for voting disk
./crsctl
query css votedisk
If
you did not store voting disks in Oracle ASM, then run the following command
using the File Universal Identifier (FUID) obtained in the previous step:
$
crsctl delete css votedisk FUID
./crsctl
delete css votedisk a4a849393fb14f4fbf92cbef0d2d215a
./crsctl
delete css votedisk a8906deabdd3ff29ff0df2f382271ce0
Add
a voting disk, as follows:
./crsctl
add css votedisk /appvotingocrtie/vote/hhcvote1
./crsctl
add css votedisk /apptmpctrltrcenv/vote/hhcvote2
./crsctl
query css votedisk
root@hrdevdb2
/oracrs/app/11.2.0.3/grid/network/admin
=>/oracrs/app/11.2.0.3/grid/bin/crsctl query css votedisk
##
STATE File Universal Id
File Name Disk group
--
-----
-----------------
--------- ---------
1.
ONLINE 70165ff16d50df98ff135ecded8bd82e
(/appvotingocr/vote/hhcvote0) []
2.
ONLINE 69e63cb962f84f1cbf3055292d79fe86 (/appvotingocrtie/vote/hhcvote1)
[]
3.
ONLINE 8a844b35ed364f85bfb3ee3ab8262215
(/apptmpctrltrcenv/vote/hhcvote2) []
root@eldevdb2
/apptmpctrltrcenv/vote =>/oracrs/app/11.2.0.3/grid/bin/crsctl query css
votedisk
##
STATE File Universal
Id
File Name Disk group
--
-----
-----------------
--------- ---------
1.
ONLINE 70165ff16d50df98ff135ecded8bd82e
(/appvotingocr/vote/hhcvote0) []
2.
ONLINE 69e63cb962f84f1cbf3055292d79fe86
(/appvotingocrtie/vote/hhcvote1) []
3.
ONLINE 8a844b35ed364f85bfb3ee3ab8262215
(/apptmpctrltrcenv/vote/hhcvote2) []
Located
3 voting disk(s).
root@eldevdb2
/apptmpctrltrcenv/vote =>/oracrs/app/11.2.0.3/grid/bin/ocrcheck
Status
of Oracle Cluster Registry is as follows :
Version
: 3
Total space (kbytes) : 262120
Used space (kbytes)
: 8916
Available space (kbytes) : 253204
ID
: 38393080
Device/File Name :
/appvotingocr/ocr/hhcocr
Device/File integrity check succeeded
Device/File Name :
/appvotingocrtie/ocr/hhcmirrocr
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
root@hrdevdb2
/oracrs/app/11.2.0.3/grid/network/admin
=>/oracrs/app/11.2.0.3/grid/bin/ocrcheck
Status
of Oracle Cluster Registry is as follows :
Version
: 3
Total space (kbytes) : 262120
Used space (kbytes)
: 8916
Available space (kbytes) : 253204
ID
: 38393080
Device/File Name :
/appvotingocr/ocr/hhcocr
Device/File
integrity check succeeded
Device/File Name :
/appvotingocrtie/ocr/hhcmirrocr
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
A current read is
one where a session reads the current value of the data block from another
instance’s Data Buffer Cache. This current value contains the most up-to-date
committed data. The current read would happen when a second instance needs a
data block that has not been changed. This is often thought of as a read/read
situation. The current read will be seen as any wait event
that starts with gc current.
Consistent Read
A consistent read is needed when a particular block is being
accessed/modified by transaction T1 and at the same time another transaction T2
tries to access/read the block. If T1 has not been committed, T2 needs a
consistent read (consistent to the non-modified state of the database) copy of
the block to move ahead. A CR copy is created using the UNDO data for that
block. A sample series of steps for a CR in a normal setup would be:
1. Process tries to read a data block
2. Finds an active transaction in the
block
3. Then checks the UNDO segment to see if
the transaction has been committed or not
4. If the transaction has been committed,
it creates the REDO records and reads the block
5. If the transaction has not been
committed, it creates a CR block for itself using the UNDO/ROLLBACK
information.
6. Creating a CR image in RAC is a bit
different and can come with some I/O overheads. This is because the UNDO could
be spread across instances and hence to build a CR copy of the block, the
instance might has to visit UNDO segments on other instances and hence perform
certain extra I/O
For Full Version Very Nice# http://www.dba-oracle.com/t_gupta_oracle_rac_cache_fusion.htm
As you said Voting &
OCR Disk resides in ASM Diskgroups, but as per startup sequence OCSSD starts
first before than ASM, how is it possible?
How does OCSSD starts if
voting disk & OCR resides in ASM Diskgroups?
You might wonder how
CSSD, which is required to start the clustered ASM instance, can be started if
voting disks are stored in ASM? This sounds like a chicken-and-egg problem:
without access to the voting disks there is no CSS, hence the node cannot join
the cluster. But without being part of the cluster, CSSD cannot start the ASM
instance. To solve this problem the ASM disk headers have new metadata in 11.2:
you can use kfed to read the header of an ASM disk containing a voting disk.
The kfdhdb.vfstart and kfdhdb.vfend fields tell CSS where to find the voting
file. This does not require the ASM instance to be up. Once the voting disks
are located, CSS can access them and joins the cluster.
The default value for SCAN is cluster_name.GNS_sub_domain, or, cluster_name-scan.domain_name if GNS is not used. For example, in a cluster that does not use GNS, if your cluster name issalesRAC, and your domain is example.com, then the default SCAN address is salesRAC-scan.example.com:1521.
SCAN is configured in DNS to resolve to three IP addresses, and DNS should return the addresses using a round-robin algorithm. This means that when SCAN is resolved by DNS, the IP addresses are returned to the client in a different order each time.
REMOTE_LISTENER
database
parameter. sqlnet.ora
file on the client
contains EZCONNECT
in the list of naming methods specified by theNAMES.DIRECTORY_PATH
parameter.sales
database. The SCAN
listener compares the workload of the instances sales1
and sales2
and the workload
of the nodes on which they are running. Because node2
is less loaded
than node1
, the SCAN listener selects node2
and sends the
address for the listener on that node back to the client.node2
. The local listener starts a dedicated server process for the
connection to the database. The client connects directly to the
dedicated server process on node2 and accesses the sales2
database instance.ocrcheck
and
if the command returns a failure message, then both the primary OCR and the OCR
mirror have failed.directory CRS_home/log/hostname/client
. To
change amount of logging, edit the file CRS_home/srvm/admin/ocrlog.ini.
If the redundancy level of the disk group is set to "external", 1 Voting File is used.
Note that Oracle Clusterware will store the disk within a disk group that holds the Voting Files. Oracle Clusterware does not rely on ASM to access the Voting Files.
In addition, note that there can be only one Voting File per failure group. In the above list of rules, it is assumed that each disk that is supposed to hold a Voting File resides in its own, dedicated failure group.
In other words, a disk group that is supposed to hold the above mentioned number of Voting Files needs to have the respective number of failure groups with at least one disk. (1 / 3 / 5 failure groups with at least one disk)
Consequently, a normal redundancy ASM disk group, which is supposed to hold Voting Files, requires 3 disks in separate failure groups, while a normal redundancy ASM disk group that is not used to store Voting Files requires only 2 disks in separate failure groups.
Did you know that you can earn dollars by locking selected pages of your blog / website?
ReplyDeleteSimply join AdscendMedia and use their Content Locking tool.
شركة تسليك مجاري
ReplyDeleteشركات تنظيف
كشف تسربات المياه