Unix for the DBA
How to kill all similar processes with single command (in this case opmn)
ps -ef | grep opmn |grep -v grep | awk ‘{print $2}’ |xargs -i kill -9 {}
Locating Files under a particular directory
find . -print |grep -i test.sql
Using AWK in UNIX
To remove a specific column of output from a UNIX command – for example to determine the UNIX process Ids for all Oracle processes on server (second column)
ps -ef |grep -i oracle |awk '{ print $2 }'
Changing the standard prompt for Oracle Users
Edit the .profile for the oracle user
PS1="`hostname`*$ORACLE_SID:$PWD>"
Display top 10 CPU consumers using the ps command
/usr/ucb/ps auxgw | head -11
Show number of active Oracle dedicated connection users for a particular ORACLE_SID
ps -ef | grep $ORACLE_SID|grep -v grep|grep -v ora_|wc -l
Display the number of CPU’s in Solaris
psrinfo -v | grep "Status of processor"|wc -l
Display the number of CPU’s in AIX
lsdev -C | grep Process|wc -l
Display RAM Memory size on Solaris
prtconf |grep -i mem
Display RAM memory size on AIX
First determine name of memory device
lsdev -C |grep mem
then assuming the name of the memory device is ‘mem0’
lsattr -El mem0
Swap space allocation and usage
Solaris : swap -s or swap -lAix : lsps -a
Total number of semaphores held by all instances on server
ipcs -as | awk '{sum += $9} END {print sum}'
View allocated RAM memory segments
ipcs -pmb
Manually deallocate shared memeory segments
ipcrm -m ''
Show mount points for a disk in AIX
lspv -l hdisk13
Display amount of occupied space (in KB) for a file or collection of files in a directory or sub-directory
du -ks * | sort -n| tail
Display total file space in a directory
du -ks .
Cleanup any unwanted trace files more than seven days old
find . *.trc -mtime +7 -exec rm {} \;
Locate Oracle files that contain certain strings
find . -print | xargs grep rollback
Locate recently created UNIX files (in the past one day)
find . -mtime -1 -print
Finding large files on the server (more than 100MB in size)
find . -size +102400 -print
Crontab :
To submit a task every Tuesday (day 2) at 2:45PM
45 14 2 * * /opt/oracle/scripts/tr_listener.sh > /dev/null 2>&1
To submit a task to run every 15 minutes on weekdays (days 1-5)
15,30,45 * 1-5 * * /opt/oracle/scripts/tr_listener.sh > /dev/null 2>&1
To submit a task to run every hour at 15 minutes past the hour on weekends (days 6 and 0)
15 * 0,6 * * opt/oracle/scripts/tr_listener.sh > /dev/null 2>&1
SRVCTL:
srvctl command target [options]
commands: enable|disable|start|stop|relocate|status|add|remove|
modify|getenv|setenv|unsetenv|config
targets: database/db|instance/inst|service/serv|nodeapps|asm|listener
srvctl -help or srvctl -v
srvctl -V -- prints version
srvctl version: 10.2.0.0.0 (or) srvctl version: 11.0.0.0.0
srvctl -h -- print usage
srvctl status service –h
Database:
srvctl add database -d db_name -o ORACLE_HOME [-m domain_name][-p spfile] [-A name|ip/netmask]
[-r {PRIMARY|PHYSICAL_STANDBY|LOGICAL_STANDBY|SNAPSHOT_STANDBY}]
[-s start_options] [-n db_name] [-y {AUTOMATIC|MANUAL}]
srvctl remove database -d db_name [-f]
srvctl start database -d db_name [-o start_options] [-c connect_str | -q]
srvctl start database -d db_name [-o open]
srvctl start database -d db_name -o nomount
srvctl start database -d db_name -o mount
srvctl start db -d prod
srvctl stop database -d db_name [-o stop_options] [-c connect_str | -q]
srvctl stop database -d db_name [-o normal]
srvctl stop database -d db_name -o transactional
srvctl stop database -d db_name -o immediate
srvctl stop database -d db_name -o abort
srvctl status database -d db_name [-f] [-v] [-S level]
srvctl status database -d db_name -v service_name
srvctl status database -d hrms
srvctl enable database -d db_name
srvctl disable database -d db_name
srvctl config database
srvctl config database -d db_name [-a] [-t]
srvctl modify database -d db_name [-n db_name] [-o ORACLE_HOME] [-m domain_name] [-p spfile]
[-r {PRIMARY|PHYSICAL_STANDBY|LOGICAL_STANDBY|SNAPSHOT_STANDBY}] [-s start_options] [-y {AUTOMATIC|MANUAL}]
srvctl modify database -d hrms -r physical_standby
srvctl modify db -d RAC -p /u03/oradata/RAC/spfileRAC.ora -- moves parameter file
srvctl getenv database -d db_name [-t name_list]
srvctl setenv database -d db_name {-t name=val[,name=val,...]|-T name=val}
srvctl unsetenv database -d db_name [-t name_list]
Instance:
srvctl add instance –d db_name –i inst_name -n node_name
srvctl remove instance –d db_name –i inst_name [-f]
srvctl start instance -d db_name -i inst_names [-o start_options] [-c connect_str | -q]
srvctl start instance –d db_name –i inst_names [-o open]
srvctl start instance –d db_name –i inst_names -o nomount
srvctl start instance –d db_name –i inst_names -o mount
srvctl start instance –d prod -i prod3
srvctl stop instance -d db_name -i inst_names [-o stop_options] [-c connect_str | -q]
srvctl stop instance –d db_name –i inst_names [-o normal]
srvctl stop instance –d db_name –i inst_names -o transactional
srvctl stop instance –d db_name –i inst_names -o immediate
srvctl stop instance –d db_name –i inst_names -o abort
srvctl stop inst –d prod -i prod6
srvctl status instance –d db_name –i inst_names [-f] [-v] [-S level]
srvctl status inst –d racdb -i racdb2
srvctl enable instance –d db_name –i inst_names
srvctl disable instance –d db_name –i inst_names
srvctl modify instance -d db_name -i inst_name {-s asm_inst_name|-r} -- set a dependency of instance to ASM
srvctl modify instance -d db_name -i inst_name -n node_name -- move the instance
srvctl modify instance -d db_name -i inst_name -r -- remove the instance
srvctl getenv instance –d db_name –i inst_name [-t name_list]
srvctl setenv instance –d db_name [–i inst_name] {-t "name=val[,name=val,...]" | -T "name=val"}
srvctl unsetenv instance –d db_name [–i inst_name] [-t name_list]
Service:
srvctl add service -d db_name -s service_name -r pref_insts [-a avail_insts] [-P TAF_policy]
srvctl add service -d db_name -s service_name -u {-r "new_pref_inst" | -a "new_avail_inst"}
srvctl add service -d RAC -s PRD -r RAC01,RAC02 -a RAC03,RAC04
srvctl add serv -d CRM -s CRM -r CRM1 -a CRM3 -P basic
srvctl remove service -d db_name -s service_name [-i inst_name] [-f]
srvctl start service -d db_name [-s service_names [-i inst_name]] [-o start_options]
srvctl start service -d db_name -s service_names [-o open]
srvctl start service -d db_name -s service_names -o nomount
srvctl start service -d db_name -s service_names -o mount
srvctl stop service -d db_name [-s service_names [-i inst_name]] [-f]
srvctl status service -d db_name [-s service_names] [-f] [-v] [-S level]
srvctl enable service -d db_name -s service_names [–i inst_name]
srvctl disable service -d db_name -s service_names [–i inst_name]
srvctl config service -d db_name [-s service_name] [-a] [-S level]
srvctl config service -d db_name -a -- -a shows TAF configuration
srvctl config service -d WEBTST -s webtest PREF:WEBTST1 AVAIL:WEBTST2
srvctl modify service -d db_name -s service_name -i old_inst_name -t new_inst_name [-f]
srvctl modify service -d db_name -s service_name -i avail_inst_name -r [-f]
srvctl modify service -d db_name -s service_name -n -i preferred_list [-a available_list] [-f]
srvctl modify service -d db_name -s service_name -i old_inst_name -a avail_inst -P TAF_policy
srvctl modify serv -d PROD -s SDW -n -i I1,I2,I3,I4 -a I5,I6
srvctl relocate service -d db_name -s service_name –i old_inst_name -t target_inst [-f]
srvctl getenv service -d db_name -s service_name -t name_list
srvctl setenv service -d db_name [-s service_name] {-t "name=val[,name=val,...]" | -T "name=val"}
srvctl unsetenv service -d db_name -s service_name -t name_list
Nodeapps:
#srvctl add nodeapps -n node_name -o ORACLE_HOME -A name|ip/netmask[/if1[|if2|...]]
#srvctl add nodeapps -n lnx02 -o $ORACLE_HOME -A 192.168.0.151/255.255.0.0/eth0
#srvctl remove nodeapps -n node_names [-f]
#srvctl start nodeapps -n node_name -- Starts GSD, VIP, listener & ONS
#srvctl stop nodeapps -n node_name [-r] -- Stops GSD, VIP, listener & ONS
#srvctl status nodeapps -n node_name
#srvctl config nodeapps -n node_name [-a] [-g] [-o] [-s] [-l]
#srvctl modify nodeapps -n node_name [-A new_vip_address]
#srvctl modify nodeapps -n lnx06 -A 10.50.99.43/255.255.252.0/eth0
#srvctl getenv nodeapps -n node_name [-t name_list]
#srvctl setenv nodeapps -n node_name {-t "name=val[,name=val,...]"|-T "name=val"}
#srvctl unsetenv nodeapps -n node_name [-t name_list]
ASM:
srvctl add asm -n node_name -i asminstance -o ORACLE_HOME [-p spfile]
srvctl remove asm -n node_name [-i asminstance] [-f]
srvctl start asm -n node_name [-i asminstance] [-o start_options] [-c connect_str | -q]
srvctl start asm -n node_name -i asminstance [-o open]
srvctl start asm -n node_name -i asminstance -o nomount
srvctl start asm -n node_name -i asminstance -o mount
srvctl stop asm -n node_name [-i asminstance] [-o stop_options] [-c connect_str | -q]
srvctl stop asm -n node_name -i asminstance [-o normal]
srvctl stop asm -n node_name -i asminstance -o transactional
srvctl stop asm -n node_name -i asminstance -o immediate
srvctl stop asm -n node_name -i asminstance -o abort
srvctl status asm -n node_name
srvctl enable asm -n node_name [-i asminstance]
srvctl disable asm -n node_name [-i asminstance]
srvctl config asm -n node_name
srvctl modify asm -n node_name -i asminstance [-o ORACLE_HOME] [-p spfile]
Listener:
srvctl add listener -n node_name -o ORACLE_HOME [-l listener_name]
-- 11g command
srvctl remove listener -n node_name [-l listener_name] -- 11g command
srvctl start listener -n node_name [-l listener_names]
srvctl stop listener -n node_name [-l listener_names]
srvctl config listener -n node_name
Wednesday, October 6, 2010
Oracle Clusterware processes for 10g on Unix and Linux
What are Oracle Clusterware processes for 10g on Unix and Linux
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user
Event manager daemon (evmd) —A background process that publishes events that crs creates.
Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.
RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.
What are Oracle database background processes specific to RAC
•LMS—Global Cache Service Process
•LMD—Global Enqueue Service Daemon
•LMON—Global Enqueue Service Monitor
•LCK0—Instance Enqueue Process
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.
What are Oracle Clusterware Components
Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.
Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster
How do you troubleshoot node reboot
Please check metalink ...
Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.
How do you backup the OCR
There is an automatic backup mechanism for OCR. The default location is : $ORA_CRS_HOME\cdata\"clustername"\
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the command:
# ocrconfig -manualbackup
How do you backup voting disk
#dd if=voting_disk_name of=backup_file_name
How do I identify the voting disk location
#crsctl query css votedisk
How do I identify the OCR file location
check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck
Is ssh required for normal Oracle RAC operation ?
"ssh" are not required for normal Oracle RAC operation. However "ssh" should be enabled for Oracle RAC and patchset installation.
What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.
Click here for more details from Oracle
What is the purpose of Private Interconnect ?
Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.
Why do we have a Virtual IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.
What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?
This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network.
How many nodes are supported in a RAC Database?
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.
Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.
what is the purpose of the ONS daemon?
The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.
This in order to facilitate:
a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user
Event manager daemon (evmd) —A background process that publishes events that crs creates.
Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.
RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.
What are Oracle database background processes specific to RAC
•LMS—Global Cache Service Process
•LMD—Global Enqueue Service Daemon
•LMON—Global Enqueue Service Monitor
•LCK0—Instance Enqueue Process
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.
What are Oracle Clusterware Components
Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.
Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster
How do you troubleshoot node reboot
Please check metalink ...
Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.
How do you backup the OCR
There is an automatic backup mechanism for OCR. The default location is : $ORA_CRS_HOME\cdata\"clustername"\
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the command:
# ocrconfig -manualbackup
How do you backup voting disk
#dd if=voting_disk_name of=backup_file_name
How do I identify the voting disk location
#crsctl query css votedisk
How do I identify the OCR file location
check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck
Is ssh required for normal Oracle RAC operation ?
"ssh" are not required for normal Oracle RAC operation. However "ssh" should be enabled for Oracle RAC and patchset installation.
What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.
Click here for more details from Oracle
What is the purpose of Private Interconnect ?
Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.
Why do we have a Virtual IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.
What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?
This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network.
How many nodes are supported in a RAC Database?
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.
Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.
what is the purpose of the ONS daemon?
The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.
This in order to facilitate:
a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.
Tuesday, October 5, 2010
ASM REBALANCING
Dynamic Storage Configuration
ASM enables you to change the storage configuration without having to take the database offline. It automatically rebalances—redistributes file data evenly across all the disks of the disk group—after you add disks to or drop disks from a disk group.
Should a disk failure occur, ASM automatically rebalances to restore full redundancy for files that had extents on the failed disk. When you replace the failed disk with a new disk, ASM rebalances the disk group to spread data evenly across all disks, including the replacement disk.
Tuning Rebalance Operations
The V$ASM_OPERATION view provides information that can be used for adjusting ASM_POWER_LIMIT and the resulting power of rebalance operations. The V$ASM_OPERATION view also gives an estimate in the EST_MINUTES column of the amount of time remaining for the rebalance operation to complete. You can see the effect of changing the rebalance power by observing the change in the time estimate.
Effects of Adding and Dropping Disks from a Disk Group
ASM automatically rebalances whenever disks are added or dropped. For a normal drop operation (without the FORCE option), a disk is not released from a disk group until data is moved off of the disk through rebalancing. Likewise, a newly added disk cannot support its share of the I/O workload until rebalancing completes. It is more efficient to add or drop multiple disks at the same time so that they are rebalanced as a single operation. This avoids unnecessary movement of data.
For a drop operation, when rebalance is complete, ASM takes the disk offline momentarily, and then drops it, setting disk header status to FORMER.
You can add or drop disks without shutting down the database. However, a performance impact on I/O activity may result.
ASM enables you to change the storage configuration without having to take the database offline. It automatically rebalances—redistributes file data evenly across all the disks of the disk group—after you add disks to or drop disks from a disk group.
Should a disk failure occur, ASM automatically rebalances to restore full redundancy for files that had extents on the failed disk. When you replace the failed disk with a new disk, ASM rebalances the disk group to spread data evenly across all disks, including the replacement disk.
Tuning Rebalance Operations
The V$ASM_OPERATION view provides information that can be used for adjusting ASM_POWER_LIMIT and the resulting power of rebalance operations. The V$ASM_OPERATION view also gives an estimate in the EST_MINUTES column of the amount of time remaining for the rebalance operation to complete. You can see the effect of changing the rebalance power by observing the change in the time estimate.
Effects of Adding and Dropping Disks from a Disk Group
ASM automatically rebalances whenever disks are added or dropped. For a normal drop operation (without the FORCE option), a disk is not released from a disk group until data is moved off of the disk through rebalancing. Likewise, a newly added disk cannot support its share of the I/O workload until rebalancing completes. It is more efficient to add or drop multiple disks at the same time so that they are rebalanced as a single operation. This avoids unnecessary movement of data.
For a drop operation, when rebalance is complete, ASM takes the disk offline momentarily, and then drops it, setting disk header status to FORMER.
You can add or drop disks without shutting down the database. However, a performance impact on I/O activity may result.
Testing the Fail over in Real Application Clusters Oracle10gR2 using service
Testing the Fail over in Real Application Clusters Oracle10gR2 using service:
How to test the Fail-over, if the Client is connected to a RAC database and one of the nodes goes down?
1. Client connected to Database from a SQL Plus session from a remote Workstation.
*********************************************
SQL> show user
USER is "SCOTT"
SQL> select INSTANCE_NAME, INSTANCE_NUMBER from v$INSTANCE;
INSTANCE_NAME INSTANCE_NUMBER
---------------- ---------------
devdb1 1
*********************************************
Checking the User /Client status from a DB Server
SQL> select USERNAME, INST_ID, MACHINE, TERMINAL, STATUS
2 from gv$session where USERNAME = 'SCOTT';
USERNAME INST_ID MACHINE TERMINAL STATUS
--------- ---------- ------------------------------ -------------- --------
SCOTT 1 WORKGROUP\PSDBA03-02 PSDBA03-02 INACTIVE
****************************************
3. On the DB Server, force /Kill the Instance devdb1.
4. The status of the DBInstance devdb1 is down.
Name Type Target State Host
------------------------------------------------------------
ora....SDEV.cs application ONLINE ONLINE rac2
ora....db1.srv application ONLINE OFFLINE
ora....db2.srv application ONLINE ONLINE rac2
ora.devdb.db application ONLINE ONLINE rac2
ora....b1.inst application OFFLINE OFFLINE
ora....b2.inst application ONLINE ONLINE rac2
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
****************************
5. You can see difference with Normal Service, the SQL plus connection throws a error, but with Client-Side Load balancing and TAF the user connection still exists. For User the fail-over is transparent and can go ahead with his session without the need to reconnect.
SQL> select INSTANCE_NAME from v$instance;
INSTANCE_NAME
----------------
devdb2
SQL> show user
USER is "SCOTT"
This is a simple test for fail-over, with a SQL Plus connections from a remote server.
How to test the Fail-over, if the Client is connected to a RAC database and one of the nodes goes down?
1. Client connected to Database from a SQL Plus session from a remote Workstation.
*********************************************
SQL> show user
USER is "SCOTT"
SQL> select INSTANCE_NAME, INSTANCE_NUMBER from v$INSTANCE;
INSTANCE_NAME INSTANCE_NUMBER
---------------- ---------------
devdb1 1
*********************************************
Checking the User /Client status from a DB Server
SQL> select USERNAME, INST_ID, MACHINE, TERMINAL, STATUS
2 from gv$session where USERNAME = 'SCOTT';
USERNAME INST_ID MACHINE TERMINAL STATUS
--------- ---------- ------------------------------ -------------- --------
SCOTT 1 WORKGROUP\PSDBA03-02 PSDBA03-02 INACTIVE
****************************************
3. On the DB Server, force /Kill the Instance devdb1.
4. The status of the DBInstance devdb1 is down.
Name Type Target State Host
------------------------------------------------------------
ora....SDEV.cs application ONLINE ONLINE rac2
ora....db1.srv application ONLINE OFFLINE
ora....db2.srv application ONLINE ONLINE rac2
ora.devdb.db application ONLINE ONLINE rac2
ora....b1.inst application OFFLINE OFFLINE
ora....b2.inst application ONLINE ONLINE rac2
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
****************************
5. You can see difference with Normal Service, the SQL plus connection throws a error, but with Client-Side Load balancing and TAF the user connection still exists. For User the fail-over is transparent and can go ahead with his session without the need to reconnect.
SQL> select INSTANCE_NAME from v$instance;
INSTANCE_NAME
----------------
devdb2
SQL> show user
USER is "SCOTT"
This is a simple test for fail-over, with a SQL Plus connections from a remote server.
Monday, October 4, 2010
RAC Switchover / Switchback
RAC Switchover / Switchback for 2 Node Primary with 2 Node Standby. Got from oracle forums.
DB Name DB Unique Name Host Name Instance Name
--------------------------------------------------------------------------------
live live linux1 live1
live live linux2 live2
live livestdby linux3 livestdby1
live livestdby linux4 livestdby2
Verify that each database is properly configured for the role it is about to assume and the standby database is in mounted state.
(Verify all Dataguard parameters on each node for Primary & Standby)
Like,
Log_archive_dest_1
Log_archive_dest_2
Log_archive_dest_state_1
Log_archive_dest_state_2
Fal_client
Fal_server
Local_listener
Remote_listener
Standby_archive_Dest
Standby_archive_management
service_names
db_unique_name
instance_name
db_file_name_convert
log_file_name_convert
Verify that both Primary RAC & Dataguard RAC are functioning properly and both are in Sync
On Primary,
Select thread#,max(sequence#) from v$archived_log group by thread#;
On Standby,
Select thread#,max(sequence#) from v$log_history group by thread#;
Before performing a switchover from a RAC primary shut down all but one primary instance (they can be restarted after the switchover has completed).
./srvctl stop instance –d live –i live1
Before performing a switchover or a failover to a RAC standby shut down all but one standby instance (they can be restarted after the role transition has completed).
./srvctl stop instance –d live –i livestdby1
On the primary database initiate the switchover:
alter database commit to switchover to physical standby with session shutdown;
Shutdown former Primary database & Startup in Mount State.
Shut immediate;
Startup mount;
select name,db_unique_name, log_mode,open_mode,controlfile_type,switchover_status,database_role from v$database;
Make log_Archive_Dest_state_2 to DEFER
alter system set log_archive_dest_state_2='DEFER' sid='*';
On the (old) standby database,
select name,log_mode,open_mode,controlfile_type,switchover_status,database_role from v$database;
On the (old) standby database switch to new primary role:
alter database commit to switchover to primary;
shut immediate;
startup;
On new Primary database,
select name,log_mode,open_mode,controlfile_type,switchover_status,database_role from v$database;
Make log_Archive_Dest_state_2 to ENABLE
alter system set log_archive_dest_state_2='ENABLE' sid='*';
Add tempfiles in New Primary database.
Do some archivelog switches on new primary database & verify that archives are getting transferred to Standby database.
On new primary,
select error from v$archive_Dest_status;
select max(sequence#) from v$archived_log;
On new Standby, Start Redo Apply
alter database recover managed standby database using current logfile disconnect;
Select max(sequence#) from v$log_history; (should be matching with Primary)
Now Start RAC databases services (both Primary – in open & Standby – in mount)
On new Primary Server.
./srvctl start instance –d live –i livestdby2
Verify using ./crs_stat –t
Check that database is opened in R/W mode.
On new Standby Server.
./srvctl start instance –d live –i live2 –o mount
Now add TAF services on new Primary (former Standby) Server.
By Command Prompt,
./srvctl add service -d live -s srvc_livestdby -r livestdby1,livestdby2 -P BASIC
OR
By GUI,
dbca -> Oracle Read Application Cluster database -> Service Management -> select database -> add services, details (Preferred / Available), TAF Policy (Basic / Preconnect) - > Finish
Start the services,
./srvctl start service -d live
Verify the same,
./crs_stat -t
Perform TAF testing, to make sure Load Balancing
DB Name DB Unique Name Host Name Instance Name
--------------------------------------------------------------------------------
live live linux1 live1
live live linux2 live2
live livestdby linux3 livestdby1
live livestdby linux4 livestdby2
Verify that each database is properly configured for the role it is about to assume and the standby database is in mounted state.
(Verify all Dataguard parameters on each node for Primary & Standby)
Like,
Log_archive_dest_1
Log_archive_dest_2
Log_archive_dest_state_1
Log_archive_dest_state_2
Fal_client
Fal_server
Local_listener
Remote_listener
Standby_archive_Dest
Standby_archive_management
service_names
db_unique_name
instance_name
db_file_name_convert
log_file_name_convert
Verify that both Primary RAC & Dataguard RAC are functioning properly and both are in Sync
On Primary,
Select thread#,max(sequence#) from v$archived_log group by thread#;
On Standby,
Select thread#,max(sequence#) from v$log_history group by thread#;
Before performing a switchover from a RAC primary shut down all but one primary instance (they can be restarted after the switchover has completed).
./srvctl stop instance –d live –i live1
Before performing a switchover or a failover to a RAC standby shut down all but one standby instance (they can be restarted after the role transition has completed).
./srvctl stop instance –d live –i livestdby1
On the primary database initiate the switchover:
alter database commit to switchover to physical standby with session shutdown;
Shutdown former Primary database & Startup in Mount State.
Shut immediate;
Startup mount;
select name,db_unique_name, log_mode,open_mode,controlfile_type,switchover_status,database_role from v$database;
Make log_Archive_Dest_state_2 to DEFER
alter system set log_archive_dest_state_2='DEFER' sid='*';
On the (old) standby database,
select name,log_mode,open_mode,controlfile_type,switchover_status,database_role from v$database;
On the (old) standby database switch to new primary role:
alter database commit to switchover to primary;
shut immediate;
startup;
On new Primary database,
select name,log_mode,open_mode,controlfile_type,switchover_status,database_role from v$database;
Make log_Archive_Dest_state_2 to ENABLE
alter system set log_archive_dest_state_2='ENABLE' sid='*';
Add tempfiles in New Primary database.
Do some archivelog switches on new primary database & verify that archives are getting transferred to Standby database.
On new primary,
select error from v$archive_Dest_status;
select max(sequence#) from v$archived_log;
On new Standby, Start Redo Apply
alter database recover managed standby database using current logfile disconnect;
Select max(sequence#) from v$log_history; (should be matching with Primary)
Now Start RAC databases services (both Primary – in open & Standby – in mount)
On new Primary Server.
./srvctl start instance –d live –i livestdby2
Verify using ./crs_stat –t
Check that database is opened in R/W mode.
On new Standby Server.
./srvctl start instance –d live –i live2 –o mount
Now add TAF services on new Primary (former Standby) Server.
By Command Prompt,
./srvctl add service -d live -s srvc_livestdby -r livestdby1,livestdby2 -P BASIC
OR
By GUI,
dbca -> Oracle Read Application Cluster database -> Service Management -> select database -> add services, details (Preferred / Available), TAF Policy (Basic / Preconnect) - > Finish
Start the services,
./srvctl start service -d live
Verify the same,
./crs_stat -t
Perform TAF testing, to make sure Load Balancing
OCR is automatically backed
OCR is automatically backed up physically:
Every four hours: CRS keeps the last three copies.
At the end of every day: CRS keeps the last two copies.
At the end of every week: CRS keeps the last two copies
[oracle@rac2 ~]$ ocrconfig -showbackup
rac2 2010/10/04 10:29:12 /u01/app/oracle/product/crs/cdata/crs
rac2 2010/10/04 06:29:11 /u01/app/oracle/product/crs/cdata/crs
rac2 2010/10/04 02:29:11 /u01/app/oracle/product/crs/cdata/crs
rac2 2010/10/03 02:29:07 /u01/app/oracle/product/crs/cdata/crs
dbrac1 2010/09/17 04:17:06 /u01/app/oracle/product/crs/cdata/crs
[oracle@rac2 ~]$ cd /u01/app/oracle/product/crs/cdata/crs
[oracle@rac2 crs]$ ls -ltr
total 27916
-rw-r--r-- 1 root root 4079616 Jun 13 02:16 week.ocr
-rw-r--r-- 1 root root 4079616 Sep 29 00:58 week_.ocr
-rw-r--r-- 1 root root 4079616 Oct 3 02:29 day.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 02:29 day_.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 02:29 backup02.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 06:29 backup01.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 10:29 backup00.ocr
Every four hours: CRS keeps the last three copies.
At the end of every day: CRS keeps the last two copies.
At the end of every week: CRS keeps the last two copies
[oracle@rac2 ~]$ ocrconfig -showbackup
rac2 2010/10/04 10:29:12 /u01/app/oracle/product/crs/cdata/crs
rac2 2010/10/04 06:29:11 /u01/app/oracle/product/crs/cdata/crs
rac2 2010/10/04 02:29:11 /u01/app/oracle/product/crs/cdata/crs
rac2 2010/10/03 02:29:07 /u01/app/oracle/product/crs/cdata/crs
dbrac1 2010/09/17 04:17:06 /u01/app/oracle/product/crs/cdata/crs
[oracle@rac2 ~]$ cd /u01/app/oracle/product/crs/cdata/crs
[oracle@rac2 crs]$ ls -ltr
total 27916
-rw-r--r-- 1 root root 4079616 Jun 13 02:16 week.ocr
-rw-r--r-- 1 root root 4079616 Sep 29 00:58 week_.ocr
-rw-r--r-- 1 root root 4079616 Oct 3 02:29 day.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 02:29 day_.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 02:29 backup02.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 06:29 backup01.ocr
-rw-r--r-- 1 root root 4079616 Oct 4 10:29 backup00.ocr
Subscribe to:
Posts (Atom)