Installation problems with 11.2.0.2 Oracle Clusterware/RAC on Solaris 10

If you’re having headaches with Solaris 10 and Oracle Clusterware 11.2.0.x (11.2.0.2.2 even), and get the following warning during root.sh:

CRS-2678: ‘ora.asm’ on ‘RAC1’ has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
CRS-5802: Unable to start the agent process
CRS-0215: Could not start resource ‘ora.CRSDATA.dg’.
CRS-4000: Command Create failed, or completed with errors.
create diskgroup CRSDATA … failed
ACFS-9200: Supported
/opt/oracrs/product/11.2.0.2/bin/srvctl start nodeapps -n rac1  … failed
FirstNode configuration failed at /opt/oracrs/product/11.2.0.2/crs/install/crsconfig_lib.pm line 8374.
/opt/oracrs/product/11.2.0.2/perl/bin/perl -I/opt/oracrs/product/11.2.0.2/perl/lib -I/opt/oracrs/product/11.2.0.2/crs/install /opt/oracrs/product/11.2.0.2/crs/install/rootcrs.pl execution failed

There’s a bug Bug 11834289: OHASD FAILED TO START TIMELY that causes OHASD to timeout due to issues in handling large number of filedescriptors. This also causes increased CPU usage!

From Oracle support:

The problem of bug:11834289 is that we take the rlimit and close all potential file handlers at process startup (even before logging in $GRID_HOME/log).
Hence, in case the stack is the same as before and the oraagent or orarootagent are still consuming cpu and the pfiles of those processes show
a rlimit as unlimited, then it is the problem. Those processes are furthermore special, i.e. they are agents started by the ora.crsd, that are failing (whereas all
other services are working, e.g. the ones started by the ohasd).

The workaround/fix for the problem I found is the following configuration on /etc/system:0::::process.max-file-descriptor=(basic,1024,deny),(privileged,1024,deny)
user.root:1::::process.max-file-descriptor=(basic,1024,deny),(privileged,1024,deny)
noproject:2::::process.max-file-descriptor=(basic,1024,deny)
default:3::::process.max-file-descriptor=(basic,1024,deny),(privileged,1024,deny);process.max-sem-nsems=(privileged,16384,deny);project.max-sem-ids=(privileged,65536,deny);project.max-shm-ids=(privileged,65536,deny);project.max-shm-memory=(privileged,68719476736,deny)
user.oracrs:100::oracrs::process.max-file-descriptor=(basic,65535,deny),(privileged,65535,deny);process.max-sem-nsems=(privileged,16384,deny);project.max-sem-ids=(privileged,65536,deny);project.max-shm-ids=(privileged,65536,deny);project.max-shm-memory=(privileged,68719476736,deny)
user.oracle11:101::oracle11::process.max-file-descriptor=(basic,65535,deny),(privileged,65535,deny);process.max-sem-nsems=(privileged,16384,deny);project.max-sem-ids=(privileged,65536,deny);project.max-shm-ids=(privileged,65536,deny);project.max-shm-memory=(privileged,68719476736,deny)

I hope this makes your life easier! Oracle support was able to help here. Thanks

Advertisements

During Rootupgrade.sh – what happens if you have CRS Cluster down?

Imagine, you want to do a rolling upgrade from 11.2.0.1 to 11.2.0.2…

First of all, apply patchset 9706490 to fix couple nasty bugs during ASM rolling upgrade.

Well, let’s say you shut down the Clusterware services (like I did for cleaning disk space) and you forget to start it up (well it isn’t rolling on this node if you shut it down 🙂

What happens if you run rootupgrade.sh during the Upgrade, after installing the new 11.2.0.2 binaries?

Well, nothing! This time rootupgrade.sh was smart! And no bugs in this environment after patching!

Just start up Clusterware (crsctl start crs) and rerun rootupgrade.sh and you’re good to continue!

Root.sh bugs in 11.2.0.2 clusterware

This is the warning when you run it first time on the second node – it almost works:

CRS-2672: Attempting to start ‘ora.cssd’ on ”host2′

CRS-2672: Attempting to start ‘ora.diskmon’ on ‘host2’

CRS-2676: Start of ‘ora.diskmon’ on ‘shost2” succeeded

CRS-2676: Start of ‘ora.cssd’ on ”host2” succeeded

Mounting Disk Group CRS_DATA failed with the following message:

ORA-15032: not all alterations performed

ORA-15017: diskgroup “CRS_DATA” cannot be mounted

ORA-15003: diskgroup “CRS_DATA” already mounted in another lock name space

Configuration of ASM … failed

see asmca logs at /opt/oracrs/admin/cfgtoollogs/asmca for details

Did not succssfully configure and start ASM at /opt/oracrs/product/11.2.0/crs/install/crsconfig_lib.pm line 6465.

/opt/oracrs/product/11.2.0/perl/bin/perl -I/opt/oracrs/product/11.2.0/perl/lib -I/opt/oracrs/product/11.2.0/crs/install /opt/oracrs/product/11.2.0/crs/install/rootcrs.pl execution failed

#

root.sh fails if you run it from other than $ORACLE_HOME dir on Solaris….as it can’t get current directory getwcd() call fails.

like if you execute: sh /opt/oracrs/product/11.2.0/root.sh   as suggested by installer, it will fail   on the second cluster node   when it tries to join existing cluster

FIX:

cd  /opt/oracrs/product/11.2.0

./root.sh works