Installation problems with 11.2.0.2 Oracle Clusterware/RAC on Solaris 10

If you’re having headaches with Solaris 10 and Oracle Clusterware 11.2.0.x (11.2.0.2.2 even), and get the following warning during root.sh:

CRS-2678: ‘ora.asm’ on ‘RAC1’ has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
CRS-5802: Unable to start the agent process
CRS-0215: Could not start resource ‘ora.CRSDATA.dg’.
CRS-4000: Command Create failed, or completed with errors.
create diskgroup CRSDATA … failed
ACFS-9200: Supported
/opt/oracrs/product/11.2.0.2/bin/srvctl start nodeapps -n rac1  … failed
FirstNode configuration failed at /opt/oracrs/product/11.2.0.2/crs/install/crsconfig_lib.pm line 8374.
/opt/oracrs/product/11.2.0.2/perl/bin/perl -I/opt/oracrs/product/11.2.0.2/perl/lib -I/opt/oracrs/product/11.2.0.2/crs/install /opt/oracrs/product/11.2.0.2/crs/install/rootcrs.pl execution failed

There’s a bug Bug 11834289: OHASD FAILED TO START TIMELY that causes OHASD to timeout due to issues in handling large number of filedescriptors. This also causes increased CPU usage!

From Oracle support:

The problem of bug:11834289 is that we take the rlimit and close all potential file handlers at process startup (even before logging in $GRID_HOME/log).
Hence, in case the stack is the same as before and the oraagent or orarootagent are still consuming cpu and the pfiles of those processes show
a rlimit as unlimited, then it is the problem. Those processes are furthermore special, i.e. they are agents started by the ora.crsd, that are failing (whereas all
other services are working, e.g. the ones started by the ohasd).

The workaround/fix for the problem I found is the following configuration on /etc/system:0::::process.max-file-descriptor=(basic,1024,deny),(privileged,1024,deny)
user.root:1::::process.max-file-descriptor=(basic,1024,deny),(privileged,1024,deny)
noproject:2::::process.max-file-descriptor=(basic,1024,deny)
default:3::::process.max-file-descriptor=(basic,1024,deny),(privileged,1024,deny);process.max-sem-nsems=(privileged,16384,deny);project.max-sem-ids=(privileged,65536,deny);project.max-shm-ids=(privileged,65536,deny);project.max-shm-memory=(privileged,68719476736,deny)
user.oracrs:100::oracrs::process.max-file-descriptor=(basic,65535,deny),(privileged,65535,deny);process.max-sem-nsems=(privileged,16384,deny);project.max-sem-ids=(privileged,65536,deny);project.max-shm-ids=(privileged,65536,deny);project.max-shm-memory=(privileged,68719476736,deny)
user.oracle11:101::oracle11::process.max-file-descriptor=(basic,65535,deny),(privileged,65535,deny);process.max-sem-nsems=(privileged,16384,deny);project.max-sem-ids=(privileged,65536,deny);project.max-shm-ids=(privileged,65536,deny);project.max-shm-memory=(privileged,68719476736,deny)

I hope this makes your life easier! Oracle support was able to help here. Thanks

Advertisements

During Rootupgrade.sh – what happens if you have CRS Cluster down?

Imagine, you want to do a rolling upgrade from 11.2.0.1 to 11.2.0.2…

First of all, apply patchset 9706490 to fix couple nasty bugs during ASM rolling upgrade.

Well, let’s say you shut down the Clusterware services (like I did for cleaning disk space) and you forget to start it up (well it isn’t rolling on this node if you shut it down 🙂

What happens if you run rootupgrade.sh during the Upgrade, after installing the new 11.2.0.2 binaries?

Well, nothing! This time rootupgrade.sh was smart! And no bugs in this environment after patching!

Just start up Clusterware (crsctl start crs) and rerun rootupgrade.sh and you’re good to continue!

Root.sh bugs in 11.2.0.2 clusterware

This is the warning when you run it first time on the second node – it almost works:

CRS-2672: Attempting to start ‘ora.cssd’ on ”host2′

CRS-2672: Attempting to start ‘ora.diskmon’ on ‘host2’

CRS-2676: Start of ‘ora.diskmon’ on ‘shost2” succeeded

CRS-2676: Start of ‘ora.cssd’ on ”host2” succeeded

Mounting Disk Group CRS_DATA failed with the following message:

ORA-15032: not all alterations performed

ORA-15017: diskgroup “CRS_DATA” cannot be mounted

ORA-15003: diskgroup “CRS_DATA” already mounted in another lock name space

Configuration of ASM … failed

see asmca logs at /opt/oracrs/admin/cfgtoollogs/asmca for details

Did not succssfully configure and start ASM at /opt/oracrs/product/11.2.0/crs/install/crsconfig_lib.pm line 6465.

/opt/oracrs/product/11.2.0/perl/bin/perl -I/opt/oracrs/product/11.2.0/perl/lib -I/opt/oracrs/product/11.2.0/crs/install /opt/oracrs/product/11.2.0/crs/install/rootcrs.pl execution failed

#

root.sh fails if you run it from other than $ORACLE_HOME dir on Solaris….as it can’t get current directory getwcd() call fails.

like if you execute: sh /opt/oracrs/product/11.2.0/root.sh   as suggested by installer, it will fail   on the second cluster node   when it tries to join existing cluster

FIX:

cd  /opt/oracrs/product/11.2.0

./root.sh works

Archived log repository – shipping redo logs over the network

You might have a need to ship redo logs over the SQL*Net connection to a remote host.

This is usually done in data guard setups.

If you only want to send redo logs over the network, you must have a standby instance running to be able to receive the logs. Otherwise, you will get warnings.

But you won’t need it to have all datafiles!


Here’s something from the docs:

Archived redo log repository

This type of destination allows off-site archiving of redo data. An archive log repository is created by using a physical standby control file, starting the instance, and mounting the database. This database  contains no datafiles and cannot be used for switchover or failover. This alternative is useful as a way of holding archived redo log files for a short period of time, perhaps a day, after which the log files can then be deleted. This avoids most of the storage and processing expense of another fully configured standby database.

 

 

Oracle recommends using an archived redo log repository for temporary storage of archived redo log files. This can be accomplished by configuring the repository destination for archiver-based transport  (using the ARCH attribute on LOG_ARCHIVE_DEST_n parameter) in a Data Guard configuration running in maximum performance mode. For a no data loss environment, you should use a fully configured standby database using the LGWR, SYNC, and AFFIRM transport settings in a Data Guard configuration and running in either maximum protection mode or maximum availability mode.

 

This is a handy feature. It would be nice if only a listener would be enough, but no – you need to have a physical standby instance running!

SecureFiles in 11.2 yes or no?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Enter Parameter #1 <XDB_PASSWD>, password for XDB schema:
Enter Parameter #2 <TABLESPACE>, tablespace for XDB:
Enter Parameter #3 <TEMP_TABLESPACE>, temporary tablespace for XDB:
> Enter Parameter #4 <SECURE_FILES_REPO>, YES/NO
……………….If YES and compatibility is at least 11.2,
……………….then XDB repository will be stored as secure files.
……………….Otherwise, old LOBS are used
Enter value for 4: YES
Oracle SecureFiles – replacement for LOBs that are faster than Unix files to read/write. Lots of potential benefit for OLAP analytic workspaces, as the LOBs used to hold AWs have historically been slower to write to than the old Express .db files. Mark Rittman Securefiles are a huge improvement to BLOB data types. Faster, with compression, encryption. Source: Laurent Schneider
YES for performance
Benefits provided
SecureFiles provides the following benefits:
* LOB compression
* LOB encryption
* Deduplication (detect duplicate LOB data and only store one copy)
* Faster access to LOB data

A SecureFile can only be created in an automatic segment space management (ASSM) tablespace.

I put YES

Creating SecureFile LOBS

By default lobs are created as BASICFILE LOBs. To create a SECUREFILE LOB, specify the STORE AS SECUREFILE clause. Here is an example:

CREATE TABLE lob_tab1 (c1 NUMBER, l CLOB)
   lob (l) STORE AS SECUREFILE secfile_segment_name
   (TABLESPACE lobtbs1
    RETENTION AUTO
        CACHE LOGGING
      STORAGE (MAXEXTENTS 5));

More information on SecureFiles

http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28393/adlob_smart.htm

Importance of monitoring

What could happen if you don’t monitor what’s happening?

CRASH.

Oracle 11.2.0.2+  new feature: if writes to a datafile fail, crash the database by default.

Go back to old ways:  by setting underscore parameter  _datafile_write_errors_crash_instance= false



This fix introduces a notable change in behaviour in that
from 11.2.0.2 onwards an I/O write error to a datafile will
now crash the instance.

Before this fix I/O errors to datafiles not in the system tablespace
offline the respective datafiles when the database is in archivelog mode.
This behavior is not always desirable. Some customers would prefer
that the instance crash due to a datafile write error.

This fix introduces a new hidden parameter to control if the instance
should crash on a write error or not:
  _datafile_write_errors_crash_instance 

With this fix:
 If _datafile_write_errors_crash_instance = TRUE (default) then
  any write to a datafile which fails due to an IO error causes
  an instance crash. 

 If _datafile_write_errors_crash_instance = FALSE then the behaviour
  reverts to the previous behaviour (before this fix) such that
  a write error to a datafile offlines the file (provided the DB is
  in archivelog mode and the file is not in SYSTEM tablespace in
  which case the instance is aborted)

Oracle unbreakable skeleton

it might be easier to believe Oracle to be an unbreakable database,  when an Oracle  database runs and walks on one of the below *unbreakable*  Ironman skeletons 🙂

These exoskeletons are meant for the dbas who handle hard problems with the Oracle databases. There are times, when these powersuits could be handy….