What’s an hipersocket?


“HiperSockets is an IBM technology for high-speed communications between partitions on a server with a hypervisor. The term is most commonly associated withzSeries, System z9 and System z10 mainframes which can provide in-memory TCP/IP connections between and among LPARs running several different operating systems, including z/OS, z/VM, and Linux on z Systems.”

I’m working on a project where hipersockets deliver the underlying networking between zLinux virtual machines. So it should be quite fast as its handled in memory altogether.

I will do some benchmarking for the TCP/IP latency and throughput, and post results here.


Why are you getting so many gigantic tracefiles with Oracle 12c?

40 gigs of tracefiles per day? No problem, Oracle 12c can generate that for you!

This might be because of the new “feature”

What a nice surprise!

You can see in the tracefiles

—– Cursor Obsoletion Dump sql_id=490jdsadiwssa —–

You can find a support note at My Oracle Support:

MOS Note:1955319.1;
Huge Trace Files Created Containing “—– Cursor Obsoletion Dump sql_id=%s —–“

Oracle introduced an enhancement for a bug in Oracle for dumping more information about obsoleted parent cursors and child cursors after parent cursor has been obsoleted

There’s an underscore parameter to control this dumping.

alter system set “_kks_obsolete_dump_threshold” = <N>;

You can use value 0 to disable cursor dumping entirely, otherwise specify  a value for N, which  tells to generate a dump only after how many times the cursor has been invalidated.

Installation problems with Oracle Clusterware/RAC on Solaris 10

If you’re having headaches with Solaris 10 and Oracle Clusterware 11.2.0.x ( even), and get the following warning during root.sh:

CRS-2678: ‘ora.asm’ on ‘RAC1’ has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
CRS-5802: Unable to start the agent process
CRS-0215: Could not start resource ‘ora.CRSDATA.dg’.
CRS-4000: Command Create failed, or completed with errors.
create diskgroup CRSDATA … failed
ACFS-9200: Supported
/opt/oracrs/product/ start nodeapps -n rac1  … failed
FirstNode configuration failed at /opt/oracrs/product/ line 8374.
/opt/oracrs/product/ -I/opt/oracrs/product/ -I/opt/oracrs/product/ /opt/oracrs/product/ execution failed

There’s a bug Bug 11834289: OHASD FAILED TO START TIMELY that causes OHASD to timeout due to issues in handling large number of filedescriptors. This also causes increased CPU usage!

From Oracle support:

The problem of bug:11834289 is that we take the rlimit and close all potential file handlers at process startup (even before logging in $GRID_HOME/log).
Hence, in case the stack is the same as before and the oraagent or orarootagent are still consuming cpu and the pfiles of those processes show
a rlimit as unlimited, then it is the problem. Those processes are furthermore special, i.e. they are agents started by the ora.crsd, that are failing (whereas all
other services are working, e.g. the ones started by the ohasd).

The workaround/fix for the problem I found is the following configuration on /etc/system:0::::process.max-file-descriptor=(basic,1024,deny),(privileged,1024,deny)

I hope this makes your life easier! Oracle support was able to help here. Thanks

During Rootupgrade.sh – what happens if you have CRS Cluster down?

Imagine, you want to do a rolling upgrade from to…

First of all, apply patchset 9706490 to fix couple nasty bugs during ASM rolling upgrade.

Well, let’s say you shut down the Clusterware services (like I did for cleaning disk space) and you forget to start it up (well it isn’t rolling on this node if you shut it down 🙂

What happens if you run rootupgrade.sh during the Upgrade, after installing the new binaries?

Well, nothing! This time rootupgrade.sh was smart! And no bugs in this environment after patching!

Just start up Clusterware (crsctl start crs) and rerun rootupgrade.sh and you’re good to continue!

Database Monitoring Automation starts from simple steps with Grid Control

Imagine you have hundreds of databases, or more. Each time when you add a database to be monitored, you have to do an action to apply a monitoring template.

Why not apply a default template each time a new database target is discovered by Grid Control and avoid extra steps!

Here’s how!

Multi-server Grid Control 11g challenge #1 – shared receive/loader directory for cluster

You’d think building up a multi-server Grid Control 11g setup is easy… well, definitely not in every case, if you are considering choosing  a stable shared filesystem.

Just to choose a stable shared / cluster filesystem for Red Hat  Linux OMS nodes, which is a requirement for multi-node uploaders?  Good luck.. not many great choices.

The officially supported choices from Oracle are:

  • NFS
  • OCFS2

Other people have been testing also ASM ACFS (ASM Cluster Filesystem) and GFS2 (global filesystem). But these are not recommened by Oracle (no support).

However, if in your environment thousands of XML files are generated, then that can be slow on NFS. NFS can also have its weaknesses.

OCFS2 – only version 1.4.x (latest is 1.4.7)  is currently supported by Oracle for REDHAT Linux . However, 1.4.7 has a problem with fragmentation, especially if you create/delete lots of file, it can occur.

There’s a workaround, but.. its only temporary! But it can reduce the chances for fragmentation to occur as bad as it would by default.

OCFS2 at least in the past has caused entire cluster crashes.  Even reportedly with the latest versions, as reported by users on the mailing lists.

The problem is fixed in 1.6 of OCFS2, but that 1.6 seriesis only supported on Oracle Unbreakable Linux.

It is interesting to notice that neither GFS2 or ACFS (ASM FS) is recommended by Oracle, although these are often marketed as ‘general-purpose filesystems’.

In the end, maybe choosing NFS, is the easiest, even with its caveats, if any. NFS does not  normally crash entire cluster and web service stays responsive for users, which is often most  important.


 Newest versions of OCFS2 1.6 with known bugfixes (fragmentation issue) are only available for Oracle Enterprise Linux subscribers with Unbreakable Enterprise Kernel. Oracle Cluster File System (OCFS2) Software Support and Update Policy for Red Hat Enterprise Linux Supported by Red Hat [ID 1253272.1] on page 3

    OCFS2 is not supported for general filesystem use on RHEL without an Oracle Linux support subscription. Oracle provides OCFS2 software and support for database file storage usage for customers who receive Red Hat Enterprise Linux (RHEL) operating system support from Red Hat and have a valid Oracle database support contract. [1]

Oracle Cluster File System (OCFS2) Software Support and Update Policy for Red Hat Enterprise Linux Supported by Red Hat [ID 1253272.1] on page 3


   Available version 1.4.7 for Red Hat Enterprise Linux is known to be problematic . Especially fragmentation for filesystems with heavy activity delete/creation of files.  Tuning can decrease the occurrence of this problem. [ 

  GFS/2  and ACFS is supported by Oracle as a general purpose filesystem

(see Using Redhat Global File System (GFS) as shared storage for RAC [ID 329530.1]



OCFS2 software can be downloaded from the Oracle OCFS2 Downloads for Red Hat Enterprise Linux Server 5 page on OTN. Updates and bug fixes to OCFS2 are provided for Linux kernels that are part the latest supported release of RHEL5 listed in the Enterprise Linux Supported Releases document and three update releases prior to that. For example, if the latest supported release of RHEL is RHEL5, Update 5 (RHEL5.5), then updates to OCFS2 will be provided for RHEL5.2, RHEL5.3, RHEL5.4 and RHEL5.5 kernels.


Supported Filesystems by Oracle/ Red Hat

Enterprise Linux: Linux, Filesystem & I/O Type Supportability [ID 279069.1]

1.1.1   Enterprise Linux 5 (EL5)

The following table is both for RedHat Enterprise Linux 5 and Enterprise Linux 5 provided by Oracle.

Linux Version RDBMS File System I/O Type Supported
EL5 10g ext3 Async  I/O Yes
raw Depr. (2)
block Yes (3)
ASM (4) Yes
GFS Yes (5)
GFS2 Yes (6)
ext3 Direct I/O Yes
raw Depr. (2)
block Yes (3)
ASM(4) Yes
NFS Yes (7)
11g ext3 Async  I/O Yes
raw Depr. (2)
block Yes (3)
ASM (4) Yes
NFS Yes (7)
GFS Yes (5)
GFS2 Yes (6)
ext3 Direct I/O Yes
raw Depr. (2)
block Yes (3)
ASM(4) Yes
  1. See http://oss.oracle.com/projects/ocfs2/ for the current certification of Oracle products with OCFS2
  1. Deprecated but still functional. With Linux Kernel 2.6 raw devices are being phased out in favor of O_DIRECT access directly to the block devices. See Note 357492.1
  2. With Oracle RDBMS and higher. See Note 357492.1
  3. Automatic Storage Manager (ASM) and ASMLib kernel driver. Note that ASM is inherently asynchronous and the behaviour depends on the DISK_ASYNCH_IO parameter setting. See also Note 47328.1 and Note 751463.1
  4. See Note 329530.1 and Note 423207.1 for GFS related support
  5. GFS2 is in technical preview in OEL5U2/RHEL5U2 and fully supported since OEL5U3/RHEL5U3.
  6. DIRECTIO is required for database files on NAS (network attached storage)


  Write To Ocfs2 Volume gets No Space Error Despite Large Free Space Available [ID 1232702.1]



  Error “__ocfs2_file_aio_write:2251 EXIT: -5” in Syslog [ID 1129192.1]

Basically this is a known issue which is not a bug in OCFS2 itself, the actual bug lies in the Linux kernel. The cause was addressed in Oracle Bug 8627756 and Red hat Bug 461100

   OCFS2: Network service restart / interconnect down causes panic / reboot of other node [ID 394827.1]



This is expected OCFS2 behaviour.

   OCFS2 Kernel Panics on SAN Failover [ID 377616.1]


Testing cluster resilience to hardware failures.


The default OCFS2 heartbeat timeout is insufficient for multipath devices.

To check the current active O2CB_HEARTBEAT_THRESHOLD value, do:


# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold

The above information is taken from the “Heartbeat” section of



To implement the solution, please execute the following steps:


   1. Obtain the timeoutvalue of the Multipath system. Assume for now it is 60.

   2. Edit /etc/sysconfig/o2cb, add parameter: O2CB_HEARTBEAT_THRESHOLD = 31

   3. Repeat on all nodes.

   4. Shutdown the o2cb on all nodes

   5. Start the o2cb on all nodes

   6. Review also Q 61 of http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html and action as required.

   7. Ensure that any proprietary device drivers are at the latest production release level


   Discussion from discussion list



> As a general recommendation, I suggest you to use the Oracle’s

> Unbreakable Linux kernel on top of CentOS 5.4 (or RedHat EL 5.4) and

> OCFS2 1.6, also provided by Oracle, to avoid the known fragmentation

> bug and to be able to enable the indexed directories feature for

> improved performance.

  Problems with OCFS2 among users

Bug list



   [Ocfs2-users] Anomaly when writing to OCFS2

OK, Now for the problem.  When an Oracle Interface/Data Load Program is started  (this loads data into the db) through Concurrent Manager, we see the log file for this operation (which stores errors etc.) being create in the out directory under $APPLCSF/$APPLOUT, but it is always of 0Kb.  We know this worked on one of our tests systems (using NFS rather than OCFS2) so we know that the process works, we just don’t know why it doesn’t work under OCFS2.



>> So everything works fine as long as the temp file is being written to.

>> The out file is the issue.

> Not exactly, the temp file is always written to whether the temp file is located on OCFS2 or sym linked to the EXT3 partition.  The problem occurs right at the end of the process;

   Server hang with two servers


> On 04/08/2011 03:56 AM, Xavier Diumé wrote:

>> Hello,

>> Two servers are using iscsi to mount lun’s from a SAN. The filesystem used is ocfs2 with Debian 6, with a multipath configuration.

>> The two servers are working well together but something hangs one of the server. Look at  dmesg outout:



On 04/08/2011 09:25 AM, Mauro Parra wrote:

> Hello,

> I’m getting this error:

> $fsck.ocfs2  /dev/mapper/360a98000572d434e4e6f6335524b396f_part1

> fsck.ocfs2: File not found by ocfs2_lookup while locking down the cluster

> Any idea or hint?


This means it failed to lookup the journal in the system directory.


   [Ocfs2-users] OCFS2 crash

Fri Apr 15 01:12:51 PDT 2011

We’re running ocfs2-2.6.18-92.el5-1.4.7-1.el5, on a 12 node Red Hat 5.2

64bit cluster. We’ve now experienced the following crash twice:

   servers blocked on ocfs2


Last modified: 2011-01-06 23:28 oss.oracle.com Bugzilla Bug 1301


It is not necessary to say we are quite worried about this. We are seriously

near to give up on OCFS2 because the problem is affecting too much our services.

If you do not see a close solution to this, please tell us.

OCFS2 run slower and slower after 30 days of usage

oss.oracle.com Bugzilla Bug 1281             Last modified: 2010-12-01 23:42

(In reply to comment #86)

> I’m sorry we haven’t gotten to this sooner.  I’m not sure we can relax

> j_trans_barrier.  It is a good question why journal action blocks us so much.


We are in the process of deciding what to do with or without OCFS.


One question, if we change the behavior of OCFS usage, by storing only much much

less frequently change file/folder on OCFS, like storing only the code. Do you

think the same problem will arise in this mode?

   Bonnie++ file system benchmark tool crashes OCFS2


Last modified: 2010-05-05 11:25

I have a testcluster consisting of 2 nodes with a shared storage below them.

The ocfs2 filesystem was mounted (module version 1.50, ocfs2-tools version

1.4.3, Linux Debian system, 32 bit) and on both I ran the benchmarking tool


After a few minutes both systems crashed and rebooted.


The kernel message I got was:

thegate:/mnt/1a# bonnie++ -s 4000 -u root -r 2000

Using uid:0, gid:0.

Writing a byte at a time…

Message from syslogd@thegateap at May  5 16:21:12 …

 kernel: [56806.205914] Oops: 0000 [#1] SMP


Locking issues with shared logfiles


Description:  [reply]        Opened: 2009-04-20 22:10

I have a cluster with 5 nodes hosting web application. All web servers

save log info into shared access.log file. There is awstats log

analyzer on the first node. Sometimes this node fails with the

following messages (captured on another server)


Two nodes doing parallel umount hang cluster

Opened: 2011-02-04 17:00  oss.oracle.com Bugzilla Bug 1312

Two nodes doing parallel umount hang. dlm locking_state shows the same lockres

on both nodes. The lockres has no locks.


pointer dereference in Linux : ocfs2_read_blocks when peer hang

oss.oracle.com Bugzilla Bug 1286

                              NULL pointer dereference in Linux : ocfs2_read_blocks when peer hang                Last modified: 2010-08-19 03:14

After node 2 was shut down electrically with ocfs2 mounted (test scenario), the

first node crashes …

We also often have to fsck -f the block device otherwise mount.ocfs2 crashes

when attempting to mount the device. It doesn’t work with only fsck (without -f)








/* Style Definitions */
{mso-style-name:”Table Normal”;
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
font-family:”Times New Roman”,”serif”;}



Root.sh bugs in clusterware

This is the warning when you run it first time on the second node – it almost works:

CRS-2672: Attempting to start ‘ora.cssd’ on ”host2′

CRS-2672: Attempting to start ‘ora.diskmon’ on ‘host2’

CRS-2676: Start of ‘ora.diskmon’ on ‘shost2” succeeded

CRS-2676: Start of ‘ora.cssd’ on ”host2” succeeded

Mounting Disk Group CRS_DATA failed with the following message:

ORA-15032: not all alterations performed

ORA-15017: diskgroup “CRS_DATA” cannot be mounted

ORA-15003: diskgroup “CRS_DATA” already mounted in another lock name space

Configuration of ASM … failed

see asmca logs at /opt/oracrs/admin/cfgtoollogs/asmca for details

Did not succssfully configure and start ASM at /opt/oracrs/product/11.2.0/crs/install/crsconfig_lib.pm line 6465.

/opt/oracrs/product/11.2.0/perl/bin/perl -I/opt/oracrs/product/11.2.0/perl/lib -I/opt/oracrs/product/11.2.0/crs/install /opt/oracrs/product/11.2.0/crs/install/rootcrs.pl execution failed


root.sh fails if you run it from other than $ORACLE_HOME dir on Solaris….as it can’t get current directory getwcd() call fails.

like if you execute: sh /opt/oracrs/product/11.2.0/root.sh   as suggested by installer, it will fail   on the second cluster node   when it tries to join existing cluster


cd  /opt/oracrs/product/11.2.0

./root.sh works

Not Getting Alerts for ORA Errors for 11g Databases and Cannot View the alert.log Contents for 11g Databases From Grid Control [ID 947773.1]

Another nice feature in agent monitoring 11g databases.


Watch out!

1 ) Configure “Incident” and “Incident Status” metrics to monitor for ORA errors in alert.log. (log.xml ) on 11g databases.

2) After the EM agent installation, users need to manually set up a symbolic link of ojdl.jar in
$ORACLE_HOME/sysman/jlib pointing to $ORACLE_HOME/diagnostics/lib where
$ORACLE_HOME is the EM agent Oracle home. For example, on Linux and Unix
platforms, users need to type the following:
1) cd $ORACLE_HOME/sysman/jlib
2) ln -s ../../diagnostics/lib/ojdl.jar
On platforms where symbolic link is not supported, users need to manually
copy the ojdl.jar from $ORACLE_HOME/diagnostics/lib into
$ORACLE_HOME/sysman/jlib after the EM agent installation.
Users also need to ensure that the copy of ojdl.jar in
$ORACLE_HOME/sysman/jlib is updated if the ojdl.jar in
$ORACLE_HOME/diagnostics/lib is patched in the future.