You’d think building up a multi-server Grid Control 11g setup is easy… well, definitely not in every case, if you are considering choosing a stable shared filesystem.
Just to choose a stable shared / cluster filesystem for Red Hat Linux OMS nodes, which is a requirement for multi-node uploaders? Good luck.. not many great choices.
The officially supported choices from Oracle are:
Other people have been testing also ASM ACFS (ASM Cluster Filesystem) and GFS2 (global filesystem). But these are not recommened by Oracle (no support).
However, if in your environment thousands of XML files are generated, then that can be slow on NFS. NFS can also have its weaknesses.
OCFS2 – only version 1.4.x (latest is 1.4.7) is currently supported by Oracle for REDHAT Linux . However, 1.4.7 has a problem with fragmentation, especially if you create/delete lots of file, it can occur.
There’s a workaround, but.. its only temporary! But it can reduce the chances for fragmentation to occur as bad as it would by default.
OCFS2 at least in the past has caused entire cluster crashes. Even reportedly with the latest versions, as reported by users on the mailing lists.
The problem is fixed in 1.6 of OCFS2, but that 1.6 seriesis only supported on Oracle Unbreakable Linux.
It is interesting to notice that neither GFS2 or ACFS (ASM FS) is recommended by Oracle, although these are often marketed as ‘general-purpose filesystems’.
In the end, maybe choosing NFS, is the easiest, even with its caveats, if any. NFS does not normally crash entire cluster and web service stays responsive for users, which is often most important.
Newest versions of OCFS2 1.6 with known bugfixes (fragmentation issue) are only available for Oracle Enterprise Linux subscribers with Unbreakable Enterprise Kernel. Oracle Cluster File System (OCFS2) Software Support and Update Policy for Red Hat Enterprise Linux Supported by Red Hat [ID 1253272.1] on page 3
OCFS2 is not supported for general filesystem use on RHEL without an Oracle Linux support subscription. Oracle provides OCFS2 software and support for database file storage usage for customers who receive Red Hat Enterprise Linux (RHEL) operating system support from Red Hat and have a valid Oracle database support contract. 
Oracle Cluster File System (OCFS2) Software Support and Update Policy for Red Hat Enterprise Linux Supported by Red Hat [ID 1253272.1] on page 3
Available version 1.4.7 for Red Hat Enterprise Linux is known to be problematic . Especially fragmentation for filesystems with heavy activity delete/creation of files. Tuning can decrease the occurrence of this problem. [
GFS/2 and ACFS is supported by Oracle as a general purpose filesystem
(see Using Redhat Global File System (GFS) as shared storage for RAC [ID 329530.1]
OCFS2 software can be downloaded from the Oracle OCFS2 Downloads for Red Hat Enterprise Linux Server 5 page on OTN. Updates and bug fixes to OCFS2 are provided for Linux kernels that are part the latest supported release of RHEL5 listed in the Enterprise Linux Supported Releases document and three update releases prior to that. For example, if the latest supported release of RHEL is RHEL5, Update 5 (RHEL5.5), then updates to OCFS2 will be provided for RHEL5.2, RHEL5.3, RHEL5.4 and RHEL5.5 kernels.
Supported Filesystems by Oracle/ Red Hat
Enterprise Linux: Linux, Filesystem & I/O Type Supportability [ID 279069.1]
1.1.1 Enterprise Linux 5 (EL5)
The following table is both for RedHat Enterprise Linux 5 and Enterprise Linux 5 provided by Oracle.
|Linux Version||RDBMS||File System||I/O Type||Supported|
- See http://oss.oracle.com/projects/ocfs2/ for the current certification of Oracle products with OCFS2
- Deprecated but still functional. With Linux Kernel 2.6 raw devices are being phased out in favor of O_DIRECT access directly to the block devices. See Note 357492.1
- With Oracle RDBMS 10.2.0.2 and higher. See Note 357492.1
- Automatic Storage Manager (ASM) and ASMLib kernel driver. Note that ASM is inherently asynchronous and the behaviour depends on the DISK_ASYNCH_IO parameter setting. See also Note 47328.1 and Note 751463.1
- See Note 329530.1 and Note 423207.1 for GFS related support
- GFS2 is in technical preview in OEL5U2/RHEL5U2 and fully supported since OEL5U3/RHEL5U3.
- DIRECTIO is required for database files on NAS (network attached storage)
OCFS2 BUGS and FEATURES
Write To Ocfs2 Volume gets No Space Error Despite Large Free Space Available [ID 1232702.1]
Error “__ocfs2_file_aio_write:2251 EXIT: -5” in Syslog [ID 1129192.1]
OCFS2: Network service restart / interconnect down causes panic / reboot of other node [ID 394827.1]
This is expected OCFS2 behaviour.
OCFS2 Kernel Panics on SAN Failover [ID 377616.1]
Testing cluster resilience to hardware failures.
The default OCFS2 heartbeat timeout is insufficient for multipath devices.
To check the current active O2CB_HEARTBEAT_THRESHOLD value, do:
# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
The above information is taken from the “Heartbeat” section of
To implement the solution, please execute the following steps:
1. Obtain the timeoutvalue of the Multipath system. Assume for now it is 60.
2. Edit /etc/sysconfig/o2cb, add parameter: O2CB_HEARTBEAT_THRESHOLD = 31
3. Repeat on all nodes.
4. Shutdown the o2cb on all nodes
5. Start the o2cb on all nodes
6. Review also Q 61 of http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html and action as required.
7. Ensure that any proprietary device drivers are at the latest production release level
Discussion from discussion list
Recommendation – use OCFS2 1.6 IF POSSIBLE TO AVOID FRAGMENTATION
> As a general recommendation, I suggest you to use the Oracle’s
> Unbreakable Linux kernel on top of CentOS 5.4 (or RedHat EL 5.4) and
> OCFS2 1.6, also provided by Oracle, to avoid the known fragmentation
> bug and to be able to enable the indexed directories feature for
> improved performance.
Problems with OCFS2 among users
[Ocfs2-users] Anomaly when writing to OCFS2
OK, Now for the problem. When an Oracle Interface/Data Load Program is started (this loads data into the db) through Concurrent Manager, we see the log file for this operation (which stores errors etc.) being create in the out directory under $APPLCSF/$APPLOUT, but it is always of 0Kb. We know this worked on one of our tests systems (using NFS rather than OCFS2) so we know that the process works, we just don’t know why it doesn’t work under OCFS2.
>> So everything works fine as long as the temp file is being written to.
>> The out file is the issue.
> Not exactly, the temp file is always written to whether the temp file is located on OCFS2 or sym linked to the EXT3 partition. The problem occurs right at the end of the process;
Server hang with two servers
> On 04/08/2011 03:56 AM, Xavier Diumé wrote:
>> Two servers are using iscsi to mount lun’s from a SAN. The filesystem used is ocfs2 with Debian 6, with a multipath configuration.
>> The two servers are working well together but something hangs one of the server. Look at dmesg outout:
On 04/08/2011 09:25 AM, Mauro Parra wrote:
> I’m getting this error:
> $fsck.ocfs2 /dev/mapper/360a98000572d434e4e6f6335524b396f_part1
> fsck.ocfs2: File not found by ocfs2_lookup while locking down the cluster
> Any idea or hint?
This means it failed to lookup the journal in the system directory.
[Ocfs2-users] OCFS2 crash
Fri Apr 15 01:12:51 PDT 2011
We’re running ocfs2-2.6.18-92.el5-1.4.7-1.el5, on a 12 node Red Hat 5.2
64bit cluster. We’ve now experienced the following crash twice:
servers blocked on ocfs2
Last modified: 2011-01-06 23:28 oss.oracle.com Bugzilla Bug 1301
It is not necessary to say we are quite worried about this. We are seriously
near to give up on OCFS2 because the problem is affecting too much our services.
If you do not see a close solution to this, please tell us.
OCFS2 run slower and slower after 30 days of usage
oss.oracle.com Bugzilla Bug 1281 Last modified: 2010-12-01 23:42
(In reply to comment #86)
> I’m sorry we haven’t gotten to this sooner. I’m not sure we can relax
> j_trans_barrier. It is a good question why journal action blocks us so much.
We are in the process of deciding what to do with or without OCFS.
One question, if we change the behavior of OCFS usage, by storing only much much
less frequently change file/folder on OCFS, like storing only the code. Do you
think the same problem will arise in this mode?
Bonnie++ file system benchmark tool crashes OCFS2
Last modified: 2010-05-05 11:25
I have a testcluster consisting of 2 nodes with a shared storage below them.
The ocfs2 filesystem was mounted (module version 1.50, ocfs2-tools version
1.4.3, Linux Debian system, 32 bit) and on both I ran the benchmarking tool
After a few minutes both systems crashed and rebooted.
The kernel message I got was:
thegate:/mnt/1a# bonnie++ -s 4000 -u root -r 2000
Using uid:0, gid:0.
Writing a byte at a time…
Message from syslogd@thegateap at May 5 16:21:12 …
kernel: [56806.205914] Oops: 0000 [#1] SMP
Description: [reply] Opened: 2009-04-20 22:10
I have a cluster with 5 nodes hosting web application. All web servers
save log info into shared access.log file. There is awstats log
analyzer on the first node. Sometimes this node fails with the
following messages (captured on another server)
Two nodes doing parallel umount hang. dlm locking_state shows the same lockres
on both nodes. The lockres has no locks.
oss.oracle.com Bugzilla Bug 1286
NULL pointer dereference in Linux 18.104.22.168 : ocfs2_read_blocks when peer hang Last modified: 2010-08-19 03:14
After node 2 was shut down electrically with ocfs2 mounted (test scenario), the
first node crashes …
We also often have to fsck -f the block device otherwise mount.ocfs2 crashes
when attempting to mount the device. It doesn’t work with only fsck (without -f)
/* Style Definitions */
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
font-family:”Times New Roman”,”serif”;}