Multi-server Grid Control 11g challenge #1 – shared receive/loader directory for cluster

You’d think building up a multi-server Grid Control 11g setup is easy… well, definitely not in every case, if you are considering choosing  a stable shared filesystem.

Just to choose a stable shared / cluster filesystem for Red Hat  Linux OMS nodes, which is a requirement for multi-node uploaders?  Good luck.. not many great choices.

The officially supported choices from Oracle are:

  • NFS
  • OCFS2

Other people have been testing also ASM ACFS (ASM Cluster Filesystem) and GFS2 (global filesystem). But these are not recommened by Oracle (no support).

However, if in your environment thousands of XML files are generated, then that can be slow on NFS. NFS can also have its weaknesses.

OCFS2 – only version 1.4.x (latest is 1.4.7)  is currently supported by Oracle for REDHAT Linux . However, 1.4.7 has a problem with fragmentation, especially if you create/delete lots of file, it can occur.

There’s a workaround, but.. its only temporary! But it can reduce the chances for fragmentation to occur as bad as it would by default.

OCFS2 at least in the past has caused entire cluster crashes.  Even reportedly with the latest versions, as reported by users on the mailing lists.

The problem is fixed in 1.6 of OCFS2, but that 1.6 seriesis only supported on Oracle Unbreakable Linux.

It is interesting to notice that neither GFS2 or ACFS (ASM FS) is recommended by Oracle, although these are often marketed as ‘general-purpose filesystems’.

In the end, maybe choosing NFS, is the easiest, even with its caveats, if any. NFS does not  normally crash entire cluster and web service stays responsive for users, which is often most  important.

REFERENCES

 Newest versions of OCFS2 1.6 with known bugfixes (fragmentation issue) are only available for Oracle Enterprise Linux subscribers with Unbreakable Enterprise Kernel. Oracle Cluster File System (OCFS2) Software Support and Update Policy for Red Hat Enterprise Linux Supported by Red Hat [ID 1253272.1] on page 3

    OCFS2 is not supported for general filesystem use on RHEL without an Oracle Linux support subscription. Oracle provides OCFS2 software and support for database file storage usage for customers who receive Red Hat Enterprise Linux (RHEL) operating system support from Red Hat and have a valid Oracle database support contract. [1]

Oracle Cluster File System (OCFS2) Software Support and Update Policy for Red Hat Enterprise Linux Supported by Red Hat [ID 1253272.1] on page 3

 

   Available version 1.4.7 for Red Hat Enterprise Linux is known to be problematic . Especially fragmentation for filesystems with heavy activity delete/creation of files.  Tuning can decrease the occurrence of this problem. [ 

  GFS/2  and ACFS is supported by Oracle as a general purpose filesystem

(see Using Redhat Global File System (GFS) as shared storage for RAC [ID 329530.1]

 

 

OCFS2 software can be downloaded from the Oracle OCFS2 Downloads for Red Hat Enterprise Linux Server 5 page on OTN. Updates and bug fixes to OCFS2 are provided for Linux kernels that are part the latest supported release of RHEL5 listed in the Enterprise Linux Supported Releases document and three update releases prior to that. For example, if the latest supported release of RHEL is RHEL5, Update 5 (RHEL5.5), then updates to OCFS2 will be provided for RHEL5.2, RHEL5.3, RHEL5.4 and RHEL5.5 kernels.

http://oss.oracle.com/projects/ocfs2/files/RedHat/RHEL5/x86_64/

Supported Filesystems by Oracle/ Red Hat

Enterprise Linux: Linux, Filesystem & I/O Type Supportability [ID 279069.1]


1.1.1   Enterprise Linux 5 (EL5)

The following table is both for RedHat Enterprise Linux 5 and Enterprise Linux 5 provided by Oracle.

Linux Version RDBMS File System I/O Type Supported
EL5 10g ext3 Async  I/O Yes
raw Depr. (2)
block Yes (3)
ASM (4) Yes
OCFS2 Yes
NFS Yes
GFS Yes (5)
GFS2 Yes (6)
ext3 Direct I/O Yes
raw Depr. (2)
block Yes (3)
ASM(4) Yes
OCFS2 Yes
NFS Yes (7)
11g ext3 Async  I/O Yes
raw Depr. (2)
block Yes (3)
ASM (4) Yes
OCFS2 Yes
NFS Yes (7)
GFS Yes (5)
GFS2 Yes (6)
ext3 Direct I/O Yes
raw Depr. (2)
block Yes (3)
ASM(4) Yes
OCFS2 Yes
NFS Yes
  1. See http://oss.oracle.com/projects/ocfs2/ for the current certification of Oracle products with OCFS2
  1. Deprecated but still functional. With Linux Kernel 2.6 raw devices are being phased out in favor of O_DIRECT access directly to the block devices. See Note 357492.1
  2. With Oracle RDBMS 10.2.0.2 and higher. See Note 357492.1
  3. Automatic Storage Manager (ASM) and ASMLib kernel driver. Note that ASM is inherently asynchronous and the behaviour depends on the DISK_ASYNCH_IO parameter setting. See also Note 47328.1 and Note 751463.1
  4. See Note 329530.1 and Note 423207.1 for GFS related support
  5. GFS2 is in technical preview in OEL5U2/RHEL5U2 and fully supported since OEL5U3/RHEL5U3.
  6. DIRECTIO is required for database files on NAS (network attached storage)

OCFS2  BUGS  and FEATURES

  Write To Ocfs2 Volume gets No Space Error Despite Large Free Space Available [ID 1232702.1]

 

 

  Error “__ocfs2_file_aio_write:2251 EXIT: -5” in Syslog [ID 1129192.1]

Basically this is a known issue which is not a bug in OCFS2 itself, the actual bug lies in the Linux kernel. The cause was addressed in Oracle Bug 8627756 and Red hat Bug 461100

   OCFS2: Network service restart / interconnect down causes panic / reboot of other node [ID 394827.1]

 

Cause

This is expected OCFS2 behaviour.

   OCFS2 Kernel Panics on SAN Failover [ID 377616.1]

Changes

Testing cluster resilience to hardware failures.

Cause

The default OCFS2 heartbeat timeout is insufficient for multipath devices.

To check the current active O2CB_HEARTBEAT_THRESHOLD value, do:

 

# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold

The above information is taken from the “Heartbeat” section of

http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html

Solution

To implement the solution, please execute the following steps:

 

   1. Obtain the timeoutvalue of the Multipath system. Assume for now it is 60.

   2. Edit /etc/sysconfig/o2cb, add parameter: O2CB_HEARTBEAT_THRESHOLD = 31

   3. Repeat on all nodes.

   4. Shutdown the o2cb on all nodes

   5. Start the o2cb on all nodes

   6. Review also Q 61 of http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html and action as required.

   7. Ensure that any proprietary device drivers are at the latest production release level

 

   Discussion from discussion list

 

   Recommendation – use OCFS2 1.6 IF POSSIBLE TO AVOID FRAGMENTATION

> As a general recommendation, I suggest you to use the Oracle’s

> Unbreakable Linux kernel on top of CentOS 5.4 (or RedHat EL 5.4) and

> OCFS2 1.6, also provided by Oracle, to avoid the known fragmentation

> bug and to be able to enable the indexed directories feature for

> improved performance.

  Problems with OCFS2 among users

Bug list

http://oss.oracle.com/bugzilla/buglist.cgi?product=OCFS2&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=NEEDINFO&query_format=advanced&order=bugs.bug_id&query_based_on=

 

   [Ocfs2-users] Anomaly when writing to OCFS2

OK, Now for the problem.  When an Oracle Interface/Data Load Program is started  (this loads data into the db) through Concurrent Manager, we see the log file for this operation (which stores errors etc.) being create in the out directory under $APPLCSF/$APPLOUT, but it is always of 0Kb.  We know this worked on one of our tests systems (using NFS rather than OCFS2) so we know that the process works, we just don’t know why it doesn’t work under OCFS2.

 

 

>> So everything works fine as long as the temp file is being written to.

>> The out file is the issue.

> Not exactly, the temp file is always written to whether the temp file is located on OCFS2 or sym linked to the EXT3 partition.  The problem occurs right at the end of the process;

   Server hang with two servers

 

> On 04/08/2011 03:56 AM, Xavier Diumé wrote:

>> Hello,

>> Two servers are using iscsi to mount lun’s from a SAN. The filesystem used is ocfs2 with Debian 6, with a multipath configuration.

>> The two servers are working well together but something hangs one of the server. Look at  dmesg outout:

>> 

 

On 04/08/2011 09:25 AM, Mauro Parra wrote:

> Hello,

> I’m getting this error:

> $fsck.ocfs2  /dev/mapper/360a98000572d434e4e6f6335524b396f_part1

> fsck.ocfs2: File not found by ocfs2_lookup while locking down the cluster

> Any idea or hint?

 

This means it failed to lookup the journal in the system directory.

 

   [Ocfs2-users] OCFS2 crash

Fri Apr 15 01:12:51 PDT 2011

We’re running ocfs2-2.6.18-92.el5-1.4.7-1.el5, on a 12 node Red Hat 5.2

64bit cluster. We’ve now experienced the following crash twice:

   servers blocked on ocfs2

 

Last modified: 2011-01-06 23:28 oss.oracle.com Bugzilla Bug 1301

                             

It is not necessary to say we are quite worried about this. We are seriously

near to give up on OCFS2 because the problem is affecting too much our services.

If you do not see a close solution to this, please tell us.

OCFS2 run slower and slower after 30 days of usage

oss.oracle.com Bugzilla Bug 1281             Last modified: 2010-12-01 23:42

(In reply to comment #86)

> I’m sorry we haven’t gotten to this sooner.  I’m not sure we can relax

> j_trans_barrier.  It is a good question why journal action blocks us so much.

 

We are in the process of deciding what to do with or without OCFS.

 

One question, if we change the behavior of OCFS usage, by storing only much much

less frequently change file/folder on OCFS, like storing only the code. Do you

think the same problem will arise in this mode?

   Bonnie++ file system benchmark tool crashes OCFS2

 

Last modified: 2010-05-05 11:25

I have a testcluster consisting of 2 nodes with a shared storage below them.

The ocfs2 filesystem was mounted (module version 1.50, ocfs2-tools version

1.4.3, Linux Debian system, 32 bit) and on both I ran the benchmarking tool

‘bonnie++’.

After a few minutes both systems crashed and rebooted.

 

The kernel message I got was:

thegate:/mnt/1a# bonnie++ -s 4000 -u root -r 2000

Using uid:0, gid:0.

Writing a byte at a time…

Message from syslogd@thegateap at May  5 16:21:12 …

 kernel: [56806.205914] Oops: 0000 [#1] SMP

 

Locking issues with shared logfiles

 

Description:  [reply]        Opened: 2009-04-20 22:10

I have a cluster with 5 nodes hosting web application. All web servers

save log info into shared access.log file. There is awstats log

analyzer on the first node. Sometimes this node fails with the

following messages (captured on another server)

 

Two nodes doing parallel umount hang cluster

Opened: 2011-02-04 17:00  oss.oracle.com Bugzilla Bug 1312

Two nodes doing parallel umount hang. dlm locking_state shows the same lockres

on both nodes. The lockres has no locks.

 

pointer dereference in Linux 2.6.33.3 : ocfs2_read_blocks when peer hang

oss.oracle.com Bugzilla Bug 1286

                              NULL pointer dereference in Linux 2.6.33.3 : ocfs2_read_blocks when peer hang                Last modified: 2010-08-19 03:14

After node 2 was shut down electrically with ocfs2 mounted (test scenario), the

first node crashes …

We also often have to fsck -f the block device otherwise mount.ocfs2 crashes

when attempting to mount the device. It doesn’t work with only fsck (without -f)

 


 



 



Normal
0

false
false
false

EN-US
X-NONE
X-NONE

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:””;
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin:0cm;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”,”serif”;}

 

http://oss.oracle.com/bugzilla/buglist.cgi?product=OCFS2&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=NEEDINFO&query_format=advanced&order=bugs.bug_id&query_based_on=

Advertisements

Root.sh bugs in 11.2.0.2 clusterware

This is the warning when you run it first time on the second node – it almost works:

CRS-2672: Attempting to start ‘ora.cssd’ on ”host2′

CRS-2672: Attempting to start ‘ora.diskmon’ on ‘host2’

CRS-2676: Start of ‘ora.diskmon’ on ‘shost2” succeeded

CRS-2676: Start of ‘ora.cssd’ on ”host2” succeeded

Mounting Disk Group CRS_DATA failed with the following message:

ORA-15032: not all alterations performed

ORA-15017: diskgroup “CRS_DATA” cannot be mounted

ORA-15003: diskgroup “CRS_DATA” already mounted in another lock name space

Configuration of ASM … failed

see asmca logs at /opt/oracrs/admin/cfgtoollogs/asmca for details

Did not succssfully configure and start ASM at /opt/oracrs/product/11.2.0/crs/install/crsconfig_lib.pm line 6465.

/opt/oracrs/product/11.2.0/perl/bin/perl -I/opt/oracrs/product/11.2.0/perl/lib -I/opt/oracrs/product/11.2.0/crs/install /opt/oracrs/product/11.2.0/crs/install/rootcrs.pl execution failed

#

root.sh fails if you run it from other than $ORACLE_HOME dir on Solaris….as it can’t get current directory getwcd() call fails.

like if you execute: sh /opt/oracrs/product/11.2.0/root.sh   as suggested by installer, it will fail   on the second cluster node   when it tries to join existing cluster

FIX:

cd  /opt/oracrs/product/11.2.0

./root.sh works

Not Getting Alerts for ORA Errors for 11g Databases and Cannot View the alert.log Contents for 11g Databases From Grid Control 10.2.0.5 [ID 947773.1]

Another nice feature in 10.2.0.5 agent monitoring 11g databases.

 

Watch out!

1 ) Configure “Incident” and “Incident Status” metrics to monitor for ORA errors in alert.log. (log.xml ) on 11g databases.

2) After the EM agent installation, users need to manually set up a symbolic link of ojdl.jar in
$ORACLE_HOME/sysman/jlib pointing to $ORACLE_HOME/diagnostics/lib where
$ORACLE_HOME is the EM agent Oracle home. For example, on Linux and Unix
platforms, users need to type the following:
1) cd $ORACLE_HOME/sysman/jlib
2) ln -s ../../diagnostics/lib/ojdl.jar
.
On platforms where symbolic link is not supported, users need to manually
copy the ojdl.jar from $ORACLE_HOME/diagnostics/lib into
$ORACLE_HOME/sysman/jlib after the EM agent installation.
.
Users also need to ensure that the copy of ojdl.jar in
$ORACLE_HOME/sysman/jlib is updated if the ojdl.jar in
$ORACLE_HOME/diagnostics/lib is patched in the future.

Archived log repository – shipping redo logs over the network

You might have a need to ship redo logs over the SQL*Net connection to a remote host.

This is usually done in data guard setups.

If you only want to send redo logs over the network, you must have a standby instance running to be able to receive the logs. Otherwise, you will get warnings.

But you won’t need it to have all datafiles!


Here’s something from the docs:

Archived redo log repository

This type of destination allows off-site archiving of redo data. An archive log repository is created by using a physical standby control file, starting the instance, and mounting the database. This database  contains no datafiles and cannot be used for switchover or failover. This alternative is useful as a way of holding archived redo log files for a short period of time, perhaps a day, after which the log files can then be deleted. This avoids most of the storage and processing expense of another fully configured standby database.

 

 

Oracle recommends using an archived redo log repository for temporary storage of archived redo log files. This can be accomplished by configuring the repository destination for archiver-based transport  (using the ARCH attribute on LOG_ARCHIVE_DEST_n parameter) in a Data Guard configuration running in maximum performance mode. For a no data loss environment, you should use a fully configured standby database using the LGWR, SYNC, and AFFIRM transport settings in a Data Guard configuration and running in either maximum protection mode or maximum availability mode.

 

This is a handy feature. It would be nice if only a listener would be enough, but no – you need to have a physical standby instance running!

Golden Gate catch #1, performance in replication, when databases in replication are partial

 

Recently, in a quick Golden Gate testing setup, there was a situation where export/import of data for replication was partial just for prototyping.

In Golden Gate replication, online redo/archive logs are extracted and full SQL statements are formed.

Now, when you have a partial import and export of the data, and run some simulation tests with real application load, you might get to a situation, where application is deleting data but the tables are not included in the export/import (or in the prototype database).

Golden Gate happily runs a cleanup process to synchronize all tables via delete statements the application has done, even if the tables do not exist. For this reason, the performance will degrade in the catchup phase on an active system.

 

In partial export/imports,    Golden Gate can also be configured to skip any delete operations for certain tables or you can disregard certain tables in the replication.

 

For this situation, there’s a Golden Gate configuration parameter.

 

 

GETDELETES | IGNOREDELETES

Valid for Extract and Replicat

Use the GETDELETES and IGNOREDELETES parameters to control whether or not Oracle

GoldenGate processes delete operations. These parameters are table-specific. One

parameter remains in effect for all subsequent TABLE or MAP statements, until the other

parameter is encountered.

Default GETDELETES

Syntax GETDELETES | IGNOREDELETES

 

 

You can also skip tables:

TABLEEXCLUDE
Valid for Extract
Use the TABLEEXCLUDE parameter with the TABLE and SEQUENCE parameters to explicitly
exclude tables and sequences from a wildcard specification. TABLEEXCLUDE must precede all
TABLE and SEQUENCE statements that contain the objects that you want to exclude.
Default None
Syntax TABLEEXCLUDE <exclude specification> [NORENAME]
Example In the following example, the TABLE statement retrieves all tables except for the table
named TEST.
TABLEEXCLUDE fin.TEST
TABLE fin.*;

SecureFiles in 11.2 yes or no?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Enter Parameter #1 <XDB_PASSWD>, password for XDB schema:
Enter Parameter #2 <TABLESPACE>, tablespace for XDB:
Enter Parameter #3 <TEMP_TABLESPACE>, temporary tablespace for XDB:
> Enter Parameter #4 <SECURE_FILES_REPO>, YES/NO
……………….If YES and compatibility is at least 11.2,
……………….then XDB repository will be stored as secure files.
……………….Otherwise, old LOBS are used
Enter value for 4: YES
Oracle SecureFiles – replacement for LOBs that are faster than Unix files to read/write. Lots of potential benefit for OLAP analytic workspaces, as the LOBs used to hold AWs have historically been slower to write to than the old Express .db files. Mark Rittman Securefiles are a huge improvement to BLOB data types. Faster, with compression, encryption. Source: Laurent Schneider
YES for performance
Benefits provided
SecureFiles provides the following benefits:
* LOB compression
* LOB encryption
* Deduplication (detect duplicate LOB data and only store one copy)
* Faster access to LOB data

A SecureFile can only be created in an automatic segment space management (ASSM) tablespace.

I put YES

Creating SecureFile LOBS

By default lobs are created as BASICFILE LOBs. To create a SECUREFILE LOB, specify the STORE AS SECUREFILE clause. Here is an example:

CREATE TABLE lob_tab1 (c1 NUMBER, l CLOB)
   lob (l) STORE AS SECUREFILE secfile_segment_name
   (TABLESPACE lobtbs1
    RETENTION AUTO
        CACHE LOGGING
      STORAGE (MAXEXTENTS 5));

More information on SecureFiles

http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28393/adlob_smart.htm

Importance of monitoring

What could happen if you don’t monitor what’s happening?

CRASH.

Oracle 11.2.0.2+  new feature: if writes to a datafile fail, crash the database by default.

Go back to old ways:  by setting underscore parameter  _datafile_write_errors_crash_instance= false



This fix introduces a notable change in behaviour in that
from 11.2.0.2 onwards an I/O write error to a datafile will
now crash the instance.

Before this fix I/O errors to datafiles not in the system tablespace
offline the respective datafiles when the database is in archivelog mode.
This behavior is not always desirable. Some customers would prefer
that the instance crash due to a datafile write error.

This fix introduces a new hidden parameter to control if the instance
should crash on a write error or not:
  _datafile_write_errors_crash_instance 

With this fix:
 If _datafile_write_errors_crash_instance = TRUE (default) then
  any write to a datafile which fails due to an IO error causes
  an instance crash. 

 If _datafile_write_errors_crash_instance = FALSE then the behaviour
  reverts to the previous behaviour (before this fix) such that
  a write error to a datafile offlines the file (provided the DB is
  in archivelog mode and the file is not in SYSTEM tablespace in
  which case the instance is aborted)