Friday 29 May 2020

ORA-39511: Start of CRS resource for instance '223' failed with error:[CRS-5702: Resource 'ora.asm' is already running on 'xvm009'

Problem abstract: 

While trying to startup a new instance manually in order to start duplicating one database I was getting below error:

SQL> startup nomount pfile=initxtd1.ora 
ORA-39511: Start of CRS resource for instance '223' failed with error:[CRS-5702: Resource 'ora.asm' is already running on 'xvm009'
CRS-0223: Resource 'ora.asm' has placement error.
clsr_start_resource:260 status:223
clsrapi_start_asm:start_asmdbs status:223


This is on a 19c database setup on a cluster environment with ASM.

Before I issue this command I've manually dropped a database having same name xtd, I didn't use dbca for this job because I had a problem login to the server with VNC, so I took the easy approach and dropped it manually and deleted all its files under ASM groups.

Fix:

It's always recommended dropping the database using DBCA, specially on a cluster/Oracle Restart environment. This will maintain to clean up all database resources on the server and avoid having above error.

For my scenario, because my job was not complete, I had to remove the remaining database cluster resources manually using the following commands:

1- Remove the database and its instances resources from OCR:
# srvctl remove instance -d xtd -i xtd1
# srvctl remove instance -d xtd -i xtd2
# srvctl remove database -d xtd


2- Remove all the database files inside ASM including datafiles, redo log files, controlfiles, password file, spfile from asmcmd console.
3- Remove the database entry from /etc/oratab

4- Make sure there are no resources belong to the dropped database:

# crsctl stat res -p


ORA-00600: internal error code, arguments: [krfg_mgen_coord2], [82], [2], [2], [], [], [], [], [], [], [], []

Problem: 

On one of 19c active data guard RAC database the recovery process was keep failing with below error:

ORA-00600: internal error code, arguments: [krfg_mgen_coord2], [82], [2], [2], [], [], [], [], [], [], [], []
ORA-10877: error signaled in parallel recovery slave
ORA-10877: error signaled in parallel recovery slave
ORA-10877: error signaled in parallel recovery slave
ORA-10877: error signaled in parallel recovery slave
Incident details in: /u01/oracle/diag/rdbms/sp/sp1/incident/incdir_1639057/sp1_mrp0_4991_i1639057.trc 


Fix:

I came to know it's Bug 12407536 | MRP PROCRESS CRASHES DUE TO ORA-600 [KRFG_MGEN_COORD2]
The workaround is to disable the FLASHBACK feature which was already enabled on that database:

SQL> ALTER DATABASE FLASHBACK OFF;
SQL> RECOVER MANAGED STANDBY DATABASE NODELAY DISCONNECT;

Sunday 12 April 2020

"No space left on device" Error while there are plenty of free space

One week back I came across the following clusterware error:

2020-04-09 05:17:38.726 [OCSSD(27899)]CRS-10000: CLSU-00100: operating system function: mkdir failed with error data: 28
CLSU-00101: operating system error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /u01/12.1.0/grid/auth/css/fzpson06pe1p/A1673086


When checking /u01 filesystem, I found that it already has much of free space :


[oracle@fzpson06pe1p fzpson06pe1p]$ df -hP /u01
Filesystem                          Size  Used Avail Use% Mounted on
/dev/mapper/VolumeGroup00-LogVol07   99G   59G   36G  63% /u01


I remember I had a similar issue many years back and it was due inodes exhaustion.
When checked the inodes with df -i it was totally exhausted:

[oracle@fzpson06pe1p ~]$ df -i /u01
Filesystem                         Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolumeGroup00-LogVol07   6.3M  6.3M   0  100% /u01


And yes, running out of inodes can paralyze the filesystem as similar as running out of space.

But what are the inodes?
In short, inodes are the records that hold the metadata of the directories/files which stored under the filesystem. Metadata like; on which blocks the file is stored, its owner and permissions and so on...
Each Linux filesystem has a limited number of inodes depending on the filesystem size. This limitation of inodes limits the number of directories/files that can be stored under the filesystem as well.

Now, how to fix this situation?
Just delete/cleanup the unwanted files under the impacted filesystem. In my case the impacted filesystem was /u01 where Grid Infrastructure and Oracle DB software were installed. So, I cleaned up old audit logs and trace files to release the inodes.

Once I cleaned the audit files alone the inodes utilization dropped dramatically:

[oracle@fzpson06pe1p ~]$ df -i /u01
Filesystem                         Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolumeGroup00-LogVol07   6.3M  1.6M  4.7M   26% /u01


Note: In this situation, I'm not concerned about cleaning up big files as I'm concerned to delete as much unneeded files as possible.

In the next few days I'll update the dbalarm script with a new feature to monitor the inodes utilization in the filesystems as well. Stay tuned and download the latest version from here:
http://dba-tips.blogspot.com/2014/02/database-monitoring-script-for-ora-and.html

For more reading about inodes:
https://www.howtogeek.com/465350/everything-you-ever-wanted-to-know-about-inodes-on-linux/