Snapshot Controlfile Strikes Again

It’s not the first time when the RMAN snapshot controlfile feature causes problems. I blogged about this in an old post.

It starts with a funny message when you want to drop the obsolete backups:

RMAN-06207: WARNING: 1 objects could not be deleted for DISK channel(s) due
RMAN-06208:          to mismatched status.  Use CROSSCHECK command to fix status
RMAN-06210: List of Mismatched objects
RMAN-06211: ==========================
RMAN-06212:   Object Type   Filename/Handle
RMAN-06213: --------------- ---------------------------------------------------
RMAN-06214: Datafile Copy   /u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f

In the case above, the message comes from a daily backup script which also takes care of deleting the obsolete backups. However, the target database is a RAC one and this snapshot controlfile needs to sit on a shared location. So, we fixed the issue by simply changing the location to ASM:

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO '+DG_FRA/snapcf_volp.f';

old RMAN configuration parameters:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f';
new RMAN configuration parameters:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '+DG_FRA/snapcf_volp.f';
new RMAN configuration parameters are successfully stored

Ok, now let’s get rid of that /u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f file. By the way, have you noticed that it is considered to be a datafile copy instead of a controlfile copy? Whatever…

The first try was:

RMAN> delete expired copy;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=252 instance=volp1 device type=DISK
specification does not match any datafile copy in the repository
specification does not match any archived log in the repository

List of Control File Copies
===========================

Key     S Completion Time     Ckp SCN    Ckp Time
------- - ------------------- ---------- -------------------
42      X 2016-05-14 16:48:48 3152170591 2016-05-14 16:48:48
        Name: /u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f
        Tag: TAG20160514T164847

Do you really want to delete the above objects (enter YES or NO)? YES
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_1 channel at 05/25/2016 08:35:09
ORA-19606: Cannot copy or restore to snapshot control file

Ups, WTF? I don’t want to copy or restore anyting! Whatever… Next try:

RMAN> crosscheck controlfilecopy '/u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f';

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=252 instance=volp1 device type=DISK
validation failed for control file copy
control file copy file name=/u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f RECID=42 STAMP=911839728
Crosschecked 1 objects

RMAN> delete expired controlfilecopy '/u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f';


released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=497 instance=volp1 device type=DISK

List of Control File Copies
===========================

Key     S Completion Time     Ckp SCN    Ckp Time
------- - ------------------- ---------- -------------------
42      X 2016-05-14 16:48:48 3152170591 2016-05-14 16:48:48
        Name: /u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f
        Tag: TAG20160514T164847


Do you really want to delete the above objects (enter YES or NO)? yes
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_1 channel at 05/25/2016 08:41:11
ORA-19606: Cannot copy or restore to snapshot control file

The same error! Even trying with the backup key wasn’t helpful:

RMAN> delete controlfilecopy 42;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=252 instance=volp1 device type=DISK

List of Control File Copies
===========================

Key     S Completion Time     Ckp SCN    Ckp Time
------- - ------------------- ---------- -------------------
42      X 2016-05-14 16:48:48 3152170591 2016-05-14 16:48:48
        Name: /u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f
        Tag: TAG20160514T164847

Do you really want to delete the above objects (enter YES or NO)? yes

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_1 channel at 05/25/2016 08:38:30
ORA-19606: Cannot copy or restore to snapshot control file

The fix was to restore the old location of the snapshot controlfile and then to delete the expired copy:

RMAN> configure snapshot controlfile name to '/u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f';

old RMAN configuration parameters:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '+dg_fra/snapcf_volp.f';
old RMAN configuration parameters:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '+DG_FRA/snapcf_volp.f';
new RMAN configuration parameters:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f';
new RMAN configuration parameters are successfully stored


RMAN> delete expired copy;

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=788 instance=volp1 device type=DISK
specification does not match any datafile copy in the repository
specification does not match any archived log in the repository

List of Control File Copies
===========================

Key     S Completion Time     Ckp SCN    Ckp Time
------- - ------------------- ---------- -------------------
42      X 2016-05-14 16:48:48 3152170591 2016-05-14 16:48:48
        Name: /u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f
        Tag: TAG20160514T164847

Do you really want to delete the above objects (enter YES or NO)? yes

deleted control file copy
control file copy file name=/u01/app/oracle/product/11.2.0/db_11204/dbs/snapcf_volp1.f RECID=42 STAMP=911839728
Deleted 1 EXPIRED objects

Don’t ask me why it worked. Of course, after we fixed the expired copy the snapshot controlfile location was set to a shared disk.

Stupid "root.sh" Crash when Installing GI 12.1.0.2

Funny thing happen today. I was asked to install a 12.1.0.2 RAC database. The hardware and the OS prerequisites were, in their vast majority, previously addressed, so everything was ready for the installation. Of course, as you already know, Oracle is quite picky about the expected environment (and for a good reason) and it checks a lot of things before starting the actual installation. Of course, sometimes it does this horribly. For example, take a look at the following screenshot:

shm prereq

Apparently this is due to unpublished Bug 19031737 and can be ignored if the setting is correctly set in the OS.

Everything went smooth until I ran the root.sh script. I ended up with:

/u01/app/12.1.0/2_grid/bin/sqlplus -V ... failed rc=1 with message:
 Error 46 initializing SQL*Plus HTTP proxy setting has incorrect value SP2-1502: The HTTP proxy server specified by http_proxy is not accessible

2016/03/30 13:53:15 CLSRSC-177: Failed to add (property/value):('VERSION'/'') for checkpoint 'ROOTCRS_STACK' (error code 1)

2016/03/30 13:53:39 CLSRSC-143: Failed to create a peer profile for Oracle Cluster GPnP using 'gpnptool' (error code 1)

2016/03/30 13:53:39 CLSRSC-150: Creation of Oracle GPnP peer profile failed for host 'ucgprdora001'

Died at /u01/app/12.1.0/2_grid/crs/install/crsgpnp.pm line 1846.
The command '/u01/app/12.1.0/2_grid/perl/bin/perl -I/u01/app/12.1.0/2_grid/perl/lib -I/u01/app/12.1.0/2_grid/crs/install /u01/app/12.1.0/2_grid/crs/install/rootcrs.pl ' execution failed

Stupid! Our sysadmin set a global http_proxy variable which confused sqlplus and, by extension, the root.sh script.

Wouldn’t be a good idea cluvfy to check also if this variable is correctly set (or unset)?

Until then, I put this as a note to myself: don’t forget to always unset http_proxy variable before starting to install a new Oracle server.

Two Surprises on Upgrading Oracle to 12.1.0.2

Yesterday we migrated one of the customers Oracle servers from 11.2 to 12.1.0.2. It was a single instance server, the volume of data wasn’t very impressive and it was also a Standard Edition installation. So, nothing special, nothing scary. We though it would be a simple task. Hmmm… Let’s see.

What Oracle Edition?

Licensing, editions, options… I’m starting to get annoyed by all these non-technical topics. But, that’s life, we have to deal with them. In this particular case, the customer asked us to upgrade their standard edition database running on 11.2.0.4 to Oracle 12.1.0.2. Super, but guess what? There’s no plain Standard Edition or Standard Edition One released for 12.1.0.2, just for 12.1.0.1. The only available edition on 12.1.0.2 is Oracle SE2 (Standard Edition 2) and, apparently, that’s the only standard edition Oracle is committed to support for the next releases. For details, please see Note: 2027072.1.

DATA_PUMP_DIR Bug

The upgrade was a success, at least this is what we thought. Well, bad luck. In the morning the customer reported that a schema which is supposed to be replicated every night was not there. Ups! We analysed the logs and we could notice that the impdp utility was complaining that the source dump couldn’t be accessed. What? The import was configured to use the DATA_PUMP_DIR oracle directory which was configured to point out to a specific location on the database server. After the upgrade it was changed to $ORACLE_HOME/rdbms/log. Great! I wasn’t aware of this. Most likely, we hit the following bug:

Bug 9006105 - DATA_PUMP_DIR is not remapped after database upgrade [ID 9006105.8]

It’s an unpublished bug, so I can’t see much besides what is already shown in the subject. However, in the note they claim that they managed to fix it in 11.2.0.2 and 12.1.0.1. Well, apparently not.

Conclusion

As always, Oracle upgrades are such a joy. Expect the unexpected… Anyways, I’m starting to have doubts that using the DATA_PUMP_DIR directory is such a good idea. In the light of what happen, I think it’s always better to create and use your own oracle directory. Not to mention, DATA_PUMP_DIR can’t be used from within a pluggable database if you plan to use this option on a 12c database.

Flashback Tablespace

We want to easily flashback just a part of the database. We need this for our developers which are sharing the same database, but using just their allocated schema and tablespaces. They want to be allowed to set a restore point (on the schema level), to run their workload and then to be able to revert back their schema as it was when the restore point was taken. The other users on the database must not be disrupted in any way.

Of course, flashback database feature is not good enough because it reverts back the entire database and can’t be used without downtime. On the other hand, a TSPITR (Tablespace point in time recovery) on big data-files is a PITA from the speed point of view, especially because we need to copy those big files when the transportable set is created and, again, as part of the restore, when we need to revert back.

In addition, we want this to work even on an Oracle standard edition.

The Idea

Use a combination of TSPITR and a file system with snapshotting capabilities. As a proof of concept, we’ll use btrfs, but it can be zfs or other exotic file systems.

Below is how the physical layout of the database should be designed.

As you can see, the non-application datafiles, or those which do not need any flashback functionality, may reside on a regular file system. However, the datafiles belonging to the applications which need the “time machine”, have to be created in a sub-volume of the btrfs pool. For each sub-volume we can create various snapshots, as shown in the above figure: app1-s1, app1-s2 etc.

To create a restore point for the application tablespaces we’ll do the following:

  1. put the tablespaces for the application in read only mode (FAST, we assume that no long-running/pending transactions are active)
  2. create a transportable set for those tablespace. So, we end up with a dumpfile having the metadata for the above read only tablespaces. (pretty FAST, only segment definitions are exported)
  3. export all non-tablespace database objects (sequences, synonyms etc.) for the target schema. (pretty FAST, no data, just definitions)
  4. take a snapshot of the sub-volume assigned to the application. That snapshot contain the read-only datafiles, the dump file from the transportable set and the dump file with all the non-tablespace definitions. (FAST, because we rely on the snapshotting feature which is fast by design)
  5. put the tablespaces back in read only mode (FAST).

For a flashback/revert, the steps are:

  1. delete the application user and its tablespaces (pretty FAST. it depends on how many objects needs to be dropped)
  2. switch to snapshot (FAST, by design)
  3. reimport user from the non-tablespace dumpfile (FAST)
  4. re-attached the read only tablespaces from the transportable set (FAST)
  5. reimport the non-tablespace objects (FAST, just definitions)
  6. put the tablespaces back in read write (FAST)

Playground Setup

On OL6.X we need to install this:

yum install btrfs-progs

Next, create the main volume:

[root@venus ~]# fdisk -l

Disk /dev/sda: 52.4 GB, 52428800000 bytes
255 heads, 63 sectors/track, 6374 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003cb01

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        6375    51198976   83  Linux

Disk /dev/sdb: 12.9 GB, 12884901888 bytes
255 heads, 63 sectors/track, 1566 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sdc: 12.9 GB, 12884901888 bytes
255 heads, 63 sectors/track, 1566 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

The disks we’re going to add are: sdb and sdc.

[root@venus ~]# mkfs.btrfs /dev/sdb /dev/sdc

WARNING! - Btrfs v0.20-rc1 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

failed to open /dev/btrfs-control skipping device registration: No such file or directory
adding device /dev/sdc id 2
failed to open /dev/btrfs-control skipping device registration: No such file or directory
fs created label (null) on /dev/sdb
        nodesize 4096 leafsize 4096 sectorsize 4096 size 24.00GB
Btrfs v0.20-rc1

Prepare the folder into which to mount to:

[root@venus ~]# mkdir /oradb
[root@venus ~]# chown -R oracle:dba /oradb

Mount the btrfs volume:

[root@venus ~]# btrfs filesystem show /dev/sdb
Label: none  uuid: d30f9345-74bc-42df-a55d-f22e3a9c6e78
        Total devices 2 FS bytes used 28.00KB
        devid    2 size 12.00GB used 2.01GB path /dev/sdc
        devid    1 size 12.00GB used 2.03GB path /dev/sdb

[root@venus ~]# mount -t btrfs /dev/sdc /oradb
[root@venus ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              49G   39G  7.6G  84% /
tmpfs                 2.0G     0  2.0G   0% /dev/shm
bazar                 688G  426G  263G  62% /media/sf_bazar
/dev/sdc               24G   56K   22G   1% /oradb

Create a new sub-volume to host the datafile for a developer schema.

[root@venus /]# btrfs subvolume create /oradb/app1
Create subvolume '/oradb/app1'
[root@venus /]# chown -R oracle:dba /oradb/app1

In Oracle, create a new user with a tablespace in the above volume:

create tablespace app1_tbs
  datafile '/oradb/app1/app1_datafile_01.dbf' size 100M;

create user app1 identified by app1
default tablespace app1_tbs
quota unlimited on app1_tbs;

grant create session, create table, create sequence to app1;

We’ll also need to create a directory to point to the application folder. It is need for expdp and impdp.

CREATE OR REPLACE DIRECTORY app1_dir AS '/oradb/app1';

Let’s create some database objects:

connect app1/app1
create sequence test1_seq;
select test1_seq.nextval from dual;
select test1_seq.nextval from dual;
create table test1 (c1 integer);
insert into test1 select level from dual connect by level <= 10;
commit;

Create a Restore Point

Now, we want to make a restore point. We’ll need to:

  1. put the tablespace in read-only mode:

     alter tablespace app1_tbs read only;
    
  2. export tablespace:

     [oracle@venus app1]$ expdp userid=talek/*** directory=app1_dir transport_tablespaces=app1_tbs dumpfile=app1_tbs_metadata.dmp logfile=app1_tbs_metadata.log
    
  3. export the other database objects which belong to the schema but they are not stored in the tablespace:

     [oracle@venus app1]$ expdp userid=talek/*** directory=app1_dir schemas=app1 exclude=table dumpfile=app1_nontbs.dmp logfile=app_nontbs.log
    
  4. take a snapshot of the volume (this is very fast):

     [root@venus /]# btrfs subvolume list /oradb
     ID 258 gen 14 top level 5 path app1
     [root@venus app1]# btrfs subvolume snapshot /oradb/app1 /oradb/app1_restorepoint1
     Create a snapshot of '/oradb/app1' in '/oradb/app1_restorepoint1'
    
  5. we can get rid now of the dumps/logs:

     [root@venus app1]# rm -rf /oradb/app1/*.dmp
     [root@venus app1]# rm -rf /oradb/app1/*.log
     [root@venus app1]# ls -al /oradb/app1
     total 102412
     drwx------ 1 oracle dba             40 Nov 13 20:21 .
     drwxr-xr-x 1 root   root            44 Nov 13 20:02 ..
     -rw-r----- 1 oracle oinstall 104865792 Nov 13 20:13 app1_datafile_01.dbf
    
  6. Make the tablespace READ-WRITE again:

     alter tablespace app1_tbs read write;
    

Simulate Workload

Now, let’s simulate running a deployment script or something. We’ll truncate the TEST1 table and we’ll create the TEST2 table. In addition, the sequence will be dropped.

20:25:04 SQL> select count(*) from test1;

COUNT(*)
--------
      10

20:25:04 SQL> truncate table test1;

Table truncated.

20:25:29 SQL> select count(*) from test1;

COUNT(*)
--------
       0

20:25:04 SQL> create table test2 (c1 integer, c2 varchar2(100));

Table created.

SQL> drop sequence test1_seq;

Sequence dropped.

As you can see, all the above are DDL statements which cannot be easily reverted.

Revert to the Restore Point

Ok, so we need to revert back, and fast. The steps are:

  1. drop the tablespace:

     drop tablespace app1_tbs including contents;
    
  2. drop the user;

     drop user app1 cascade;
    
  3. revert to snapshot on the FS level:

     [root@venus oradb]# btrfs subvolume delete app1
     Delete subvolume '/oradb/app1'
     [root@venus oradb]# btrfs subvolume list /oradb
     ID 259 gen 15 top level 5 path app1_snapshots
     [root@venus oradb]# btrfs subvolume snapshot /oradb/app1_restorepoint1/ /oradb/app1
     Create a snapshot of '/oradb/app1_restorepoint1/' in '/oradb/app1'
    
  4. recreate the user with its non-tablespace objects. Please note that we remap the tablespace APP1_TBS to USERS because APP1_TBS is not created yet. Likewise, we exclude the quotas for the same reason.

     [oracle@venus scripts]$ impdp userid=talek/*** directory=app1_dir schemas=app1 remap_tablespace=app1tbs:users dumpfile=app1_nontbs.dmp nologfile=y exclude=tablespace_quota
    
  5. reimport the transportable set:

     [oracle@venus app1]$ impdp userid=talek/*** directory=app1_dir dumpfile=app1_tbs_metadata.dmp logfile=app1_tbs_metadata_imp.log transport_datafiles='/oradb/app1/app1_datafile_01.dbf'
    
  6. reimport quotas:

     [oracle@venus scripts]$ impdp userid=talek/*** directory=app1_dir schemas=app1 remap_tablespace=app1tbs:users dumpfile=app1_nontbs.dmp nologfile=y include=tablespace_quota
    
  7. restore the old value for the default tablespace of the APP1 user:

     alter user app1 default tablespace APP1_TBS;
    
  8. make the application tablespace read write:

     alter tablespace app1_tbs read write;
    
  9. remove the /oradb/app1/*.dmp and /oradb/app1/*.log files. They are not needed anymore.

Test the Flashback

Now, if we connect with app1 we get:

22:53:14 SQL> select object_name, object_type from user_objects;

OBJECT_NAME OBJECT_TYPE
----------- -----------
TEST1_SEQ   SEQUENCE
TEST1       TABLE

22:53:48 SQL> select count(*) from test1;

COUNT(*)
--------
      10

We can see that we are back on the previous state, across DDLs. The records from the TEST1 table and our sequence are back.

Remove an Un-necessary Snapshot

If the snapshot is not needed anymore it can be deleted with:

[root@venus oradb]# btrfs subvolume delete /oradb/app1_restorepoint1
Delete subvolume '/oradb/app1_restorepoint1'

Pros & Cons

The main advantages for this solution are:

  • no need for an Oracle enterprise edition. It’s working with a standard edition without any problems.
  • no need to have the database in ARCHIVELOG.
  • no flashback logs needed.
  • schema level flashback.
  • the database is kept online without any disruption for the other schema users.
  • can be easily scripted/automated
  • fast

The disadvantages:

  • btrfs is not quite production ready, but it is suitable for test/dev environments. For production systems zfs might be a better alternative.

Multipath Playground

The vast majority of production Oracle database installations takes now advantage of the multi-pathing facility so that to ensure HA via redundant paths to the storage. However, if you want to see this feature in action and want to play with it, you may find it difficult without a testing SAN available. In this post I want to cover a simple setup for testing the default multipath facility provided by all modern Linux systems. There are other multipath commercial solutions available (e.g. PowerPath from DELL), but they won’t be covered here.

Setup Overview

Below is what we’d need for this playground:

Multipath Playground

jupiter and sun are both virtual machines running on VirtualBox. jupiter is an OEL 6.3 and sun is an openfiler.

Please note that these machines have been setup with two NICs in order to simulate the two paths to the same disk. The sun server has two disks attached (from VirtualBox): one for the OS and the other for creating disk volumes to be published to other hosts (in this case to jupiter).

Open filer setup

Just install the downloaded open filer image into a new virtual box host. Don’t forget to configure two NICs. As soon as the installation is finished you may access the web interface via:

https://192.168.56.6:446/

The default credentials are: openfiler/password.

The first step is to enable the “iSCSI target server” service.

iSCSI target

Then we need to grant access to jupiter in order to access the san.

ACL Network

Ready now to prepare the storage. Go to “Volumes/Block Devices”:

Block Devices

The disk we want to use for publishing is /dev/sdb. Click on that link in order to partition it. Then create a big partition, spanning the entire disk.

Partition

Next, go to “Volumes/Volume Groups” and create a volume group called “oracle”.

VG Management

Now, we can create the logical volumes. Go to “Volumes/Add Volume” and create a new volume of type “block”. I gave 10G and I call it jupiter1.

It’s now time to configure our openfiler as an iSCSI target server. Go to “Volumes/iSCSI Targets/Target Configuration”. Add the following Target IQN: iqn.2006-01.com.openfiler:jupiter. Then go, to “LUN Mapping” and map this volume. We also need to configure ACLs on the volume level using the “Network ACL” tab:

Volume ACL

Jupiter Setup

First of all, it is a good thing to upgrade to the last version. Multipath-ing software may be affected by various bugs, so it’s better to run with the last version available.

iSCSI Configuration

The following packages are mandatory:

device-mapper-multipath
device-mapper-multipath-lib
iscsi-initiator-utils

Next, configure the iscsi initiator:

service iscsid start
chkconfig iscsid on
chkconfig iscsi on

Ok, let’s import the iSCSI disks:

iscsiadm -m node -T iqn.2006-01.com.openfiler:jupiter -p 192.168.56.6
iscsiadm -m node -T iqn.2006-01.com.openfiler:jupiter -p 192.168.56.7

If it’s working, then we can set those disks to be added on boot:

iscsiadm -m node -T iqn.2006-01.com.openfiler:jupiter -p 192.168.56.6 --op update -n node.startup -v automatic
iscsiadm -m node -T iqn.2006-01.com.openfiler:jupiter -p 192.168.56.7 --op update -n node.startup -v automatic

Now we should have two new disks:

[root@jupiter ~]# fdisk -l

Disk /dev/sda: 52.4 GB, 52428800000 bytes
255 heads, 63 sectors/track, 6374 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00058a16

  Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        6375    51198976   83  Linux

Disk /dev/sdc: 11.5 GB, 11475615744 bytes
255 heads, 63 sectors/track, 1395 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x30a7ae71

Disk /dev/sdb: 11.5 GB, 11475615744 bytes
255 heads, 63 sectors/track, 1395 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x30a7ae71

Multipath Configuration

Start with a simple /etc/multipath.conf configuration file:

blacklist {
  devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"  
  devnode "^hd[a-z]"
  devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}

defaults {
  find_multipaths yes
  user_friendly_names yes
  failback immediate
}

Now, let’s check if the multipath can figure out how to aggregate the paths under one single device:

[root@jupiter ~]# multipath -v2 -ll
mpathb (14f504e46494c4552674774717a782d7030585a2d31514b4d) dm-0 OPNFILER,VIRTUAL-DISK
size=11G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 3:0:0:0 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 2:0:0:0 sdb 8:16 active ready running

As shown above, multipath has aggregated those two disks under the mpathb/dm-0. The wwid of those disks is 14f504e46494c4552674774717a782d7030585a2d31514b4d. This piece of information can be useful in order to set special properties on the disk level. For example, we can assign an alias to this disk by adding the following section into the configuration file.

multipaths {
  multipath {
    wwid                    14f504e46494c4552674774717a782d7030585a2d31514b4d
    alias                   oradisk
  }
}

Now, we can start the multipathd daemon:

service multipathd start

fdisk should report now the new multipath disk:

Disk /dev/mapper/oradisk: 11.5 GB, 11475615744 bytes
255 heads, 63 sectors/track, 1395 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x30a7ae71

You may partition it as you’d do with a regular disk.

Test the Multipath Configuration

The idea is to simulate a path failure. In this iSCSI configuration is easy to do this by shutting down one eth at a time. For example, login to sun and shutdown the eth0 interface:

ifdown eth0

In /var/log/messages of the jupiter server you should see:

Apr 22 16:48:06 jupiter kernel: connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4308454544, last ping 4308459552, now 4308464560
Apr 22 16:48:06 jupiter kernel: connection2:0: detected conn error (1011)
Apr 22 16:48:06 jupiter iscsid: Kernel reported iSCSI connection 2:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)

You can confirm that the path is down using:

[root@jupiter ~]# multipath -ll
oradisk (14f504e46494c4552674774717a782d7030585a2d31514b4d) dm-0 OPNFILER,VIRTUAL-DISK
size=11G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 3:0:0:0 sdc 8:32 active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 2:0:0:0 sdb 8:16 active faulty running

Note the faulty state for the 2:0:0:0 path.

Now, startup the eth0 interface:

ifup eth0

In /var/log/messages you should see:

Apr 22 16:54:55 jupiter iscsid: connection1:0 is operational after recovery (26 attempts)
Apr 22 16:54:56 jupiter multipathd: oradisk: sdb - directio checker reports path is up
Apr 22 16:54:56 jupiter multipathd: 8:16: reinstated
Apr 22 16:54:56 jupiter multipathd: oradisk: remaining active paths: 2

Learned Lessons

If another disk is added in Open Filer, on the clients (jupiter) you need to rescan the iSCSI disks:

iscsiadm -m node -R

Usually, the multipath daemon, if running, we’ll pick the new disk right away. However, if you want to assign an alias, you’ll have to amend the configuration file and do a:

service multipathd reload

According to my tests, adding a new disk can be done online. However, that doesn’t apply in case of resizing an existing disk. Openfiler can only extend an existing disk. Downsizing is not an option.

Even Openfiler allows you to increase the disk size via its GUI, the new size is not seen by the initiators. Even though the initiator (jupiter) is rescanning its iSCSI disks, their size is not refreshed. What I found is that you need to restart the iSCSI target server service on Openfiler in order to publish the new size. This is quite an annoying limitation, considering that you shouldn’t do this if the published disks are in use. The conclusion is that it is possible to add new disks online, but increasing the size of an existing disk can’t be done online.