Today I have been troubleshooting very interesting issue. Our DBAs deployed new cluster of Oracle RACs with the ability to relocate the IP address between the nodes. When failover is triggered the IP is deactivated on a failing node, and is being configured on the standby node. Unfortunately for some reason, new node sends a proper gratuitous ARP with it’s new MAC but shortly after this it sends second gratuitous ARP with MAC address of it’s default gateway.
You can see it easily on a packet capture:
It’s really weird behaviour and we have spent lot time digging around bonding configuration and other Linux internals but today we got a breakthrough. After googling around I found that this is a bug in Oracle. It has been identified as bug 13440962 with following vague description:
"After upgrading to 220.127.116.11, after vip failover, the ip address is not pingable from a different subnet on Linux."
While this matches the situation it’s not easy to find. Anyway, the short term workaround is to execute arping after failover:
/sbin/arping -U -c 3 -I public NIC for vip vip ip address
Obviously, this is poor workaround because it requires manual intervention to get things rolling again. Long term solution is to apply the patch 13440962