Friday, October 18, 2013

Ongoing troubleshooting and fixes for RANCID on Fedora

Ongoing troubleshooting and fixes for RANCID on Fedora

Keeping RANCID running is an ongoing process, and I'll document fixes as I go on here. Hope it helps others!

Replacing or Updating Equipment can Break RANCID
RANCID is no longer updating and won't run anymore!
How to add a new switch or Router to RANCID
cvs update


This is a security feature of SSH on Linux that can break RANCID's ability to check in with an IP when it lives on a new device (hardware upgrade) or has had its crypto keys regenerated. The logs for Rancid will read: "clogin error: Error: The host key for 10.10.10.1 has changed. Update the SSH known_hosts file accordingly."

Here's how to fix it:

su
#Enter super-user mode
su - rancid
#Changes the terminal context to the local rancid user. Change "rancid" to your specific local user if you use another account to run the rancid service.
ssh-keygen -R 10.10.10.1
#replace the IP with any other IPs which are failing to check in
The local saved SSH key for the device will be flushed, and the next time the device is reached out to, the linux environment will save the new key and remember it. This retains the great security feature, but resets the specific IP.

RANCID has a strange habit of not updating its databases occasionally. There are a few reasons for this including normal permissions issues, database credentials, but there's one issue that is unique to RANCID -- a lock file that will block all instances of rancid from running. 

The error message you'll see in your log files is self-explanatory: "Config diffs failed: /tmp/.networking.run.lock" You'll also see a manual run of rancid-run end immediately without writing any diffs to the cvs database. 

This file appears when rancid-run unexpectedly stops -- like if you ctrl+c interrupt a test run of rancid-run, like I did. Or if the server reboots mid rancid-run. 

The fix is to delete the lock file: 
su
rm /tmp/.networking.run.lock

Rancid will run normally, and you can test it immediately by: 
su
# To enter super-user mode. 
su - rancid
# To change the terminal to rancid user context. 
cd /usr/local/rancid/bin
./rancid-run
Once RANCID is fully set up and functioning, you'll invariably need to add more devices to it. Here's how:
su 
#to enter super-user mode which enables you to switch to other users
su - rancid
#Switches to the local rancid user, whom runs the local rancid service
vi /usr/local/rancid/var/nteworking/router.db
#this opens VI to edit the router.db text
Add the IP of the router/switch
:w
#Writes the changes
:q
#Quits vi and return to terminal
cd /usr/local/rancid/var/networking/configs
#tells the cvs change management tool to re-read the router.db list and start monitoring the new IP

No comments:

Post a Comment