Thursday, February 20, 2020

EIGRP Queries and Stuck in Active

I’d like to tell you a little bit more about the EIGRP query process and the stuck-in-active problem. This will be helpful to understand why we use the EIGRP stub feature.


EIGRP is designed for large enterprise networks but having one big EIGRP network (5000+ prefixes and many hops) can lead to some problems:
  • Lots of EIGRP prefixes equal a large topology table and routing table.
  • Calculating the successor router will take longer if you have many EIGRP neighbors and different paths.
  • If there are many backup paths EIGRP will have to see if there are 1 or more feasible successors, this will take longer.
  • More information means our EIGRP routers have to work harder to process everything.
  • When EIGRP loses a route and there is no feasible successor the route will go from passive to active and the router starts sending queries to its neighbors.
  • EIGRP sends queries on all interfaces except the interface of the successor.
Let me describe the EIGRP query process in detail for you:
eigrp big topology failed link
In the topology above we are running EIGRP on all of the routers. Router 1 has a link failure and as a result has lost its successor route to a particular network on the left side. There is no feasible successor so the route is going from passive to active and we will send a query to router 2 and 3.
There are 2 things that can happen at this moment:
  • Router 2 or 3 has information about this particular route and will send information about it to router 1. The query process is now over.
  • Router 2 or 3 don’t know anything about this route and will send a query themselves to their neighbors router 4, 5 and router 6 and 7. Router 2 or 3 will not send a reply to router 1 until they heard a response from all their neighbors.
eigrp big topology failed link query forward
In our topology nobody has a clue which network router 1 is looking for. They will forward their queries to their own neighbors. The red arrows indicate the query packet.
eigrp big topology failed link query reply
There are no other neighbors behind router 4, 5, 6 or 7. They will send a reply to router 2 and 3 to let them know they don’t know the answer. Router 2 and 3 will send a reply to router 1 to tell them they are sorry but this is it. That’s a lot of packets for just one route that was lost right?
eigrp big topology failed link query fail
Let’s make things even more interesting. Look at my picture above and you’ll see that the reply from router 2 never makes it back to router 1. EIGRP is a reliable protocol and for each query a router sends to its neighbors it must get a reply in response within 3 minutes. If the router does not receive a reply to ALL its outstanding queries it will put the route in SIA (Stuck in Active) state and will kill the neighbor adjacency. By dropping the neighbor adjacency you will lose all the routes you learned from this neighbor which means the router will start sending queries for all those routes as well. Not a pretty sight right?
How is it possible that a reply never makes it back?
  • The router that gets the query is too busy because of memory problems or a CPU that’s too busy. It might not get the chance to process the incoming query or send a reply packet.
  • There are problems with the link between the neighbors so not all packets arrive.
  • You have a unidirectional link failure so packets only flow in one direction. This can happen with fiber links.
Since IOS 12.1 Cisco decided to change the stuck in active process to reduce the number of unwanted lost neighbor adjacencies. They introduced two new packets called SIA query and SIA reply.
eigrp sia reply
Before Cisco introduced SIA query and SIA reply this would happen:
  1. Router 1 loses information about a network and has no feasible successor.
  2. Router 1 sends a query to router 2.
  3. Router 2 doesn’t know the answer so sends a query as well to router 3.
  4. Router 3 doesn’t know the answer and sends a reply to router 2.
  5. Router 2 sends a reply to router 1 to let him know he has no clue about this network.
  6. Because of congestion the reply from router 2 never makes it back to router 1.
  7. After 3 minutes router 1 will drop the neighbor adjacency with router 2 including all the routes it learnt from router 2.
Now we have SIA query and SIA reply and things will work a little bit different:
  1. Router 1 loses information about a network and has no feasible successors.
  2. Router 1 sends a query to router 2.
  3. Router 2 doesn’t know the answer so sends a query as well to router 3.
  4. Router 3 doesn’t know the answer and sends a reply to router 2.
  5. Router 2 sends a reply to router 1 to let him know he has no clue about this network.
  6. Because of congestion the reply from router 2 never makes it back to router 1.
  7. After 1.5 minute router 1 will send a SIA query to router 2 to ask for its status.
  8. Router 2 will respond with a SIA reply and the neighbor adjacency will not be dropped.
Does this make sense to you? Another method to deal with these queries is to configure EIGRP summarization or the EIGRP stub feature. I’ll save those for another lesson! If you have any questions please leave a comment.

No comments:

Post a Comment