Thursday, February 20, 2020

OER (Optimize Edge Routing) Phases

If you are new to OER I suggest to stop now and read my Introduction to OER first. This will give you an idea what OER is and why you might want to use it. Reading about the different OER phases will be very confusing if you don’t know the basics. After reading the introduction it’s best to start with a basic configuration first so you’ll see how it works. Having said that, let me show you the different phases:
  • OER Profile Phase
  • OER Measure Phase
  • OER Apply Policy Phase
  • OER Control Phase
  • OER Verify Phase
These 5 phases always loop around. OER will start with the profile phase and then moves on to the measure, apply policy, control and verify phase. After the verify phase it will go back to the profile phase and this cycle will keep on going. Now let’s take a closer look at the different phases:

OER Profile Phase

Depending on the size of your network you might have hundreds or thousands of routes in the RIB (Routing Information Base). Optimize edge routing means that we will prefer some traffic over other traffic so we have to select a number of routes from the RIB that we want to optimize.  There are a couple of ways how we can do this:
  • We can automatically learn traffic flows that experience performance issues.
  • It’s also possible to manually configure different traffic classes.
So what is a “traffic class” exactly? It can be something simple like a prefix but it’s also possible to use a prefix in combination with a port number. Border routers will learn about traffic classes and report this to the master controller. These traffic classes are then stored in a special table called the Monitored Traffic Classes (MTC) table. This MTC has a limited capacity so by default OER will perform prefix aggregation. It will aggregate (summarize) all prefixes to a /24 by default. The MTC will store 100 prefixes but this is something we can change if we want to.
Automatically learning of traffic classes is done by using the top talkers feature of NetFlow. You don’t have to configure NetFlow yourself, it is done for you automatically when you enable OER.
OER doesn’t check for traffic classes non-stop but it works with a schedule:
oer learning cycle
OER will learn in the monitor period which is 5 minutes by default. Once this period is over it will go into “sleep mode” (120 minutes) and it will save the information on the prefixes that it has learned. These timers might be ok for production networks but you might want to speed it up for your labs.

OER Measure Phase

Once we know what traffic classes we want to optimize (either by learning or configuring them ourselves) OER will measure the performance of each traffic class. There are two methods how we can do this:
  • Passive monitoring
  • Active monitoring
Passive monitoring uses Netflow and interface counters on the border routers. OER will measure the following performance metrics using NetFlow:
  • Delay: The average delay of TCP flows for a prefix. It will keep track of TCP segments and the returning TCP ACK to calculate round-trip response time (RTT).
  • Packet Loss: By keeping track of TCP sequence numbers OER can measure packet loss.
  • Reachability: OER will keep track of TCP SYNs that have been sent without receiving a TCP ACK response.
  • Throughput: The total number of bytes and packets for each traffic class in a certain amount of time. This is one of the metrics that can be used for non-TCP traffic.
Active monitoring uses IP SLA to emulate the traffic class and discover performance metrics. Our border routers will report their information to the master controller, the master controller will store the performance metrics together with the traffic classes in the MTC. OER can collect the following performance metrics thanks to IP SLA:
  • Delay (same as above).
  • Reachability (same as above).
  • Jitter: A variation in delay causes jitter. OER will send multiple packets to the destination and measure the delay between them.
  • MOS: The Mean Opinion Score is used to represent voice quality on a scale from 1 to 5. 1 = terrible voice quality, 5 = best voice quality
For each traffic class or interface that OER monitors it will check the current state and it does this by using different states:
  • Default: traffic classes in the default state are not controller by OER. When a traffic class is added to the MTC it will be in the default state. You will see traffic classes go into and out the default state depending on measurement results and the policy that you have configured.
  • Choose Exit: This is where OER compares performance metrics against the configured policy for the traffic class. OER will prefer to keep the current “exit path” for a certain traffic class but when it exceeds its current policy the master controller will start looking for another exit path.
  • Holddown: A traffic class will be in the holddown state when the master controller tells the border router  to use active probes (IP SLA) to monitor the traffic class. Performance metrics are collected until the holddown timer expires.
  • In-Policy: After comparing the performance metrics against the policy and an exit path has been selected, the traffic class will be in-policy. This means the traffic class doesn’t exceed our policy…the master controller will keep monitoring the traffic class but no action is taken unless the periodic timer expires or when the traffic class performance metrics exceeds the policy.
  • Out-of-Policy (OOP): When there are no exit paths that conform to the policy, the traffic class will go out-of-policy. The backoff timer controls whether a traffic class can leave this state or not and every time a traffic class goes to the out-of-policy state this timer will increase. The backoff timer will reset when the traffic class goes to the in-policy state. When all exit paths are out-of-policy the master controller can select the best exit path available.

OER Apply Policy Phase

On the master controller we will configure a policy with certain thresholds for our traffic classes to define “acceptable performance”. OER will compare the performance metrics with the policy that we created. When the performance metrics exceed the threshold in our policy the traffic class will go OOP (Out of Policy). OER will keep comparing these results to see if there are changes in traffic flows. There are two types of policies for OER:
  • Traffic class policies
  • Link policies
Traffic class policies are configured for prefixes or applications. Link policies are used for exit or entrance links at the edge of our network. For example we can configure OER that traffic towards destination 4.4.4.0/24 should never have a higher delay than 150 ms. As soon as the delay is higher than 150 ms, OER will look for a better exit path so that we conform to the policy.

OER Control Phase

The control phase is where the action happens. When OER has decided that some traffic classes do not confirm to the policy it will change routing on the border routers. It can do this by injecting static routes, BGP routes, changing the BGP local preference, changing route metrics and/or using policy based routing. As a result some traffic classes will have a different exit path. OER will initiate a route change when any of the following occurs:
  • Traffic out-of-policy: when the performance metrics exceed the configured policy.
  • Exit link out of policy: when the link (interface) exceeds link utilization or loses connectivity.
  • Periodic timer expires: when OER is configured for “best mode” the master controller will start looking for the best exit path for the traffic class.
When a traffic class is defined only by a prefix then OER can use static route or BGP route injection. These changes are network-wide since it will affect all routers in the network. Before injecting routes, OER will verify if you already have this route in your BGP table or configured as a static route…if not, it is impossible to inject the route. OER requires a parent route in order to inject routing information. This makes sense because OER is not a routing protocol…if it would just send traffic in a certain direction you might end up blackholing traffic or creating routing loops. Make sure you have configured a floating static route or have a valid entry in the BGP table.
OER is not a routing protocol…make sure you have a valid parent route if you want to send traffic another way.
Traffic classes that have a prefix AND a port number can not be influenced by using route injection When this occurs the changes are not network-wide but device-specific. In this case OER will use policy based routing to influence routing.
When you use PBR to influence routing, the border routes have to be one hop away…physically or otherwise by using a GRE tunnel.

OER Verify Phase

After making changes and sending traffic in different directions, OER will verify that the traffic is optimized and using a different exit interface. OER will collect the resulting statistics to verify that the changes bring the traffic classes “in policy”. The master controller will verify this by watching netflow information from the interface of the new exit path and ignoring netflow information from the old interface.
That’s all I have for now. I hope this helps you to understand OER better. If you have any questions just leave a comment!

No comments:

Post a Comment