Network maintenance basically means you have to do what it takes in order to keep a network up and running and it includes a number of tasks:
- Troubleshooting network problems.
- Hardware and software installation/configuration.
- Monitoring and improving network performance.
- Planning for future network growth.
- Creating network documentation and keeping it up-to-date.
- Ensuring compliance with company policies.
- Ensuring compliance with legal regulations.
- Securing the network against all kind of threats.
Of course this list could be different for each network you work on and perhaps you are only responsible for a number of these tasks. All these tasks can be performed in the following way:
- Structured tasks.
- Interrupt-driven tasks.
Structured means you have a pre-defined plan for network maintenance that will make sure that problems are solved before they occur. As a network engineer this will also make your life a whole lot easier. Interrupt-driven means you just wait for trouble to occur and then fix it as fast as you can. Interrupt-driven is more like the “fireman” approach…you wait for trouble to happen and then you try to fix the problem as fast as you can. A structured approach where you have a network maintenance strategy and plan reduces downtime and it’s more cost effective.
Of course you can never completely get rid of interrupt-driven tasks because sometimes things “just go wrong” but with a good plan we can reduce the number of interrupt-driven tasks for sure.
You don’t have to think of a complete network maintenance model yourself; there are a number of well-known network maintenance models that we use. It’s best to use one of the models that is best suited for your organization and adjustments if needed.
Choosing which network maintenance model you will use depends on your network and the business. You can also use them as a template to create your own network maintenance model.
To give you an idea what a network maintenance model is about and what it looks like, here’s an example for FCAPS:
- Fault management: we will configure our network devices (routers, switches, firewalls, servers, etc.) to capture logging messages and send them to an external server. Whenever an interface goes down or the CPU goes above 80% we want to receive an e-mail so we can see what is going on.
- Configuration management: Any changes made to the network have to be logged. We will use a change management so relevant personnel will be notified of planned network changes. Changes to network devices have to be reported and acknowledged before they are implemented.
- Accounting management: We will charge (guest) users for usage of the wireless network so they’ll pay for each 100MB of data or something. It’s also commonly used to charge people for long distance VoIP calls.
- Performance management: Network performance will be monitored on all LAN and WAN links so we know when things go wrong. QoS (Quality of Service) will be configured on the appropiate interfaces.
- Security management: We will create a security policy and implement it by using firewalls, VPNs, intrusion prevention systems and use AAA (Authorization, Authentication and Accounting) servers to validate user credentials. Network breaches have to be logged and a appropiate response has to be made.
You can see FCAPS is not just a “theoretical” method but it truly describes “what”, “how” and “when” we will do things.
Whatever network maintenance model you decide to use, there are always a number of routine maintenance tasks that should have listed procedures, here are a couple of examples:[teaser]
- Configuration changes: Business are never static but they change all the time. Sometimes you need to make changes to the network to allow access for guest users, normal users might move from one office to another so you’ll have to make changes to the network to facilitate this.
- Replacement of hardware: Older hardware has to be replaced with more modern equipment and it’s also possible that production hardware fails so we’ll have to replace it immediately.
- Backups: If we want to recover from network problems such as failing switches or routers then we need to make sure we have recent backups of configurations. Normally you will use scheduled backups so you will save the running-configuration each day, week, month or whatever you like.
- Software updates: We need to keep our network devices and operating systems up-to-date. Bugs are fixed but also to make sure we don’t have devices that are running older software that has security vulnerabilities.
- Monitoring: We need to collect and understand traffic statistics and bandwidth utilization so we can spot (future) network problems but also so we can plan for future network growth.
Normally you will create a list with the tasks that have to be done for your network. These tasks can be assigned a certain priority. If a certain access layer switch fails then you will likely want to replace it as fast as you can but a failed distribution or core layer device will have a much higher priority since it impacts more users of the network. Other tasks like backups and software updates can be scheduled. You will probably want to install software updates outside of business operating hours and backups can be scheduled to perform each day after midnight or something. The advantage of scheduling certain tasks is that network engineers will less likely forget to do them.
Making changes to your network will sometimes impact productivity of users who rely on the network availability. Some changes will have a huge impact, changes to firewalls or access-list rules might impact more users then you’d wish for. For example you might want to install a new firewall and planned for a certain result. Accidentally you forgot about a certain application that uses random port numbers and you end up troubleshooting this issue. Meanwhile some users are not able to use this application (and shouting at you while you try to fix it…;).
Larger companies might have more than 1 IT department and each department is responsible for different network services. If you plan to replace a certain router tommorow at 2AM then you might want to warn the “Microsoft Windows” guys department because their servers will be unreachable. You can use change management for this. When you plan to make a certain change to the network then other departments will be informed and they can object if there is a conflict with their planning.
When you want to implement change management you might want to think about the following:
- Who will be responsible for authorizing changes to the network?
- Which tasks will be performed during scheduled maintenance windows?
- What procedures have to be followed before making a change? (for example: doing a “copy run start” before making changes to a switch).
- How will you measure the success or failure of network changes? (for example: if you plan to change a number of IP addresses you will plan the time required to make this change. If it takes 5 minutes to reconfigure the IP addresses and you end up troubleshooting 2 hours because something else is not working you might want to “rollback” to the previous configuration. How much time do you allow for troubleshooting? 5 minutes? 10 minutes? 1 hour?
- How, when and who will add the network change to the network documentation?
- How will you create a rollback plan so you can restore a configuration to the previous configuration in case of unexpected problems?
- What circumstances will allow change management policies to be overruled?
Another task we have to do is to create and update our network documentation. Whenever a new network is designed and created it should be documented.
The more challenging part is to keep it up-to-date in the future. There are a number of items that you should find in any network documentation:
- Physical topology diagram: This should show all the network devices and how they are physically connected to each other.
- Logical topology diagram: This should show how everything is connected to each other. Protocols that are used, VLAN information etc.
- Interconnections: It’s useful to have a diagram that shows which interfaces of one network device are connected to the interface of another network device.
- Inventory: You should have an inventory of all network equipment, vendor lists, product numbers, software versions, software license information and each network device should have an organization tag assess number.
- IP Addresses: You should have a diagram that covers all the IP addresses in use on the network and on which interfaces they are configured.
- Configuration management: Before changing a configuration we should save the current running-configuration so it’s easy to restore to a previous (working) version. It’s even better to keep an archive of older configurations for future use.
- Design documents: Documents that were created during the original design of the network should be kept so you can always check why certain design decisions were made.
It’s also a good idea to work with step-by-step guidelines for troubleshooting or using templates for certain configurations that all network engineers agree on to use, here are some examples to give you an idea:
interface FastEthernet0/1
description AccessPoint
switchport access vlan 2
switchport mode access
spanning-tree portfast
Here’s an example for access interfaces connected to wireless access points. Portfast has to be enabled for spanning-tree, the access points have to be in VLAN 2 and the switchport has to be changed to “access” manually.
interface FastEthernet0/2
description VOIP
interface FastEthernet0/2
description ClientComputer
switchport access vlan 6
switchport mode access
switchport port-security
switchport port-security violation shutdown
switchport port-security maximum 1
spanning-tree portfast
spanning-tree bpduguard enable
Here’s a template for interfaces that connect to client computers. The interface has to be configured for “access” mode manually. Port security has to be enabled so only 1 MAC address is allowed (the computer). The interface has to go into forwarding mode immediately so we configure spanning-tree portfast and if we receive a BPDU the interface should go into err-disabled. Working with pre-defined templates like there will reduce the number of errors because everyone agrees on the same configuration. If you give each network engineer instructions to “protect the interface” you’ll probably end up with 10 different configurations…here’s one more example:
interface GigabitEthernet0/1
description TRUNK
switchport trunk encapsulation dot1q
switchport mode trunk
switchport trunk nonegotiate
If you tell 2 network engineers to “configure a trunk” you might end up with one interface configured for 802.1Q encapsulation and the other one for ISL encapsulation. If one network engineer disabled DTP and the other one configure the interface as “dynamic desirable” then it will also fail. If you instruct them to configure a trunk according to a template then we’ll have the same configuration on both sides.[/MM_Access_Decision]
Hopefully this has given you an idea of what network maintenance is about. If you have any questions, feel free to leave a comment!