暗夜星空: 1.4 Serialization and Deserialization of Data

Serialization and deserialization may sound like unfamiliar terms at first, but you are performing these two actions in your daily life. Take for instance a telephone call between two entities. When you are talking to another person over the phone, your words have to transform into a series of bits that are then sent over an electronic medium or a wireless signal. Your speech has to literally transform to something understandable to the medium that it is traversing. This process is called serialization. On the other side of that telephone call, a receiver has to do the opposite process to extract the meaning out of those bits sent over and reconstruct your words. This process is called deserialization.

Serialization in computer science indicates converting a data structure or an object into a binary or textual format that can be stored and reconstructed later. You want to save an object constructed in your code to a file, giving it permanent storage. This way, you preserve the state of an object. Serialized files could be in YAML, JSON, XML, or any other textual data format or a binary format. From here on, you'll focus on textual formats because of their convenience.

The serialized file could now be transferred to any other system over the network. The receiving system is able to open that file and reconstruct it to the original state with all the objects defined inside. This process is the opposite of serialization and it is called deserialization.

Like saving data to files, APIs also require reliable serialization and deserialization to exchange data. Most programming languages, like Python, have existing tools for working with various data formats.

Here is a practical example that you could be facing as a network engineer working with network programmability. You have written some code in Python for configuring switches in your organization. You want to send the data held in your objects to the switch API in a format that it will understand unmistakably. You have to convert that Python data structure into a valid YAML, JSON, or XML format. To do that, you will use the serialization process.

Other times, you might want to get some information about the specific interface on your switch through that same API. You would receive a configuration of that switch from the API in YAML, JSON, or XML format. To interpret that file so your Python code could understand it, you would use the deserialization process. Basically, you are extracting the details out of a textual file and converting it into valid Python objects. This process is also called data or file parsing.

Introduction to Python

The Python programming language was created by Guido van Rossum and was first released in 1991. It has been gaining ground in recent years among developers, being a serious competition in popularity to core programming languages such as Java or C. Many engineers adopted Python quickly because it is fast, powerful, platform independent, and open source. The syntax was designed to be clear and readable so that you could read through the actual code and understand what is going on, without complexity seen in other core programming languages. The Python syntax structure makes it easy to learn for everyone, including network engineers trying to get into the network programmability as quickly as possible.

Tim Peter, one of the core Python developers who were working on it from the very beginning, was a considerable influence on how Python was written. He summed up the programming language in 19 guideline principles that had considerable influence on how Python was actually designed. That set of principles is known as The Zen of Python and was written in the style of a short poem. If you are not familiar with The Zen of Python, you can access it by executing the import this command from the Python interactive shell.

Python is widely available, whether you are on a Linux workstation, Windows machine, or Macintosh laptop. You might already have Python preinstalled if you are a Linux or Mac user, which makes it even easier to start using it. You will even find Python available on multiple routers and switches across most of the Cisco platforms today.

Note

To check if you already have Python installed on your workstation, try running the python --version or python -V (note the capital V) command from your console window. If Python has been already installed, the command will return the current version present on your computer. Otherwise, you will have to install Python manually.

Note

The most recent major version of Python is 3.7. All the examples featured in this course are based on this version.

Python Libraries

One of the things you will find out working with Python is that you do not have to write every last bit of code for your project yourself. You will discover many relevant code samples and training resources publicly available on the Internet to do some common activities.

You have already seen all the different data formats that you might have to work with, and for the common data formats Python provides the libraries to make it easier to work with them. A library is practically any code outside your script that you want to use. With their usage, you can write more efficient and consistent code, avoiding a lot of unnecessary programming in between. In many ways, your code becomes less error prone, and you are able to reuse the code that has been tested over and over again by a fast-growing community of developers. Just by importing a specific library into your Python code, you get a wide range of features that the library provides, without the need to develop them yourself.

>>> import datetime
>>> date_now = datetime.datetime.now()
>>> f"Current time is {date_now}"
'Current time is 2019-09-15 23:16:34.587930.'


>>> import random
>>> random.randint(1,100)
32
>>> random.randint(1,100)
24
>>> random.randint(1,100)
86
>>> random.randint(1,100)
7

There are many ways to get access to libraries. If you are only starting with Python and want to use the functionalities of some of the existing libraries, you can check what is included with the Python standard library bundle. These functionalities are included with Python itself and differ a bit from version to version. If you want to check the list of libraries included with your chosen version of Python, check the website https://docs.python.org/3/library. Some common standard libraries are datetime, os, sys, and json for working with JSON files. Importing packages that are part of Python standard library is a pretty straightforward process with the import library command run inside your Python code (for instance, import json). You will see these typically at the top of a script. An import statement is a powerful tool for getting access to hundreds of existing Python libraries.

Sometimes, you might only need a specific resource inside a library. In this case, you would use the from library import resource statement.

In addition to the standard library, there is a growing collection of packages, libraries, and even entire applications available on the Python Package Index (PyPI). The PyPI is one of the greatest resources available for Python packages. It is a repository of Python software that you can take advantage of simply by installing and importing them into your script. The pip install library command gives you the ability to install and download the available libraries directly from PyPI to your environment. After that, you can use all the underlying features that the library provides inside your code.

Every developer can create a library that does certain things, package it, and publish it on PyPI. Consequently, the PyPI community grows rapidly each day.

Note

The package manager for Python (pip) is already installed on your workstation if you are using Python version 3.4 or newer. To be sure if pip is present, you can always verify by running the pip --version command from your console window.

Also, pip is the recommended way of installing new libraries onto your system, because it will always download the latest stable release.

If you are unsure about the package name or if a certain package even exists, you can always browse through all the packages available directly on the https://pypi.org webpage. When you locate the package you want to use, you have an option of downloading it manually or installing it simply through the pip install command.

If you are more skilled in programming, you may be familiar with the pip search command that is used to search for a specific library from your console window instead of your browser. Both options work the same, so it is up to you to choose the option that you are most comfortable using.

PyPI repository
Command to install a library: pip install <libraryname>

Note

You should run the pip install command from your console window, not inside your Python script. On Windows, that means the Command Prompt. For Linux or Mac, use the Terminal application.

Another option when searching for a specific library is accessing the GitHub repository. From there, you can quickly see the popularity of a specific library and the last time it was updated. From that information, you can decide if a library is worth looking into and using for your next project. From https://github.com, you have an option of downloading the library manually and then importing it into your project.

There are numerous libraries available on the Internet that can help you deserialize or parse information from a wide palette of data formats into a Python data structure. You will use the same libraries to serialize objects back into the chosen data format if needed.

When you are searching for a specific library for parsing your chosen data format, you will notice that usually, there are multiple libraries available. It is up to you to choose which one to work with given your requirements. A good practice is to check the related documentation for a specific library to get you started working with it.

Some commonly used libraries to work with data formats are shown in this table:

Data Format	Library Name	Commands
YAML	PyYAML	`pip install PyYAML` `import yaml`
YAML	ruamel.yaml	`pip install ruamel.yaml` `from ruamel.yaml import YAML`
JSON	json	`import json`
XML	xmltodict	`pip install xmltodict` `import xmltodict`
	untangle	`pip install untangle` `import untangle`
	minidom	`import xml.dom.minidom`
	ElementTree	`import xml.etree.ElementTree`

For parsing YAML, there are two commonly used libraries from which you can choose. The newest, ruamel.yaml, is a derivative from the original pyYAML library. For some time, the original was not maintained, which is why the new library was created by a community of Python developers. Later, the pyYAML library received an update, and now both libraries are actively maintained. The biggest difference when choosing one over the other is that they have different support for YAML specifications. PyYAML only supports the YAML 1.1 specification, whereas ruamel.yaml comes in handy if YAML 1.2 is required.

Today, pretty much every programming language has libraries for parsing and generating JSON data. With Python, the JSON library is part of the Python standard library and is therefore included by default. You do not have to run any pip install commands from your console; you can leverage JSON library features with only the import statement inside your Python code.

Working with XML in Python, there are several native libraries available. They all differ in some way, thus providing you the ability to choose the specific one for your needs. If you need powerful manipulation of XML, the libraries shown in the table are worth looking at.

YAML File Manipulation with Python

When you parse data from data formats, you need to know how the actual conversion is going to happen inside your Python code. In this case, how are the strings from a YAML file going to be translated into Python objects? YAML probably is one of the closest data formats to Python itself because it natively maps the data into a Python dictionary, on which you can do all sorts of powerful manipulation later on.

---
user:
  name: john
  location:
    city: Austin
    state: TX
  roles:
    - admin
    - user

YAML	Python
object	dict
array	list
string	str
number (int)	int
number (real)	float
true	True
false	False
null	None

In the figure, there is a snippet from a YAML configuration file. The file hosts information about an application user. This user has a username, office location information, and assigned application roles.

How the translation from a YAML structure into a Python object is going to happen is shown in the table in this figure. By default, a YAML object is converted to a Python dictionary, an array is converted to a list, and so on.

Once you have your preferred library installed—in this case, PyYAML—you have to first import it into your script. You do that with a simple import yaml statement at the top of your code. Note that the PyYAML library does not come with Python natively, so you have to make sure that the library is installed on your system prior to importing it into your script and taking advantage of all the features the library provides. If that library has not already been installed on your system, simply run the pip install PyYAML command from your console window to install it.

The next action is to open the YAML file and load it into a Python object so that you can work with the data more easily. You do that by defining a variable in Python to parse all the information from that file. You are using the yaml.safe_load() method for that purpose.

To print out the Python object that you parsed all the data from a YAML file, use this code: print(data). Your output should look something like this:

{'user': {'name': 'john', 'location': {'city': 'Austin', 'state': 'TX'}, 'roles': ['admin', 'user']}}

You can see that the output is not the cleanest; it is all kind of bunched together. It looks like a Python dictionary. To check the type of that data variable, use the following code:

print(type(data))
Output: <class 'dict'>

In this case, you can confirm that this is a valid Python dictionary. You can access elements by key names—for example, data['user'].

From here on, all the manipulation you are doing is on the Python dictionary inside your script and does not affect the YAML file itself.

Next, you would like to traverse through all the roles in your dictionary and print them out. You do that by creating a Python loop to go through all the objects with a key roles.

That loop in the example would give you the following output:

admin
user

You have now successfully opened a YAML file, parsed the information to a variable, traversed through the desired key, and printed out the desired data.

Suppose that you are faced with a task to change the location for a bunch of users as a result of moving the office of your organization to Dallas. Manually changing all of them would be very time consuming. For that purpose, you want to use your existing script to perform changes on those users.

The data in question is city, which is currently set to the value of "Austin". Changing it requires you to locate that key inside your Python dictionary and change its value to the new city name.

Until now, all you were doing is changing the content of a Python dictionary inside your script. To save all the changes permanently, you have to serialize it back to the YAML file. In this example, you are creating a new YAML file for that purpose. That way, you can compare it later to the original file and see the effects made by your script.

Similarly, as with opening the file, you have a method inside the PyYAML library to save information back to a file. Using the method yaml.dump() inside your script makes it very convenient to do so.

JSON File Manipulation with Python

Here, you have the same information about this user saved in a JSON format. The syntax almost looks like a Python dictionary. This JSON file in the figure has a key called user, and a value of it is an object, representing the user named john.

{
    "user": {
        "name": "john", 
        "location": {
            "city": "Austin", 
            "state": "TX"
        },
        "roles": [
            "admin",
            "user"
        ]
    }
}

JSON	Python
object	dict
array	list
string	str
number (int)	int
number (real)	float
true	True
false	False
null	None

As you can see in the conversion table in the figure, the same translations into Python elements are being made as with a YAML data format. A JSON object gets natively translated into a Python dictionary, which makes it working with JSON largely similar to YAML.

From the functionality perspective, JSON and YAML are very similar when writing your Python code.

First, you have to open the JSON file and parse it into a Python dictionary before you can do any manipulation of that data. For that purpose, you are using the method json.load(). After you are done working on that dictionary inside your script, you want to save the changes back to the file. For that, you are leveraging the method json.dump(), which is serializing the data back to a JSON file.

If you look closely at the code in the figure, you may notice that the core code is the same as in the YAML example. The loop created for traversing through all the roles and printing them out is the same code. In both cases, the data is parsed into a dictionary so that you are working on the same Python element type.

XML File Manipulation with Python

In the figure, you can see the same data as in the previous two examples, represented also in an XML data format. Because of the XML syntax structure, parsing XML documents differs a bit from the other two shown.

<?xml version="1.0" encoding="UTF-8" ?>
<root>
  <user>
    <name>john</name>
    <location>
      <city>Austin</city>
      <state>TX</state>
    </location>
    <roles>admin</roles>
    <roles>user</roles>
  </user>
</root>

XML Library	Python element and example
minidom	DOM object
	user.getElementsByTagName('name')[0].firstChild.data
ElementTree	Element Tree
	user.find('name').text
xmltodict	Dictionary
	user['name']
untangle	Object
	user.name.cdata

Python offers numerous libraries for manipulating XML data. The main differentiation between them is the way that they parse an XML file into Python elements. You have the option of converting the XML data into a Python object represented as nodes and attributes using the untangle library, or if you are more familiar with the Document Object Model (DOM), you could use the minidom library, which in turn converts an XML data into DOM objects. For an experience similar to working with YAML or JSON files, you could use xmltodict, which converts an XML document into a Python dictionary. Another heavily used option is the ElementTree library, which represents XML data in a hierarchical format represented in a tree structure. It is often described as a hybrid between a list and a dictionary inside Python.

Because of its convenience, you decide to go with the ElementTree Python library for parsing the XML data. As always, you have to first include the library inside your script to use it. ElementTree comes with Python natively, so one import statement is sufficient to start using all the capabilities from it. Because the library has a longer name, you can import the entire library under an alternate name, ET, as shown in the code. That way, you can use the simplified name for further reference to it.

The first action you need to take, as with any other data format, is to open a file. Here, you are declaring two variables—first, to parse the read data into a tree structure by creating an ElementTree object, and second, to get the root element of that tree. Once you have access to the root element, you can traverse through the entire tree. You can imagine a tree structure as objects forming a connected graph.

If you want to find all elements by a specific tag name, you can use a findall() method on your ElementTree. When you are searching for a specific tag name, you will use a find() method to retrieve it. And lastly, if you want to access the tag value, you can make use of a text attribute. The latter also is used in situations where you want to change the value of a particular tag. In the code in the figure, that is changing the city of the user location.

To save all the changes you made to your ElementTree structure to a permanent location, use the write() method.

暗夜星空

Thursday, October 22, 2020

1.4 Serialization and Deserialization of Data

Introduction to Python

Note

Note

Python Libraries

Note

Note

YAML File Manipulation with Python

JSON File Manipulation with Python

XML File Manipulation with Python

No comments:

Post a Comment