Thursday, October 22, 2020

1.3 API Data Formats

There are many different data formats used for your applications to communicate with a wide range of APIs available on the Internet. Each format represents syntax coding data that could be read by another machine but in such a way that is easy to understand for humans, too.

For instance, if you want to use an API to configure a Cisco router, you have to check which data types are supported by that API. Then, you can start writing a request to be handled by that API that has an effect on your router configuration. An API server comprehends your written code and translates it into instructions that are suitable for your router to process and create an action.



You will most likely encounter these common data formats:

  • YAML Ain't Markup Language (YAML)

  • JavaScript Object Notation (JSON)

  • eXtensible Markup Language (XML)

Common Use Cases

Take a look at some of the most common use cases of XML, JSON, and YAML. The first thing you will notice is that their files have the same extension as their name (.xml for XML, .json for JSON and .yaml for YAML).

Data Formats

Most common use

XML

Transformation with XSL

Applying XML schemas

JSON

Communication server—web page

Configuration files

YAML

Configuration files

XML has been recognized many times as hefty and not so humanly readable as the other two formats. XML is verbose, redundant, and complex to use. It is mostly used to interexchange highly structured data between applications (machine-to-machine communication). When you are talking about programming in Java, XML is widely used.

JSON is serving as an alternative to XML because it is often smaller and easier to read. It is mostly used for transmitting data between a server and a web page. The JSON syntax is the same as in the JavaScript programming language; therefore, you can very easily convert JSON data into JavaScript objects. JSON syntax is also useful for YAML, because JSON is basically a subset of YAML. Parsing JSON files with a YAML parser is therefore very intuitive.

If you are building your first API and you are a nondeveloper, YAML would be a way to go with when choosing which data format to use. YAML is made for people who are starting to write code from scratch. XML and JSON are mostly for your programming code to be more machine readable; that is why you export YAML into one of these two formats. YAML is used in many configuration files today. Because of its similar indentation styles, YAML files resonate with people that know Python.

Common Characteristics

Data formats are the foundation of API. They define the syntax and semantics, including constraints of working with the API.

A syntax is a way to represent a specific data format in textual form. You notice that some formats use curly braces or square brackets, and others have tags marking the beginning and end of an element. In some formats, quotation marks or commas are heavily used, while others do not require them at all. But no matter which syntax is used, each of these data formats has a concept of an object. You can think of an object as a packet of information, an element that has characteristics. An object can have one or more attributes attached to it.

Many characteristics will be broken down to the key-value concept, the key and value often being separated by a colon. The key identifies a set of data and it is often positioned on the left side of the colon. The values are the actual data that you are trying to represent. Usually, the data appears on the right side of the colon.

To extract the meaning of the syntax, it is crucial that you recognize how keys and values are notated when looking at the data format. A key must be a string, while a value could be a string, a number, or a Boolean (for instance, true or false). Other values could be more complicated, containing an array or an entirely new object that represents a lot of its own data.

Another thing to notice when looking at a particular data format is the importance of whitespaces and case sensitivity. Sometimes, whitespaces and case sensitivity could be of high importance, and in others, it could carry no significance whatsoever, as you will get to know through some future examples.

One of the main points about data formats that you should bear in mind is that you can represent any kind of data in any given format.



In the figure, there are the three previously mentioned common data formats—YAML, JSON, and XML. Each of these examples provides details about a specific user, providing their name, role, and location.

You can quickly recognize that the same data is represented in all three formats, so it really comes down to two factors when considering which format to choose:

  • If the system you are working with prefers one format over the other, pick the format that the language uses.

  • If the system can support any of them, pick the format that you are more comfortable working with.

In other words, if the API that you are addressing uses one specific format or a handful of them, you will have to choose one of them. If the API supports any given format, it is up to you which one you prefer to use.

YAML

The first data format you will learn about is YAML which, as the name suggests, is not a markup language like JSON and XML. With its minimalistic format, it was more heavily weighted to be humanly writable and readable but works the same way as other data formats. In general, YAML is the most humanly readable of all the formats and at the same time is just as easy for programs to use, which is why it is gaining increasing popularity among engineers working with programmability.

---
user:
  name: john
  location:
    city: Austin
    state: TX
  roles:
    - admin
    - user

Whitespaces are significant to YAML because whitespace indentation defines a structure of a YAML file. All the data inside a particular object is at the same indentation level. In this example, the first object that is indented is "name" which is a child node of "user" . All data at the same indentation level are attributes of that same object. The next level of indentation starts at the location property, denoting an object that represents a location, with properties city and state. Typically, YAML uses indentation of two whitespaces for every newly defined object, but you can also use your preferred indentation system.

Note

Tab indentations are not allowed in YAML because they are treated differently by different tools.

In YAML, keys and values are being separated only by a colon and space, which makes it very intuitive for humans to read or write. YAML will also try to assume which data type is intended as the value, so there are no quotes necessary. If it is a value such as john, YAML will assume that it is a string; you do not have to be explicit with quotes. The same concept applies to numbers and other types of values.

Note here that no commas are ending any of the values attached to the key. YAML automatically knows when there is an end of a value. Also, intuitively, "-" (dash) syntax in YAML denotes lists. They might appear so natural to you that it is like writing a shopping list for your groceries. Put a dash with the same indentation space in front of every element to add it to the same list.

JSON

Another heavily used data format is JSON. JSON was derived from the JavaScript programming language. Because of that historical background, JavaScript code can easily convert data from a JSON file into native JavaScript objects.

JSON syntax uses curly braces, square brackets, and quotes for its data representation. Typically, the very first character in a JSON file is a curly brace defining a new object structure. Below that, other objects can be defined in the same way, starting with a name of an object in quotes following a colon and a curly bracket. Underneath, you will find all information about that object.

Note

There are also some exceptions regarding the very first character in a JSON file. You could come across some small JSON files containing only values—for instance: Hello World!, 100 or true. All three options are regular JSON documents.

With YAML, the whitespaces are important, but that is not the case with JSON. All whitespaces that you see are just for humans consuming and reading the data; they have nothing to do with how the JSON file will be consumed by an application or script. Here, you are free to choose which kind of formatting style you want to use with JSON, as long as the other syntax rules remain satisfied.

Note

The explained information is true for all the whitespaces that are not part of a value itself. In this case, the values "john" and "j o h n" would not be considered the same because the whitespace is inside quotation marks. That way, whitespace carries importance.

{
    "user": {
        "name": "john", 
        "location": {
            "city": "Austin", 
            "state": "TX"
        },
        "roles": [
            "admin",
            "user"
        ]
    }
}

You will notice that all the data in a JSON file format is presented similarly as in YAML, using a key-value notation. Every object starts and ends with a curly bracket, and inside that main object in this figure is user. That object defines all information that you would like to configure for a user. You can see here that the john user is given a name and is assigned a location and a list of roles. You will also notice that all values attributed to this user are separated by a comma. Separating values by comma is obligatory for all objects except the last one; there is no comma at the end of a list of objects in JSON code.

XML

Another data serialization format that is broadly used for interchanging information over the Internet within two machines is XML. The beginnings of XML go back to the previous century, with the first version initially defined in 1998. It is very similar to HTML; they are both markup languages, meaning that they indicate which parts of a document are present and not how the data is going to be shown in your system specifically.

For that purpose, XML code heavily uses <tags></tags> to surround elements in form as <key>value</key>. All the information about an object is defined inside the opening <tag> and </tag> with a slash to indicate a closing tag. When using tags, you have to be careful that the beginning and ending tags match up both in the name itself and the same letter case. For instance: <tag></TAG> would not work properly, whereas <tag>.</tag> or <TAG></TAG> would work perfectly. Note that it is the same tag name that only differs in case. That said, each of the tags represents a completely different element.

Usually in XML, tag names are all written in lower case.

Whitespaces can be quite important in some formats, or they carry no significance in others.

XML is a combination of both. Significant whitespaces are a part of the document content and are always preserved. A good example is the whitespaces inside the value or opening and closing tag (<t1>John Wayne</t1> is not the same as <t1>JohnWayne</t1>). Whitespaces that are mostly meant to make XML documents more humanly readable are insignificant whitespaces and are used between different tags (<t1><t2> is considered the same as <t1> <t2>).

<?xml version="1.0" encoding="UTF-8" ?>
<user>
  <name>john</name>
  <location>
    <city>Austin</city>
    <state>TX</state>
  </location>
  <roles>admin</roles>
  <roles>user</roles>
</user>

An object usually contains multiple other objects inside it, as shown in the figure. The main object is <user> that ends with </user> tag at the end of an output. It is composed of many other tags such as <name><location>, and <roles> to provide all the information needed about that specific user. An object can either contain basic information (such as a name) or more complicated data with tags being nested inside that object (such as location in this case).

XML Namespaces

By having an increasing number of XML files exchanged through the Internet, you can quickly come into a situation where two or more applications use the same tag names but represent a completely different object (from that tag) at the same time. Here, you see a conflict with systems trying to parse some information from a specific tag that uses different hierarchy than the system expects to get. Solving that issue requires the use of namespaces and prefixes.



In this figure, you can see the same <table> element in two separate XML documents. Even though the starting tag name is the same, each element represents different information, which could cause a name conflict. The upper XML code carries HTML table information, whereas the lower one carries information about a table as a piece of furniture. You can easily avoid that by defining a prefix and a namespace for each of those two elements, as shown in the right side of the figure.

A prefix is an alphabetic character or a string put before the actual tag name followed by a colon (<a:tag_name> or <b:tag_name>). That way, you are defining an exact tag name for your application to parse correctly. When you are using prefixes in XML, you also have to define namespaces for those prefixes. The name of a namespace is a Uniform Resource Identifier (URI), which provides uniquely named elements and attributes in XML documents.

Namespaces are defined with the xmlns attribute in the starting tag of an element and a syntax as follows: xmlns:prefix='URI'. A URI can be any arbitrary string as long as they are different from any other URI. It can also be a URL linking to a specific page with a definition of that namespace. However, there is no need for the URL to be accessed. The only thing that matters is that the URI uniquely represents a logical namespace name.

Similar to XML, both YAML and JSON can also use namespaces that define the syntax and semantics of a name element, and in that way avoid element name conflicts. Take a look at the example codes from each format.



In this figure, you can find the same namespace encoded in each of the formats. In general, you will see that YAML and JSON usually do not use namespaces often, as is the case with XML. However, JSON has one exception, which is the Representational State Transfer Configuration Protocol (RESTCONF). When you are using RESTCONF, it requires a namespace. RESTCONF is a subset of Network Configuration Protocol (NETCONF), which is an IETF network management protocol designed specifically for transactional-based network management. It basically allows you to configure specific network devices such as routers, switches, and so on.

Note

The namespace myapp is not a valid namespace name because it has to be in the correct URI format. This example is just for the sake of a simplified illustration.

No comments:

Post a Comment