What is: YAML – its overview, basic data types, YAML vs JSON, and PyYAML

By | 03/14/2019

YAML – is one of the most popular formats of the…

Well, actually, they don’t know the format of what…

Originally it was the «Yet Another Markup Language», later it became «YAML Ain’t Markup Language»:

Originally YAML was said to mean Yet Another Markup Language,[12] referencing its purpose as a markup languagewith the yet another construct, but it was then repurposed as YAML Ain’t Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.

In the Russian Wikipedian the “friendly” word even take in quotes – and I absolutely agree with that.

In fact – YAML is just another one data serialization type, the successor of the JSON format, but with some additional abilities.

In the recent survey in the Ukrainian DevOps CommunityYAML vs JSON” – YAML took about 90% votes.

As for me JSON still the most convenient, but YAML used in many places so have to use it.

In this post will take a closer look at YAML’s data types and a quick comparison with JSON.

YAML main principles

  • always use UTF-8 to avoid possible issues
  • never use TAB for indentation

YAML syntax validation

To check YAML syntax in Linux the yamllint can be used.

Install it:

[simterm]

$ sudo pacman -S yamllint

[/simterm]

And check file:

[simterm]

$ yamllint monitoring.yml 
monitoring.yml
  1:1       warning  missing document start "---"  (document-start)
  20:34     error    trailing spaces  (trailing-spaces)
  22:32     error    trailing spaces  (trailing-spaces)
  23:37     error    trailing spaces  (trailing-spaces)
  33:7      error    wrong indentation: expected 8 but found 6  (indentation)
  35:9      error    wrong indentation: expected 10 but found 8  (indentation)
  36:11     error    wrong indentation: expected 12 but found 10  (indentation)

[/simterm]

Although this file used by Ansible without any problems – there are still some issues in its formatting.

JSON validation

And for example – JSON documents validation from Linux console using Python’s json module:

[simterm]

$ python -m json.tool < json-example.json 
{
    "key1": "value1",
}

[/simterm]

vim plugin

There is also the vim-yaml plugin for vim.

Add to your .vimrc:

...
" https://vimawesome.com/plugin/vim-yaml-all-too-well
Plug 'avakhov/vim-yaml'

" add yaml stuffs
au! BufNewFile,BufReadPost *.{yaml,yml} set filetype=yaml foldmethod=indent
autocmd FileType yaml setlocal ts=2 sts=2 sw=2 expandtab
...

Reload config and install it:

[simterm]

:source %
:PlugInstall

[/simterm]

PyYAML

To work with YAML from Python there is the PyYAML library.

Some examples below.

YAML formatting

Comments in YAML

One of the few advantages of YAM is an ability to add comments in its files.

Comments formatting is usual – using the #.

Comment can be added in any place.

Examples:

---
# I'm comment
- name: somestring
  value1: "# I'm not a comment!"
  value: anotherstring  # another comment

Indentations

The main headache on the YAML is the indentations.

In this, in a whole file, the number of spaces (spaces – ever TABs!) must be the same.

I.e. if in one place two spaces are used – then the whole file must use two spaces.

Even more – the agreement is to use two spaces, although can be any – just has to be the same everywhere.

For example:

---
parent_key:
    key1: "value1"
    key2: "value2"
    key3: "%value3"

Will be valid form, but the next example:

---
parent_key1:
    key1: "value1"
    key2: "value2"
    key3: "%value3"

parent_key2:
  key1: "value1"
  key2: "value2"
  key3: "%value3"

Will not.

While in Python which is ofter is scolded because of the spaces dependency such formatting can be used, although will be standard’s violation:

#!/usr/bin/env python

def a():
    print("A")
    
def b():
  print("B")
  
a()
b()

Results:

[simterm]

$ python spaces.py 
A
B

[/simterm]

Single-line YAML

Besides the standard view and spaces indentation – you can use JSON-like formatting like:

---
parent_key: {key1: "value1", key2: "value2"}

Literal Block Scalar

YAML supports the ability to add multiline literal block scalars and has three types of it: the common one, using the “|” and the “>“.

The common format looks like:

---
string: This
    is
    some text
    without newlines

Results in Python console:

[simterm]

>>> yaml.load(open('yaml-example.yml'))
{'string': 'This is some text without newlines'}

[/simterm]

Using the  | (Literal style) – will save all newlines and closing spaces:

---
string: |
    This
    is
    some text
    with newlines

Result is:

[simterm]

>>> yaml.load(open('yaml-example.yml'))
{'string': 'This\nis\nsome text\nwith newlines\n'}

[/simterm]

And using the > (Folded style):

---
string: >
    This
    is
    some text
    without newlines

Will return whole text in one line + closing newline symbol:

[simterm]

>>> yaml.load(open('yaml-example.yml'))
{'string': 'This is some text without newlines\n'}

[/simterm]

But still, you have to adhere to the same spaces formatting.

Also, check the great answer on the StackOverflow here>>>:

There are 5 6 NINE (or 63*, depending how you count) different ways to write multi-line strings in YAML.

YAML basic data formats

YAML uses three main data formats::

  • scalars: the simplest in a key:value view
  • list/sequence: data ordered by indexes
  • dictionary/mapping: similar to scalars but can contain nested data including other data types

Scalars

Basic data type – scalars, just a key:value as programming variables:

---
key1: "value1"
key2: "value2"

Using quotes for values recommended to avoid possible issues with special characters:

[simterm]

$ cat example.yml 
---
key1: "value1"
key2: "value2"
key3: %value3

[/simterm]

Validate it:

[simterm]

$ yamllint example.yml
example.yml
  4:7       error    syntax error: found character '%' that cannot start any token

[/simterm]

Still, you can skip quotes for boolean true/false values and for integer types.

Scalars – YAML vs JSON

For example – scalar in YAML:

---
key: "value"

And JSON:

{
    "key": "value"
}
Python

The YAML-scalars in Python example:

[simterm]

>>> import yaml
>>> yaml.load("""
... key: "value"
... """)
{'key': 'value'}

[/simterm]

Or from a file:

[simterm]

>>> import yaml
>>> yaml.load(open('yaml-example.yml'))
{'key': 'value'}

[/simterm]


Lists in YAML

Lists, sequences, collections – represents a collection of an ordered data where each element can be accessed by its index.

For example:

# SIMPLE LIST
- element1
- element2
Nested lists in YAML

Similarly to the examples above – lists can include nested lists:

# SIMPLE LIST
- element1
- element2

# nested list
-
  - element1

Or can be a named list:

---
itemname:
  - valuename

In doing so lists can also contain scalars or dictionaries:

---
itemname:
  - valuename
  - scalar: "value"
  - dict: {item1: "value1", item2: "value2"}
Lists – YAML vs JSON

List in YAML:

---
- item1
- item2
- item3

List in JSON:

[
    "item1",
    "item2",
    "item3"
]

Nested list in YAML:

---
- item1
- item2
- item3
-
  - nested1

Nested list in JSON:

[
    "item1",
    "item2",
    "item3",
    [
        "nested1"
    ]
]
Python and YAML-lists

Here are all similar to scalar’s example:

[simterm]

>>> yaml.load(open('yaml-example.yml'))
['item1', 'item2', 'item3', ['nested1']]

>>> for i in yaml.load(open('yaml-example.yml')):
...   print(i)
... 
item1
item2
item3
['nested1']

[/simterm]


Dictionaries

Dictionaries, also called mappings is similar to scalars type and contains a key:value data but unlike scalars which are basic type – dictionary can include nested elements, for example, a list:

---
key1: "value1"
key2:
  - value2
  - value3

Or another nested dictionary:

---
key1: "value1"
key2:
  - value2
  - value3
    
key3:
  key4: "value4"
  key5: "value5"
  key6:
    key7: "value7"
Dictionary – JSON vs YAML

Dictionary in YAML:

---
key1: "value1"
key2:
  - value2
  - value3

Dictionary in JSON:

{
    "key1": "value1",
    "key2": [
        "value2",
        "value3"
    ]
}
Python

[simterm]

>>> yaml.load(open('yaml-example.yml'))
{'key1': 'value1', 'key2': ['value2', 'value3']}

>>> type(yaml.load(open('yaml-example.yml')))
<class 'dict'>

[/simterm]

And all usual to Python’s dictionaries operations are supported:

[simterm]

>>> dict = yaml.load(open('yaml-example.yml'))
>>> type(dict)
<class 'dict'>
>>> dict.update({'key3':'value3'})
>>> print(dict)
{'key1': 'value1', 'key2': ['value2', 'value3'], 'key3': 'value3'}

[/simterm]


In general – that’s all.

Check also those pages for more details: