Tag node syntax for VyOS 2.0
OpenPublic

Asked by dmbaturin on Jan 5 2017, 9:56 AM.

Shall we adopt the new (uniform) tag node syntax?

The old syntax: "ethernet eth0" or "vif 99" on the same line, "ethernet" and "vif" words are repeated if multiple nodes are present.

interfaces {
  ethernet eth0 {
    vif 99 {
      address 192.0.2.1/24
    }
    vif 101 {
      address 203.0.113.1/24
    }
  }
}

The new syntax: "eth0", "eth1", ... and "99", "101", ... are inside the "ethernet" or "vif" nodes respectively, inside the curly braces. The words "ethernet" or "vif" are not repeated.

interfaces {
  ethernet {
    eth0 {
      vif {
        99 {
          address 192.0.2.1/24;
        }
      101 {
        address 203.0.113.1/24;
      }
  }
}

Pros and cons:

  • The new syntax is more uniform
  • The old syntax is more concise, however
  • The new syntax is substantially easier to implement in the parser and renderer
  • The old syntax may be easier on eyes when reading nodes with multiple tag nodes inside

The main difficulty with the old syntax is that the parser gets to keep the state. "ethernet eth0" is really two nodes (eth0 being a child of ethernet), and "ethernet eth0; ethernet eth1" is three nodes, so the parser gets to track if "ethernet" node was already seen earlier and add a child to it instead of creating a new one. That's quite annoying to implement, though if people want it back, it's doable.

Also, while this change may look cosmetic, it's a big change for the grammar, and we will not be able to change it in a forward-compatible manner after stable release, so we'd better decide on it as early as possible.

Update: Added support for the old style syntax. T245 describes an edge case that requires the same code I wanted to avoid, so there are no technical reasons to not support the old syntax actually, only aesthetic ones.
Notes: There are common misconceptions about the issue that stem from my failure to communicate the details and from limitations of the old system.

  • This syntax has no effect on user's ability to do "edit interfaces ethernet" and similar, it's just a formatting issue
  • If we support the old syntax, it doesn't remove support for the new one since the old one is a strict superset of the new one, and the parser doesn't know if something is supposed to be a tag node or not, this distinction only becomes important at config valiation which happens after parsing is complete
  • The old system (VyOS 1.1.x/1.2.x) also allows "ethernet { eth0" and similar in config loading, for the same reasons

So, right now the real question if we should use old or new syntax in the config output ("save" and "show" that is).

dmbaturin created this poll.Jan 5 2017, 9:56 AM
dmbaturin added projects: VyConf, VyOS 2.0.x.
dsteinkopf added a subscriber: dsteinkopf.EditedJan 5 2017, 11:04 AM

Maybe it's a good idea to 1. use the new syntax but 2. generate less line breaks. e.g.

interfaces {
  ethernet { eth0 {
      vif {
          99 { address 192.0.2.1/24; }
          101 { address 203.0.113.1/24; }
      }
  } }
}

In this case the new syntax would be fine for me. (Details open for discussion.)

@dsteinkopf Not sure, we'll have to devise some rules regarding line breaks, and past some number of leaf nodes inside we are back to the original aesthetic issue (and then there can be non-leaf nodes inside too!
On a fresh look today, I'm convinced that the old tag node formatting is aesthetically superior, so myself as a user of my own project I'm probably voting no, though as a developer I want to see how many people also think it's worth it.

Merijn added a subscriber: Merijn.Jan 5 2017, 12:35 PM

A pro for me would be that i can do 'edit interfaces ethernet eth0 vif' and work with all virtual interfaces.

@Merijn Now that you remind me of it, I think "edit interfaces tunnel; copy tun10 to tun11" or similar should be possible regardless of the config syntax. No matter how it looks in the config, internally "tunnel" is a node with children "tun0", "tun1" and so on, and there's no reason why it shouldn't be possible to use it as edit level.

tmartinson added a subscriber: tmartinson.EditedJan 5 2017, 12:51 PM

Maybe something like this? We already know that it is an ethernet interface by the fact that it is eth0. And by adding the "vlan-id" portion we get a newer style of configuration but keep the read-ability of the configuration stanza.

interfaces {
           eth0 {
             vif {
                vlan-id 99 {
                   address 192.168.2.1/24;
                   }
                vlan-id 101 {
                   address 10.10.10.1/30;
                  }
           }
     }
}
dmbaturin added a comment.EditedJan 5 2017, 12:59 PM

@tmartinson No, "vlan-id 99" is the old style. And, at that stage we don't know if it's ethernet or not.

I guess I could have confused someone with this. This is specifically about the config grammar, not exactly about ethernet. The question is whether tag nodes (nodes that can have children with variable names) should be rendered "foo bar0 { baz "quux quux"; }" or "foo { bar0 { baz "quux quux"; }".

In the new style your example would be:

interfaces {
           eth0 {
             vif {
               vlan-id {
                  99 {
                     address 192.168.2.1/24;
                     }
                  101 {
                     address 10.10.10.1/30;
                    }
             }
       }
  }
}

I was thinking that the variable would actually be "vlan-id 99". That was written simply to make it easier to read. But if it will be the top of a node, then we end up with vif, vlan-id. Which is redundant, redundant. In that case I would drop the "vlan-id" portion all together. It is only there for esthetics.

Merijn added a comment.EditedJan 5 2017, 1:19 PM

In the blog post #7 i liked the address [ 192.168.2.1/24 10.10.10.1/30 ]; part. But since i work most of the time with mixed JunOS and Vyos environments a mostly the same syntax would be very nice :-)
However JunOS would be:

interfaces {
     eth0 {
          unit 99 {
               address 192.168.2.1/24;
               address 10.10.10.1/30;
          }
     }
}

So in JunOS there is also a lot of room for improvement.

rps added a subscriber: rps.EditedJan 5 2017, 1:21 PM

The XORP configuration syntax (which Vyatta initially built upon) solves the parsing issue with the simple introduction of a ":" as a delimiter between keys and values.

I'm not sure why Vyatta omitted it when they moved from XORP to the BASH-based CLI, but it created the problem we're trying to fix.

I think the least disruptive way to make the change is to introduce the ":" to the current configuration syntax and not make drastic changes.

interfaces {
    ethernet eth0 {
	    description: "Outside"
		address 100.64.255.2 {
		    prefix-length: 24
		}
		vif 10 {
		    address 100.64.10.1 {
			    prefix-length: 24
			}

		}
	}
}

EDIT: Example http://www.xorp.org/getting_started.html#configuring

@Merijn I'm still not sure why JunOS has that "unit" thing. To me it looks redundant, redundant ©. Though what we are discussing is "unit 0" vs "unit { 0" grammatic distinction, rather than specific syntax of ethernet interfaces.

@rps Oh, leaf nodes are fine. The "IDENTIFIER (STRING|IDENTIFIER) SEMICOLON" grammar turned out rather nice and unambiguous. I don't think colons would make things more readable either. It's the tag nodes we are talking about, your examples uses the old style "ethernet eth0" vs proposed "ethernet { eth0".

Merijn added a comment.Jan 5 2017, 1:27 PM

Well plain JSON would also be an option then :-)

{
	"interfaces": {
		"ethernet": {
			"eth0": {
				"description": "Outside",
				"address": [
					"192.168.1.1/24",
					"192.168.2.1/24"
				],
				"vif": {
					"10": {
						"address": "100.64.10.1/24"
					}
				}
			}
		}
	}
}
Merijn added a comment.Jan 5 2017, 1:29 PM

@dmbaturin I understand that the discussion is "unit 0" vs "unit { 0", what i meant was that i could be an option to keep following the JunOS style as much as possible to maybe enable more interoperability.

rps added a comment.Jan 5 2017, 1:44 PM

From a parsing perspective the only challenge tag nodes present is that you can't easily distinguish between "key value" and "key tag" without context. "key" and "key tag value" are fine. Using a ":" you get "key: value" vs "key tag" which removes the ambiguity.

Major syntax changes can be very disruptive to people and shouldn't be taken lightly. Perhaps I'm being more conservative here.

Merijn added a comment.Jan 5 2017, 1:57 PM

@rps this distinction also seems to be easy in the original proposed solution by @dmbaturin because key value pairs are not followed by '{' and the rest is.

@rps No, that's not the biggest challenge. Semicolon at the end of leaf nodes makes them unambiguous enough and easy to tell from tag nodes (this is especially bad with valueless nodes by the way, think "disable", colon wouldn't help there, but semicolon at the end does the job). The biggest challenge is that with "ethernet eth0" the parser must be fully stateful and capable of tracking which parent nodes it's already seen. "eth0", "eth1" etc. are really children of the same node called "ethernet", but in the config they appear separately. Consider this unusual but logically valid config:

interfaces {
  tunnel tun0 {
    ...
  }
  ethernet eth0 {
    ...
  }
  tunnel tun2 {
    ...
  }

The parser sees "interfaces", so it creates a node called "interfaces" in the tree. Then it goes to "tunnel tun0", recognizes it as a tag node, creates node called "tunnel" under "interfaces", then adds "tun0" node to the "tunnel" node. Then it repeats the procedure for "ethernet eth0". The next interface is tunnel again, and "tun2" should be added to the "tunnel" node it created earlier. It means we have to pass the nodes around and check if the parent node was already created and add to it, or create it first. If every level in the config corresponds to a single node, we can simply have every production return a node and call it a day.
While that config example exaggerates the issue for demonstration, configs where all tag nodes of the same parent are grouped sorted are no better really.

Also, don't forget to vote. This is a poll with a discussion attached to it, not the other way around. ;)

rps added a comment.EditedJan 5 2017, 2:44 PM

I haven't voted yet because I haven't decided ... It's a big change.

So the problem you're trying to resolve is in representing "ethernet eth0" as ["ethernet"]["eth0"] vs. ["ethernet eth0"]. What's the advantage of treating a node.tag as node.child instead of just the node identifier?

If you have something like "address 1.2.3.4" you now encounter a choice where you have to decide if you represent it as ["address"] => 1.2.3.4 or ["address"][0] => 1.2.3.4 because you have no way of knowing if multiple values are valid without the context of a schema reference. On the other hand if you represent it as ["address 1.2.3.4"] => true; then additional addresses are not a problem. The presence of the address keyword in this case (or things like "network-group") allows for some easier validation to be performed (e.g. does "network-group EXAMPLE" actually exist.

I've very recently written zero-knowledge parser that use both approaches and haven't decided on which one is better myself.

Elimination of node.tag would certainly make it easier but would also have a readability cost and that kind of syntax change would mean anyone who's done scripting against Vyatta or VyOS or EdgeOS would need to do some major refactoring in response (which is my biggest concern).

If I were doing things greenfield I might opt for something along the lines of:

interfaces {
    eth0 {
        ipv4 {
            address: 1.2.3.4/24;
            firewall {
                local: OUTSIDE-LOCAL;
                in: OUTSIDE-IN;
                out: OUTSIDE-OUT;
            }    
        }
        ipv6 {
            allow-autonomous: enable;
        }
    }
    eth0.10 {
...
    }
}

But to make that work nicely, there would need to be major schema changes; not just shuffling things around I think.

@rps An serious issue with "interfaces { eth0" is that when there is no parent subtree of all ethernet interfaces specifically, we don't know which script to call when something in "eth0" changes. We'd have to have one big script that handles the whole "interfaces" subtree, which is very problematic when it comes to adding new interface types. If eth* interfaces are children of the "ethernet" node and tun* interfaces are children of the "tunnel" node, it's easy to attach ethernet script to the "ethernet" node and "tunnel" script to the "tunnel" node, if we want to add "openvpn" later, we won't have to modify that large script to accomodate it

Making "ethernet eth0" a single identifier would also create a whole new type of "specialness" both in the config and reference trees. The reference tree largely mirrors a config tree with every possible path set, with exception for tag nodes, in the reference tree "duplex" or "speed" are children of "ethernet", and "ethernet" has a "name constraint" attached to it so when a path like "interfaces ethernet eth0 speed auto" is being validated, the validation function checks that "ethernet" is a tag node and checks "eth0" against a regex, then proceeds to check if "ethernet" has "speed" child and check "auto" against the "value constraint" regex. If they have no common parent other than "interfaces", then the constraint must be attached to "interfaces", which means adding new interface type requires modifying the one big schema too. [On a side note, while I'm not fond of that idea, the approach with parent "ethernet" node makes it possible to introduce custom interface names, since checking if something is ethernet or not becomes trivial]
Likewise, in the config API, getting a list of all ethernet interfaces or rules in a firewall ruleset would be no longer as simple as returning the list of children, it's a regex search. It also makes it impossible to use "interfaces ethernet" as an edit level, like @Merijn mentioned. It also means you cannot check if say "interfaces tunnels" path exists in the config because there is no such node. In other words, this decision would have a whole bunch of unfortunate consequences.

Any change that imparts simplicity for the coding ahead is worthwhile. Time saved in the parser's reduced complexity can be spent in other ways.

The fact that I personally find this format more intuitive is a bonus. I realize that it represents a change from what people are currently used to, but to a new user, this seems an easier scheme to grasp, and having it apply "universally" makes a great deal of sense.

systo added a subscriber: systo.EditedJan 8 2017, 8:57 AM

As an end user, I just keep coming back to the verbosity of the syntax, and the divergence from all the other established command syntax in this space. VyOS doesn't have the following to do it differently, as it adds another barrier to adoption. Its a subtle change, but it has a long reach, especially when luring former vyatta or EdgeOS converts that want to roll-their-own, vs buy MIPS hardware. While I understand it may save coding time in the end, I'm trying to avoid the verbosity that is pfsense, and awall/shorewall. I bet if you asked a room of non-vyos engineers, they would prefer the first syntax with a much higher percentage, but alas I digress.

@systo Just to make sure you are looking at it the right way, in the large it's actually less verbose than old syntax. The vif may not be the best example but firewall would make it apparent:

firewall {
  name Foo {
    rule 10 {
      destination port 22
      action accept
    }
    rule 20 {
      destination port 23
      action accept
    }
    rule 30 {
      destination port 80
      action accept
    }
    rule 40 {
      destination port 443
      action accept
    }
  }
}

vs.

firewall {
  name {
    Foo {
      rule {
        10 {
           action accept
           destination port 22
       }
        20 {
           action accept
           destination port 23
       }
        30 {
           action accept
           destination port 80
       }
        10 {
           action accept
           destination port 443
       }
  }
}

My aesthetic concern is that reading a 100 rule firewall could be harder to follow when the rules are not marked with the word "rule" (or whatever word we choose), what can be visual clutter for one person is an important navigation aid for another. I'm not sure myself, I'll probably need to live with it to decide.

Also, could you elaborate on pfSense? Last I checked it only had a GUI.

Asking a broader audience probably could be a good idea, by the way. The only question is how you get end users to vote when you can't get them to read the blog even. ;)

If we get complains about this syntax after public alphas appear, we can change it, after stable release it will be set in stone.

In the example above, I vote that the first example where name Foo and rule 10 are on the same line. It is much easier to read, and shortens up the output on the display. Sometimes with long configurations, it is easier when you can see more information on the same screen without scrolling.

rps added a comment.Jan 8 2017, 3:08 PM

I keep coming back to a sense that dramatic syntax changes are very damaging and disruptive to users. My fear is that we'll be spending years explaining to people that they're looking at old documentation or examples and that they don't have their curly braces in the right place. Or that we'll alienate a segment of our user base that is averse to change.

With that in mind my question is really this:

What is the development impact of not making the change? Are we talking about the ability to save a few lines of code in the config encode/decode functions and have a "cleaner" more elegant solution, or are we talking about trying to solve a problem that has design implications throughout the system? My sense is the former, but if I'm wrong about that I need to know.

@tmartinson Well, you should change your vote then (votes are not final here, for the better I guess).

And everyone else who's concerned, just in case I confused the hell out of people with this question, "yes" means "I want it to be 'firewall { name { Foo { rule { 10'" and "no" means "I want it to be 'firewall { name Foo { rule 10'".

@rps Well, it's more than a few lines, though it's not some non-trivial algorithmic task either. Old documentation is going to be completely useless anyway since a lot of cruft cannot be cleaned up without complete syntax redesign, so I guess we can judge the options on their own merits. Aesthetically, I'm for "rule 10" on the same line.

systo added a comment.Jan 8 2017, 6:12 PM

@dmbaturin, Im with you on the aesthetics, and the readability. In the firewall ruleset example I still feel that the first is easier read than the second. Are we talking hundreds of lines to parse the former vs the latter? It seems like the later, across a whole config would at 10-20 lines if not more depending on the complexity. I for one am interested in seeing as much of the config on one screen, vs needlessly needing to scroll. As for your Q on pfSense, I've had to edit the xml configuration file by hand based on how pfSense sorts VLANs based on their add date vs numerical value.

when I look at other options in this space outside of JunOS and IOS, having them on the same line seems to be the one common factor across all.
zebra -- http://oob.freeshell.org/nzwireless/routing.html
bird -- http://bird.network.cz/?get_doc&f=bird-3.html
securityrouter.org -- http://securityrouter.org/wiki/Configuration_file
bsdrp -- https://bsdrp.net/documentation/examples/simple_bgp-rip-ospf_lab

rps added a comment.Jan 8 2017, 6:46 PM

With respect to the concerns I mentioned above, I've voted no.

I think creating the code for import and export of JSON or XML representation of the configuration in more of the unified format being proposed would be a way to provide access to the configuration to other applications (e.g. future remote API) without changing the user-facing syntax and this would be my preference. In terms of human readability the current syntax is much easier to process. Humans don't deal with nesting well.

dmbaturin updated the description for this poll. (Show Details)Jan 9 2017, 5:34 AM
adestis added a subscriber: adestis.Jan 9 2017, 7:13 AM

The suggestion from @rps (XORP style) seems to be the best way from my point of view:
https://phabricator.vyos.net/V3#51

mickvav added a subscriber: mickvav.Jan 9 2017, 7:54 AM

Well, my vote is "No", because if for small configs it's OK to have just intent-expressed syntax, if you have huge one, e.g. several pages - if you omit prefix before, say, 55, you will have to guess from context, if it is a vlan or preffix list entry, or VRRP group or whatever.

Anyway, if the default behavior will change we need to maintain a clean and simple migration path for old configs. May be a good solution is to have something like

show configuration verbose

which shows all ommited prefixes, default values, etc.

I'm a "NO" as a network engineer with a bunch of different brands already XORP style, or as close to JunOS as you can get it the best. Yet another (Similar) config style would be way too much frustration for most of my peers to even consider.

Also, we have to be able to produce configs that are diff-able in config tools, and adding extra line breaks is hard to infer context when all we get from the tools is a diff of what changed.