PIGBARF: Parser Implementation Generator for BinARy Formats

A PIGBARF specification is an XML file with a <spec> element as its root. The <spec> element contains subelements that describe the binary data format and how it is laid byte-by-byte (or perhaps bit-by-bit). The following is a detailed description of the subelements in a PIGBARF specification.

The headers tag
The types tag
The rules tag
The byte-order tag
The bit-order tag

The <headers> tag:

The headers tag takes no attributes and may contain only text or CDATA. It contains raw uninterpreted Java code that is meant to be included at the top of the generated parser code. This will generally include package declarations, import statements, and non-public class definitions. The concanenation of all text and CDATA (in document order) will define the text of the Java code which will be inserted at the top of the generated file, completely unmodified.

Example:

          <headers>
             package com.awesome.very;

             import java.util.*;
          </headers>

The <types> tag:

The types tag takes no attributes and may contain only text and CDATA. This text is used to define the types of each of the nonterminal symbols defined inside a rules tag. A nonterminal's type may either be simple or functional. A simple type is used for nonterminals that take no parameters and return a value. A functional type is for nonterminals that take 1 or more parameters and return a value. The type of a rule in the types tag must match with the definition of the tag itself in the rules tag. That is, if there is a nonterminal in the rules tag that takes 3 formal parameters, then the matching entry in the types tag must have a functional type with 3 parameter types listed. The syntax of the text in the types tag is as follows:

Syntax:

types ::= typedecl+

typedecl ::= type ':' rulename ( ',' rulename )* ';'

type ::= simpletype | functiontype

simpletype ::= a string of uninterpreted Java code that defines a valid Java type (allows generics)

functiontype ::= '(' simpletype ( ',' simpletype )* ')' simpletype

Example:

          <types>
             int : magicnumber;
             (int) String : list_of_strings;
          </types>

The <rules> tag:

This tag defines the "grammar" of the binary data format, and contains definitions of nonterminal rules for parsing it. The rules has one required attribute, which is the name of the top-level rule that will begin the parsing. The top-level rule MUST have a simple type (i.e. cannot declare any parameters). The rules tag contains only text and CDATA, the syntax of which is described below.

Syntax:

rules ::= rule+

rule ::= ident [ formals ] '::=' rhs ';;'

rhs ::= option ( '|' option )*

option ::= ruleitem+

formals ::= '(' ident (',' ident)* ')'

ruleitem ::= value | predicate | code | align | eof

eof ::= 'EOF'

align ::= 'align' '(' code ')'

value ::= nonterminal | primitive | list | switch | subparse | expression

expression ::= '[:[' typestr ']:[' javacode ']:]'

typestr ::= uninterpreted Java string that specifies a type (anything without ']:[')

subparse ::= 'subparse' '(' ( inputpair ';' typepair | typepair ';' inputpair ) ')'

inputpair ::= 'input' '=' code

switch ::= 'switch' code '{' case+ [ default_case ] '}'

case ::= code '->' value ';;'

default_case ::= 'default' '->' value ';;'

list ::= 'list' '(' ( lengthpair ';' typepair | typepair ';' lengthpair ) ')'

nonterminal ::= ident [ actuals ]

actuals ::= '(' code (';' code)* ')'

primitive ::=

namedargs ::= '(' keyvaluepair (';' keyvaluepair)* ')'

keyvaluepair ::= lengthpair | 'value' '=' code | 'cast' '=' ident

lengthpair ::= 'length' '=' code

typepair ::= 'type' '=' value

predicate ::= '{{' javapred '}}'

javapred ::= uninterpreted Java boolean expression (anything without '}}')

code ::= '[[' javacode ']]'

javacode ::= uninterpreted Java code (anything without ']]')

Details about the usage of the various parts of the rules tag can be found here.

The <byte-order> tag:

This tag defines the byte-ordering to be used for parsing multi-byte values. The two options are big-endian and little-endian. In a big-endian byte ordering, a multi-byte value will use the first byte read from the stream as its most-significant byte. In a little-endian byte ordering, a multi-byte value will use the first byte read from the stream as its least-significant byte. This tag defines the byte ordering through its single required attribute, value, which can have the value "big" or "little". This tag does not allow any contents.

Example:

          <byte-order value="big"/>
             or
          <byte-order value="little"/>

The <bit-order> tag:

Much like the byte-order tag, this tag defines the order in which bits are used when reading a multi-bit value. This tag only affects the result of the bits primitive value item. If the bit ordering is set to big-endian, then the first bit read will be the most significant bit used in the result. If the bit ordering is little-endian, then the first bit read will be the least significant bit used in the result. To be very clear, this tag only affects the order in which bits are packaged into a multi-bit value, it DOES NOT affect the order in which bits are read from the stream. The order in which bits are read from the stream always proceeds within a byte by progressing from less-significant bits to more-significant bits. For example, if you had the two bytes [240, 202] in the stream, then the next 16 bits in order would be: 0 0 0 0 1 1 1 1 0 1 0 1 0 0 1 1. This tag defines the bit ordering through its single required attribute, value, which can have the value "big" or "little". This tag does not allow any contents.

Example:

          <bit-order value="big"/>
             or
          <bit-order value="little"/>

Specification

The <headers> tag:

The <types> tag:

The <rules> tag:

The <byte-order> tag:

The <bit-order> tag: