Yet Another CPAN Grep

YATT-Lite/Lite/LRXML/Syntax.pod

=encoding utf-8

=head1 NAME

YATT::Lite::LRXML::Syntax - Loose but Recursive XML (LRXML) format.

=for code perl

=head1 SYNOPSIS

  require YATT::Lite::LRXML;
  my $container = YATT::Lite::LRXML->load_from(string => <<'END');
  <!yatt:args x y>
  <h2>&yatt:x;</h2>
  &yatt:y;

  <!yatt:widget foo id x>
  <div id="&yatt:id;">
    &yatt:x;
  </div>
  END

=head1 DESCRIPTION

Loose but Recursive XML (B<LRXML>), which I'm defining here,
is an XML-like template format. LRXML is first used in
my template engine L<YATT> and then extended in
my latest template engine L<YATT::Lite>.

LRXML format consists of B<3 layers> of syntax definitions
which are L<"LRXML multipart container"|/LRXML-multipart-container>
(or simply I<container>),
L<"LRXML template"|/LRXML-template> (I<template>)
and L<"LRXML entity reference"|/LRXML-entity-reference> (I<entref>).
A container can carry multiple parts.
Each part can have a boundary (header) and it can carry meta information
(usually used as a declaration) for the body of the part.
Each part can be a template or other type of text payload.
Entref can appear in templates and other text payload.

LRXML format only defines syntax and doesn't touch semantics,
like S-expression in Lisp.
Actually, the current implementation of L<LRXML parser|YATT::Lite::LRXML>
determines the types of each part by (predefined) I<declaration keywords>
(such as I<"widget">, I<"page">, I<"action">...),
but the declaration keywords are B<not> part of this LRXML format specification.
It is opened for each user of LRXML format.

=head2 XXX: Brief introduction of LRXML


=head1 FORMAT SPECIFICATION
X<FORMAT> X<SYNTAX>

=head2 Syntax Notation (ABNF with negative-match)
X<syntax-notation> X<BNF>

In this document, I (roughly) use L<ABNF|https://tools.ietf.org/html/rfc5234>,
with some modifications/extensions.

=over 4

=item C<[..]> means a character set, like regexp in perl5.

In original ABNF, C<[..]> means optional element.

=item The operator "C<?>" is equivalent of C<*1> and indicates I<optional element>.

For optional element, I chose C<< ?<elem> >> instead of C<< [<elem>] >>.

=item The operator "C< ¬ >" preceding an element indicates I<negative-match>.

If an element is written like:

   ¬ elem

then this pattern matches I<longest> possible character sequence
which do not match C<elem>. This operator helps defining customizable namespace.

=item Rule can take parameters.

If left-hand-side of a rule definition consists of two or more words,
it is a parametric rule. Parametric rule is used like C<< <rule Param> >>.

   group C          =  *term C

   ...other rule... =   <group ")">


=back

=head3 Customizable namespace qualifier

In LRXML, every top-level constructs are marked by I<namespace qualifier>
(or simply I<namespace>).
Namespace can be customized to arbitrary set of words.
For simplicity, in this document, I put a "sample" definition of
customizable namespace rule C<CNS> like:

  CNS             = ("yatt")

But every implementation of LRXML parser should allow overriding this rule like
following instead:

  CNS             = ("yatt" / "js" / "perl")

=head2 BNF of LRXML multipart container
X<LRXML-multipart-container>

  lrxml-container = ?(lrxml-payload) *( lrxml-boundary lrxml-payload
                                      / lrxml-comment )

  lrxml-boundary  = "<!" CNS ":" NSNAME decl-attlist ">" EOL

  lrxml-comment   = "<!--#" CNS *comment-payload "-->"

  lrxml-payload   = ¬("<!" (CNS ":" / "#" CNS))

  decl-attlist    = *(1*WS / inline-comment / att-pair / decl-macro)

  inline-comment  = "--" comment-payload "--"

  comment-payload = *([^-] / "-" [^-])

  decl-macro      = "%" NAME *[0-9A-Za-z_:\.\-=\[\]\{\}\(,\)] ";"

  att-pair        = ?(NSNAME "=") att-value

  att-value       = squoted-att / dquoted-att / nested-att / bare-att

  squoted-att     = ['] *[^'] [']

  dquoted-att     = ["] *[^"] ["]

  nested-att      = '[' decl-attlist ']'

  bare-att        = 1*[^'"\[\]\ \t\n<>/=]

  NSNAME          = NAME *(":" NAME)

  NAME            = 1*[0-9A-Za-z_]

  WS              = [\ \t\n]

  EOL             = ?[\r] [\n]


Some notes on current spec and future changes:

=over 4

=item NAME may be allowed to contain unicode word.
X<unicode-name>

In current YATT::Lite, C<NAME> can cotain C<\w> in perl unicode semantics.

=back

=head2 BNF of LRXML template syntax
X<LRXML-template>.

  lrxml-template   = ?(template-payload) *( (template-tag / lrxml-entref )
                                           ?(template-payload) )

  template-payload = ¬( tag-leader / ent-leader )

  tag-leader       = "<" ( CNS ":"
                         / "?" CNS
                         )

  ent-leader       = "&" ( CNS (":" / lcmsg )
                         / special-entity
                         )

  template-tag     = element / pi

  element          = "<" (single-elem / open-tag / close-tag) ">"

  pi               = "<?" CNS ?NSNAME pi-payload "?>"

  single-elem      = CNS NSNAME elem-attlist "/"

  open-tag         = CNS NSNAME elem-attlist

  close-tag        =  "/" CNS NSNAME *WS

  elem-attlist     = *(1*WS / inline-comment / att-pair)

  pi-payload       = *([^?] / "?" [^>])

=head2 BNF of LRXML entity reference syntax
X<LRXML-entity-reference> X<LRXML-entref>

  lrxml-entref     = "&" ( CNS (pipeline / lcmsg)
                         / special-entity "(" <group ")">
                         )
                     ";"

  pipeline         = 1*( ":" NAME ?( "(" <group ")">)
                       / "[" <group "]">
                       / "{" <group "}">
                       )

  group CLO        = *ent-term CLO

  ent-term         = ( ","
                     / ( etext / pipeline ) ?[,:]
                     )

  etext            = etext-head *etext-body

  etext-head       = ( ETEXT *( ETEXT / ":" )
                     / paren-quote
                     )

  etext-body       = ( ETEXT *( ETEXT / ":" )
                     / paren-quote
                     / etext-any-group
                     )

  etext-any-group  = ( "(" <etext-group ")">
                     / "{" <etext-group "}">
                     / "[" <etext-group "]">
                     )

  etext-group CLO  = *( ETEXT / [:,] ) *etext-any-group CLO

  paren-quote      = "(" *( [^()] / paren-quote ) ")"

  lcmsg            = lcmsg-open / lcmsg-sep / lcmsg-close

  lcmsg-open       = ?("#" NAME) 2*"["

  lcmsg-sep        = 2*"|"

  lcmsg-close      = 2*"]"

  special-entity   = SPECIAL_ENTNAME

  ETEXT            = [^\ \t\n,;:(){}\[\]]

=head3 Special entity name

I<Special entity> is another customizable syntax element.
For example, it is usually defined like:

  SPECIAL_ENTNAME  = ("HTML")

And then you can write C<&HTML(:var);>.

But every implementation of LRXML parser should allow overriding this rule like
following instead:

  SPECIAL_ENTNAME  = ("HTML" / "JSON" / "DUMP")

=head1 AUTHOR

"KOBAYASI, Hiroaki" <hkoba@cpan.org>

=head1 LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Maintained by Kenichi Ishigaki <ishigaki@cpan.org>. If you find anything, submit it on GitHub.