Yet Another CPAN Grep

JSON-Relaxed/lib/JSON/Relaxed.pod

=head1 NAME

JSON::Relaxed -- An extension of JSON that allows for better human-readability.

=head1 TAKEOVER

New maintainer,

The minimal perl requirement is raised to 5.26 for future features. If
this fails the takeover will be cancelled and new features will be
implemented in a different module.

=head1 SYNOPSIS

 my ($rjson, $hash, $parser);
 
 # raw RJSON code
 $rjson = <<'(RAW)';
 /* Javascript-like comments are allowed */
 {
   // single or double quotes allowed
   a : 'Larry',
   b : "Curly",
   
   // nested structures allowed like in JSON
   c: [
      {a:1, b:2},
   ],
   
   // like Perl, trailing commas are allowed
   d: "more stuff",
 }
 (RAW)
 
 # subroutine parsing
 $hash = from_rjson($rjson);
 
 # object-oriented parsing
 $parser = JSON::Relaxed::Parser->new();
 $hash = $parser->parse($rjson);


=head1 INSTALLATION

JSON::Relaxed can be installed with the usual routine:

 perl Makefile.PL
 make
 make test
 make install

=head1 DESCRIPTION

JSON::Relaxed is a lightweight parser and serializer for an extension of JSON
called Relaxed JSON (RJSON).  The intent of RJSON is to provide a format that
is more human-readable and human-editable than JSON. Most notably, RJSON allows
the use of JavaScript-like comments. By doing so, configuration files and other
human-edited files can include comments to indicate the intention of each
configuration.

JSON::Relaxed is currently only a parser that reads in RJSON code and produces
a data structure. JSON::Relaxed does not currently encode data structures into
JSON/RJSON. That feature is planned.

=head2 Why Relaxed JSON?

There's been increasing support for the idea of expanding JSON to improve
human-readability.  "Relaxed" JSON is a term that has been used to describe a
JSON-ish format that has some features that JSON doesn't.  Although there isn't
yet any kind of official specification, descriptions of Relaxed JSON generally
include the following extensions to JSON:

=over 4

=item * comments

RJSON supports JavaScript-like comments:

 /* inline comments */
 // line-based comments

=item * trailing commas

Like Perl, RJSON allows treats commas as separators.  If nothing is before,
after, or between commas, those commas are just ignored:

 [
    , // nothing before this comma
    "data",
    , // nothing after this comma
 ]

=item * single quotes, double quotes, no quotes

Strings can be quoted with either single or double quotes.  Space-less strings
are also parsed as strings. So, the following data items are equivalent:

 [
    "Starflower",
    'Starflower',
    Starflower
 ]

Note that unquoted boolean values are still treated as boolean values, so the
following are NOT the same:

 [
    "true",  // string
    true,    // boolean true
    
    "false", // string
    false,   // boolean false
    
    "null", // string
    null, // what Perl programmers call undef
 ]

Because of this ambiguity, unquoted non-boolean strings should be considered
sloppy and not something you do in polite company.

=item * documents that are just a single string

Early versions of JSON require that a JSON document contains either a single
hash or a single array.  Later versions also allow a single string.  RJSON
follows that later rule, so the following is a valid RJSON document:

 "Hello world"

=item * hash keys without values

A hash in JSON can have a key that is followed by a comma or a closing C<}>
without a specified value.  In that case the hash element is simply assigned
the undefined value.  So, in the following example, C<a> is assigned C<1>,
C<b> is assigned 2, and C<c> is assigned undef:

 {
    a: 1,
    b: 2,
    c
 }

=back

=head2 from_rjson()

C<from_rjson()> is the simple way to quickly parse an RJSON string. Currently
C<from_rjson()> only takes a single parameter, the string itself. So in the
following example, C<from_rjson()> parses and returns the structure defined in
C<$rjson>.

 $structure = from_rjson($rjson);

=head2 Object-oriented parsing

To parse using an object, create a C<JSON::Relaxed::Parser> object, like this:

 $parser = JSON::Relaxed::Parser->new();

Then call the parser's <code>parse</code> method, passing in the RJSON string:

 $structure = $parser->parse($rjson);

B<Methods>

=over 4

=item * $parser->extra_tokens_ok()

C<extra_tokens_ok()> sets/gets the C<extra_tokens_ok> property. By default,
C<extra_tokens_ok> is false.  If by C<extra_tokens_ok> is true then the
C<multiple-structures> isn't triggered and the parser returns the first
structure it finds.  So, for example, the following code would return undef and
sets the C<multiple-structures> error:

 $parser = JSON::Relaxed::Parser->new();
 $structure = $parser->parse('{"x":1} []');

However, by setting C<multiple-structures> to true, a hash structure is
returned, the extra code after that first hash is ignored, and no error is set:

 $parser = JSON::Relaxed::Parser->new();
 $parser->extra_tokens_ok(1);
 $structure = $parser->parse('{"x":1} []');

=back

=head2 Error codes

When JSON::Relaxed encounters a parsing error it returns C<undef> and sets two
global variables: 

=over 4

=item * $JSON::Relaxed::err_id

C<$err_id> is a unique code for a specific error.  Every code is set in only
one place in JSON::Relaxed.

=item * $JSON::Relaxed::err_msg

C<$err_msg> is an English description of the code.  It would be cool to migrate
towards multi-language support for C<$err_msg>.

=back

Following is a list of all error codes in JSON::Relaxed:

=over 4

=item * C<missing-parameter>

The string to be parsed was not sent to $parser->parse(). For example:

 $parser->parse()

=item * C<undefined-input>

The string to be parsed is undefined. For example:

 $parser->parse(undef)

=item * C<zero-length-input>

The string to be parsed is zero-length. For example:

 $parser->parse('')

=item * C<space-only-input>

The string to be parsed has no content beside space characters. For example:

 $parser->parse('   ')

=item * C<no-content>

The string to be parsed has no content. This error is slightly different than
C<space-only-input> in that it is triggered when the input contains only
comments, like this:

 $parser->parse('/* whatever */')


=item * C<unclosed-inline-comment>

A comment was started with /* but was never closed. For example:

 $parser->parse('/*')

=item * C<invalid-structure-opening-character>

The document opens with an invalid structural character like a comma or colon.
The following examples would trigger this error.

 $parser->parse(':')
 $parser->parse(',')
 $parser->parse('}')
 $parser->parse(']')

=item * C<multiple-structures>

The document has multiple structures. JSON and RJSON only allow a document to
consist of a single hash, a single array, or a single string. The following
examples would trigger this error.

 $parse->parse('{}[]')
 $parse->parse('{} "whatever"')
 $parse->parse('"abc" "def"')

=item * C<unknown-token-after-key>

A hash key may only be followed by the closing hash brace or a colon. Anything
else triggers C<unknown-token-after-key>. So, the following examples would
trigger this error.

 $parse->parse("{a [ }") }
 $parse->parse("{a b") }

=item * C<unknown-token-for-hash-key>

The parser encountered something besides a string where a hash key should be.
The following are examples of code that would trigger this error.

 $parse->parse('{{}}')
 $parse->parse('{[]}')
 $parse->parse('{]}')
 $parse->parse('{:}')

=item * C<unclosed-hash-brace>

A hash has an opening brace but no closing brace. For example:

 $parse->parse('{x:1')

=item * C<unclosed-array-brace>

An array has an opening brace but not a closing brace. For example:

 $parse->parse('["x", "y"')

=item * C<unexpected-token-after-colon>

In a hash, a colon must be followed by a value. Anything else triggers this
error. For example:

 $parse->parse('{"a":,}')
 $parse->parse('{"a":}')

=item * C<missing-comma-between-array-elements>

In an array, a comma must be followed by a value, another comma, or the closing
array brace.  Anything else triggers this error. For example:

 $parse->parse('[ "x" "y" ]')
 $parse->parse('[ "x" : ]')

=item * C<unknown-array-token>

This error exists just in case there's an invalid token in an array that
somehow wasn't caught by C<missing-comma-between-array-elements>. This error
shouldn't ever be triggered.  If it is please L<let me know|/AUTHOR>.

=item * C<unclosed-quote>

This error is triggered when a quote isn't closed. For example:

 $parse->parse("'whatever")
 $parse->parse('"whatever') }

=back


=head1 INTERNALS

The following documentation is for if you want to edit the code of
JSON::Relaxed itself.

=head2 JSON::Relaxed

C<JSON::Relaxed> is the parent package. Not a lot actually happens in
C<JSON::Relaxed>, it mostly contains L<from_rjson()|/from_rjson()> and
definitions of various structures.

=over 4

=item Special character and string definitions

The following hashes provide information about characters and strings that have
special meaning in RJSON.

=over 4

=item * Escape characters

The C<%esc> hash defines the six escape characters in RJSON that are
changed to single characters. C<%esc> is defined as follows.

 our %esc = (
   'b'   => "\b",    #  Backspace
   'f'   => "\f",    #  Form feed
   'n'   => "\n",    #  New line
   'r'   => "\r",    #  Carriage return
   't'   => "\t",    #  Tab
   'v'   => chr(11), #  Vertical tab
 );

=item * Structural characters

The C<%structural> hash defines the six characters in RJSON that define
the structure of the data object. The structural characters are defined as
follows.

 our %structural = (
   '[' => 1, # beginning of array
   ']' => 1, # end of array
   '{' => 1, # beginning of hash
   '}' => 1, # end of hash
   ':' => 1, # delimiter between name and value of hash element
   ',' => 1, # separator between elements in hashes and arrays
 );

=item * Quotes

The C<%quotes> hash defines the two types of quotes recognized by RJSON: single
and double quotes. JSON only allows the use of double quotes to define strings.
Relaxed also allows single quotes.  C<%quotes> is defined as follows.

 our %quotes = (
   '"' => 1,
   "'" => 1,
 );

=item * End of line characters

The C<%newlines> hash defines the three ways a line can end in a RJSON
document. Lines in Windows text files end with carriage-return newline
("\r\n").  Lines in Unixish text files end with newline ("\n"). Lines in some
operating systems end with just carriage returns ("\n"). C<%newlines> is
defined as follows.

 our %newlines = (
   "\r\n" => 1,
   "\r" => 1,
   "\n" => 1,
 );

=item * Boolean

The C<%boolean> hash defines strings that are boolean values: true, false, and
null. (OK, 'null' isn't B<just> a boolean value, but I couldn't think of what
else to call this hash.) C<%boolean> is defined as follows.

 our %boolean = (
   'null' => 1,
   'true' => 1,
   'false' => 1,
 );

=back

=back

=head2 JSON::Relaxed::Parser

A C<JSON::Relaxed::Parser> object parses the raw RJSON string. You don't
need to instantiate a parser if you just want to use the default settings.
In that case just use L<from_rjson()|/from_rjson()>.

You would create a C<JSON::Relaxed::Parser> object if you want to customize how
the string is parsed.  I say "would" because there isn't actually any
customization in these early releases. When there is you'll use a parser
object.

To parse in an object oriented manner, create the parser, then parse.

 $parser = JSON::Relaxed::Parser->new();
 $structure = $parser->parse($string);

=over 4

=item new

C<JSON::Relaxed::Parser->new()> creates a parser object. Its simplest and most
common use is without any parameters.

 my $parser = JSON::Relaxed::Parser->new();

=over 4

=item B<option:> unknown

The C<unknown> option sets the character which creates the
L<unknown object|/"JSON::Relaxed::Parser::Token::Unknown">. The unknown object
exists only for testing JSON::Relaxed. It has no purpose in production use.

 my $parser = JSON::Relaxed::Parser->new(unknown=>'~');

=back

=item Parser "is" methods

The following methods indicate if a token has some specific property, such as
being a string object or a structural character.

=over 4

=item * is_string()

Returns true if the token is a string object, i.e. in the class
C<JSON::Relaxed::Parser::Token::String>.

=item * is_struct_char()

Returns true if the token is one of the structural characters of JSON, i.e.
one of the following:

 { } [ ] : ,

=item * is_unknown_char()

Returns true if the token is the
L<unknown character|/"JSON::Relaxed::Parser::Token::Unknown">.

=item * is_list_opener()

Returns true if the token is the opening character for a hash or an array,
i.e. it is one of the following two characters:

 { [

=item * is_comment_opener()

Returns true if the token is the opening character for a comment,
i.e. it is one of the following two couplets:

 /*
 //

=back

=item parse()

C<parse()> is the method that does the work of parsing the RJSON string.
It returns the data structure that is defined in the RJSON string.
A typical usage would be as follows.

 my $parser = JSON::Relaxed::Parser->new();
 my $structure = $parser->parse('["hello world"]');

C<parse()> does not take any options.

=item parse_chars()

C<parse_chars()> parses the RJSON string into either individual characters
or two-character couplets. This method returns an array. The only input is the
raw RJSON string. So, for example, the following string:

 $raw = qq|/*x*/["y"]|;
 @chars = $parser->parse_chars($raw);

would be parsed into the following array:

 ( "/*", "x", "*/", "[", "\"", "y", "\""", "]" )

Most of the elements in the array are single characters. However, comment
delimiters, escaped characters, and Windows-style newlines are parsed as
two-character couplets:

=over 4

=item * C<\> followed by any character

=item * C<\r\n>

=item * C<//>

=item * C</*>

=item * C<*/>

=back

C<parse_chars()> should not produce any fatal errors.

=item tokenize()

C<tokenize()> organizes the characters from
C<L<parse_chars()|/"parse_chars()">> into tokens. Those tokens can then be
organized into a data structure with
C<L<structure()|/"structure()">>.

Each token represents an item that is recognized by JSON. Those items include
structural characters such as C<{> or C<}>, or strings such as
C<"hello world">. Comments and insignificant whitespace are filtered out
by C<tokenize()>.

For example, this code:

 $parser = JSON::Relaxed::Parser->new();
 $raw = qq|/*x*/ ["y"]|;
 @chars = $parser->parse_chars($raw);
 @tokens = $parser->tokenize(\@chars);

would produce an array like this:

 (
   '[',
   JSON::Relaxed::Parser::Token::String::Quoted=HASH(0x20bf0e8),
   ']'
 )

Strings are tokenized into string objects.  When the parsing is complete they
are returned as scalar variables, not objects.

C<tokenize()> should not produce any fatal errors.

=item structure()

C<$parser->structure()> organizes the tokens from C<L<tokenize()|/"tokenize()">>
into a data structure.  C<$parser->structure()> returns a single string, single
array reference, a single hash reference, or (if there are errors) undef.

=back

=head2 JSON::Relaxed::Parser::Structure::Hash

This package parses Relaxed into hash structures. It is a static package, i.e.
it is not instantiated.

=over 4

=item build()

This static method accepts the array of tokens and works through them building
the hash reference that they represent. When C<build()> reaches the closing
curly brace (C<}>) it returns the hash reference.

=item get_value

This static method gets the value of a hash element. This method is called
after a hash key is followed by a colon. A colon must be followed by a value.
It may not be followed by the end of the tokens, a comma, or a closing brace.

=back

=head2 JSON::Relaxed::Parser::Structure::Array

This package parses Relaxed into array structures. It is a static package, i.e.
it is not instantiated.

=over 4

=item build()

This static method accepts the array of tokens and works through them building
the array reference that they represent. When C<build()> reaches the closing
square brace (C<]>) it returns the array reference.

=item missing_comma()

This static method build the C<missing-comma-between-array-elements> error
message.

=item invalid_array_token)

This static method build the C<unknown-array-token> error message.

=back

=head2 JSON::Relaxed::Parser::Token::String

Base class . Nothing actually happens in this package, it's just a base class
for JSON::Relaxed::Parser::Token::String::Quoted and
JSON::Relaxed::Parser::Token::String::Unquoted.

=head2 JSON::Relaxed::Parser::Token::String::Quoted

A C<JSON::Relaxed::Parser::Token::String::Quoted> object represents a string
in the document that is delimited with single or double quotes.  In the
following example, I<Larry> and I<Curly> would be represented by C<Quoted>
objects by I<Moe> would not.

 [
    "Larry",
    'Curly',
    Moe
 ]

C<Quoted> objects are created by C<$parser-E<gt>tokenize()> when it works
through the array of characters in the document.

=over 4

=item * C<new()>

C<new()> instantiates a C<JSON::Relaxed::Parser::Token::String::Quoted> object
and slurps in all the characters in the characters array until it gets to the
closing quote.  Then it returns the new C<Quoted> object.

A C<Quoted> object has the following two properties:

C<raw>: the string that is inside the quotes.  If the string contained any
escape characters then the escapes are processed and the unescaped characters
are in C<raw>. So, for example, C<\n> would become an actual newline.

C<quote>: the delimiting quote, i.e. either a single quote or a double quote.


=item * C<as_perl()>

C<as_perl()> returns the string that was in quotes (without the quotes).

=back

=head2 JSON::Relaxed::Parser::Token::String::Unquoted

A C<JSON::Relaxed::Parser::Token::String::Unquoted> object represents a string
in the document that was not delimited quotes.  In the following example,
I<Moe> would be represented by an C<Unquoted> object, but I<Larry> and I<Curly>
would not.

 [
    "Larry",
    'Curly',
    Moe
 ]

C<Unquoted> objects are created by C<$parser-E<gt>tokenize()> when it works
through the array of characters in the document.

An C<Unquoted> object has one property, C<raw>, which is the string. Escaped
characters are resolved in C<raw>.

=over 4

=item * C<new()>

C<new()> instantiates a C<JSON::Relaxed::Parser::Token::String::Unquoted>
object and slurps in all the characters in the characters array until it gets
to a space character, a comment, or one of the structural characters such as
C<{> or C<:>.

=item * C<as_perl()>

C<as_perl()> returns the unquoted string or a boolean value, depending on how
it is called.

If the string is a boolean value, i.e. I<true>, I<false>, then the C<as_perl>
return 1 (for true), 0 (for false) or undef (for null), B<unless> the
C<always_string> option is sent, in which case the string itself is returned.
If the string does not represent a boolean value then it is returned as-is.

C<$parser-E<gt>structure()> sends the C<always_string> when the token is a key
in a hash. The following example should clarify how C<always_string> is used:

 {
    // key: the literal string "larry"
    // value: 1
    larry : true,
    
    // key: the literal string "true"
    // value: 'x'
    true : 'x',
    
    // key: the literal string "null"
    // value: 'y'
    null : 'y',
    
    // key: the literal string "z"
    // value: undef
    z : null,
 }

=back

=head2 JSON::Relaxed::Parser::Token::Unknown

This class is just used for development of JSON::Relaxed. It has no use in
production. This class allows testing for when a token is an unknown object.

To implement this class, add the 'unknown' option to JSON::Relaxed->new(). The
value of the option should be the character that creates an unknown object.
For example, the following option sets the tilde (~) as an unknown object.

 my $parser = JSON::Relaxed::Parser->new(unknown=>'~');

The "unknown" character must not be inside quotes or inside an unquoted string.

=head1 TERMS AND CONDITIONS

Copyright (c) 2014 by Miko O'Sullivan.  All rights reserved.  This program is 
free software; you can redistribute it and/or modify it under the same terms 
as Perl itself. This software comes with B<NO WARRANTY> of any kind.

=head1 AUTHOR

Miko O'Sullivan
F<miko@idocs.com>

=head1 VERSION

Version: 0.04

=head1 HISTORY

=over 4

=item Version 0.01    Nov 30, 2014

Initial version.

=item Version 0.02    Dec 3, 2014

Fixed test.t so that it can load lib.pm when it runs.

Added $parser->extra_tokens_ok(). Removed error code
C<invalid-structure-opening-string> and allowed that error to fall through to
C<multiple-structures>.

Cleaned up documentation.

=item Version 0.03    Dec 6, 2014

Modified test for parse_chars to normalize newlines.  Apparently the way Perl
on Windows handles newline is different than what I expected, but as long as
it's recognizing newlines and|or carriage returns then the test should pass.

=item Version 0.04 Apr 28, 2016

Fixed bug in which end of line did not terminate some line comments.

Minor cleanups of documentation.

Cleaned up test.pl.

=item Version 0.05 Apr 30, 2016

Fixed bug: Test::Most was not added to the prerequisite list. No changes
to the functionality of the module itself.

=back