Marpa-R3/pod/External/Basic.pod
# Marpa::R3 is Copyright (C) 2018, Jeffrey Kegler.
#
# This module is free software; you can redistribute it and/or modify it
# under the same terms as Perl 5.10.1. For more details, see the full text
# of the licenses in the directory LICENSES.
#
# This program is distributed in the hope that it will be
# useful, but it is provided "as is" and without any express
# or implied warranties. For details, see the full text of
# of the licenses in the directory LICENSES.
=head1 Name
Marpa::R3::External::Basic - External scanning, basics
=head1 Synopsis
=for Marpa::R3::Display
name: recognizer read/resume synopsis
partial: 1
normalize-whitespace: 1
my @pause_location;
my $recce = Marpa::R3::Recognizer->new(
{
grammar => $parser->{grammar},
event_handlers => {
'before lstring' => sub () {
( undef, undef, undef, @pause_location ) = @_;
'pause';
},
}
}
);
my $length = length $string;
for (
my $pos = $recce->read( \$string );
$pos < $length;
$pos = $recce->resume()
)
{
my $start = $pause_location[1];
my $length = $pause_location[2];
my $value = substr $string, $start + 1, $length - 2;
$value = decode_string($value) if -1 != index $value, '\\';
$recce->lexeme_read_block( 'lstring', $value, undef, $start, $length ) // die;
} ## end for ( my $pos = $recce->read( \$string ); $pos < $length...)
my $per_parse_arg = bless {}, 'MarpaX::JSON::Actions';
my $value_ref = $recce->value($per_parse_arg);
return ${$value_ref};
=for Marpa::R3::Display::End
=head1 About this document
This page describes B<external scanning>.
By default, Marpa::R3 scans based on the L0 grammar in
its DSL.
This DSL-driven scanning is called B<internal scanning>.
But
many applications find it useful or necessary to
do their own scanning in procedural code.
In Marpa::R3 this is called B<external scanning>.
External scanning can be used as a replacement for internal scanning.
Marpa::R3 also allows applications to switch back and
forth between internal and external scanning.
=head1 Lexemes
In external scanning, the app controls tokenization directly.
External scanning might also be called one-by-one scanning because,
in external scanning, the app feeds lexemes to Marpa::R3 one-by-one.
This differs from internal scanning -- in internal scanning
Marpa::R3 tokenizes a string for the app.
Every lexeme has three things associated with it:
=over 4
=item 1.
A B<symbol name>, which is required.
The symbol name must be the name of a lexeme in
both the L0 and G1 grammars.
The symbol name
tells the parser which symbol represents this lexeme to
the Marpa semantics.
The symbol name, in other words,
connects the lexeme to the grammar.
=item 2.
A B<symbol value> or B<value>, which may be undefined.
The
value of the lexeme is also seen by the semantics.
=item 3.
A B<literal equivalent>, which is required
and must be a span in the input.
The literal equivalent is needed for
the messages produced by
tracing, debugging, error reporting, etc.
If more than one lexeme ends at the same G1 location --
which can happen if lexemes are ambiguous --
all of the lexemes must have the same literal equivalent.
=back
=head1 High level and low level methods
In scanning externally, you can use high level or
low level methods.
The simpler methods, and the ones which most
users will want,
are the high level methods.
They are described in this document.
The low level external scanning methods are
described in
L<Marpa::R3::External::Low>.
=head1 High level methods in general
The high levels scanning methods
have almost all of their behaviors in common.
For convenience, therefore, the usual behaviors
of the completion methods are described in the section,
and exceptions to these behaviors are noted
in the descriptions of the individual methods.
Every call of an external scanning method
is made with a specified block span.
How that the block span is specified varies by method.
Unless the call results in
a hard failure,
on return it leaves a valid current block span.
For the purposes of this section, let the specified block
span
be C<< <$block_id, $offset, $length> >>.
Also for the purposes of this section,
we will define B<eolexeme>,
or "end of lexeme",
as C<$offset> + C<$length>.
External scanning can succeed or fail.
If external scanning fails, the failure may be hard
or soft.
The only soft failure that can occur in external scanning
is the rejection of a lexeme.
=head2 Successful reads
An external scanning methods reads a lexeme
if it is successful and no pre-lexeme event occurs.
In the case of a successful read,
the current block is set to C<$block_id>;
the offset of the current block is set to C<eolexeme>;
and the eoread of the current block will
not be changed.
Also in the case of a successful read,
the current G1 location is advanced by one.
The lexeme just read will start at the previous G1 location
and end at the new current G1 location.
When we speak simply of the G1 location of a lexeme,
we refer to its end location, so that
the G1 location of the lexeme is considered to be
the new current G1 location.
A
L<parse event|Marpa::R3::Event> may occur
during a successful read.
The parse event may call an event handler.
The event handler will see
the block and G1 locations as just described.
=head2 Pre-lexeme events
An external scanning method may succeed with a
pre-lexeme event.
In that case,
the current block is set to C<$block_id>.
The offset of the current block is set to the event location,
which will be the same as C<$offset>.
The eoread of the current block will
not be changed.
If a pre-lexeme event does occur,
no lexeme is read.
The current G1 location will remain where it was
before external scanning.
The pre-lexeme
L<parse event|Marpa::R3::Event>
may call an event handler.
The event handler will see
the block and G1 locations as just described.
=head2 Soft failure
If a high level external scanning method rejects a lexeme,
then that method results in a soft failure.
In this case the current block data
remains unchanged.
On soft failure,
no lexeme is read.
The current G1 location will remain where it was
before the method call.
=head2 Hard failure
Any failure in external scanning completion,
other than lexeme rejection,
is a hard failure.
In the case of a hard failure,
no guarantee is made about the current block data,
or about the current G1 location.
On hard failure,
Marpa::R3 will attempt to leave
the block location at
an "error location" -- a location
as relevant as possible to the error.
Marpa::R3 will attempt to leave the current G1 location valid and unchanged.
=head1 High-level mutators
Most applications doing external scanning
will want to use the high-level methods.
The L<C<< $recce->lexeme_read_string() >> method|/"lexeme_read_string()">
allows the reading of a string, where the string is both
the literal equivalent of the input, and its value for semantics.
The L<C<< $recce->lexeme_read_literal() >> method|/"lexeme_read_literal()">
is similar, but the string is specified as a block span.
The L<C<< $recce->lexeme_read_block() >> method|/"lexeme_read_block()">
is the most general of the high-level external scanning methods.
C<lexeme_read_block()> allows the app to specify the literal equivalent
and the value separately.
=head2 lexeme_read_block()
=for Marpa::R3::Display
name: recognizer lexeme_read_block() synopsis
partial: 1
normalize-whitespace: 1
my $ok = $recce->lexeme_read_block($symbol_name, $value,
$main_block, $start_of_lexeme, $lexeme_length);
die qq{Parser rejected lexeme "$long_name" at position $start_of_lexeme, before "},
$recce->literal( $main_block, $start_of_lexeme, 40 ), q{"}
if not defined $ok;
=for Marpa::R3::Display::End
C<lexeme_read_block()> is the basic method for external scanning.
It takes five arguments, only the first of which is required.
Call them, in order,
C<$symbol_name>,
C<$value>,
C<$block_id>,
C<$offset>,
and
C<$length>.
The C<$symbol_name> argument is the symbol name of the lexeme
to scan.
The C<$value> argument will be the value of the lexeme.
If C<$value> is missing or undefined,
the value of the lexeme will be a Perl C<undef>.
The C<$block_id>, C<$offset>, and C<$length> arguments
are the literal equivalent of the lexeme, as a
L<block span|/"Block spans">.
C<lexeme_read_block()> is a high level method
and details of its behavior are
L<as described above|"High level methods in general">.
B<Return values>:
On success, C<lexeme_read_block()> returns the new current offset.
Soft failure occurs if and only if
the lexeme was rejected.
On soft failure,
C<lexeme_read_block()> returns a Perl C<undef>.
Other failures are thrown as exceptions.
=head2 lexeme_read_literal()
=for Marpa::R3::Display
name: recognizer lexeme_read_literal() synopsis
partial: 1
normalize-whitespace: 1
my $ok = $recce->lexeme_read_literal($symbol_name, $main_block, $start_of_lexeme, $lexeme_length);
die qq{Parser rejected lexeme "$long_name" at position $start_of_lexeme, before "},
$recce->literal( $main_block, $start_of_lexeme, 40 ), q{"}
if not defined $ok;
=for Marpa::R3::Display::End
C<lexeme_read_literal()>
takes four arguments, only the first of which is required.
Call them, in order,
C<$symbol_name>,
C<$block_id>,
C<$offset>,
and
C<$length>.
The C<$symbol_name> argument is the symbol name of the lexeme
to scan.
The C<$block_id>, C<$offset>, and C<$length> arguments
are the literal equivalent of the lexeme, as a
L<block span|/"Block spans">.
The value of the lexeme will be the same
as its literal equivalent.
C<lexeme_read_literal()> is a high level method
and details of its behavior are
L<as described above|"High level methods in general">.
=for Marpa::R3::Display
ignore: 1
$recce->lexeme_read_literal($symbol, $start, $length, $value)
=for Marpa::R3::Display::End
is roughly equivalent to
=for Marpa::R3::Display
name: recognizer lexeme_read_literal() high-level equivalent
normalize-whitespace: 1
sub read_literal_equivalent_hi {
my ( $recce, $symbol_name, $block_id, $offset, $length ) = @_;
my $value = $recce->literal( $block_id, $offset, $length );
return $recce->lexeme_read_block( $symbol_name, $value, $block_id, $offset, $length );
}
=for Marpa::R3::Display::End
B<Return values>:
On success, C<lexeme_read_literal()> returns the new current offset.
Soft failure occurs if and only if
the lexeme was rejected.
On soft failure,
C<lexeme_read_literal()> returns a Perl C<undef>.
Other failures are thrown as exceptions.
=head2 lexeme_read_string()
=for Marpa::R3::Display
name: recognizer lexeme_read_string() synopsis
partial: 1
normalize-whitespace: 1
my $ok = $recce->lexeme_read_string( $symbol_name, $lexeme );
die qq{Parser rejected lexeme "$long_name" at position $start_of_lexeme, before "},
$recce->literal( $main_block, $start_of_lexeme, 40 ), q{"}
if not defined $ok;
=for Marpa::R3::Display::End
The C<lexeme_read_string()> method takes 2 arguments, both required.
Call them, in order, C<$symbol_name> and C<$string>.
C<$symbol_name>
is the symbol name of the lexeme
to be read.
C<$string>
is a string which becomes both the value of the lexeme
and its literal equivalent.
C<lexeme_read_string()> is a high level
method and, with two important exceptions,
the details of its behavior are
L<as described above|"High level methods in general">.
The first difference is that,
on success,
C<lexeme_read_string()>
creates a new input text block,
using C<$string> as its text.
We'll call this block the "per-string block".
The literal equivalent of the lexeme
will be the per-string block, starting at offset 0 and ending at eoblock.
The second difference is that,
after a successful call to C<lexeme_read_string()>,
the per-string block does B<not> become the new current block.
The current block data after a call to
C<lexeme_read_string()>
will be the same as it was before
the call to C<lexeme_read_string()>.
For most purposes, then,
the per-string block
is invisible to the
app that called
C<lexeme_read_string()>.
Apps which trace or keep track of the details of the input text blocks
may notice the additional block.
Also, event handlers
which
trigger during the C<lexeme_read_string()> method
will see the per-string block.
=for Marpa::R3::Display
ignore: 1
$recce->lexeme_read_string($symbol, $string)
=for Marpa::R3::Display::End
is roughly equivalent to
=for Marpa::R3::Display
name: recognizer lexeme_read_string() high-level equivalent
normalize-whitespace: 1
sub read_string_equivalent_hi {
my ( $recce, $symbol_name, $string ) = @_;
my ($save_block) = $recce->block_progress();
my $new_block = $recce->block_new( \$string );
my $return_value = $recce->lexeme_read_literal( $symbol_name, $new_block );
$recce->block_set($save_block);
return $return_value;
}
=for Marpa::R3::Display::End
C<lexeme_read_string()> is not designed for
very long values of C<$string>.
For efficiency with long strings,
use the equivalent in terms of C<lexeme_read_literal()>, as just shown.
C<lexeme_read_literal()> sets the value of the lexeme to a span
of an input text block,
while C<lexeme_read_string()> sets the value of the lexeme to a string.
Marpa::R3 optimizes lexeme values when they are literals in its
input text blocks.
B<Return values>:
On success, C<lexeme_read_string()> returns the new current offset.
Soft failure occurs if and only if
the lexeme was rejected.
On soft failure,
C<lexeme_read_string()> returns a Perl C<undef>.
Other failures are thrown as exceptions.
=head1 COPYRIGHT AND LICENSE
=for Marpa::R3::Display
ignore: 1
Marpa::R3 is Copyright (C) 2018, Jeffrey Kegler.
This module is free software; you can redistribute it and/or modify it
under the same terms as Perl 5.10.1. For more details, see the full text
of the licenses in the directory LICENSES.
This program is distributed in the hope that it will be
useful, but without any warranty; without even the implied
warranty of merchantability or fitness for a particular purpose.
=for Marpa::R3::Display::End
=cut
# vim: expandtab shiftwidth=4: