Yet Another CPAN Grep

Table-Readable/lib/Table/Readable.pod


=encoding UTF-8

=head1 NAME

Table::Readable - Simple-to-edit tables of data

=head1 SYNOPSIS

    use FindBin '$Bin';
    use Table::Readable qw/read_table/;
    my @list = read_table ("$Bin/file.txt");
    for my $entry (@list) {
        for my $k (keys %$entry) {
            print "$k $entry->{$k}\n";
        }
    }
    


produces output

    en Residual Current Device
    ja 配線用遮断器
    de Fehlerstrom-Schutzschalter


(This example is included as L<F<synopsis.pl>|https://fastapi.metacpan.org/source/BKB/Table-Readable-0.05/examples/synopsis.pl> in the distribution.)


=head1 VERSION

This documents Table::Readable version 0.05
corresponding to git commit L<c5bf0b4cfb273eaa66a9d14c56a59527a86c3e11|https://github.com/benkasminbullock/Table-Readable/commit/c5bf0b4cfb273eaa66a9d14c56a59527a86c3e11> released on Sun Feb 7 09:47:05 2021 +0900.

=head1 DESCRIPTION

Table::Readable provides a format for human-editable tables of
information which a computer can read. By design, the format does not
support any kind of nesting, and can only be text in UTF-8 encoding.

=head1 FUNCTIONS

=head2 read_table

    my @table = read_table ("list_file.txt");

Read one table of information from the specified file. Each row of
information is stored as an anonymous hash. The return value is an
array. It dies if not called in array context.

Each row of the table consists of key/value pairs. The key/value pairs
are given in the form

    key: value

If the key has spaces

    key with spaces: value

then it is turned into C<key_with_spaces> in the anonymous hash.

Rows are separated by a blank line.

So, for example

    row: first
    data: some information

    row: second
    data: more information
    gubbins: guff here

defines two rows, the first one gets a hash reference with entries
C<row> and C<data>, and the second one is a hash reference with
entries C<row> and C<data> and C<gubbins>, each containing the
information on the right of the colon.

If the key begins with two percentage symbols,

    %%key:

then it marks the beginning of a multiline value which continues until
the next line which begins with two percentage symbols. Thus

    %%key:

    this is the value

    %%

assigns "this is the value" to "key".

If the key contains spaces, these are replaced by underscores. For example,

    this key: value

becomes C<this_key> in the output. Whitespace before the colon is also
converted, so

    this key : value

becomes C<this_key_> in the output, with an underscore at the end.

Comments can be added to the table using lines with # as the first
character.

The file is assumed to be in the UTF-8 encoding.

=head3 Read from a scalar

    my @table = read_table ($stuff, scalar => 1);

Read from a scalar in C<$stuff>.

=head2 read_table_hash

    my $hash = read_table_hash ('table.txt', 'id');

    my ($hash, $order) = read_table_hash ('table.txt', 'id');
    for (@$order) {
        print $hash->{$_}{value}, "\n";
    }

This reads the table specified in the first argument, then creates a
hash reference using the key specified as the second argument. If some
entries of the table do not have the specified key, or if some entries
have the same value for the key, warnings are printed.

=head2 write_table

    write_table (\@table, 'file.txt');

Write the table in C<@table> to F<file.txt>. It insists on an array
reference containing hash references, each of which has simple scalars
as values.

This does not convert underscores in the keys into spaces.

If the name of the file is omitted, it prints to STDOUT.

    write_table (\@table);

If the caller asks for a return value, it returns the table as a
string rather than printing it.

    my $table = write_table (\@table);

If the width of an output line exceeds a maximum length, the entry is
written using the L<multiline format|/Multiline values>. This maximum length is
available as the global variable C<$Table::Readable::maxlen>. The
default value is 75.

=head1 TABLE FORMAT

This section gives exact details of the format of the tables.

The table takes the format

    key1: value
    key2: value

    key1: another value
    key2: yet more values

where rows of the table are separated by a blank line, and the columns
of each row are defined by giving the name of the column, followed by
a colon, followed by the value.

=head2 Blank lines

A blank line may contain spaces (something which matches C<\s>).

=head2 Comments

Lines containing a hash character '#' at the beginning of the line are
ignored. However, lines containing a hash character '#' within
multiline entries are considered part of the entry, not comments. Hash
characters at positions other than the start of a line are not
considered comments, and are not ignored.

Comments are not considered blank lines for the purpose of separating
table rows.

    
    use Table::Readable 'read_table';
    my $table = <<EOF;
    row: 1
    # comment
    some: thing
    
    row: 2
    EOF
    my @rows = read_table ($table, scalar => 1);
    print scalar (@rows), "\n";
    


produces output

    2


(This example is included as L<F<comment-not-row.pl>|https://fastapi.metacpan.org/source/BKB/Table-Readable-0.05/examples/comment-not-row.pl> in the distribution.)


=head2 Encoding

The file must be encoded in the UTF-8 encoding.

=head2 Unparseable lines

Lines which are not part of a multiline value, are not comments, and
do not contain a key, are discarded and a warning is printed.

=head2 Values

=head3 Multiline values

    %%key1:

    value goes here.

    %%

Multiline values begin and end with two percent characters at the
beginning of the line. Between the two percent characters there may be
any number of blank lines. Whitespace (anything matching C<\s>) is
stripped from the beginning and end of the value. 

There is no way to have double percent characters at the beginning of
a line within a multiline value, so if you need double percents, you
must use a different syntax and then post-process the entry to convert
your syntax to double percent characters.

=head3 Whitespace

Whitespace (anything matching C<\s>) is stripped from the beginning
and end of the value.  Leading and trailing whitespace can be preserved by preceding it
with a backslash character:

    
    use utf8;
    use Table::Readable 'read_table';
    my $table =<<'EOF';
    a: \  b     
    %%c:
    \
    
    d
    
    %%
    %%e:
    
    f
    
    !
    
    
    \
    %%
    EOF
    my @entries = read_table ($table, scalar => 1);
    for my $k (keys %{$entries[0]}) {
        my $v = $entries[0]{$k};
        $v =~ s/!$//;
        print "'$k' = '$v'\n";
    }
    


produces output

    'a' = '  b'
    'e' = 'f
    
    !
    
    
    '
    'c' = '
    
    d'


(This example is included as L<F<slash.pl>|https://fastapi.metacpan.org/source/BKB/Table-Readable-0.05/examples/slash.pl> in the distribution.)


If you actually need a backslash at the start or end of your string,
use a double backslash, C<\\>. In parts of the string other than the
first or the last position, double backslashes are not treated
specially.

Alternatively you could use your own syntax such
as the following.

    
    use Table::Readable 'read_table';
    my $table =<<EOF;
    a: b     
    %%c:
    
    d
    
    %%
    %%e:
    
    f
    
    !
    %%
    EOF
    my @entries = read_table ($table, scalar => 1);
    for my $k (keys %{$entries[0]}) {
        my $v = $entries[0]{$k};
        $v =~ s/!$//;
        print "'$k' = '$v'\n";
    }
    


produces output

    'e' = 'f
    
    '
    'c' = 'd'
    'a' = 'b'


(This example is included as L<F<whitespace.pl>|https://fastapi.metacpan.org/source/BKB/Table-Readable-0.05/examples/whitespace.pl> in the distribution.)


=head3 Empty values

Keys without values, like

    key:

are permitted within the table. A key with no value results in the
value for that key being an empty string, rather than the undefined
value.

=head2 Keys

=head3 Key syntax

A key is a series of one or more of any characters whatsoever except
for colons. In regular expression language, a key matches $2 in the
following:

    ^(%%)?([^:]+)

Keys cannot contain colons, so if you need to have colons in your
keys, invent your own escape sequence, such as substituting semicolons
or @ marks for colons.

=head3 Consistency of keys

There is no requirement that the keys in one row of the table have to
be the same as the keys in the subsequent row. Each row of the table
may have completely inconsistent keys. If you need consistent keys,
add a post-processor of your own.

=head3 Uniqueness of keys

Keys within a single row must be unique. A duplicate key within a row
causes a fatal error.

=head2 Design and motivation

This module and the associated format were born out of exasperation
with various complicated file formats, and the associated complicated
parser software. In particular I originally made this module and
format as an alternative to using the TMX format for translation
memory files, and also out of frustration with the L<AppConfig>
module. I currently use this to store translations, such as
L<http://kanji.sljfaq.org/translations.txt>, and files of tabular
information, such as
L<https://www.lemoda.net/unix/troff-dictionary/dictionary.txt>.

This format is designed to reduce the amount of mental effort
necessary to type in a machine-readable table of information. By
design, it adds only the most minimal possible interpretations to
characters. There are only five significant characters, the newline,
the colon, the hash character #, the percent character %, and the
backslash \. The hash character and the percent character are only
significant either when they come immediately after a new line or when
they are the first byte in the file, and the backslash is only
significant in conjunction with leading or trailing white space. The
multiline escape sequence is two percents at the beginning of a line,
a sequence which rarely occurs in normal text.

The minimalism of this module is intentional; I will never, ever, add
new syntax, extra escape characters, comments not at the end of lines,
nested tables, or multiple tables in one file to this format, and I
would gladly remove anything from it, if there was anything that could
possibly be removed. The reason for that is that every time one adds a
new facility, it adds yet another meaning to some sequence of
characters, which not only has to be remembered, but also has to be
programmed around by adding yet another escape. Let's say that I added
comments like this:

     key: value # this is a comment

then I would have to add yet another escape for the case where I
actually wanted to put a hash character inside a value, yet another
annoying bit of syntax to remember like

     key: value \# not a comment

The more one adds these kinds of meaningful characters, the more the
complexity, the more the bugs, the more the workarounds, the more the
fixes, and the more the number of things to remember, and the more the
headaches. No thanks!

=head1 OTHER

There is a Go mode and an Emacs mode for this format as well as the
CPAN Perl distribution. They are all part of the same github
repository, so you can report issues or make pull requests at the same
place as this.

=head2 Emacs mode

There is an Emacs mode for the format called F<table-readable-mode.el>
in the top directory of the CPAN distribution or L<in the github
repository|https://github.com/benkasminbullock/Table-Readable/blob/master/table-readable-mode.el>.

This includes highlighting of comments and makes it easier to format
paragraphs of multiline text.

At the moment it is restricted to lower-case alphabetical keys,
although that is not part of the format specification.

=head2 Go mode

There is L<a reader and writer of the format in Go in the github
repository|https://github.com/benkasminbullock/Table-Readable/blob/master/go> including tests. I'm not
currently making that much use of this code at the moment, since
string manipulation in Go is a nuisance compared to Perl, and I've
found it's usually easier to convert the tables to JSON in Perl then
read the JSON into a C<map[string]string> in Go.

=head1 DEPENDENCIES

=over

=item Carp

L<Carp> is used for printing error messages.

=back

=head1 EXPORTS

Nothing is exported by default. All functions can be exported on
request. A tag ":all" exports all the functions:

    use Table::Readable ':all';



=head1 AUTHOR

Ben Bullock, <bkb@cpan.org>

=head1 COPYRIGHT & LICENCE

This package and associated files are copyright (C) 
2010-2021
Ben Bullock.

You can use, copy, modify and redistribute this package and associated
files under the Perl Artistic Licence or the GNU General Public
Licence.
Maintained by Kenichi Ishigaki <ishigaki@cpan.org>. If you find anything, submit it on GitHub.