DiaColloDB/DiaColloDB/Corpus/Compiled.pod
##========================================================================
## POD DOCUMENTATION, auto-generated by podextract.perl
##========================================================================
## NAME
=pod
=head1 NAME
DiaColloDB::Corpus::Compiled - collocation db, source corpus (pre-compiled)
=cut
##========================================================================
## SYNOPSIS
=pod
=head1 SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::Corpus::Compiled;
##========================================================================
## Constructors etc.
$corpus = $CLASS_OR_OBJECT->new(%args);
##========================================================================
## Persistent API
@keys = $obj->headerKeys();
@files = $obj->diskFiles();
$bool = $obj->unlink(%opts);
##========================================================================
## Corpus API
##-- Corpus API: open/close
$bool = $corpus->open([$dbdir], %opts); ##-- compat;
$bool = $corpus->close();
##-- Corpus API: iteration
$nfiles = $corpus->size();
$bool = $corpus->iok();
$label = $corpus->ifile();
$doc_or_undef = $corpus->idocument();
##========================================================================
## Compiled API
$ccorpus = $CLASS_OR_OBJECT->create($src_corpus, %opts);
$ccorpus = $CLASS_OR_OBJECT->union(\@sources, %opts);
##========================================================================
## Convenience Methods
$bool = $corpus->opened();
$bool = $corpus->flush();
$corpus = $corpus->reopen(%opts);
$dirname = $corpus->datadir();
$bool = $corpus->truncate();
$filters = $ccorpus->filters();
=cut
##========================================================================
## DESCRIPTION
=pod
=head1 DESCRIPTION
DiaColloDB::Corpus::Compiled is an intermediate abstraction layer
for storing pre-filtered corpus data in a format suitable for fast I/O.
It should not be necessaray for end users to use this class directly,
since the L<DiaColloDB::create()|DiaColloDB::compile/create> method should
implicitly create a (temporary) C<DiaColloDB::Corpus::Compiled> object
whenever required.
=cut
##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Globals & Constants
=pod
=head2 Globals & Constants
=over 4
=item Variable: @ISA
C<DiaColloDB::Corpus::Compiled>
inherited from L<DiaColloDB::Corpus|DiaColloDB::Corpus>
and supports all L<DiaColloDB::Corpus|DiaColloDB::Corpus> methods.
=back
=cut
##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Constructors etc.
=pod
=head2 Constructors etc.
=over 4
=item new
$corpus = $CLASS_OR_OBJECT->new(%args);
%args, object structure:
(
##-- NEW in DiaColloDB::Corpus::Compiled
dbdir => $dbdir, ##-- data directory for compiled corpus
flags => $flags, ##-- open mode flags (fcntl flags or perl-style; default='r')
filters => \%filters, ##-- corpus filters ( DiaColloDB::Corpus::Filters object or HASH-ref )
njobs => $njobs, ##-- number of parallel worker jobs for create(); default=-1 (= nCores)
temp => $bool, ##-- implicitly unlink() on exit?
logThreads => $level ##-- log-level for thread stuff (default='off')
##
##-- INHERITED from DiaColloDB::Corpus
#files => \@files, ##-- source files (OVERRIDE: unused)
#dclass => $dclass, ##-- DiaColloDB::Document subclass for loading (OVERRIDE forces 'DiaColloDB::Document::JSON')
dopts => \%opts, ##-- options for $dclass->fromFile() (override default={})
cur => $i, ##-- index of current file
logOpen => $level, ##-- log-level for open(); default='info'
)
Implicitly calls calls the L<open()|open> method if the C<dbdir> property is defined.
=item DESTROY
Destructor implicitly calls the L<close()|close> method,
and may also implicitly call L<unlink()|unlink> if the C<temp> property
is true.
=back
=cut
##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Persistent API
=pod
=head2 Persistent API
=over 4
=item headerKeys
@keys = $obj->headerKeys();
Override filters out more object-specific keys.
=item diskFiles
@files = $obj->diskFiles();
Returns disk storage files; override retuns singleton list
C<$obj-E<gt>{dbdir}>.
=item unlink
$bool = $obj->unlink(%opts);
Removes all disk file(s) associated with the object.
Override accepts additional %opts:
close => $bool, ##-- mall $obj->close() before unlinking? (default=1)
=back
=cut
##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Corpus API
##----------------------------------------------------------
## Corpus API: open/close
=pod
=head2 Corpus API: open/close
=over 4
=item open
$bool = $corpus->open([$dbdir], %opts); ##-- compat
$bool = $corpus->open($dbdir, %opts); ##-- new
Opens compiled corpus directory C<$dbdir>,
which must be specified as either a simple scalar or a singleton
ARRAY-ref, or must already be defined as C<$corpus-E<gt>{dbdir}> or C<$opts{dbdir}>.
Superclass %opts accepted by L<DiaColloDB::Corpus|DiaColloDB::Corpus>:
compiled => $bool, ##-- implicitly true here
glob => $bool, ##-- (ignored here) whether to glob arguments
list => $bool, ##-- (ignored here) whether arguments are file-lists
=item close
$bool = $corpus->close();
Close currently opened corpus if any.
Override implicitly calls L<$corpus-E<gt>flush()|flush>
if C<$corpus> is opened in write-mode.
=back
=cut
##----------------------------------------------------------
## Corpus API: iteration
=pod
=head2 Corpus API: iteration
=over 4
=item size
$nfiles = $corpus->size();
Returns total number of file(s) in the corpus (constant time).
=item iok
$bool = $corpus->iok();
True if corpus file-iterator is valid.
=item ifile
$label = $corpus->ifile();
$label = $corpus->ifile($pos);
Get current iterator filename (first form),
or filename at index C<$pos> (second form).
Override always returns filenames of the form
C<"$corpus-E<gt>{dbdir}/$pos.json">.
=item idocument
$doc_or_undef = $corpus->idocument();
$doc_or_undef = $corpus->idocument($pos);
Gets current document (first form)
or document at index C<$pos> (second form).
=back
=cut
##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Corpus::Compiled API
=pod
=head2 Corpus::Compiled API
=over 4
=item create
$ccorpus = $CLASS->create($src_corpus, %opts);
$ccorpus = $ccorpus->create($src_corpus, %opts);
Compile or append a single C<$src_corpus> to the compiled corpus directory C<$opts{dbdir}>.
If specified C<%opts>, overrides C<%$ccorpus> properties.
Returns a (possibly new) DiaColloDB::Corpus::Compiled object $ccorpus.
Honors perl- or fcntl-style C<$opts{flags}> for append and truncate.
Parses all document file(s) from C<$src_corpus>, applies
the corpus content filters specified by the HASH-ref or
L<DiaColloDB::Corpus::Filters> object specified by C<$ccorpus-E<gt>{filters}>,
and saves the compiled data to the compiled corpus directory C<$ccorpus-E<gt>{dbdir}>.
If the L<threads|threads> module is available, compilation may
use multiple parallell threads as specified by the C<$DiaColloDB::NJOBS> variable;
see L<DiacolloDB::Utils::nJobs()|DiaColloDB::Utils/nJobs> for details.
=item union
$ccorpus = $CLASS->union(\@sources, %opts);
$ccorpus = $ccorpus->union(\@sources, %opts);
Merges pre-compiled corpora C<\@sources> to the output directory C<$opts{dbdir}>.
If specified C<%opts>, overrides C<%$ccorpus> properties.
Returns a (possibly new) DiaColloDB::Corpus::Compiled object $ccorpus
representing the union over C<@sources>.
Honors C<$ccorpus-E<gt>{flags}> for append and truncate.
Each $src in \@sources is either a DiaColloDB::Corpus::Compiled object or a simple scalar
(which is interpreteed as the C<dbdir> of a DiaColloDB::Corpus::Compiled object).
No content filters are applied, and output data files are created as
links to the input data-files from @sources (hard-links if possible, otherwise symbolic links).
=back
=cut
##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Convenience Methods
=pod
=head2 Convenience Methods: disk files etc.
=over 4
=item datadir
$dirname = $corpus->datadir();
$dirname = $corpus->datadir($dir);
Wrapper for C<$corpus-E<gt>{dbdir}>.
=item truncate
$bool = $corpus->truncate();
Removes all disk data (including header) and resets C<$corpus-E<gt>{size}> to 0 (zero).
=item filters
$filters = $ccorpus->filters();
Return corpus content filters as a L<DiaColloDB::Corpus::Filters|DiaColloDB::Corpus::Filters> object.
=back
=cut
##----------------------------------------------------------------
## DESCRIPTION: DiaColloDB::Corpus::Compiled: Compiled API: open/close
=pod
=head2 Convenience Methods: open/close
=over 4
=item opened
$bool = $corpus->opened();
Returns true iff $corpus is currently opened.
=item flush
$bool = $corpus->flush();
Writes any pending corpus data (e.g. header) to disk.
=item reopen
$corpus = $corpus->reopen(%opts);
Closes and re-opened corpus, e.g. with different C<flags>.
=back
=cut
##========================================================================
## END POD DOCUMENTATION, auto-generated by podextract.perl
##======================================================================
## Footer
##======================================================================
=pod
=head1 AUTHOR
Bryan Jurish E<lt>moocow@cpan.orgE<gt>
=head1 COPYRIGHT AND LICENSE
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.14.2 or,
at your option, any later version of Perl 5 you may have available.
=head1 SEE ALSO
L<dcdb-corpus-compile.per(1)|dcdb-corpus-compile.perl>,
L<dcdb-create.per(1)|dcdb-create.perl>,
L<DiaColloDB::Corpus::Filters(3pm)|DiaColloDB::Corpus::Filters>,
L<DiaColloDB::Corpus(3pm)|DiaColloDB::Corpus>,
L<DiaColloDB(3pm)|DiaColloDB>,
L<perl(1)|perl>,
...
=cut