Yet Another CPAN Grep

Test-Against-Dev/lib/Test/Against/Build.pm

package Test::Against::Build;
use strict;
use 5.14.0;
our $VERSION = '0.14';
use Carp;
use Cwd;
use File::Basename;
use File::Path ( qw| make_path | );
use File::Spec;
use File::Temp ( qw| tempdir tempfile | );
use Archive::Tar;
use CPAN::cpanminus::reporter::RetainReports;
use Data::Dump ( qw| dd pp | );
use JSON;
use Path::Tiny;
use Text::CSV_XS;

=head1 NAME

Test::Against::Build - Test CPAN modules against specific Perl build

=head1 SYNOPSIS

    my $self = Test::Against::Build->new( {
        build_tree      => '/path/to/top/of/build_tree',
        results_tree    => '/path/to/top/of/results_tree',
        verbose => 1,
    } );

    my $gzipped_build_log = $self->run_cpanm( {
        module_file => '/path/to/cpan-river-file.txt',
        title       => 'cpan-river-1000',
        verbose     => 1,
    } );

    $ranalysis_dir = $self->analyze_cpanm_build_logs( { verbose => 1 } );

    $fcdvfile = $self->analyze_json_logs( { verbose => 1, sep_char => '|' } );

=head1 DESCRIPTION

=head2 Who Should Use This Library?

This library should be used by anyone who wishes to assess the impact of a
given build of the Perl 5 core distribution on the installability of libraries
found on the Comprehensive Perl Archive Network (CPAN).

=head2 The Problem to Be Addressed

=head3 The Perl Annual Development Cycle

Perl 5 undergoes an annual development cycle whose components typically include:

=over 4

=item * Individual commits to the Perl 5 F<git> repository

These commits may be identified by commit IDs (SHAs), branches or tags.

=item * Release tarballs

=over 4

=item * Monthly development release tarballs

Whose version numbers follow the convention of C<5.27.0>, C<5.27.1>,
etc., where the middle digits are always odd numbers.

=item * Release Candidate (RC) tarballs

Whose version numbers follow the convention of C<5.28.0-RC1>, C<5.28.0-RC2>,
C<5.28.1-RC1>.

=item * Production release tarballs

Whose version numbers follow the convention of C<5.28.0> (new release);
C<5.28.1>, C<5.28.2>, etc. (maintenance releases).

=back

=back

=head3 Measuring the Impact of Changes in Core on CPAN Modules

You can configure, build and install a F<perl> executable starting from any of
the above components and you can install CPAN modules against any such F<perl>
executable.  Given a list of specific CPAN modules, you may want to be able to
compare the results you get from trying to install that list against different
F<perl> executables built from different commits or releases at various points
in the development cycle.  To make such comparisons, you will need to have
data generated and recorded in a consistent format.  This library provides
methods for that data generation and recording.

=head2 High-Level View of What the Module Does

=head3 Tree Structure

For any particular attempt to build a F<perl> executable from any of the
starting points described above, F<Test::Against::Build> guarantees that there
exists on disk B<two> directory trees:

=over 4

=item 1 The build tree

The build tree is a directory beneath which F<perl>, other executables and
libraries will be installed (or already are installed).  As such, the
structure of this tree will look like this:

    top_of_build_tree/bin/
                      bin/perl
                      bin/perldoc
                      bin/cpan
                      bin/cpanm
    ...
    top_of_build_tree/lib/
                      lib/perl5/
                      lib/perl5/5.29.0/
                      lib/perl5/site_perl/
    ...
    top_of_build_tree/.cpanm/
    top_of_build_tree/.cpanreporter/

F<Test::Against::Build> presumes that you will be using Miyagawa's F<cpanm>
utility to install modules from CPAN.  The F<.cpanm> and F<.cpanreporter>
directories will be the locations where data concerning attempts to install CPAN
modules are recorded.

=item 2 The results tree

The results tree is a directory beneath which data parsed from the F<.cpanm>
directory is formatted and stored.  Its format looks like this:

    top_of_results_tree/analysis/
                        buildlogs/
                        storage/

=back

The names of the top-level directories are arbitrary; the names of their
subdirectories are not.  The top-level directories may be located anywhere
writable on disk and need not share a common parent directory.  It is the
F<Test::Against::Build> object which will establish a relationship between the
two trees.

=head3 Installation of F<perl> and F<cpanm>

F<Test::Against::Build> does B<not> provide you with methods to build or
install these executables.  It presumes that you know how to build F<perl>
from source, whether that be from a specific F<git> checkout or from a release
tarball.  It further presumes that you know how to install F<cpanm> against
that F<perl>.  It does provide a method to identify the directory you should
use as the value of the C<-Dprefix=> option to F<Configure>.  It also provides
methods to determine that you have installed F<perl> and F<cpanm> in the
expected locations.  Once that determination has been made, it provides you
with methods to run F<cpanm> against a specific list of modules, parse the
results into files in JSON format and then summarize those results in a
delimiter-separated-values file (such as a pipe-separated-values (C<.psv>)
file).

Why, you may ask, does F<Test::Against::Build> B<not> provide methods to
install these executables?  There are a number of reasons why not.

=over 4

=item * F<perl> and F<cpanm> already installed

You may already have on disk one or more F<perl>s built from specific commits
or release tarballs and have no need to re-install them.

=item * Starting from F<git> commit versus starting from a tarball

You can build F<perl> either way, but there's no need to have code in this
package to express both ways.

=item * Many ways to configure F<perl>

F<perl> configuration is a matter of taste.  The only thing which this package
needs to provide you is a value for the C<-Dprefix=> option.  It should go
without saying that if want to measure the impact on CPAN modules of two
different builds of F<perl>, you should call F<Configure> with exactly the
same set of options for each.

=back

The examples below will provide guidance.

=head1 METHODS

=head2 C<new()>

=over 4

=item * Purpose

Test::Against::Build constructor.  Guarantees that the build tree and results
tree have the expected directory structure.  Determines whether F<perl> and
F<cpanm> have already been installed or not.

=item * Arguments

    my $self = Test::Against::Build->new( {
        build_tree      => '/path/to/top/of/build_tree',
        results_tree    => '/path/to/top/of/results_tree',
        verbose => 1,
    } );

=item * Return Value

Test::Against::Build object.

=item * Comment

=back

=cut

sub new {
    my ($class, $args) = @_;

    croak "Argument to constructor must be hashref"
        unless ref($args) eq 'HASH';
    my $verbose = delete $args->{verbose} || '';
    for my $d (qw| build_tree results_tree |) {
        croak "Hash ref must contain '$d' element"
            unless $args->{$d};
        unless (-d $args->{$d}) {
            croak "Could not locate directory '$args->{$d}' for '$d'";
        }
        else {
            say "Located directory '$args->{$d}' for '$d'" if $verbose;
        }
    }
    # Crude test of difference of directories;
    # need to take into account, e.g., symlinks, relative paths
    croak "Arguments for 'build_tree' and 'results_tree' must be different directories"
        unless $args->{build_tree} ne $args->{results_tree};

    my $data;
    for my $k (keys %{$args}) {
        $data->{$k} = $args->{$k};
    }

    for my $subdir ( 'bin', 'lib' ) {
        my $dir = File::Spec->catdir($data->{build_tree}, $subdir);
        my $key = "${subdir}_dir";
        $data->{$key} = (-d $dir) ? $dir : undef;
    }
    for my $subdir ( '.cpanm', '.cpanreporter' ) {
        my $dir = File::Spec->catdir($data->{build_tree}, $subdir);
        my $key = "${subdir}_dir";
        $key =~ s{^\.(.*)}{$1};
        $data->{$key} = (-d $dir) ? $dir : undef;
    }
	$data->{PERL_CPANM_HOME} = $data->{cpanm_dir};
	$data->{PERL_CPAN_REPORTER_DIR} = $data->{cpanreporter_dir};

    for my $subdir ( qw| analysis buildlogs storage | ) {
        my $dir = File::Spec->catdir($data->{results_tree}, $subdir);
        unless (-d $dir) {
            my @created = make_path($dir, { mode => 0711 })
                or croak "Unable to make_path '$dir'";
        }
        my $key = "${subdir}_dir";
        $data->{$key} = $dir;
    }

    my $expected_perl = File::Spec->catfile($data->{bin_dir}, 'perl');
    $data->{this_perl} = (-e $expected_perl) ? $expected_perl : '';

    my $expected_cpanm = File::Spec->catfile($data->{bin_dir}, 'cpanm');
    $data->{this_cpanm} = (-e $expected_cpanm) ? $expected_cpanm : '';

    return bless $data, $class;
}

=head2 Accessors

The following accessors return the absolute path to the directories in their names:

=over 4

=item * C<get_build_tree()>

=item * C<get_bin_dir()>

=item * C<get_lib_dir()>

=item * C<get_cpanm_dir()>

=item * C<get_cpanreporter_dir()>

=item * C<get_results_tree()>

=item * C<get_analysis_dir()>

=item * C<get_buildlogs_dir()>

=item * C<get_storage_dir()>

=back

=cut

sub get_build_tree { my $self = shift; return $self->{build_tree}; }
sub get_bin_dir { my $self = shift; return $self->{bin_dir}; }
sub get_lib_dir { my $self = shift; return $self->{lib_dir}; }
sub get_cpanm_dir { my $self = shift; return $self->{cpanm_dir}; }
sub get_cpanreporter_dir { my $self = shift; return $self->{cpanreporter_dir}; }
sub get_results_tree { my $self = shift; return $self->{results_tree}; }
sub get_analysis_dir { my $self = shift; return $self->{analysis_dir}; }
sub get_buildlogs_dir { my $self = shift; return $self->{buildlogs_dir}; }
sub get_storage_dir { my $self = shift; return $self->{storage_dir}; }

sub get_this_perl {
    my $self = shift;
    if (! $self->{this_perl}) {
        croak "perl has not yet been installed; configure, build and install it";
    }
    else {
        return $self->{this_perl};
    }
}

=head2 C<is_perl_built()>

=over 4

=item * Purpose

Determines whether a F<perl> executable has actually been installed in the
directory returned by C<get_bin_dir()>.

=item * Arguments

None.

=item * Return Value

C<1> for yes; C<0> for no.

=back

=cut

sub is_perl_built {
    my $self = shift;
    if (! $self->{this_perl}) {
        my $expected_perl = File::Spec->catfile($self->get_bin_dir, 'perl');
        $self->{this_perl} = (-e $expected_perl) ? $expected_perl : '';
    }
    return ($self->{this_perl}) ? 1 : 0;
}

sub get_this_cpanm {
    my $self = shift;
    if (! $self->{this_cpanm}) {
        croak "cpanm has not yet been installed; configure, build and install it";
    }
    else {
        return $self->{this_cpanm};
    }
}


=head2 C<is_cpanm_built()>

=over 4

=item * Purpose

Determines whether a F<cpanm> executable has actually been installed in the
directory returned by C<get_bin_dir()>.

=item * Arguments

=item * Return Value

C<1> for yes; C<0> for no.

=back

=cut

sub is_cpanm_built {
    my $self = shift;
    if (! $self->{this_cpanm}) {
        my $expected_cpanm = File::Spec->catfile($self->get_bin_dir, 'cpanm');
        $self->{this_cpanm} = (-e $expected_cpanm) ? $expected_cpanm : '';
    }
    return ($self->{this_cpanm}) ? 1 : 0;
}

=head2 C<run_cpanm()>

=over 4

=item * Purpose

Use F<cpanm> to install selected Perl modules against the F<perl> built for
testing purposes.

=item * Arguments

Two mutually exclusive interfaces:

=over 4

=item * Modules provided in a list

    $gzipped_build_log = $self->run_cpanm( {
        module_list => [ 'DateTime', 'AnyEvent' ],
        title       => 'two-important-libraries',
        verbose     => 1,
    } );

=item * Modules listed in a file

    $gzipped_build_log = $self->run_cpanm( {
        module_file => '/path/to/cpan-river-file.txt',
        title       => 'cpan-river-1000',
        verbose     => 1,
    } );

=back

Each interface takes a hash reference with the following elements:

=over 4

=item * C<module_list> B<OR> C<module_file>

Mutually exclusive; must use one or the other but not both.

The value of C<module_list> must be an array reference holding a list of
modules for which you wish to track the impact of changes in the Perl 5 core
distribution over time.  In either case the module names are spelled in
C<Some::Module> format -- I<i.e.>, double-colons -- rather than in
C<Some-Module> format (hyphens).

=item * C<title>

String which will be used to compose the name of project-specific output
files.  Required.

=item * C<verbose>

Extra information provided on STDOUT.  Optional; defaults to being off;
provide a Perl-true value to turn it on.  Scope is limited to this method.

=back

=back

=cut

sub run_cpanm {
    my ($self, $args) = @_;
    croak "run_cpanm: Must supply hash ref as argument"
        unless ( ( defined $args ) and ( ref($args) eq 'HASH' ) );
    my $verbose = delete $args->{verbose} || '';
    my %eligible_args = map { $_ => 1 } ( qw|
        module_file module_list title
    | );
    for my $k (keys %$args) {
        croak "run_cpanm: '$k' is not a valid element"
            unless $eligible_args{$k};
    }
    if (exists $args->{module_file} and exists $args->{module_list}) {
        croak "run_cpanm: Supply either a file for 'module_file' or an array ref for 'module_list' but not both";
    }
    if ($args->{module_file}) {
        croak "run_cpanm: Could not locate '$args->{module_file}'"
            unless (-f $args->{module_file});
    }
    if ($args->{module_list}) {
        croak "run_cpanm: Must supply array ref for 'module_list'"
            unless ref($args->{module_list}) eq 'ARRAY';
    }

    unless (defined $args->{title} and length $args->{title}) {
        croak "Must supply value for 'title' element";
    }
    $self->{title} = $args->{title};

    # At this point, we must have real .cpanm and .cpanreporter directories

    for my $subdir ( '.cpanm', '.cpanreporter' ) {
        my $dir = File::Spec->catdir($self->get_build_tree(), $subdir);
        unless (-d $dir) {
            make_path($dir, { mode => 0711 })
                or croak "Unable to make_path $dir";
        }
        my $key = "${subdir}_dir";
        $key =~ s{^\.(.*)}{$1};
        $self->{$key} = $dir;
    }
	$self->{PERL_CPANM_HOME} = $self->{cpanm_dir};
	$self->{PERL_CPAN_REPORTER_DIR} = $self->{cpanreporter_dir};

    say "cpanm_dir: ", $self->get_cpanm_dir() if $verbose;
    local $ENV{PERL_CPANM_HOME} = $self->{PERL_CPANM_HOME};
    local $ENV{PERL_CPAN_REPORTER_DIR} = $self->{PERL_CPAN_REPORTER_DIR};

    my @modules = ();
    if ($args->{module_list}) {
        @modules = @{$args->{module_list}};
    }
    elsif ($args->{module_file}) {
        @modules = path($args->{module_file})->lines({ chomp => 1 });
    }
    my @cmd = (
        $self->get_this_perl,
        "-I$self->get_lib_dir",
        $self->get_this_cpanm,
        @modules,
    );
    eval {
        local $@;
        my $rv = system(@cmd);
        say "<$@>" if $@;
        if ($verbose) {
            say $self->get_this_cpanm(), " exited with ", $rv >> 8;
        }
    };
    my $gzipped_build_log = $self->gzip_cpanm_build_log();
    say "See gzipped build.log in $gzipped_build_log" if $verbose;

    return $gzipped_build_log;

}

sub gzip_cpanm_build_log {
    my ($self) = @_;
    my $build_log_link = File::Spec->catfile($self->get_cpanm_dir, 'build.log');
    croak "Did not find symlink for build.log at $build_log_link"
        unless (-l $build_log_link);
    my $real_log = readlink($build_log_link);
    $self->{timestamp} = (File::Spec->splitdir(dirname($real_log)))[-1] || '';

    my $gzipped_build_log_filename = join('.' => (
        $self->{title},
        #(File::Spec->splitdir(dirname($real_log)))[-1],
        $self->{timestamp},
        'build',
        'log',
        'gz'
    ) );
    my $gzlog = File::Spec->catfile(
        $self->get_buildlogs_dir,
        $gzipped_build_log_filename,
    );
    system(qq| gzip -c $real_log > $gzlog |)
        and croak "Unable to gzip $real_log to $gzlog";
    $self->{gzlog} = $gzlog;
}

=head2 C<analyze_cpanm_build_logs()>

=over 4

=item * Purpose

Parse the F<build.log> created by running C<run_cpanm()>, creating JSON files
which log the results of attempting to install each module in the list or
file.

=item * Arguments

    $ranalysis_dir = $self->analyze_cpanm_build_logs( { verbose => 1 } );

Hash reference which, at the present time, can only take one element:
C<verbose>.  Optional.

=item * Return Value

String holding absolute path to the directory holding F<.log.json> files for a
particular run of C<run_cpanm()>.

=item * Comment

=back

=cut

sub analyze_cpanm_build_logs {
    my ($self, $args) = @_;
    croak "analyze_cpanm_build_logs: Must supply hash ref as argument"
        unless ( ( defined $args ) and ( ref($args) eq 'HASH' ) );
    my $verbose = delete $args->{verbose} || '';

    my $gzlog = $self->{gzlog};
    my ($fh, $working_log) = tempfile('acbl_XXXXX', UNLINK => 1);
    system(qq|gunzip -c $gzlog > $working_log|)
        and croak "Unable to gunzip $gzlog to $working_log";

    my $reporter = CPAN::cpanminus::reporter::RetainReports->new(
      force => 1, # ignore mtime check on build.log
      build_logfile => $working_log,
      build_dir => $self->get_cpanm_dir,
      'ignore-versions' => 1,
    );
    croak "Unable to create new reporter for $working_log"
        unless defined $reporter;
    no warnings 'redefine';
    local *CPAN::cpanminus::reporter::RetainReports::_check_cpantesters_config_data = sub { 1 };
    #$reporter->set_report_dir($ranalysis_dir);
    my $ranalysis_dir = $self->get_analysis_dir;
    $reporter->set_report_dir($ranalysis_dir);
    $reporter->run;
    say "See results in $ranalysis_dir" if $verbose;

    return $ranalysis_dir;
}


=head2 C<analyze_json_logs()>

=over 4

=item * Purpose

Tabulate the grades (C<PASS>, C<FAIL>, etc.) assigned to each CPAN
distribution analyzed in C<analyze_cpanm_build_logs()> and write to a
separator-delimited file.

=item * Arguments

    $fcdvfile = $self->analyze_json_logs( { verbose => 1, sep_char => '|' } );

Hash reference which, at the present time, can only take only two elements:

=over 4

=item * C<verbose>

Extra information provided on STDOUT.  Optional; defaults to being off;
provide a Perl-true value to turn it on.  Scope is limited to this method.

=item * C<sep_char>

The separator character used to delimit columns in the output file.  Optional;
two possibilities:

=over 4

=item * C<|>

Pipe -- in which case the file extension will be C<.psv> (default).

=item * C<,>

Comma -- in which case the file extension will be C<.csv>.

=back

=back

=item * Return Value

String holding absolute path to the separator-delimited file created by this
method.  This file will be placed in the F<storage/> directory in the results
tree as described above.

=back

=cut

sub analyze_json_logs {
    my ($self, $args) = @_;
    croak "analyze_json_logs: Must supply hash ref as argument"
        unless ( ( defined $args ) and ( ref($args) eq 'HASH' ) );
    my $verbose     = delete $args->{verbose}   || '';
    my $sep_char    = delete $args->{sep_char}  || '|';
    croak "analyze_json_logs: Currently only pipe ('|') and comma (',') are supported as delimiter characters"
        unless ($sep_char eq '|' or $sep_char eq ',');

    # As a precaution, we archive the log.json files.

    my $output = join('.' => (
        $self->{title},
        $self->{timestamp},
        'log',
        'json',
        'tar',
        'gz'
    ) );
    my $foutput = File::Spec->catfile($self->get_storage_dir(), $output);
    say "Output will be: $foutput" if $verbose;

    my $vranalysis_dir = $self->get_analysis_dir;
    opendir my $DIRH, $vranalysis_dir or croak "Unable to open $vranalysis_dir for reading";
    my @json_log_files = sort map { File::Spec->catfile('analysis', $_) }
        grep { m/\.log\.json$/ } readdir $DIRH;
    closedir $DIRH or croak "Unable to close $vranalysis_dir after reading";
    dd(\@json_log_files) if $verbose;

    my $versioned_results_dir = $self->get_results_tree;
    chdir $versioned_results_dir or croak "Unable to chdir to $versioned_results_dir";
    my $cwd = cwd();
    say "Now in $cwd" if $verbose;

    my $tar = Archive::Tar->new;
    $tar->add_files(@json_log_files);
    no strict 'subs';
    $tar->write($foutput, COMPRESS_GZIP);
    use strict;
    croak "$foutput not created" unless (-f $foutput);
    say "Created $foutput" if $verbose;

    # Having archived our log.json files, we now proceed to read them and to
    # write a pipe- (or comma-) separated-values file summarizing the run.

    my %data = ();
    for my $log (@json_log_files) {
        my $flog = File::Spec->catfile($cwd, $log);
        my %this = ();
        my $f = Path::Tiny::path($flog);
        my $decoded;
        {
            local $@;
            eval { $decoded = decode_json($f->slurp_utf8); };
            if ($@) {
                say STDERR "JSON decoding problem in $flog: <$@>";
                eval { $decoded = JSON->new->decode($f->slurp_utf8); };
            }
        }
        map { $this{$_} = $decoded->{$_} } ( qw| author dist distname distversion grade | );
        $data{$decoded->{dist}} = \%this;
    }
    #pp(\%data);

    my $cdvfile = join('.' => (
        $self->{title},
        $self->{timestamp},
        (($sep_char eq ',') ? 'csv' : 'psv'),
    ) );

    my $fcdvfile = File::Spec->catfile($self->get_storage_dir(), $cdvfile);
    say "Output will be: $fcdvfile" if $verbose;

    my @fields = ( qw| author distname distversion grade | );
    my $columns = [
        'dist',
        map { "$self->{title}.$_" } @fields,
    ];
    my $psv = Text::CSV_XS->new({ binary => 1, auto_diag => 1, sep_char => $sep_char, eol => $/ });
    open my $OUT, ">:encoding(utf8)", $fcdvfile
        or croak "Unable to open $fcdvfile for writing";
    $psv->print($OUT, $columns), "\n" or $psv->error_diag;
    for my $dist (sort keys %data) {
        $psv->print($OUT, [
           $dist,
           @{$data{$dist}}{@fields},
        ]) or $psv->error_diag;
    }
    close $OUT or croak "Unable to close $fcdvfile after writing";
    croak "$fcdvfile not created" unless (-f $fcdvfile);
    say "Examine ", (($sep_char eq ',') ? 'comma' : 'pipe'), "-separated values in $fcdvfile" if $verbose;

    return $fcdvfile;
}

1;

__END__