Yet Another CPAN Grep

DBIx-DataModel/lib/DBIx/DataModel/Doc/Cookbook.pod

=encoding ISO8859-1

=head1 NAME

DBIx::DataModel::Doc::Cookbook - Helpful recipes


=head1 DOCUMENTATION CONTEXT

This chapter is part of the C<DBIx::DataModel> manual.

=over

=item *

L<SYNOPSIS AND DESCRIPTION|DBIx::DataModel>

=item *

L<DESIGN|DBIx::DataModel::Doc::Design>

=item *

L<QUICKSTART|DBIx::DataModel::Doc::Quickstart>

=item *

L<REFERENCE|DBIx::DataModel::Doc::Reference>

=item *

COOKBOOK

=item *

L<INTERNALS|DBIx::DataModel::Doc::Internals>

=item *

L<GLOSSARY|DBIx::DataModel::Doc::Glossary>

=back


=head1 DESCRIPTION

This chapter provides some recipes for common ORM tasks.

=head1 SCHEMA DECLARATION

=head2 Automatically generate a schema

A schema skeleton can be produced automatically from
the following external sources : 
a C<DBI> connection, a L<SQL::Translator> parser, or a
C<DBIx::Class> schema. 
See L<DBIx::DataModel::Schema::Generator|DBIx::DataModel::Schema::Generator>.
That schema skeleton contains enough information to be immediately
usable with minimal functionalities; but it is usually a good idea
to enrich the schema with additional specifications, like for example
types and column definitions.



=head2 Add custom methods into a generated table class

Defining methods in any Perl class does not require to have a I<file>
corresponding to that class; it suffices to define the method within
the appropriate I<package>. So the easiest way to add methods into
tables is to first let C<DBIx::DataModel> create the schema and table
classes, and then switch to those packages, all in the same file :


  # define schema, tables, associations (current package doesn't matter)
  DBIx::DataModel->Schema('Some::Schema')
    ->Table(qw/Foo foo foo_id/)
    ->Table(...)
    ->Association(...)
    ->...;

  # add a method into table 'Foo'
  package Some::Schema::Foo;
  sub my_added_method {
    my $self = shift;
    ...
  }

  # go back to main package
  package main;
  ...
   

Another way to achieve the same result is to use C<DBIx::DataModel>'s 
L<internal utility method|DBIx::DataModel::Meta::Utils/define_method>
for injecting methods into classes :

   use DBIx::DataModel::Meta::Utils qw/define_method/;
   define_method(
      class => 'Some::Schema::Foo',
      name  => 'my_added_method',
      body  => sub {my $self = shift; ...},
    );
  


=head2 Views within the ORM


C<define_table()> declarations usually map directly to database tables
or database views; but it is also possible to map to an SQL query,
possibly with a predefined C<where> clause :

  $schema->metadm->define_table(
    class       => 'View_example',
    db_name     => 'Foo INNER JOIN Bar ON Foo.fk=Bar.pk',
    where       => {col => $special_filter},
    primary_key => [qw/some_foo_col some_bar_col/],
    parents     => [map {$schema->metadm->table($_)} qw/Foo Bar/],
  );

The same can be declared through the front-end C<View()> method :

  $schema->View('View_example', '*',
                'Foo INNER JOIN Bar ON Foo.fk=Bar.pk',
                {col => $special_filter}, [qw/Foo Bar/],
                {primary_key => [qw/some_foo_col some_bar_col/],
                 parents     => [map {$schema->metadm->table($_)} qw/Foo Bar/]},
                );


This is exactly the same idea as a database view, except that it is
implemented within the ORM, not within the database. Such views can
join several tables, or can specify WHERE clauses to filter the
data. ORM views are useful to implement application-specific or
short-lived requests, that would not be worth registering persistently
within the database model. They can also be useful if you have no
administration rights in the database.


=head2 Object inflation/deflation

The term "object inflation" means that a scalar value read from a
column in the database is transformed into an Perl object in memory,
and is transformed back into a scalar value when writing into the
database. The standard example for such situations is the handling
of dates, because Perl programs often need to perform operations
on dates that are not possible with a plain scalar format.

Here is an example of automatic inflation/deflation of date columns
to Perl objects of class L<Date::Simple> :

  # declare column type
  use Date::Simple;
  $schema->Type(Date_simple => 
    from_DB => sub {Date::Simple->new($_[0]) if $_[0] },
    to_DB   => sub {$_[0] = $_[0]->as_str    if $_[0] },
  );
  
  # apply column type to columns
  My::Table1->metadm->define_column_type(Date_simple => qw/d_start d_end/);
  My::Table2->metadm->define_column_type(Date_simple => qw/d_birth/);

With this automatic conversion, all functionalities of L<Date::Simple>
can be applied to date columns within rows of C<Table1> and C<Table2> :
comparisons, date arithmetics, etc.


B<Caveat>: the C<from_DB> / C<to_DB> functions do not apply
automatically within C<-where> conditions. So the following
would not work :

  use Date::Simple qw/today/;
  my $rows = $schema->table($name)->select(
    -where => {d_end => {'<' => today()}},  # BOGUS
  );

because C<today()> returns a C<Date::Simple> object that will
not be understood by L<SQL::Abstract|SQL::Abstract> when
generating the SQL query. C<DBIx::DataModel> is not clever
enough to inspect the C<-where> conditions and decide
which column types to apply, so you have to do it yourself :

  my $today = today()->as_str;
  my $rows = $schema->table($name)->select(
    -where => {d_end => {'<' => $today}},
  );


=head2 SQL Types

At places where a plain value is expected, you can put an arrayref of
2 elements, where the first element is a type specification, and the
second element is the value. This is convenient when the DBD driver
needs additional information about the values used in the statement.
See L<SQL::Abstract::More/"BIND VALUES WITH TYPES"> for explanations.

  my $rows = $source->select(
    -where => {col => [{sql_type => 'some_type'}, $val]}
  );
  $source->insert(
    {key => $pk, some_col => [{sql_type => 'some_type'}, $val]}
  );
  $record->update(
    {some_col => [{sql_type => 'some_type'}, $val]}
  );

This can also be automated within a C<to_DB> handler :

  # adding type information for the DBD handler to inform Oracle about XML data
  $schema->Type(XML => 
     to_DB  => sub {$_[0] = [{dbd_attrs => {ora_type => ORA_XMLTYPE}}, $_[0]]
                        if $_[0]},
    );

=head2 Quoting table and column names

By default, table or column names are inserted "as is" in the generated SQL;
but sometimes this could cause conflicts with SQL reserved words.
The solution is to quote table and column names, by activating the
C<quote_char> option of L<SQL::Abstract>, inherited through
L<SQL::Abstract::More>. Here is an example :

  # define the schema
  DBIx::DataModel->Schema('SCH', {sql_abstract_args => [quote_char => "`"]});

  # define a table
  SCH->Table(qw/Config CONFIG KEY/);

  # produce SQL with quoted table and column names
  my ($sql, @bind) = SCH::Config->select(
    -columns   => [qw/KEY VALUE/],
    -where     => {KEY => 123},
    -result_as => 'sql',
   );

  print $sql; # SELECT `KEY`, `VALUE` FROM `CONFIG` WHERE ( `KEY` = ? )


=head2 Self-referential associations

Associations can be self-referential, i.e. describing tree
structures :

  $schema->Association([qw/OrganisationalUnit parent   1 ou_id       /],
                       [qw/OrganisationalUnit children * parent_ou_id/],

However, when there are several self-referential associations,
we might get into problems : consider

  $schema->Association([qw/Person mother   1 pers_id  /],
                       [qw/Person children * mother_id/])
         ->Association([qw/Person father   1 pers_id  /],
                       [qw/Person children * father_id/]); # BUG: children

This does not work because there are two definitions  of the "children"
role name in the same class "Person".
One solution is to distinguish these
roles, and then write by hand a general "children" role :

  $schema->Association([qw/Person mother          1 pers_id  /],
                       [qw/Person mother_children * mother_id/])
         ->Association([qw/Person father          1 pers_id  /],
                       [qw/Person father_children * father_id/]);
  
  package MySchema::Person;
  sub children {
    my $self = shift;
    my $id = $self->{pers_id};
    my $sql = "SELECT * FROM Person WHERE mother_id = $id OR father_id = $id";
    my $children = $self->dbh->selectall_arrayref($sql, {Slice => {}});
    MySchema::Person->bless_from_DB($_) foreach @$children;
    return $children;
  }

Alternatively, since rolenames C<mother_children> and C<father_children>
are most probably useless, we might just specify unidirectional
associations : 

  $schema->Association([qw/Person mother  1 pers_id  /],
                       [qw/Person ---     * mother_id/])
         ->Association([qw/Person father  1 pers_id  /],
                       [qw/Person ---     * father_id/]);

And here is a more sophisticated way to define the "children" method,
that will accept additional "where" criteria, like every regular method.

  package MySchema::Person;
  sub children {
    my $self      = shift; # remaining args in @_ will be passed to select()
    my $class     = ref $self;
    my $id        = $self->{pers_id};
    my $statement = $self->schema->table($class)->select(
      -where => [mother_id => $id, 
                 father_id => $id],
      -result_as => 'statement'
    );
    return $statement->select(@_);
  }

This definition forces the join on C<mother_id> or 
C<father_id>, while leaving open the possibility for the caller
to specify additional criteria. For example, all female children 
of a person (either father or mother) can now be retrieved through

  $person->children(-where => {gender => 'F'})

Observe that C<mother_id> and C<father_id> are inside an arrayref
instead of a hashref, so that L<SQL::Abstract> will generate an SQL 'OR'.


=head2 Schema versioning

Currently C<DBIx::DataModel> has no specific support
for schema versioning. See CPAN module L<DBIx::VersionedSchema>,
or switch to the L<DBIx::Class> ORM, that has good support for
schema versioning.


=head1 DATA RETRIEVAL

=head2 Database functions

Use normal SQL syntax for database functions, and give them
column aliases (with a vertical bar C<|>) in order to retrieve the results.

  my $row = $source->select(-columns   => [qw/MAX(col1)|max_col1
                                              AVG(col2)|avg_col2
                                              COUNT(DISTINCT(col3))|n_col3/],
                            -where     => ...,
                            -result_as => 'firstrow');
  print "max is : $row->{max_col1}, average is $row->{avg_col2}";

Or you can dispense with column aliases, and retrieve the results
directly into an arrayref, using C<< -result_as => 'flat_arrayref' >> :

  my $array_ref = $source->select(-columns   => [qw/MAX(col1)
                                                   AVG(col2)
                                                   COUNT(DISTINCT(col3))/],
                                  -where     => ...,
                                  -result_as => 'flat_arrayref');
  my ($max_col1, $avg_col2, $count_col3) = @$array_ref;

B<Caveat>: C<from_DB> handlers do not apply to 
database functions. So if the result needs any transformation,
you have to specify a column type for it at the statement level :

  my $row = $source->select(
    -columns      => [qw/MAX(d_begin)|max_d_begin MIN(d_end)|min_d_end .../],
    -where        => ...,
    -column_types => {Date_simple => [qw/max_d_begin min_d_end/],
    -result_as    => 'firstrow'
  );




=head2 Conditions on functions with special syntax

Some database systems have SQL functions with special syntax.
For example a fulltext search in Oracle is expressed as

  ... WHERE CONTAINS(fulltext_field, 'word') > 0

This does not fit well in a hashref to be passed as a C<-where> condition for 
L<SQL::Abstract::More>, because the name of the field and the bind value
are lost within the SQL syntax. To make it easier, we define a I<special operator>
for L<SQL::Abstract::More> :

  # define the schema
  DBIx::DataModel->Schema('SCH',
                          {sql_abstract_args => [sql_dialect => "Oracle12c",
                                                 special_ops => [{regex   => qr/^contains(:?_all|_any)?$/i,
                                                                  handler => \&_fulltext_contains_for_Oracle}]]});
  
  sub _fulltext_contains_for_Oracle {
    my ($self, $field, $op, $arg) = @_;

    my $sql = "CONTAINS($field, ?) > 0";
    my @bind;

    # Oracle connector for words : default '&', but '|' if op is -contains_any
    my $connector = ($op =~ /any$/) ? ' | ' : ' & ';

    # words to be passed to the CONTAINS function
    my @words = ref $arg ? @$arg : ($arg);
    @words = map { split /\s+/ } grep {$_} @words;

    @bind = (join $connector, @words);
    return ($sql, @bind);
  }

Now fulltext queries can be expressed easily as

  my $results = SCH->table('Table1')->select(
    -where => {fulltext_field1 => {-contains_all => ['ab', 'cd']},
               fulltext_field2 => {-contains_any => ['ef', 'gh', 'ij']},
              },
   );


=head2 Nested queries

For inserting a nested query within a basic query, we need to
pass the SQL and bind values of the nested query to L<SQL::Abstract>;
the syntax for this is a reference to an
arrayref (in other words a double reference), as explained in
L<SQL::Abstract/"Literal SQL with placeholders and bind values (subqueries)">.

C<DBIx::DataModel> has a feature to produce exactly this datastructure :

  my $subquery = $source1->select(..., -result_as => 'subquery');

Then it is easy to insert the subquery within another query.

  my $rows = $source2->select(
      -columns => ...,
      -where   => {foo => 123, bar => {-not_in => $subquery}},
   );



=head2 "Hashref inflation"

Unlike other ORMs, there is no need here to transform results into
hashrefs, because rows returned by a C<select()> can be used directly
as hashrefs.  For example here is a loop that prints a hash slice from
each row :

  my $rows       = $schema->table($name)->select(...);
  my @print_cols = qw/col3 col6 col7/;
  foreach my $row (@$rows) {
    print @{$row}{@print_cols};
  }

The only differences between row objects and plain Perl hashrefs are
that :

=over

=item *

they are blessed into a source class

=item *

they may contain an additional key C<< $row->{__schema}  >>
if C<DBIx::DataModel> is used in
L<multi-schema mode|DBIx::DataModel::Doc::Glossary/"multi-schema mode">.

=back

Those differences can often be ignored; but nevertheless they can be a problem
with some external modules like L<JSON> that croak when encoding
a blessed reference. In that case you can use the C<unbless()> function
which removes both the blessing and the C<__schema> key. Unblessing is
recursively applied to nested datastructures :

  $schema->unbless($rows);
  my $json = JSON->new->encode($rows);


=head2 Common table expressions (WITH RECURSIVE)

The SQL syntax for I<common table expressions> (CTEs),
introduced in L<SQL 1999|https://en.wikipedia.org/wiki/SQL:1999>,
defines a temporary name corresponding to a simple query,
so that this name can be used in a more general SQL statement :

  WITH [RECURSIVE] <tmp_table_name> (<col1>, ...) AS (<simple_query>)
  SELECT <main_query>

This is useful in two situations :

=over

=item *

when the I<tmp_table_name> is needed at several places within the main query

=item *

for expressing queries that willl I<recursively> traverse a graph of related nodes.
See SQLite examples at L<https://sqlite.org/lang_with.html>; but many other database
management systems also support CTEs. possibly with some slight variations.

=back

For using CTEs within C<DBIx::DataModel>, the first step is encapsulate the WITH query
as a new instance of L<SQL::Abstract::More>, through the 
L<SQL::Abstract::More/with_recursive> method.
Then that instance can be passed to C<DBIx::DataModel> statements through the 
C<-with> argument. Here is an example borrowed from L<https://sqlite.org/lang_with.html> : 

=over

=item *

suppose an initial table like this :

  CREATE TABLE family(name, mom, dad, bord, died)

The declaration within C<DBIx::DataModel> looks like this :

  my $schema = DBIx::DataModel->Schema('CTE_example');
  $schema->Table(qw/Family family name/); 

=item *

Encapsulate a C<descendant_of> common table expression as a new instance of L<SQL::Abstract::More> :

  sub sqla_with_CTE_descendant_of {
    my ($schema, $ancestor) = @_;

    return $schema->sql_abstract->with_recursive(
      [ -table     => 'parent_of',
        -columns   => [qw/name parent/],
        -as_select => {-columns => [qw/name mom/],
                       -from    => 'family',
                       -union   => [-columns => [qw/name dad/]]},
       ],
      [ -table     => 'descendant_of',
        -columns   => [qw/name/],
        -as_select => {-columns   => [qw/name/],
                       -from      => 'parent_of',
                       -where     => {parent => $ancestor},
                       -union_all => [-columns => [qw/parent_of.name/],
                                      -from    => [qw/-join parent_of {parent=name} descendant_of/]],
                   },
       ],
      );
  }

Note: this is defined at the level of C<SQL::Abstract::More>, not C<DBIx::DataModel>,
so the syntax for the join is C<< -from => [qw/-join parent_of {parent=name} descendant_of/] >>,
following the specification in L<SQL::Abstract::More/join>.


=item *

The CTE table C<descendant_of> will recursively find all descendants of any given ancestor.
This can be used as a subquery for selecting family members who are descendants :

  my $subquery    = \ ["SELECT name FROM descendant_of"];
  my $descendants = $schema->table('Family')->select(
    -with     => sqla_with_CTE_descendant_of($schema, $ancestor),
    -columns  => [qw/name born died/],
    -where    => {name => {-in => $subquery }},
    -order_by => 'born',
  );
  
=back

Another approach would be to declare C<ancestor_of> as a new table, and add a new association
with the C<family> table. This approach is displayed below; but it is not
recommanded because it creates I<permanent> metada within the schema, while CTEs are meant to be
used as I<temporary> constructs for building complex queries. Here is the example :

  $schema->Table(qw/Descendant_of descendant_of name/)
         ->Association([qw/Descendant_of descendants *  name/],
                       [qw/Family        family      1  name/]);
  
  my $descendants = $schema->join(qw/Descendant_of family/)->select(
    -with     => sqla_with_CTE_descendant_of($schema, $ancestor),
    -columns  => [qw/family.name born died/],
    -order_by => 'born',
  );


=head1 DATA UPDATE

=head2 Transaction

  # anonymous sub containing the work to do
  my $to_do = sub {
    $table1->insert(...);
    $table2->delete(...);
  };
  # so far nothing has happened in the database
  
  # now do the transaction
  $schema->do_transaction($to_do);

=head2 Nested transaction

  $schema->do_transaction(sub {
    do_something();
    $schema->do_transaction(sub { some_nested_code();       });
    $schema->do_transaction(sub { some_other_nested_code(); });
  });

=head2 Nested transaction involving another database

  $schema->dbh($initial_dbh);
  $schema->do_transaction(sub {

    # start working in $initial_dbh
    do_something();

    # now some work in $other_dbh
    $schema->do_transaction(sub { some_nested_code();       }, $other_dbh);

    # here, implicitly we are back in $initial_dbh
    $schema->do_transaction(sub { some_other_nested_code(); });
  });
  # commits in both $initial_dbh and $other_dbh are performed here


=head2 Generating primary keys

Most database systems have mechanisms to generate primary keys
automatically, generally as a sequence of natural numbers; however,
there may be situations where one would like primary keys to be generated
under other algorithms, like for example taking a random number, or taking
the next "free slot" in a sparse sequence of numbers. Algorithmic
generation of keys can be implemented in the ORM layer by overriding the
L<_singleInsert()|DBIx::DataModel::Doc::Internals/"_singleInsert"> method.
Here is an example :

  sub insert_with_random_key {
    my ($self) = @_;
    my $class = ref $self;
    my ($key_column) = $class->primary_key;
  
    for (1..$MAX_ATTEMPTS) {
      my $random_key = int(rand($MAX_RANDOM));
  
        $self->{$key_column} = $random_key;
        eval {$self->_rawInsert; 1} 
          and return $random_key;   # SUCCESS

        # if duplication error, try again; otherwise die
        last unless $DBI::errstr =~ $DUPLICATE_ERROR;
     }
     croak "cannot generate a random key for $class: $@";
  }
  
  foreach my $class (@tables_with_random_keys) {
    define_method(
      class          => $schema->metadm->table($class)->class,
      name           => '_singleInsert',
      body           => \&insert_with_random_key,
    );
  }


=head2 Cascaded operations

Some database systems support cascaded operations : for example
a constraint definition with a clause like C<ON DELETE CASCADE>
will automatically delete child rows (rows containing foreign keys)
when the parent row (the row containing the primary key) is deleted.

C<DBIx::DataModel> does not know about such cascaded operations in the
database; but it can perform some cascaded operations at the ORM level,
when tables are associated through a 
L<composition|DBIx::DataModel::Doc::Glossary/"composition">.
In that case, the C<insert()> method can accept a data tree as argument,
and will automatically perform recursive inserts in the children tables;
an example is given in the
L<quickstart tutorial|DBIx::DataModel::Doc::Quickstart/"Cascaded inserts">.
Cascaded deletes are also supported : 

  my $bach = HR->table('Employee')->fetch($bach_id); 
  $bach->expand('activities');
  $bach->delete; # deletes the Employee together with its Activities

The C<expand> operations retrieve related records and add them
into a tree in memory. Then C<delete> removes from the database
all records found in the tree.

Observe that this is not a "true" cascaded 
delete, because the client code is responsible for fetching the
related records first. 



=head2 Timestamp validation

Suppose we want to sure that the record was not touched between the time
it was presented to the user in a display form and the time
the user wants to update or delete that record. 

In order to do this, we will suppose that every record in every
table has a timestamp field C<TS_MODIF>, updated automatically by
a trigger within the database. When defining the schema, we
register an I<auto_update> callback on that column; such callbacks
are called automatically both on C<update()> and C<insert()> calls :

  DBIx::DataModel->define_schema(
   class               => 'My::Schema',
   auto_update_columns => {TS_MODIF => \&_check_time_stamp},
  );

The body of the callback looks like this : 

  sub _check_time_stamp {
    my ($record, $table, $where) = @_;
    if ($where) { # this is an update, not an insert

      my $displayed_timestamp = delete $record->{TS_MODIF};
      my $db_record  = $record->schema->table($table)->select(
        -columns   => 'TS_MODIF',
        -where     => $where,
        -for       => 'update', # optional, depends on your RDBMS
        -result_as => 'firstrow',
      )
        or croak "fetch timestamp: could not find record "
               . join(" / ", %$where);
     my $db_timestamp = $db_record->{TS_MODIF};
     $db_timestamp == $displayed_timestamp
       or croak "record in $table was modified by somebody else; please "
              . "refresh your screen and try again";
     }
  }

=head1 DATA CONVERSION

=head2 JSON

  use JSON;
  my $json_converter = JSON->new->convert_blessed(1);
  my $json_text      = $json_converter->encode($data_row);

By default, the L<JSON> module refuses to convert any object into JSON;
however, the L<JSON/convert_blessed> option will accept to convert objects
provided they possess a C<TO_JSON> method. Such a method is implemented in 
the L<DBIx::DataModel::Source/DBIx::DataModel::Source> class, so 
any data row can be converted into JSON.