SQL*Loader Preserving Hierarchy

13 June, 2014

Every time I go to build a SQL*Loader control file, I find out that it can do things I didn't consider and can't do things that seemed obvious that it should. Somewhere between removing tufts of hair with fists and jumping up and down with joy, things finally get done. I find the lack of examples and documentation to be particularly frustrating. Or maybe I'm awful at Google. With peer assistance and Internetting, I finally got things to do my bidding. Anyway, I thought I'd put down some learnings in wordings to help prevent some headaches for others and myself when I forget things.

Background

So my task was to get a delimited, hierarchical file loaded into a set of tables. I wanted to avoid just dumping everything into a table and running a complicated stored procedure over the top of it, and the file was way to big to map into memory. And I didn't want to make a complicated process to break it into smaller chunks for tracking and time purposes. I'll mock up some completely contrived examples for the problem and solution explanation.

Example Data

Mind that this data is made up and simpler than my actual probably, but it is structured in a way that is on purpose and will illustrate the point without being excessively complex.

There are records for each person as indicated by the first field PERSON. Under each person are groupings of records that tie to the person: PERSON_PHONE, and PERSON_ADDRESS. Notice that there are more than one of these records. In theory you would be able to have any number of phone numbers or addresses for each person, and you can have any number of people.

PERSON|123|Robert||Robertson|
PERSON_PHONE|123|1115557777|CELLULAR
PERSON_PHONE|123|1115557772|WORK
PERSON_ADDRESS|123|555 Robin Ln||St. Robert|CA|88888|HOME
PERSON_ADDRESS|123|222 Bobbypin Circle|Suite 211|St. Robert|CA|88888|WORK
PERSON|123|William|W|Williams|Jr.
PERSON_PHONE|123|1115556666|HOME
PERSON_PHONE|123|1115556622|WORK
PERSON_ADDRESS|123|212 Wiley St. E||St. Robert|CA|88888-8888|WORK
PERSON_ADDRESS|123|444 Willie St|Apt 222|St. Robert|CA|88888|HOME

Data Model

These are the made-up tables for demonstration. Again, this is totally contrived, so please just ignore that I am being design and syntax-lazy.

CREATE TABLE PERSON (
    PERSON_ID NUMBER NOT NULL PRIMARY KEY,
    FILE_ID NUMBER,
    FIRST_NAME VARCHAR2,
    MIDDLE_NAME VARCHAR2,
    LAST_NAME VARCHAR2,
);

CREATE TABLE PERSON_PHONE (
    PERSON_PHONE_ID NUMBER NOT NULL PRIMARY KEY,
    PERSON_ID NUMBER FOREIGN KEY REFERENCES PERSON (PERSON_ID),
    FILE_ID NUMBER,
    PHONE_NUMBER VARCHAR2,
    PHONE_TYPE VARCHAR2
);

CREATE TABLE PERSON_ADDRESS (
    PERSON_ADDRESS_ID NUMBER NOT NULL PRIMARY KEY,
    PERSON_ID NUMBER FOREIGN KEY REFERENCES PERSON (PERSON_ID),
    FILE_ID NUMBER,
    STREET VARCHAR2,
    SUITE VARCHAR2,
    CITY VARCHAR2,
    STATE VARCHAR2,
    ZIP VARCHAR2
);

Failed First Attempt

Disclaimer: This does not work.

The first thing I tried was to use sequences. Bang-pow, done. Shut down everything and go home.

Control File:

LOAD DATA
CHARACTERSET WE8MSWIN1252

APPEND

INTO TABLE PERSON
WHERE (1) = 'PERSON'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
    RECORD_TYPE FILLER,
    FILE_ID,
    FIRST_NAME,
    MIDDLE_NAME,
    LAST_NAME,
    PERSON_ID "PERSON_SEQUENCE.NEXTVAL"
)

INTO TABLE PERSON_PHONE
WHERE (1) = 'PERSON_PHONE'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
    RECORD_TYPE FILLER,
    FILE_ID,
    PHONE_NUMBER,
    PHONE_TYPE,
    PERSON_PHONE_ID "PERSON_PHONE_SEQUENCE.NEXTVAL",
    PERSON_ID "PERSON.CURRVAL"
)

INTO TABLE PERSON_ADDRESS
WHERE (1) = 'PERSON_ADDRESS'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
    RECORD_TYPE FILLER,
    FILE_ID,
    STREET,
    SUITE,
    CITY,
    STATE,
    ZIP,
    PERSON_ADDRESS_ID "PERSON_ADDRESS_SEQUENCE.NEXTVAL",
    PERSON_ID "PERSON.CURRVAL"
)

This did not work for a number of reasons. I'll list them off.

Every time the SQL*Loader reads a record, it has some sort of positional index thing that dictates the position in the file to read. When the first field of each record is read, the default behavior is to take a NEXT. I don't claim to completely understand why it doesn't start at the beginning, but I suspect it's something to do with the WHEN clause eating position 1.
The WHEN clause doesn't care about the terminator. Notice how the "FIELDS TERMINATED BY" part comes after the match? Yeah. Because all the tables start with PERSON, all of them matched for the PERSON table. Ooops.
The SQL*Loader behavior does each "INTO TABLE" block through for the entire file before going to the next. And it commits between. This means that all the matches for PERSON are done first, then all the PERSON_PHONE, then PERSON_ADDRESS. As a result, using a sequence CURRVAL is not going to work. What happens is all the PERSON records get loaded first. Then the sequence is sitting on the most recent insert record. When the PERSON_PHONE records or PERSON_ADDRESS records come along, all of them get the PERSON_ID of the last inserted PERSON record. Not cool.

Successful Attempt

To say that this was attempt 2 would be a blatant lie. It would be more like attempt 12. Anyway, this is where I landed to achieve level 9001.

Luckily, I had complete control over the data file spec (I put it together from an even uglier file) and over the data model (at least enough to add a column). I could bake the hierarchy into the file and use some queries. Since my process created the hierarchy, I could trust it. I did try some nonsense using key values--for example, including the name in each phone and address record--but that would not guarantee a unique result, so I had to toss it out with the moldy peaches.

New Data File:

PERSON|123|1|Robert||Robertson|
PERSON_PHONE|123|1|1115557777|CELLULAR
PERSON_PHONE|123|1|1115557772|WORK
PERSON_ADDRESS|123|1|555 Robin Ln||St. Robert|CA|88888|HOME
PERSON_ADDRESS|123|1|222 Bobbypin Circle|Suite 211|St. Robert|CA|88888|WORK
PERSON|123|2|William|W|Williams|Jr.
PERSON_PHONE|123|2|1115556666|HOME
PERSON_PHONE|123|2|1115556622|WORK
PERSON_ADDRESS|123|2|212 Wiley St. E||St. Robert|CA|88888-8888|WORK
PERSON_ADDRESS|123|2|444 Willie St|Apt 222|St. Robert|CA|88888|HOME

Data Model:

CREATE TABLE PERSON (
    PERSON_ID NUMBER NOT NULL PRIMARY KEY,
    FILE_ID NUMBER,
    SEQUENCE_ID NUMBER,
    FIRST_NAME VARCHAR2,
    MIDDLE_NAME VARCHAR2,
    LAST_NAME VARCHAR2,
);

CREATE TABLE PERSON_PHONE (
    PERSON_PHONE_ID NUMBER NOT NULL PRIMARY KEY,
    PERSON_ID NUMBER FOREIGN KEY REFERENCES PERSON (PERSON_ID),
    FILE_ID NUMBER,
    PHONE_NUMBER VARCHAR2,
    PHONE_TYPE VARCHAR2
);

CREATE TABLE PERSON_ADDRESS (
    PERSON_ADDRESS_ID NUMBER NOT NULL PRIMARY KEY,
    PERSON_ID NUMBER FOREIGN KEY REFERENCES PERSON (PERSON_ID),
    FILE_ID NUMBER,
    STREET VARCHAR2,
    SUITE VARCHAR2,
    CITY VARCHAR2,
    STATE VARCHAR2,
    ZIP VARCHAR2
);

New Control File:

LOAD DATA
CHARACTERSET WE8MSWIN1252

APPEND

INTO TABLE PERSON
WHERE (1) = 'PERSON' AND (1) != 'PERSON_'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
    RECORD_TYPE FILLER POSITION(1),
    FILE_ID,
    SEQUENCE_ID,
    FIRST_NAME,
    MIDDLE_NAME,
    LAST_NAME,
    PERSON_ID "PERSON_SEQUENCE.NEXTVAL"
)

INTO TABLE PERSON_PHONE
WHERE (1) = 'PERSON_PHONE'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
    RECORD_TYPE FILLER POSITION(1),
    SEQUENCE_ID BOUNDFILLER,
    FILE_ID,
    PHONE_NUMBER,
    PHONE_TYPE,
    PERSON_PHONE_ID "PERSON_PHONE_SEQUENCE.NEXTVAL",
    PERSON_ID "(SELECT PERSON_ID FROM PERSON WHERE FILE_ID = :FILE_ID AND SEQUENCE_ID = :SEQUENCE_ID)"
)

INTO TABLE PERSON_ADDRESS
WHERE (1) = 'PERSON_ADDRESS'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
    RECORD_TYPE FILLER POSITION(1),
    SEQUENCE_ID BOUNDFILLER,
    FILE_ID,
    STREET,
    SUITE,
    CITY,
    STATE,
    ZIP,
    PERSON_ADDRESS_ID "PERSON_ADDRESS_SEQUENCE.NEXTVAL",
    PERSON_ID "(SELECT PERSON_ID FROM PERSON WHERE FILE_ID = :FILE_ID AND SEQUENCE_ID = :SEQUENCE_ID)"
)

Notice a few things.

The first field in each table has a POSITION(1) flag on it to fix the problem of grabbing a NEXT instead of the first column.
The PERSON table now has an AND in the WHEN clause for not matching to other tables.
Now there is a SQL query using the sequence number for referencing the right row.
1. The SEQUENCE_ID in the address and phone tables don't have a column, so we had to flag it is a FILLER, but not just any filler; a BOUNDFILLER. This allows the column to be available in the query for PERSON_ID. Otherwise no dice.
2. The select query is in parenthesis. If you don't do that, you can't use a select query. This is because you are only supposed to access functions in the control file. Apparently putting a select query in parenthesis makes it a function.

So there it is. Peace at last.

Mock Testing DAOs in JPA-style Hibernate

25 January, 2014

Looking for how to do a decent mock test on a DAO layer using JPA-style hibernate, I found nothing that perfectly fit what we were doing, so I thought I'd put down an example and help add to the world of knowledge.

The point here is that all of the code is executed right up to actually trying to make a database call. The query itself is mocked up, and so there is no external dependency. This is useful if--for some reason or another--the team that controls your automated builds puts up a firewall between your build server and your database. Also useful if you can't rely on the data in your database to be stable. Not so useful if you are getting failures calling out to the database, but you can always write non-mock tests for that purpose. That's easier anyway.

I don't feel especially like explaining all this since I'm lazy. That, and the links at the bottom include some decent explanations. So have a look over there, and then hopefully the example will speak for itself.

Note that all of this code is simplified from real, working examples, but I haven't actually compiled it--hopefully it's good. Anyway, onto some classes.

SillyTable.java

package com.intents.model;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.NamedQueries;
import javax.persistence.NamedQuery;
import javax.persistence.SequenceGenerator;
import javax.persistence.Table;@SuppressWarnings("serial")

@Entity
@NamedQueries({
    @NamedQuery(name = "SillyTable.findByPrimaryKey", query = "select st from SillyTable st where st.primaryKey = :primaryKey")
})
@Table(name = "SILLY_TABLE")
public class SillyTable {
    @Id
    @GeneratedValue(generator = "primaryKeySeq")
    @SequenceGenerator(name = "primaryKeySeq", sequenceName = "PRIMARY_KEY_SEQ")
    @Column(name = "PRIMARY_KEY", nullable = false)
    private Long primaryKey;
}

SillyTableDao.java

package com.intents.dao;

import com.intents.model.SillyTable;

public interface SillyTableDao {
    String FIND_BY_PRIMARY_KEY_QUERY = "SillyTable.findByPrimaryKey";
    String PRIMARY_KEY_PARAMETER = "primaryKey";

    SillyTable findByPrimaryKey(Long primaryKey) throws Exception;
}

SillyTableDaoImpl.java

package com.intents.dao.impl;

import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;
import javax.persistence.Query;
import org.springframework.stereotype.Repository;
import com.intents.dao.SillyTableDao;
import com.intents.model.SillyTable;

@Repository
public class SillyTableDaoImpl implements SillyTableDao {
    
    @PersistenceContext(unitName = "framework")
    private EntityManager entityManager;
    
    public void setEntityManager(EntityManager entityManager) {
        this.entityManager = entityManager;
    }

    public EntityManager getEntityManager() {
        return entityManager;
    }

    @Override
    public SillyTable findByPrimaryKey(final Long primaryKey) throws Exception {
        SillyTable result = null;
        Query query;
        try {
            query = getEntityManager().createNamedQuery(FIND_BY_PRIMARY_KEY_QUERY);
            query.setParameter(PRIMARY_KEY_PARAMETER, primaryKey);
            result = (SillyTable) query.getSingleResult();
        } catch (Exception e) {
            throw new Exception("Failed lookup! ZMOG!", e);
        }
        return result;
    }
}

SillyTableDaoImplMockTest.java

package com.intents.dao.impl;

import static org.junit.Assert.assertSame;
import static org.mockito.Matchers.any;
import static org.mockito.Mockito.when;
import javax.persistence.EntityManager;
import javax.persistence.NoResultException;
import javax.persistence.Query;
import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.mockito.Mock;
import org.mockito.runners.MockitoJUnitRunner;
import org.springframework.test.annotation.Rollback;
import org.springframework.transaction.annotation.Transactional;
import com.intents.dao.impl.SillyTableDaoImpl;
import com.intents.model.SillyTable;

@RunWith(MockitoJUnitRunner.class)
@Transactional
public class SillyTableDaoImplMockTest  {
    private static final String FIND_BY_PRIMARY_KEY_QUERY = "SillyTable.findByPrimaryKey";
    private static final Long PRIMARY_KEY = 9001L;

    @Mock
    EntityManager entityManager;

    @Mock
    Query query;

    SillyTableDaoImpl sillyTableDao = new SillyTableDaoImpl();

    @Before
    public void setup() {
        when(entityManager.find((Class<?>) any(), any())).thenReturn(null);
        sillyTableDao.setEntityManager(entityManager);
    }

    @Test
    @Rollback(true)
    public void testFindByPrimaryKeyReturnsResult() {
        SillyTable expectedSillyTable = new SillyTable();
        when(entityManager.createNamedQuery(FIND_BY_PRIMARY_KEY_QUERY)).thenReturn(query);
        when(query.getSingleResult()).thenReturn(expectedSillyTable);
        SillyTable returnedSillyTable = sillyTableDao.findByPrimaryKey(PRIMARY_KEY);
        assertSame(expectedSillyTable, returnedSillyTable);
    }

    @Test(expected = Exception.class)
    @Rollback(true)
    public void testFindByPrimaryKeyNoResultsThrowsException() throws Exception {
        when(entityManager.createNamedQuery(FIND_BY_PRIMARY_KEY_QUERY)).thenReturn(query);
        when(query.getSingleResult()).thenThrow(new NoResultException());
        sillyTableDao.findByPrimaryKey(PRIMARY_KEY);
    }
}

References Used

http://bitbybitblog.com/unit-testing-data-access-components/

http://www.luckyryan.com/2013/06/28/unit-testing-with-mockito/

http://api.nhindirect.org/java/site/direct-msg-monitor/1.1.3/xref-test/org/nhindirect/monitor/dao/impl/AggregationDAOImpl_confirmAggregationTest.html

Perl Is not Java: Quick and Dirty Unit Testing in Perl

10 November, 2013

Background

At work I recently needed to edit a file-manipulation script written in Perl. The script had no unit testing. It had no documentation. It had all of one subroutine; the rest was just straight script in loops and if blocks. It worked perfectly. However, it was time to implement additional functionality. Left with a raw script and only a basic working knowledge of Perl, I embarked on a quest to make the script a little more robust through unit testing.

Most of my coding experience is in Java, SQL, and proprietary IBM™ tools. I also dabble in a fair amount of PHP outside of work. Not to say I've never used anything else--just that I haven't done a lot of it. Since the Perl script in question is actually a fairly important component in our software, I really didn't want to screw it up. Of course, that means test-driven development, right? I didn't have a lot of time to spend, so I wanted to use tools that were fairly familiar. I also wanted anyone on the project to be able to quickly and easily pick up the script and not feel like they need to learn a whole new language to muddle through. So reuse of existing tools and only adding very basic components was important.

Also, since this was a work project, the code is confidential. If I'm being vague, it's because I can't tell your the secrets! Otherwise I'd probably give better examples.

Running Perl in Windows

The first thing I figured was to be able to actually run a Perl script in Windows. What I found was a thing called Strawberry. It worked with no fuss. I have no comment on whether it is the best or whatever.

The IDE

The best IDE I found was the EPIC plugin for Eclipse. This is with heavy bias because we use Eclipse for Java development, and remember: familiarity and not expanding the tool set is important in this case. Here is a link on how to install it: http://www.epic-ide.org/download.php

Side-node: Padre looks kinda cool.

Unit Test Package

There are quite a number of unit testing tools for Perl. Actually, they are quite good. They output in TAP (Test Anything Protocol). The most like JUnit I could find was Test::Unit. Here's a fairly decent example of how to use it: http://twoalpha.blogspot.com/2005/11/unit-testing-in-perl-with-testunit.html

Unfortunately, Test::Unit was not suitable for my purposes because it kind of assumed that my Perl code was object-oriented and neatly organized into modules. Also, it didn't seem to be part of the basic install of Strawberry. Keeping to the simple principle, Test::Unit was out...for now.

I finally landed on Test::More. It was fairly simple to use and part of the packages installed with Strawberry.

Organizing the Project

I spent way too much time trying to figure out how a project should be organized. In the end I completely ignored what others were explaining because it over-complicated for how simple this project was. This is what I ended up with:

Project Root
 |-scripts
 |  |-actualScript.pl
 |-testData
 |  |-testFile1.txt
 |  |-expectedOutput1.txt
 |-tests
 |  |-testScript.pl
 |  |-testResults.txt

One thing to note here is that my test is a simple script. I didn't need any speical .t file to make it a test case. To run the test script, I just right-clicked on the file and went to Run As>Perl Local. I didn't need any configuration settings, and it was nearly as easy to run as a JUnit test.

The Test Cases

Admittedly, I ended up kind of running some scenarios rather than truly writing unit tests. But it worked fine if you think the script as a whole is the unit.... The approach was to run the script on various files and compare the output against expected results. Then the test files become the test cases. It's maybe more like automated quality assurance, but since the script really only did some basic file reading and writing sort of stuff, it was ugly to write proper unit tests.

To run the test case, I needed to run the script from my test, which required input arguments. As it happens, that is not terribly difficult.

system("perl ../scripts/actualScript.pl \"$inputFileName\" \"$outputFileName\"");

I found the module File::Compare useful for doing the comparison of expected result to actual result. My tests looked something like this:

is(compare($expectedOutputFileName, $inputFileName), 0, "Test 1");

The Test Output

What I really wanted was a summarized output with red and green colors. Who doesn't, right? I didn't get it, though.

There is a module Test::Harness that may have been useful here, but I was too lazy to look into it completely; it looked overly complicated, and I didn't have that much time.

I did, however, manage to output the test cases to a flat file, which I found to be slightly better than scrolling through the console output with all the logging and such. I put this up at the beginning of the test script:

my $builder =Test::More->builder->output('testResults.txt');

On problem with this approach is that you don't see anything but pass or fail in this file. So the details about the failures are still in the console output, as is the summary information about how many tests were run and how many failed.

Putting It All Together

So here is the final test script, in its "entirety" (edited so that it doesn't give away so many trade secrets):

#!/usr/bin/perl

use File::Compare;
use Test::More tests => 3;

Test::More->builder->output('testResults.txt');
my $self = shift;

@testFileNameArray = (
    "TestCase1.txt",
    "TestCase2.txt",
    "TestCase3.txt"
);

foreach $testFileName (@testFileNameArray) {
    runScenario("../testData/" . $testFileName);
}

sub runScenario() {
    local ($inputFileName);
    ($inputFileName) = ( $_[0] );
    $expectedResultFileName = replace($inputFileName, "[.]txt", ".out");
    system("perl ../scripts/actualScript.pl \"$inputFileName\" \"$inputFileName.tmp\"" );
    is( compare( $expectedResultFileName, $inputFileName . ".tmp" ), 0, $inputFileName );
    unlink( $inputFileName . ".tmp" );
}

sub replace {
    my ( $string, $textToFind, $replacementText ) = @_;
    $string =~ s/$textToFind/$replacementText/ig;
    return $string;
}

To add a test case, one would just add a new file name into the array and bump up the number on line 4.

Here's an example of the output result file on success:

ok 1 - ../testData/TestCase1.txt
ok 2 - ../testData/TestCase2.txt
ok 3 - ../testData/TestCase3.txt

And here's where the second test failed:

ok 1 - ../testData/TestCase1.txt
not ok 2 - ../testData/TestCase2.txt
ok 3 - ../testData/TestCase3.txt

Conclusion

So that's it. I'm sure there is a better way, but this got me by for now. I am dissatisfied with the lack of testing tools available, though. Maybe that would be a good project....

Categories

Last posts

Pages

SQL*Loader Preserving Hierarchy

Background

Example Data

Data Model

Failed First Attempt

Successful Attempt

Mock Testing DAOs in JPA-style Hibernate

SillyTable.java

SillyTableDao.java

SillyTableDaoImpl.java

SillyTableDaoImplMockTest.java

References Used

Perl Is not Java: Quick and Dirty Unit Testing in Perl

Background

Running Perl in Windows

The IDE

Unit Test Package

Organizing the Project

The Test Cases

The Test Output

Putting It All Together

Conclusion