Thursday, April 4, 2013

Improving bulk inserts on SQLite

One of these days I decided that I needed to have a way to transport data from Siebel SQLAnywhere local database to anything that I could review information without having to install any Siebel application for it. My choice to go was SQLite and Perl to extract the data.

SQLite is far more "transportable" than SQL Anywhere, specially if you don't need all the fancy features it has when compared with SQLite.

First I checked if SQL::Translator wouldn't do the job, but it seems SQL Anywhere syntax is not supported by now. Then, after playing around with Perl DBI, DBD::ODBC (yes, Siebel local databases still uses ODBC, don't ask me why) and DBD::SQLite, I got something working that could check all SIEBEL schema tables DDL, create the tables into SQLite database and populate those tables with data. After all I got something like a "data dumper".

Unfortunately, performance was not that good. Specifically talking about the computer that I used for that (a old notebook with encrypted HD), the I/O was a performance hog: perl interpreter was not getting even 20% of all processor power available but the insert steps where really slow.

First thing when doing bulk inserts with DBI is to be sure to use binding parameters, which in fact was already applied:
my $dest_insert =
    'INSERT INTO '
  . $table . ' ('
  . join( ', ', @quoted_columns )
  . ') VALUES('
  . join( ', ', ( map { '?' } @quoted_columns ) ) . ')';

my $insert_sth = $sqlite_dbh->prepare($dest_insert);
$select_sth->execute();

while ( my $row = $select_sth->fetchrow_arrayref() ) {

    for ( my $i = 0 ; $i < scalar(@columns) ; $i++ ) {
        $insert_sth->bind_param( ( $i + 1 ),
        $row->[$i], $columns[$i]->sql_type() );
    }

    $insert_sth->execute();
    # and code goes on
}

With my homework done, I went to another basic step: check the storage back-end.

The thing is, the notebook was "plagued" with disk encryption: this means that everything in that HD that needs to be read/write must pass through the trouble to be encrypted/decrypted. While quite secure, doing I/O in these conditions are really bad for bulk inserts.

I decided then to go with a RAM disk: such solutions are uncommon in Microsoft realm, but after some configuration I got some good improvement. The problem is, the notebook didn't have enough RAM to have a RAM disk size that could hold the complete database file and keep everything else running... I ended exhausting all RAM memory available.

Next steps where taking a look about features of SQLite that could help on that... and the little database didn't let me down. Here is a quick list of the features that needed to be turned on:
By enabling those features I was able to get something in the middle of good performance and not exhausting the notebook resources.

One observation: COMMIT control in SQLite is really weird. I finished by using a combination to disabling Autocommit during database connecting and disabling it again after each COMMIT executed.

No comments:

Post a Comment