In our continuing series on acquainting you with less obvious features in Extrata, I bring you our episode on database writeback. The big selling feature of Extrata is that we can dump data in Excel format. There’s a reason for this. Most people in business and academia have some familiarity with Excel. Also, advanced users of Excel find wonderful ways of making solutions that can find squares into round holes. It’s quite amazing what people can do with Excel.
How The Feature Came About
Not too long into Extrata’s life cycle, I added the ability to use some databases (MySQL, ODBC, Postgresql and SQLite) as a data source. While the cloud API service probably has a wider appeal, I figured there were probably some instances where people would want to dump the contents of SQL statements into an Excel without having to constantly harass the IT department.
Once the capability to read from a database was wired in to Extrata, the next logical step to me seemed to be the addition of database writeback capability. After some discussion, Vince and I figured this was more of a niche feature, so we restrict the writeback capability to the enterprise edition of Extrata.
I made a judgment call and still force users to save data in a file format before I write into a database. My reasoning is that a lot of data sources in Extrata are API calls, and some services are metered. I would rather users have an extra copy of the API data in file format than none at all in the event of a database write error.
Extrata offers two ways to write to a database. The first way is to dump each record into a JSON field (for the databases that support it). The second way is to dump each record into a traditional table, where each table column name has to match the corresponding data attribute name from the source query.
Since the data that Extrata gets from data sources can be unstructured, the first method is my preferred way to write back to a database, especially if your data source is a cloud API. Cloud APIs can change without notice, and this will ultimately cause problems in the event that an attribute has been renamed, added or deleted.
This is why I prefer to stage my data loads into a table that has a single JSON attribute and use a query or view to format the data in the way I want.
How To Use Database Writeback
As I mentioned earlier, there are two types of database writeback. The first and easiest way is to write to a JSON column. The second way is to write to a predefined table.
One important thing to note is that in the current version of Extrata, it will only insert records into a database table. There is no option to clear all records prior to a load (this will be remedied in a future version).
Create A Database
If you don’t already have one, create a new database on a platform supported by Extrata– MySQL, Postgresql or SQLite. If your database supports ODBC, that will work too. Create the database and a corresponding ODBC DSN for that database.
Create A Database Source
To write to a database, Extrata needs to know what that database is. So in your Sources view, create a new source that points to the database you created in the last step. Note that any SQL sample code I provide here is for Postgresql. You may need to make adjustments for the database platform that you are using.
Create Your Database Table
Option 1: JSON Column
If you are using a JSON friendly database like Postgresql, you just need to create a simple table using code similar to below:
create table my_staging_table ( id bigserial, -- this column is optional but handy json_value json )
The id column is not required, but I like having a serialized ID to help me differentiate multiple inserts.
Option 2: Predefined Table
If you want to create a predefined table, you’re going to have to dry run a normal task against your source to generate an Excel or CSV file. The header names in the file are the names you should use as your table’s column names. Here’s an example of a predefined table for Calendarific exports:
create table my_staging_table ( id bigserial, -- this column is optional but handy name varchar(255), -- when setting varchar sizes, you should be familiar with the data first description text, type varchar(1024), country_code varchar(2), year integer, month integer, day integer, date_id integer )
Predefined tables are a little more “fragile” than using a JSON column, but depending on what database engine you’re using, a JSON column might not be an option.
The reason why predefined tables are more fragile is that if your data source output headers change, you’re probably going to encounter errors. Not only do the column names need to match, the data types do as well.
Enable Writeback On Your Task
In the Output Options of your task, go to the DB Output tab.
In the Database Destination drop down, select the source that you created earlier.
In the Table Name field, enter the name of your destination table.
In the Table Type drop down, select either “Write to a predefined table” or “Write to a JSON column” depending on the type of table you created earlier.
Save your task and then Run it.
If all goes well, you should find that the table you created earlier will be populated with data!
Doesn’t the database writeback feature make Extrata an ETL tool? Sorta, I guess…
Let’s set the record straight, Extrata is not designed to replace an industrial strength ETL tool. Having said that, it can do some lightweight ETL. If you run a task with database writeback with the command line interface, it can integrate nicely with many ETL tools. In fact, we’ve got a client who runs Extrata on a server to stage data for more complicated loads with Pentaho Data Integration.
If you care to be creative, you can do multiple passes on the same data with the database writeback feature. You can use one task to do the initial extraction into a database table. You can then create another task to extract the extracted data into yet another database table, ad infinitum. But just because you can, doesn’t mean you should.
It’s not for me to tell you how to use or not to use Extrata. Just use it in the way that maximizes value for you.
Other articles by this author:
- Easy Database Version Control (a.k.a. migrations) with Flyway
- Get To Know Extrata, Episode 1: The Command Line Interface
- Microsoft Excel Rules The World
- Quick Hit: Don’t Do This In Postgresql
- Postgresql Tip: Getting rid of redundant rows using DELETE USING