ETL and SODA (Service Oriented Data Architecture)

So over the last couple of days I have been really wracking my brain on the post I did about API vs ODBC.  And one of the things that it brought back to me was the idea of Service Oriented Data Architecture (SODA).

I wrote about it before in the context of getting tight and involved with the enterprise platforms evolution to SOA.

So in my thinking I decided to go back to the why.  Well I started in with my main drivers: Cost and Speed.  The solutions that Project X Ltd likes to work on deliver value to the end customer faster so that they can start garnering their corporate value out of the item of development.

In the definition (see below of links) of SOA, the main element is loosely coupled services.  These services can then be published for wider re-use, without wider knowledge of the underlying components of the service.  I think this is great from a technology point of view, but let’s go a little deeper in terms of ETL.

I may not always as I evolve to SOA or not decide I want to convert or open up all components that I build as services.  So how do I get the best out this concept.  This is where the underlying concepts of SOA are very interesting when generally applied.  The main 3 guiding principles according to wikipedia (no known full definition decided) are:

  1. Reuse, granularity, modularity, composability, componentization, interoperability
  2. Compliance to open standards (common and industry)
  3. Service identification and categorization

So I hope you can see why I fell upon this thinking in our API vs ODBC example.  First fact is that I am not opening my ETL process a a service so the 3rd item above does not hold, but I will use metadata and other tools to make the ETL process known within the organization.  But the other two are really key to me.

  • Re-use – when we build any frameworks, systems or platforms we try to build them for re-use.  This means that next time we do not have to completely build from scratch.  Good old object oriented programming.  This is not enough though in SODA.
  • Granularity – this one is tough for me to leverage, but in SOA case this is both fine and coarse grained services.  By allowing the two different types we can build and componentize how they are used.  Fine may be private while coarse open.  So in SODA this may be the use of base layer tables (data marts) over the actual tables.  The tables may not have the right context on their own, but the aggregates do.
  • Modularity – this one is  easy in that to create separation of functions allows for specialization and reuse.  This in ETL and SODA is looking for the ability to create components for re-use.  Say for example a file transfer gateway, zipping utility, cleansing tool – process steps within the ETL tool could be termed in here, but you are generally copying the process to another project as opposed to using it over again.
  • Composability – ability to build it – I am going to skip now other than to say that you need to be able to build new items with these components
  • Componentization – for me see modularity and composability above
  • Interoperability – Allow it to operate across disperate systems and platforms and connect.  In the ETL world this was a key to me.  But I will leave that to the next item.
  • Compliance to Open Standards – OK.  This is the point of the post…
    In ETL you build a special purpose piece of code to connect one data element to another.  When connecting you can use open standard interfaces or specialized ones built by the manufacturers of the different systems.  In the case from earlier this week or late last it was the DataStage ODBC driver vs the Teradata API.  The ODBC driver is the open standard the API is not.  So in SODA we would be aiming to stay away from this.

Now a couple things to remember the above are guiding principles and not rules.  So if performance is a problem and it is killing the ability to meet customer requirements (say using ODBC or W3C standards) then you will need to look at the API solution for ETL.  But this means that you are tightly coupled to the end platform.

Some musings from the weekend.

Definitions (for Jim):

ETL – Extract Transform and Load
SOA – Service Oriented Architecture – wikipedia definition
Loosely Coupled – For me this is about the ability to interconnect different systems in some sort of exchange where the connection is open enough that the systems do not need to be of the same platform or understand what is going on – wikipedia definition
Granularity – size and frequency of the communication (fine and coarse) – wikipedia definition

API – Application Program Interface – wikipedia definition

  1. Andy James Reply

    Hey buddy! Interesting post… I just landed on your blog courtesy google. I was thinking.. you could try to put up some current news and happenings. Will make ur blog more interesting.There are many news scrollers. I know of one on

Leave a Reply

captcha *