DataStage to Excel

I have a technical question and want to see if others have ideas…

We were asked by a client to build a DataStage ETL feed that outputs an excel file with multiple excel worksheets in the workbook.

Anyone out there have any ideas?

  1. Vincent McBurney Reply

    @Stephen, I like the trick of writing to html and still storing the data in different tables. The XML is harder, I’ve tried importing an Excel xml file format into DataStage and it’s just too complex to do easily. It would be a great trick if you could do it as you could produce fully formatted Excel files from scratch within DataStage on any platform. The problem is all the extra Excel formatting and organisation XML around the data.
    One other trick would be writing to SQL Server from Unix and having an Excel file use the table as a data source, refresh the data when you open the sheet. Given the low volumes you could use the free SQL Server version.

  2. Jacob Reply

    Another simple solution: save the Excel file as .xml format, and then use XML stages to process the .xml file.

  3. Stephen Hayward Reply

    I will give this a try. It seems very straightforward.
    Thanks and as they say, the cheque is in the mail
    🙂

  4. Morgan Goeller Reply

    There is a solution for this problem that I learned from my time as a web programmer.
    In a pinch, Excel will automatically try to import data from any format that it already knows. If it recognizes the format it will not even ask if you want to import, it will just do it for you.
    You can use this to your advantage by supplying the data in a format that you know is correct (like HTML) but isn’t Excel’s proprietary format. Your customer will get the data they want and never know the difference.
    To do this you will need to:
    1. Have DataStage store the data as an HTML page with a TABLE element inside. This page should be fully formed HTML and all the rows, columns, headers, and formatting should be present.
    2. Have DataStage output the HTML to a file with an extension that will be recognized by Excel (most likely .xls).
    3. Have your customer open the file just like any other spreadsheet. They don’t have to follow any prompts or click through any menu selections. It will even keep the formatting of the web page, so you can do pretty colors and outlines and headers if you wish.
    4. Look like a genius.
    5. Send me a check 😉
    I have used this method on a corporate intranet to allow users to download Excel files from web pages and it works very well. All you have to do is use the .xls extension on an HTML table and Excel will do the rest.
    BTW, you can try it yourself by creating your own HTML page with a table inside, giving the file an .xls extension, then trying to open it up with Excel.
    Good luck!

  5. Stephen Hayward Reply

    Vince,
    As always good to hear from you. Our challenge is that sadly we are on a Unix environment without connectivity to a Windows server.
    We knew we could easily build a VB script to convert or in worst case write a simple Macro to pull in the files into a work book. So I agree with your thoughts. Wasn’t sure if I had somehow missed something in DataStage on Unix.

  6. Vincent McBurney Reply

    It’s easy to do if you have a DataStage server on Windows. You just create an ODBC driver for Excel and use it in your DataStage job as a target stage. If you write to it from multiple concurrent jobs you should get the multiple worksheets where an ODBC table name represents a worksheet name.
    Not sure whether you need to create an empty Excel document with all the worksheets first or whether DataStage can create it.
    If you are Unix or Linux than you have a bigger challenge as Microsoft documents are a foreign concept and the Microsoft ODBC drivers are not present. You need some type of Unix to Windows bridging software that exposes ODBC connections on a Windows box to your Unix/Linux server. A cheaper option is to write to csv and let Excel convert it. You could write a master Excel document that runs a macro that opens a set of csv files and turns them into worksheets in a single document.
    regards
    Vincent from ITToolbox

Leave a Reply to Stephen Hayward Cancel reply

captcha *