Nerds Central

Nature Podcast

Peter Day's World of Business

It's a COBOL world!

Tech Oriented Search

Enterprise Relational Data Management With DBC – “how easy can it be?”

Data Base Connectors from Micro Focus make COBOL data a first class citizen of the enterprise architectural armoury

Data becomes enterprise class when you can:

  • Access it from multiple application platforms
  • Analyse and mine it
  • Centralise it and manage it
  • Update it, secure it and obfuscate it

Traditional VSAM COBOL data is phenomenally efficient but fails all of the above to a greater or lesser extent. It fails because it can only be accessed through COBOL and hence all manipulation of it has to be through COBOL.

In One Step DBC Fixes All Of This


To illustrate, let me tell you a story

Two years ago I was involved with a blue chip client who had an ticking system managing IT error reports and upgrade requests. It was run out of kuala lumpur.

  • Each morning a not very well formatted CSV 'dump' of all the open tickets was sent to UK based project managers from KL.
  • A horrid, slow and complex VBScript (I wrote) then parsed this data and sent it to tables I had defined in SQL Server.
  • The data was then linked into Excel as a pivot table.

The managers of the project (to enhance ticket response times) absolutely loved the availability or data this way. However, the VBScript behind this was horrid and the need to carefully craft the database tables at the same time as write the script was a pain.

  1. It would be lovely to have a language which did the data definition and data manipulation all at the same time.
  2. It would be fabulous to have a language which was very easy to code for data processing but which runs very fast indeed.
  3. It would be just great to have a language which would parse the CSV file and write to the database at the same time.

ADD 1 TO CSHARP GIVING COBOL

With DBC - Micro Focus COBOL does all of the above!

Even with LINQ, C# does not tightly bind the programming language to write enabled relational data. This is exactly what DBC does. So, why not go over an example program which performs something a bit like the tasks I was doing with VBScript previously?

My example project is written in Micro Focus Studio Enterprise Edition. It consist of four programs. The core program being readData and this calls ioData,reReadData and obfuscator in sequence; each one of these sub-programs illustrates one or more of the

I will use an export file from ProcessMonitor showing about nine thousand network events which happened on my laptop. This is just data, but it is the kind of ragged data that enterprises produce like a fire-hose but which often prove rather hard to analyse.

      $set sourceformat(variable)
       program-id. ReadDataProgram as "DBC_Rollup.ReadDataProgram" IS INITIAL PROGRAM.

       environment division.
           file-control.
           select infile assign to "logfile.csv" organization is line sequential.

Here we can see that the core program just reads the log file line sequential. I read the whole file into memory and then pass it to ioData. The data is passed as a COBOL group. This is where COBOL just smashes most other languages, grouping and passing data around is just so easy,

       01 inputLines.
           03 inputLine pic x(1024) occurs 10240.
           03 lengths   binary-long occurs 10240.
           03 numbLines binary-long.

ioData uses a simple tokenizing parser to break the data into pieces and store it in a COBOL record structure. Each time it tokenizes a row from the input data, it stores it in the following file record structure:

       program-id. ioDataProgram as "DBC_Rollup.ioDataProgram".
      $SET CREATEXFD
      $SET CALLFH(ACUFH)
       environment division.
       input-output section.
       file-control.
       
        select ioDataFile assign to "ioData"
          organization indexed
          access mode dynamic
          record key is keyField.
        fd ioDataFile.
        
        01 accessRecord.
           03 keyField       pic x(16).
           03 appData.
             05 appName      pic x(16).
             05 appPidx      pic  x(6).
             05 appPid redefines appPidX pic 9(6).
           03 comsData.
             05 comsType     pic x(16).
             05 fromAddr     pic x(50).
             05 toAddr       pic x(50).
             05 result       pic  x(8).
             05 packetData   pic x(50).

I have chosen to store my data in the above structure. Now - spy the area in pink - that does the magic. These compiler directives make all standard file handling into DBC file handling for the sub-program. So, my parser will write to a database. In this case, I have pointed it to MS SQL Sever.

Here is a couple of lines from the log file to show what was being parsed.

"13:42:19.6855784","OUTLOOK.EXE","2964","TCP Send","nwb-alextlap.microfocus.com:56647 -> nwb-exchange.microfocus.com:1293","SUCCESS","Length: 253, startime: 159419, endtime: 159419, seqnum: 0, connid: 0"
"13:42:19.6855833","OUTLOOK.EXE","2964","TCP Receive","nwb-alextlap.microfocus.com:56647 -> nwb-exchange.microfocus.com:1293","SUCCESS","Length: 205, seqnum: 0, connid: 0"

Let me drive this home:
I HAVE NOT CREATED A TABLE FOR THIS DATA IN MS-SQL Server


Lets do the "ADD 1 TO CSHARP GIVING COBOL" thing and show this table being created on the fly

This is MS SQL Server before I run my COBOL program

Now my demo program has parsed all 9302 lines from the log file and stored them in the database!

DBC has automatically created the table from me - based on the COBOL data group description - no extra work on my part. What is more, the data is human readable!

Push the data to Excel

To prove the point, how about now reading and summarising the data from Excel. This is the last step in the story I told at the start. To do this we can click on the 'Data' tab on Excel and then 'Connections' and from that follow the sequence below:

Here is the finished result - a pivot chart in Excel which analyses the data in a way management can use.

Access it from multiple application platforms - DONE! What next?

The next thing my program does is to create a new table - using DBC again - from which we can read summarised data. Here is the file and group description from which the table is created automatically:

       select summaryFile assign to "summaryFile"
          organization indexed
          access mode dynamic
          record key is appName
          .
        fd summaryFile.
        
        01 accessRecord.
           03 appName       pic x(16).
           03 countOfType   binary-long.
Next, here is the jaw droppingly simple bit of SQL to populate the table which DBC has kindly created for me:
begin transaction
insert into
    dbo.summaryFile
select
    appname,
    count(appname) as count
from
    dbo.ioData
group by appname
commit
Indeed, the demo program displays this SQL for us and then waits for us to tell it that we have run it:

Above is MS SQL Server showing the new table DBC has automatically created for us and the data I just loaded into it using SQL. For the next trick - DBC can read that summary data DBC will read that summary data back into COBOL

So we can analyse it, mine it and (because it is just plain old relational data) share it with other software and platforms - DONE! What next?

The clue is in the above image, for the last trick the demo program will do a simple obfuscation on the data. To be truthful - this could be seen as 'showing off'. Because our data is now in a relational database, we could obfuscate with with a tool like Data Express. However, just to drive home how brain numbingly simple DBC usage is - let's just go for it!

My approach is to write a second file out with the process names obfuscated. The second file is keyed by the same keys of the main file. Actually, neither of these are files at all, they are tables in the database. But, I do not need to create this second table - so it is very easy to work this way. I know I keep banging on about this - but the productivity impact of automatic mapping from COBOL groups to SQL tables is profound.

           open input ioDataFile
           open output tmpFile
               perform until 1 < 0
                   read ioDataFile next
                       at end exit perform
                   end-read
                   move appName(1:1) to tmpData
                   move "**********" to tmpData(2:10)
                   move keyField to tmpKey
                   write tmpRecord
               end-perform
           close tmpFile
           close ioDataFile

           display "--Entering phase 2 of obfuscation"

           open input tmpFile
           open i-o ioDataFile
               perform until 1 < 0
                   read tmpFile next
                       at end exit perform
                   end-read
                   
                   move tmpKey to keyField
                   read ioDataFile
                   
                   move tmpData to appName
                   rewrite accessRecord
               end-perform
           close tmpFile
           close ioDataFile

Once the obfuscation sub-program has run, right clicking on the Excel pivot table and choosing refresh shows how the data in the database has been updated to hide the names of the applications.


Make COBOL data a first class enterprise data citizen - DONE!

Use COBOL group structures to define, write, update and read SQL Tables (more than even LINQ can do) - DONE!

Show that data clean up in COBOL/DBC is really easy (IMHO easier than C#) - DONE!


So What Now - Bringing DBC Into Systems Architecture

Well, apparently I have made this harder than it needs to be (thanks for pointing that out to Randy Zack - after I had done the example code). DBC has dynamic file reassignment so that individual files can be mapped to the database rather than at the level of a whole sub-program. What is more, which files are re-directed this way is changeable at runtime. All that is required is the file descriptor xml entities created by the Micro Focus compiler (there is an example of one of these at the end of the post). This means that we can start to make some sophisticated architectural topologies and then optimize them without touching any code.

Perhaps the simplest topology is to re-direct all the files from multiple nodes running COBOL to a central database. This is the sort of architecture which makes sense in an on line transaction processing system (OLTP) like CICS or a web based system.

However, anyone who has seriously worked with relational databases will know just precious a resource they can be. In a lot of OLTP systems, and even in some OLAP (on line analytical processing) ones, temporary data is created which is associates with sessions or intermediate calculations. These need not be sent to the database but use raw file access local to the particular node.

A similar topology could be used where branch offices process data locally and synchronize to a central business intelligence server overnight.

Powerful, dynamic topology management - DONE! Anything else?

Imagination is the limit. Here is an idea I came up with. Imagine two organizations have merged. One organization uses SQL Server and the other Oracle. COBOL/DBC could be used as a data translation batch processor to allow transfer and sharing of data between the two heterogeneous systems. Even though the two RDBMS' have different data types and locking semantics, DBC translates these to the common definitions in COBOL.


The COBOL demo program and xfd example

Please note that this is only 'demo grade' code and has some bad coding practices in it. For example, I allocate a 10M array to move data around rather than calling the sub-program for every few lines read from the line sequential reader and there are some issues with the timing code: This source is only here to illustrate a proof of concept!

      $set sourceformat(variable)
       program-id. ReadDataProgram as "DBC_Rollup.ReadDataProgram" IS INITIAL PROGRAM.

       environment division.
           file-control.
           select infile assign to "logfile.csv" organization is line sequential.

       configuration section.
       repository.
       
       data division.
           fd infile record is varying from 1 to 1024 depending on rec-len.
           01  inrec pic x(1024).

       working-storage section.
       01 rec-len binary-long. 
       01 inputLines.
           03 inputLine pic x(1024) occurs 10240.
           03 lengths   binary-long occurs 10240.
           03 numbLines binary-long.
       01 numbLinesV pic 9999.
       01 i binary-long.
       01 j binary-long.
       01 ts pic S9(6)V99.
       01 te pic S9(6)V99.
       01 tout pic 9(6).99.
       procedure division.
           open input infile
           display " "
           display "Start translation"
           display "================="
           accept ts from time
           perform varying i from 1 by 1 until i > 10250
               read infile
                at end exit perform
               end-read
               if i greater than 1 then
               subtract 1 from i giving j
               move inrec to inputLine(j)
               move rec-len to lengths(j)
               end-if
           end-perform
           close infile
           subtract 2 from i giving numbLines
           move numbLines to numbLinesV
           display "--Read " numbLinesV " lines from log"
           call "ioDataProgram" using by reference inputLines
           accept te from time
           subtract ts from te giving te
           divide te by 100 giving tout
           display "Total execution time (seconds) " tout
           
           display " "
           display "Entering Secondary Processing"
           display "============================="
           
           call "reReadProgram"
           
           call "obfuscator"    
           
           display " "
           display "ALL DONE!"
           goback.
       end program ReadDataProgram.
      $set sourceformat(variable)
       program-id. ioDataProgram as "DBC_Rollup.ioDataProgram".
      $SET CREATEXFD
      $SET CALLFH(ACUFH)
       environment division.
       input-output section.
       file-control.
       
        select ioDataFile assign to "ioData"
          organization indexed
          access mode dynamic
          record key is keyField.
        fd ioDataFile.
        
        01 accessRecord.
           03 keyField       pic x(16).
           03 appData.
             05 appName      pic x(16).
             05 appPidx      pic  x(6).
             05 appPid redefines appPidX pic 9(6).
           03 comsData.
             05 comsType     pic x(16).
             05 fromAddr     pic x(50).
             05 toAddr       pic x(50).
             05 result       pic  x(8).
             05 packetData   pic x(50).

        working-storage section.
        01 i binary-long.
        01 s binary-long.
        01 ss binary-long.
        01 e binary-long.
        01 l binary-long.
        01 s1 binary-long.
        01 e1 binary-long.
        01 l1 binary-long.
        01 token binary-long.
        01 inputLines.
            03 inputLine pic x(1024) occurs 10240.
            03 lengths   binary-long occurs 10240.
            03 numbLines binary-long.
        01 ts pic S9(6)V99.
        01 te pic S9(6)V99.
        01 tout pic 9(6).99.
        01 numbLinesV pic 9999.
        procedure division using 
               by reference inputLines.
            
            display "--Start writing lines"
            accept ts from time
            open output ioDataFile
            accept te from time
            subtract ts from te giving te
            divide te by 100 giving tout
            display "--File open/connect time (seconds) " tout
            
            accept ts from time
            perform varying i from 1 by 1 until i > numbLines
               move 1 to s
               move 0 to token
               perform varying e from 1 by 1 until e > lengths(i)
                   if inputLine(i)(e:1) equals "," then
                      subtract s from e giving l
                      add 1 to token
                      move s to ss
                      move e to s
                      if token equals 1 then
                          add 1 to ss
                          subtract 1 from l
                          move inputLine(i)(ss:l) to keyField
                      else if token equals 2 then
                          add 2 to ss
                          subtract 3 from l
                          move inputLine(i)(ss:l) to appName
                      else if token equals 3 then
                          move inputLine(i)(ss:l) to appPid
                      else if token equals 4 then
                          add 2 to ss
                          subtract 3 from l
                          move inputLine(i)(ss:l) to comsType
                      else if token equals 5 then
                          add 2 to ss
                          perform varying e1 from ss by 1 until e1 > e
                              if inputLine(i)(e1:2) equals "->"
                                   subtract ss from e1 giving l1
                                   move inputLine(i)(ss:l1) to fromAddr
                                   exit perform
                              end-if
                          end-perform
                          *> Remove -> space and the end "
                          add 3 to e1
                          subtract e1 from e giving l1
                          subtract 1 from l1
                          move inputLine(i)(e1:l1) to toAddr
                      else if token equals 6 then
                          add 2 to ss
                          subtract 3 from l
                          move inputLine(i)(ss:l) to result
                      else
                          add 2 to ss
                          subtract 2 from l
                          subtract ss from lengths(i) giving l
                          move inputLine(i)(ss:l) to packetData
                          exit perform
                      end-if
                   end-if
               end-perform

               write accessRecord

            end-perform

            close ioDataFile
            accept te from time
            subtract ts from te giving te
            divide te by 100 giving tout
            move numbLines to numbLinesV
            display "--Wrote " numbLinesV " lines to file/database in " tout " (seconds)"
            
            goback.
         end program ioDataProgram.
       program-id. reReadProgram as "DBC_Rollup.reReadProgram".
      $SET CREATEXFD
      $SET CALLFH(ACUFH)
       environment division.
       file-control.
       select summaryFile assign to "summaryFile"
          organization indexed
          access mode dynamic
          record key is appName
          .
        fd summaryFile.
        
        01 accessRecord.
           03 appName       pic x(16).
           03 countOfType   binary-long.
           
                       
       working-storage section.
           01 nullData pic x.
           01 countOfTypeV pic 9(6).
           01 appNameV     pic A(16).

           
       procedure division.
       display "Enter to continue (create summary table)"
       accept nullData
       open output summaryFile
       *> This just creates the table structure if it is not
       *> there already
       close summaryFile
       
       display "Please run this SQL:"
       display "begin transaction" 
       display "insert into "
       display " dbo.summaryFile"
       display "select "
       display " appname,"
       display " count(appname) as count"
       display "from"
       display " dbo.ioData"
       display "group by appname"
       display "commit"

       display "Enter to continue (read summary table)"
       accept nullData
       
       
       display "Application Name | Count Of Records"
       display "-----------------------------------"
       open input summaryFile
       perform until exit
       
           read summaryFile next
               at end
               exit perform
           end-read
           
           move countOfType to countOfTypeV
           move appName     to appNameV
           display appNameV " | " countOfTypeV
       
       end-perform
       goback.
       end program reReadProgram.
       program-id. obfuscator as "DBC_Rollup.obfuscator".
      $SET CREATEXFD
      $SET CALLFH(ACUFH)
       environment division.
       input-output section.
       file-control.
       
        select tmpFile assign to "tmpFile"
          organization indexed
          access mode dynamic
          record key is tmpKey.
        
        select ioDataFile assign to "ioData"
          organization indexed
          access mode dynamic
          record key is keyField.
          
        data division.
        file section.

        fd ioDataFile.        
        01 accessRecord.
           03 keyField       pic x(16).
           03 appData.
             05 appName      pic x(16).
             05 appPidx      pic  x(6).
             05 appPid redefines appPidX pic 9(6).
           03 comsData.
             05 comsType     pic x(16).
             05 fromAddr     pic x(50).
             05 toAddr       pic x(50).
             05 result       pic  x(8).
             05 packetData   pic x(50).     
             
       fd tmpFile.
           01 tmpRecord.
               03 tmpKey  pic x(16).
               03 tmpData pic x(16).

       working-storage section.
       01 ts pic S9(6)V99.
       01 te pic S9(6)V99.
       01 tout pic 9(6).99.
       01 nullData pic x.
       procedure division.
           display "Enter to continue (obfuscate)"
           accept nullData
           
           display "--Starting obfuscation"
           accept ts from time
           open input ioDataFile
           open output tmpFile
               perform until exit
                   read ioDataFile next
                       at end exit perform
                   end-read
                   move appName(1:1) to tmpData
                   move "**********" to tmpData(2:10)
                   move keyField to tmpKey
                   write tmpRecord
               end-perform
           close tmpFile
           close ioDataFile

           display "--Entering phase 2 of obfuscation"
           open input tmpFile
           open i-o ioDataFile
               perform until exit
                   read tmpFile next
                       at end exit perform
                   end-read
                   
                   move tmpKey to keyField
                   read ioDataFile
                   
                   move tmpData to appName
                   rewrite accessRecord
               end-perform
           close tmpFile
           close ioDataFile
           
           accept te from time
           subtract ts from te giving te
           divide te by 100 giving tout
           display "--File obfuscate time (seconds) " tout
           goback.
       end program obfuscator.

And for interest, here is the xml created by the XFD tool:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!-- tmpfile.xfd - generated on 26-May-09 14:29
     Micro Focus Net Express            V6.0 revision 000
     AXCGG/AA0/00000  GNR-040066000AE
 -->
<xfd:xfdfile
     xmlns:xfd="http://www.microfocus.com">
  <xfd:identification
       xfd:version="7">
    <xfd:select-name>TMPFILE</xfd:select-name>
    <xfd:maximum-numeric-digits>31</xfd:maximum-numeric-digits>
    <xfd:alphabet>None</xfd:alphabet>
    <xfd:period-character>.</xfd:period-character>
    <xfd:comma-character>,</xfd:comma-character>
    <xfd:file-organization>Indexed</xfd:file-organization>
    <xfd:table-name>TMPFILE</xfd:table-name>
    <xfd:number-of-keys>1</xfd:number-of-keys>
    <xfd:minimum-record-size>32</xfd:minimum-record-size>
    <xfd:maximum-record-size>32</xfd:maximum-record-size>
    <xfd:sign-compatibility>8</xfd:sign-compatibility>
  </xfd:identification>
  <xfd:keys>
    <xfd:key
         xfd:segment-count="1"
         xfd:duplicates-allowed="false">
      <xfd:segments>
        <xfd:segment
             xfd:segment-size="16"
             xfd:segment-offset="0"/>
      </xfd:segments>
      <xfd:key-columns
           xfd:key-column-count="1">
        <xfd:key-column
             xfd:key-column-name="TMPKEY"/>
      </xfd:key-columns>
    </xfd:key>
  </xfd:keys>
  <xfd:conditions
       xfd:condition-count="0"/>
  <xfd:fields
       xfd:elementary-items="2"
       xfd:elementary-items-with-occurs="2"
       xfd:total-items="3"
       xfd:total-items-with-occurs="3">
    <xfd:field
         xfd:field-level="1"
         xfd:field-name="TMPRECORD"
         xfd:field-condition="999"
         xfd:field-offset="0"
         xfd:field-bytes="32"
         xfd:field-type="152"
         xfd:field-length="32"
         xfd:field-scale="0"
         xfd:field-user-flags="0"/>
    <xfd:field
         xfd:field-level="3"
         xfd:field-name="TMPKEY"
         xfd:field-condition="0"
         xfd:field-offset="0"
         xfd:field-bytes="16"
         xfd:field-type="152"
         xfd:field-length="16"
         xfd:field-scale="0"
         xfd:field-user-flags="0"/>
    <xfd:field
         xfd:field-level="3"
         xfd:field-name="TMPDATA"
         xfd:field-condition="0"
         xfd:field-offset="16"
         xfd:field-bytes="16"
         xfd:field-type="152"
         xfd:field-length="16"
         xfd:field-scale="0"
         xfd:field-user-flags="0"/>
  </xfd:fields>
</xfd:xfdfile>

0 comments:

Post a Comment

Please feel free to leave comments or thoughts.

Recent Comments

Blog Archive