Enterprise Relational Data Management With DBC – “how easy can it be?”
Data Base Connectors from Micro Focus make COBOL data a first class citizen of the enterprise architectural armoury
Data becomes enterprise class when you can:
- Access it from multiple application platforms
- Analyse and mine it
- Centralise it and manage it
- Update it, secure it and obfuscate it
Traditional VSAM COBOL data is phenomenally efficient but fails all of the above to a greater or lesser extent. It fails because it can only be accessed through COBOL and hence all manipulation of it has to be through COBOL.
In One Step DBC Fixes All Of This
To illustrate, let me tell you a story
Two years ago I was involved with a blue chip client who had an ticking system managing IT error reports and upgrade requests. It was run out of kuala lumpur.
- Each morning a not very well formatted CSV 'dump' of all the open tickets was sent to UK based project managers from KL.
- A horrid, slow and complex VBScript (I wrote) then parsed this data and sent it to tables I had defined in SQL Server.
- The data was then linked into Excel as a pivot table.
The managers of the project (to enhance ticket response times) absolutely loved the availability or data this way. However, the VBScript behind this was horrid and the need to carefully craft the database tables at the same time as write the script was a pain.
- It would be lovely to have a language which did the data definition and data manipulation all at the same time.
- It would be fabulous to have a language which was very easy to code for data processing but which runs very fast indeed.
- It would be just great to have a language which would parse the CSV file and write to the database at the same time.
ADD 1 TO CSHARP GIVING COBOL
With DBC - Micro Focus COBOL does all of the above!
Even with LINQ, C# does not tightly bind the programming language to write enabled relational data. This is exactly what DBC does. So, why not go over an example program which performs something a bit like the tasks I was doing with VBScript previously?
My example project is written in Micro Focus Studio Enterprise Edition. It consist of four programs. The core program being readData and this calls ioData,reReadData and obfuscator in sequence; each one of these sub-programs illustrates one or more of the
I will use an export file from ProcessMonitor showing about nine thousand network events which happened on my laptop. This is just data, but it is the kind of ragged data that enterprises produce like a fire-hose but which often prove rather hard to analyse.
$set sourceformat(variable)
program-id. ReadDataProgram as "DBC_Rollup.ReadDataProgram" IS INITIAL PROGRAM.
environment division.
file-control.
select infile assign to "logfile.csv" organization is line sequential.
Here we can see that the core program just reads the log file line sequential. I read the whole file into memory and then pass it to ioData. The data is passed as a COBOL group. This is where COBOL just smashes most other languages, grouping and passing data around is just so easy,
01 inputLines.
03 inputLine pic x(1024) occurs 10240.
03 lengths binary-long occurs 10240.
03 numbLines binary-long.
ioData uses a simple tokenizing parser to break the data into pieces and store it in a COBOL record structure. Each time it tokenizes a row from the input data, it stores it in the following file record structure:
program-id. ioDataProgram as "DBC_Rollup.ioDataProgram".
$SET CREATEXFD
$SET CALLFH(ACUFH)
environment division.
input-output section.
file-control.
select ioDataFile assign to "ioData"
organization indexed
access mode dynamic
record key is keyField.
fd ioDataFile.
01 accessRecord.
03 keyField pic x(16).
03 appData.
05 appName pic x(16).
05 appPidx pic x(6).
05 appPid redefines appPidX pic 9(6).
03 comsData.
05 comsType pic x(16).
05 fromAddr pic x(50).
05 toAddr pic x(50).
05 result pic x(8).
05 packetData pic x(50).
I have chosen to store my data in the above structure. Now - spy the area in pink - that does the magic. These compiler directives make all standard file handling into DBC file handling for the sub-program. So, my parser will write to a database. In this case, I have pointed it to MS SQL Sever.
Here is a couple of lines from the log file to show what was being parsed.
"13:42:19.6855784","OUTLOOK.EXE","2964","TCP Send","nwb-alextlap.microfocus.com:56647 -> nwb-exchange.microfocus.com:1293","SUCCESS","Length: 253, startime: 159419, endtime: 159419, seqnum: 0, connid: 0" "13:42:19.6855833","OUTLOOK.EXE","2964","TCP Receive","nwb-alextlap.microfocus.com:56647 -> nwb-exchange.microfocus.com:1293","SUCCESS","Length: 205, seqnum: 0, connid: 0"
Let me drive this home:
I HAVE NOT CREATED A TABLE FOR THIS DATA IN MS-SQL Server
Lets do the "ADD 1 TO CSHARP GIVING COBOL" thing and show this table being created on the fly
This is MS SQL Server before I run my COBOL program
Now my demo program has parsed all 9302 lines from the log file and stored them in the database!
DBC has automatically created the table from me - based on the COBOL data group description - no extra work on my part. What is more, the data is human readable!
Push the data to Excel
To prove the point, how about now reading and summarising the data from Excel. This is the last step in the story I told at the start. To do this we can click on the 'Data' tab on Excel and then 'Connections' and from that follow the sequence below:
Here is the finished result - a pivot chart in Excel which analyses the data in a way management can use.
Access it from multiple application platforms - DONE! What next?
The next thing my program does is to create a new table - using DBC again - from which we can read summarised data. Here is the file and group description from which the table is created automatically:
select summaryFile assign to "summaryFile"
organization indexed
access mode dynamic
record key is appName
.
fd summaryFile.
01 accessRecord.
03 appName pic x(16).
03 countOfType binary-long.
Next, here is the jaw droppingly simple bit of SQL to populate the table which DBC has kindly created for me:
begin transaction
insert into
dbo.summaryFile
select
appname,
count(appname) as count
from
dbo.ioData
group by appname
commit
Indeed, the demo program displays this SQL for us and then waits for us to tell it that we have run it:
Above is MS SQL Server showing the new table DBC has automatically created for us and the data I just loaded into it using SQL. For the next trick - DBC can read that summary data DBC will read that summary data back into COBOL
So we can analyse it, mine it and (because it is just plain old relational data) share it with other software and platforms - DONE! What next?
The clue is in the above image, for the last trick the demo program will do a simple obfuscation on the data. To be truthful - this could be seen as 'showing off'. Because our data is now in a relational database, we could obfuscate with with a tool like Data Express. However, just to drive home how brain numbingly simple DBC usage is - let's just go for it!
My approach is to write a second file out with the process names obfuscated. The second file is keyed by the same keys of the main file. Actually, neither of these are files at all, they are tables in the database. But, I do not need to create this second table - so it is very easy to work this way. I know I keep banging on about this - but the productivity impact of automatic mapping from COBOL groups to SQL tables is profound.
open input ioDataFile
open output tmpFile
perform until 1 < 0
read ioDataFile next
at end exit perform
end-read
move appName(1:1) to tmpData
move "**********" to tmpData(2:10)
move keyField to tmpKey
write tmpRecord
end-perform
close tmpFile
close ioDataFile
display "--Entering phase 2 of obfuscation"
open input tmpFile
open i-o ioDataFile
perform until 1 < 0
read tmpFile next
at end exit perform
end-read
move tmpKey to keyField
read ioDataFile
move tmpData to appName
rewrite accessRecord
end-perform
close tmpFile
close ioDataFile
Once the obfuscation sub-program has run, right clicking on the Excel pivot table and choosing refresh shows how the data in the database has been updated to hide the names of the applications.
Make COBOL data a first class enterprise data citizen - DONE!
Use COBOL group structures to define, write, update and read SQL Tables (more than even LINQ can do) - DONE!
Show that data clean up in COBOL/DBC is really easy (IMHO easier than C#) - DONE!
So What Now - Bringing DBC Into Systems Architecture
Well, apparently I have made this harder than it needs to be (thanks for pointing that out to Randy Zack - after I had done the example code). DBC has dynamic file reassignment so that individual files can be mapped to the database rather than at the level of a whole sub-program. What is more, which files are re-directed this way is changeable at runtime. All that is required is the file descriptor xml entities created by the Micro Focus compiler (there is an example of one of these at the end of the post). This means that we can start to make some sophisticated architectural topologies and then optimize them without touching any code.
Perhaps the simplest topology is to re-direct all the files from multiple nodes running COBOL to a central database. This is the sort of architecture which makes sense in an on line transaction processing system (OLTP) like CICS or a web based system.
However, anyone who has seriously worked with relational databases will know just precious a resource they can be. In a lot of OLTP systems, and even in some OLAP (on line analytical processing) ones, temporary data is created which is associates with sessions or intermediate calculations. These need not be sent to the database but use raw file access local to the particular node.
A similar topology could be used where branch offices process data locally and synchronize to a central business intelligence server overnight.
Powerful, dynamic topology management - DONE! Anything else?
Imagination is the limit. Here is an idea I came up with. Imagine two organizations have merged. One organization uses SQL Server and the other Oracle. COBOL/DBC could be used as a data translation batch processor to allow transfer and sharing of data between the two heterogeneous systems. Even though the two RDBMS' have different data types and locking semantics, DBC translates these to the common definitions in COBOL.
The COBOL demo program and xfd example
Please note that this is only 'demo grade' code and has some bad coding practices in it. For example, I allocate a 10M array to move data around rather than calling the sub-program for every few lines read from the line sequential reader and there are some issues with the timing code: This source is only here to illustrate a proof of concept!
$set sourceformat(variable)
program-id. ReadDataProgram as "DBC_Rollup.ReadDataProgram" IS INITIAL PROGRAM.
environment division.
file-control.
select infile assign to "logfile.csv" organization is line sequential.
configuration section.
repository.
data division.
fd infile record is varying from 1 to 1024 depending on rec-len.
01 inrec pic x(1024).
working-storage section.
01 rec-len binary-long.
01 inputLines.
03 inputLine pic x(1024) occurs 10240.
03 lengths binary-long occurs 10240.
03 numbLines binary-long.
01 numbLinesV pic 9999.
01 i binary-long.
01 j binary-long.
01 ts pic S9(6)V99.
01 te pic S9(6)V99.
01 tout pic 9(6).99.
procedure division.
open input infile
display " "
display "Start translation"
display "================="
accept ts from time
perform varying i from 1 by 1 until i > 10250
read infile
at end exit perform
end-read
if i greater than 1 then
subtract 1 from i giving j
move inrec to inputLine(j)
move rec-len to lengths(j)
end-if
end-perform
close infile
subtract 2 from i giving numbLines
move numbLines to numbLinesV
display "--Read " numbLinesV " lines from log"
call "ioDataProgram" using by reference inputLines
accept te from time
subtract ts from te giving te
divide te by 100 giving tout
display "Total execution time (seconds) " tout
display " "
display "Entering Secondary Processing"
display "============================="
call "reReadProgram"
call "obfuscator"
display " "
display "ALL DONE!"
goback.
end program ReadDataProgram.
$set sourceformat(variable)
program-id. ioDataProgram as "DBC_Rollup.ioDataProgram".
$SET CREATEXFD
$SET CALLFH(ACUFH)
environment division.
input-output section.
file-control.
select ioDataFile assign to "ioData"
organization indexed
access mode dynamic
record key is keyField.
fd ioDataFile.
01 accessRecord.
03 keyField pic x(16).
03 appData.
05 appName pic x(16).
05 appPidx pic x(6).
05 appPid redefines appPidX pic 9(6).
03 comsData.
05 comsType pic x(16).
05 fromAddr pic x(50).
05 toAddr pic x(50).
05 result pic x(8).
05 packetData pic x(50).
working-storage section.
01 i binary-long.
01 s binary-long.
01 ss binary-long.
01 e binary-long.
01 l binary-long.
01 s1 binary-long.
01 e1 binary-long.
01 l1 binary-long.
01 token binary-long.
01 inputLines.
03 inputLine pic x(1024) occurs 10240.
03 lengths binary-long occurs 10240.
03 numbLines binary-long.
01 ts pic S9(6)V99.
01 te pic S9(6)V99.
01 tout pic 9(6).99.
01 numbLinesV pic 9999.
procedure division using
by reference inputLines.
display "--Start writing lines"
accept ts from time
open output ioDataFile
accept te from time
subtract ts from te giving te
divide te by 100 giving tout
display "--File open/connect time (seconds) " tout
accept ts from time
perform varying i from 1 by 1 until i > numbLines
move 1 to s
move 0 to token
perform varying e from 1 by 1 until e > lengths(i)
if inputLine(i)(e:1) equals "," then
subtract s from e giving l
add 1 to token
move s to ss
move e to s
if token equals 1 then
add 1 to ss
subtract 1 from l
move inputLine(i)(ss:l) to keyField
else if token equals 2 then
add 2 to ss
subtract 3 from l
move inputLine(i)(ss:l) to appName
else if token equals 3 then
move inputLine(i)(ss:l) to appPid
else if token equals 4 then
add 2 to ss
subtract 3 from l
move inputLine(i)(ss:l) to comsType
else if token equals 5 then
add 2 to ss
perform varying e1 from ss by 1 until e1 > e
if inputLine(i)(e1:2) equals "->"
subtract ss from e1 giving l1
move inputLine(i)(ss:l1) to fromAddr
exit perform
end-if
end-perform
*> Remove -> space and the end "
add 3 to e1
subtract e1 from e giving l1
subtract 1 from l1
move inputLine(i)(e1:l1) to toAddr
else if token equals 6 then
add 2 to ss
subtract 3 from l
move inputLine(i)(ss:l) to result
else
add 2 to ss
subtract 2 from l
subtract ss from lengths(i) giving l
move inputLine(i)(ss:l) to packetData
exit perform
end-if
end-if
end-perform
write accessRecord
end-perform
close ioDataFile
accept te from time
subtract ts from te giving te
divide te by 100 giving tout
move numbLines to numbLinesV
display "--Wrote " numbLinesV " lines to file/database in " tout " (seconds)"
goback.
end program ioDataProgram.
program-id. reReadProgram as "DBC_Rollup.reReadProgram".
$SET CREATEXFD
$SET CALLFH(ACUFH)
environment division.
file-control.
select summaryFile assign to "summaryFile"
organization indexed
access mode dynamic
record key is appName
.
fd summaryFile.
01 accessRecord.
03 appName pic x(16).
03 countOfType binary-long.
working-storage section.
01 nullData pic x.
01 countOfTypeV pic 9(6).
01 appNameV pic A(16).
procedure division.
display "Enter to continue (create summary table)"
accept nullData
open output summaryFile
*> This just creates the table structure if it is not
*> there already
close summaryFile
display "Please run this SQL:"
display "begin transaction"
display "insert into "
display " dbo.summaryFile"
display "select "
display " appname,"
display " count(appname) as count"
display "from"
display " dbo.ioData"
display "group by appname"
display "commit"
display "Enter to continue (read summary table)"
accept nullData
display "Application Name | Count Of Records"
display "-----------------------------------"
open input summaryFile
perform until exit
read summaryFile next
at end
exit perform
end-read
move countOfType to countOfTypeV
move appName to appNameV
display appNameV " | " countOfTypeV
end-perform
goback.
end program reReadProgram.
program-id. obfuscator as "DBC_Rollup.obfuscator".
$SET CREATEXFD
$SET CALLFH(ACUFH)
environment division.
input-output section.
file-control.
select tmpFile assign to "tmpFile"
organization indexed
access mode dynamic
record key is tmpKey.
select ioDataFile assign to "ioData"
organization indexed
access mode dynamic
record key is keyField.
data division.
file section.
fd ioDataFile.
01 accessRecord.
03 keyField pic x(16).
03 appData.
05 appName pic x(16).
05 appPidx pic x(6).
05 appPid redefines appPidX pic 9(6).
03 comsData.
05 comsType pic x(16).
05 fromAddr pic x(50).
05 toAddr pic x(50).
05 result pic x(8).
05 packetData pic x(50).
fd tmpFile.
01 tmpRecord.
03 tmpKey pic x(16).
03 tmpData pic x(16).
working-storage section.
01 ts pic S9(6)V99.
01 te pic S9(6)V99.
01 tout pic 9(6).99.
01 nullData pic x.
procedure division.
display "Enter to continue (obfuscate)"
accept nullData
display "--Starting obfuscation"
accept ts from time
open input ioDataFile
open output tmpFile
perform until exit
read ioDataFile next
at end exit perform
end-read
move appName(1:1) to tmpData
move "**********" to tmpData(2:10)
move keyField to tmpKey
write tmpRecord
end-perform
close tmpFile
close ioDataFile
display "--Entering phase 2 of obfuscation"
open input tmpFile
open i-o ioDataFile
perform until exit
read tmpFile next
at end exit perform
end-read
move tmpKey to keyField
read ioDataFile
move tmpData to appName
rewrite accessRecord
end-perform
close tmpFile
close ioDataFile
accept te from time
subtract ts from te giving te
divide te by 100 giving tout
display "--File obfuscate time (seconds) " tout
goback.
end program obfuscator.
And for interest, here is the xml created by the XFD tool:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!-- tmpfile.xfd - generated on 26-May-09 14:29
Micro Focus Net Express V6.0 revision 000
AXCGG/AA0/00000 GNR-040066000AE
-->
<xfd:xfdfile
xmlns:xfd="http://www.microfocus.com">
<xfd:identification
xfd:version="7">
<xfd:select-name>TMPFILE</xfd:select-name>
<xfd:maximum-numeric-digits>31</xfd:maximum-numeric-digits>
<xfd:alphabet>None</xfd:alphabet>
<xfd:period-character>.</xfd:period-character>
<xfd:comma-character>,</xfd:comma-character>
<xfd:file-organization>Indexed</xfd:file-organization>
<xfd:table-name>TMPFILE</xfd:table-name>
<xfd:number-of-keys>1</xfd:number-of-keys>
<xfd:minimum-record-size>32</xfd:minimum-record-size>
<xfd:maximum-record-size>32</xfd:maximum-record-size>
<xfd:sign-compatibility>8</xfd:sign-compatibility>
</xfd:identification>
<xfd:keys>
<xfd:key
xfd:segment-count="1"
xfd:duplicates-allowed="false">
<xfd:segments>
<xfd:segment
xfd:segment-size="16"
xfd:segment-offset="0"/>
</xfd:segments>
<xfd:key-columns
xfd:key-column-count="1">
<xfd:key-column
xfd:key-column-name="TMPKEY"/>
</xfd:key-columns>
</xfd:key>
</xfd:keys>
<xfd:conditions
xfd:condition-count="0"/>
<xfd:fields
xfd:elementary-items="2"
xfd:elementary-items-with-occurs="2"
xfd:total-items="3"
xfd:total-items-with-occurs="3">
<xfd:field
xfd:field-level="1"
xfd:field-name="TMPRECORD"
xfd:field-condition="999"
xfd:field-offset="0"
xfd:field-bytes="32"
xfd:field-type="152"
xfd:field-length="32"
xfd:field-scale="0"
xfd:field-user-flags="0"/>
<xfd:field
xfd:field-level="3"
xfd:field-name="TMPKEY"
xfd:field-condition="0"
xfd:field-offset="0"
xfd:field-bytes="16"
xfd:field-type="152"
xfd:field-length="16"
xfd:field-scale="0"
xfd:field-user-flags="0"/>
<xfd:field
xfd:field-level="3"
xfd:field-name="TMPDATA"
xfd:field-condition="0"
xfd:field-offset="16"
xfd:field-bytes="16"
xfd:field-type="152"
xfd:field-length="16"
xfd:field-scale="0"
xfd:field-user-flags="0"/>
</xfd:fields>
</xfd:xfdfile>
0 comments:
Post a Comment
Please feel free to leave comments or thoughts.