From: Alistair on
On Feb 3, 4:16 pm, SomeGuy <jimgr...(a)nc.rr.com> wrote:
> On Feb 3, 7:01 am, Alistair <alist...(a)ld50macca.demon.co.uk> wrote:
>
>
>
>
>
> > On Feb 2, 9:11 pm, SomeGuy <jimgr...(a)nc.rr.com> wrote:
>
> > > On Feb 2, 7:18 am, Alistair <alist...(a)ld50macca.demon.co.uk> wrote:
>
> > > > On Feb 1, 9:43 pm, Richard <rip...(a)Azonic.co.nz> wrote:
>
> > > > > On Feb 2, 6:11 am, SomeGuy <jimgr...(a)nc.rr.com> wrote:
>
> > > > > > Need to identify some database files used by a PC COBOL program
> > > > > > written in the mid-90's.  The extensions are .DB and .IDX.  Given the
> > > > > > date, language and OS, are there any candidates you can think of?  I
> > > > > > can send a sample of the files if that would help.
>
> > > > > > Thanks,
> > > > > > Jim
>
> > > > > The .DB is probably a user choice. The .IDX is most likely an index
> > > > > file for the .DB. If the first two bytes of the .IDX is 0xFE53 then it
> > > > > is probable that these are MicroFocus LevelII/CISAM format indexed
> > > > > files.
>
> > > > > The first block of the .IDX should have further information giving
> > > > > record length and key information (size and start position).
>
> > > > > If the files are LevelII/CISAM then the data records in the .DB will
> > > > > be fixed length with CR/LF record terminators. Other formats may have
> > > > > variable length records with record headers and/or may have compressed
> > > > > data.
>
> > > > > Without an FD entry you are unlikely to be able to know what the data
> > > > > fields are or even where they start/end within the record.
>
> > > > Am I the only person who remembers dBase files which (IIRC) were
> > > > suffixed .DB?
> > > > Since the .DB files are unlikely to contain cobol specific data items
> > > > then importing the flat files in to MS Access would be an option. It
> > > > would require some understanding of data formats and intelligent
> > > > guessing of layouts. Not too difficult, even I have done that in the
> > > > past.
>
> > > How would one go about guessing the layout of a COBOL-generated file
> > > for which you know next to nothing about the layout?  Note that,
> > > unlike DBase files, I can discern no field descriptors (name, type,
> > > start, length, etc...) in the file.
>
> > > Thanks.- Hide quoted text -
>
> > > - Show quoted text -
>
> > The same way that I went about guessing the contents of a VDF (Visual
> > Data Flex) file: look at the reports or screens produced/used and tie
> > their contents to the data in the file. It takes a bit of brain
> > processing and is not guaranteed to be 100% foolproof especially if
> > you don't know the data formats available to the database.
>
> > If you don't have the file layouts and probably you don't have report
> > or screen shots then you probably won't be able to resolve the issue.
>
> We do have screen shots and output, but I can't imagine that approach
> would be economical for this project.
>

It worked for me on a project with about 8 database files. It was a
right pain as the compression of the data resulted in rubbish
characters in the data stream. I was faced with a lot of manual
editing.

From: James J. Gavan on
>>SomeGuy wrote:
>>
>>
>>pic 9(04)v9(02) comp-3 (contains 012345) shows as '0123.45' in the
>>display dialog. BUT it doesn't give you a column-header description, nor
>>when you look at the size does it indicate whether or not the field was
>>specified as :-
>>
>>- pic 9(04).9(02), pic 9(04)v9(02) or pic 9(04)v9(02) comp-3, and
>>depending upon the compiler, other variations on the 'comp-3', such as
>>comp-1, comp-5.
>>
>>The only thing it specifically does, in the case of ISAM, is define the
>>positioning of the PrimeKey and any Alternate Keys, such as :-
>>
>>- PrimeKey (1:20) Alt-Key-1 (30:40) Alt-Key-2 (78:20)
>>
>>I drafted something, but I don't think I sent it. Either the end-user
>>has got to show you what they have, from reports, (which will still
>>entail you doing a lot of messing about to extract the data), or bite
>>the bullet, and for an acceptable fee get the file formats from the
>>original developers - *IF* they will sell them to you !
>>
>>Did you google on COBOL data conversions, or look at the COBOL FAQ for
>>help ?
>>
>
> To be honest, never having used COBOL, I didn't really follow
> everything you posted. I looked at the Net Express website but
> couldn't find information about the DFE. I did try the Siber Systems
> Data Viewer. It manages to report a lot of columns, many of them with
> legitimate-looking values.

Doesn't surprise me; no doubt you could clearly see text fields such as :-

01 CustomerRecord.
05 CustomerKey pic x(05).
05 Customer Name pic x(40).
---> 'Encana Construction Ltd.................."
or Usage Display values :-

05 GSTorVatNumber pic 9(10).
----> '8282233344'

Your gibberish will be (which accounts for "GUESSED-535-1" ):-

05 SalesThisWeek pic s9(04)v9(02) comp-3.
05 Sales YTD pic s9(10)v9(02) comp-3.

But the rest have basically gibberish and
> all have only generated names (like "GUESSED-535-1").
>
> Another thought: is there a tool that will scan COBOL source and
> produce a report (copybook? FDD?) with the layout? If so, perhaps I
> can give the tool to the client to run for me (assuming they have
> source).
>
No there just aren't such tools; as previously indicated, a limited
amount in the File Header record, primarily to do with sizing and
accessing but the real answer is in the record formats.

Even assuming you are very proficient at 'bit-fiddling', and given you
had the record layouts, you've still got to locate the fields by size
and translate them into numeric values that you want. (Seeing as Bob has
just recently done a low key sales pitch :-), check out Flexus.com. They
have a document from Michael Mattias, a contributor here, explaining
COBOL binary fields).

I can assure you GIVEN a COBOL programmer had BOTH the record layouts,
and the appropriate compiler, (We're still into assuming it's Micro
Focus), it wouldn't take too much time to knock out a conversion per
file. (Which was what DD was alluding to). The steps are :-

1 - create a COBOL source that contains the copyfiles for the FD and
Record layouts, which Alistair and I mentioned.
2 - above includes the record layout additionally for a CSV file
3 - Open Your file as input and the CSV file as Output
4 - read records sequentially and just move the input fields to text and
non-binary fields as appropriate.

01 CSV-Record.
05 CustomerKey pic x(06).
05 pic x value ",".
05 CustomerName pic x(40).
05 pic x value ",".
05 GSTorVatNumber pic 9(10).
05 pic x value ",".
05 SalesThisWeek pic -9999.99. *> shows +/-
05 pic x value ",".
05 Sales YTD pic -9999999999.99. *> shows +/-
05 pic x value ",".
05 pic x value ",".
05 etc....

It is not a challenging exercise; I've done it moving RM/COBOL data
files to Micro Focus; but in my case I additionally had the advantage of
bit routines from Micro Focus, to convert RM binaries to M/F format.
Why my Two Step approach ? Without going into details, I took the
opportunity to enhance the application, so wrote to the output CSVs,
which meant incoming Record-A might finish up as output Records B and C,
particularly when I got into (R)DBMS and SQL.

Bear in mind I had the compiler for RM/COBOL as well, and MOST
IMPORTANTLY, having programmed the application in RM, I also had the RM
RECORD FORMATS !

Jimmy, Calgary AB
From: Michael Wojcik on
Alistair wrote:
>
> The same way that I went about guessing the contents of a VDF (Visual
> Data Flex) file: look at the reports or screens produced/used and tie
> their contents to the data in the file. It takes a bit of brain
> processing and is not guaranteed to be 100% foolproof especially if
> you don't know the data formats available to the database.
>
> If you don't have the file layouts and probably you don't have report
> or screen shots then you probably won't be able to resolve the issue.

In other words, this is a forensic exercise. It's impossible to
reconstruct the data format with guaranteed complete accuracy in the
general case, and difficult in many specific cases. You'd need to
perform a cost/benefit analysis to determine how much effort is
reasonable to expend on it.

--
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
From: Michael Wojcik on
SomeGuy wrote:
> On Feb 2, 8:23 am, Fred Mobach <f...(a)mobach.nl> wrote:
>> Did you already try to use the file command ? See :http://www.darwinsys.com/file/
>
> Never heard of it before, but just tried online at http://swoag.webhop.org/
> (which per Wikipedia uses it internally). Reports both the DB and IDX
> as "data". Thanks.

The standard Unix file command does not contain information about file
types. Instead, it uses a side file, /etc/magic, which describes
heuristic identifiers for various kinds of files.

Originally, /etc/magic was just a list of "magic cookie" values that
appeared as the first few bytes of a handful of specific file types on
various Unix implementations - executables, shell scripts, archive
libraries, etc. Later implementations of the file command and
/etc/magic are more sophisticated, and entries in /etc/magic can be
fairly complicated rules (along the lines of search for this regular
expression, then get the value of this byte at this offset from the
match, and so on).

So just using some random implementation of the file command doesn't
guarantee that you're using one with a very comprehensive /etc/magic
file. And many ISVs add entries to /etc/magic as part of product
installation, to recognize the particular file types their code generates.

Since http://swoag.webhop.org/ provides no information (that I could
find) about what implementation it's using, who knows if it's any good?

Of course, this won't help you identify the files in question, unless
you want to go around trying different file implementations (which you
probably don't).

Incidentally, file implementations are available for Windows, for
example as part of Cygwin. (For the record, the Cygwin /etc/magic
doesn't recognize MF ISAM files or their index files as anything but
"data". I'm not sure there *is* anything in MF ISAM files that can be
used to distinguish them.) While they may not help in this particular
case, they can be useful in others.

--
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
From: Alistair on
On Feb 4, 4:02 pm, Michael Wojcik <mwoj...(a)newsguy.com> wrote:
> Alistair wrote:
>
> > The same way that I went about guessing the contents of a VDF (Visual
> > Data Flex) file: look at the reports or screens produced/used and tie
> > their contents to the data in the file. It takes a bit of brain
> > processing and is not guaranteed to be 100% foolproof especially if
> > you don't know the data formats available to the database.
>
> > If you don't have the file layouts and probably you don't have report
> > or screen shots then you probably won't be able to resolve the issue.
>
> In other words, this is a forensic exercise. It's impossible to
> reconstruct the data format with guaranteed complete accuracy in the
> general case, and difficult in many specific cases. You'd need to
> perform a cost/benefit analysis to determine how much effort is
> reasonable to expend on it.
>

The application of a cost benefit analysis is quite a good idea as I
found the effort excessive (but I had very little choice in the
matter).

I think SOMEGUY is banging his head against a brick wall (ce taper la
tete contre le mur as they say in Germany) without the copylibs.