From: cardan on
Hello, I have a large database issue I hope someone can help me with.
It is hard to explain so I have tried to be as descriptive as possible
without losing sight of my problem.

OVERALL QUESTION
Is there a way to have my Excel workbook search a very large text
database outside the Excel file?
I am working with an extremely large data set and my Excel file is
starting to get too large (20M) that I need to figure out a way to
reduce file size (and maybe increase functionality and flexibility).

QUICK BACKGROUND
Every bank in the US is required to report their financial statements
on a quarterly basis called Call Reports. This information is
available free and online on the FDIC website. There are about 7,500
banks nationally that report. These reports contain maybe 1,000
different numbers, each given its own code (i.e. Total Assets is
RCON2170- I’ll call them RCON #’s for short) and every bank uses the
same template.

I am able to download this data in bulk form into a zipped text folder
which I then extract. This folder will now contains about 40
different text files – each named for a section in the report (ie.
RCCI is the balance sheet section). The Banks unique identifier is
listed in column A and the number code (RCON) is listed in row 1.
(Each bank identifier and number code are unique). The whole unzipped
folder is approximately 65M.

CURRENT SETUP
Of these 40 sections mentioned, I need about 6 of the sections in my
Excel file. Right now I convert the sections I need into an Excel
format (Convert with a Tab Delimited) and put it into my model. I have
template tabs that uses the INDEX-MATCH formulas to find the right
number based on the RCON code and the banks unique identifier. This
setup allows for comparisons amongst different banks. All the user has
to do is input the banks unique identifier and it will return the
appropriate numbers based on the RCON number. (Sometimes we will do 10
banks side by side). The INDEX MATCH works very well.

THE PROBLEM
The issue is that each tab represents a section of the report. Each
section lists each bank (7,500) and contains approximately 100 RCON
numbers (a data set of 750,000 fields per tab). My file for a
quarterly report is approximately 20M. Is there a way to have my
Excel workbook search the text files for the appropriate RCON number
and the banks unique identifier? That way I can keep my files limited
in size and may be able to include previous Call Reports for Trend
Analysis.

Any help would be extremely appreciated. (Also let me know if this
format is acceptable to understand my issues :)


From: JLatham on
Short answer: yes. VBA can deal with this type of thing pretty well, and
actually with pretty good speed.

The process would go something like this:
You enter the RCON number into a cell or as a response to an InputBox$()
statement. Then the code would open the text files and look for the RCON
number in each row of input from them, and as the RCON is encountered, would
then extract the information you need from the row and it into the Excel
sheet.

In order to accomplish this, a very good analysis/understanding of the
content and format of the lines of the text files is required. You usually
have to write custom code to "parse" the inut data from the text file to get
just what you want from it and pull it into Excel.

As far as whether or not this format is acceptable to understand your
issues, I think so - at least I think I understand your needs. But I could
be wrong - that's been known to happen from time to time (usually with rather
short intervals between the misunderstandings).


"cardan" wrote:

> Hello, I have a large database issue I hope someone can help me with.
> It is hard to explain so I have tried to be as descriptive as possible
> without losing sight of my problem.
>
> OVERALL QUESTION
> Is there a way to have my Excel workbook search a very large text
> database outside the Excel file?
> I am working with an extremely large data set and my Excel file is
> starting to get too large (20M) that I need to figure out a way to
> reduce file size (and maybe increase functionality and flexibility).
>
> QUICK BACKGROUND
> Every bank in the US is required to report their financial statements
> on a quarterly basis called Call Reports. This information is
> available free and online on the FDIC website. There are about 7,500
> banks nationally that report. These reports contain maybe 1,000
> different numbers, each given its own code (i.e. Total Assets is
> RCON2170- I'll call them RCON #'s for short) and every bank uses the
> same template.
>
> I am able to download this data in bulk form into a zipped text folder
> which I then extract. This folder will now contains about 40
> different text files – each named for a section in the report (ie.
> RCCI is the balance sheet section). The Banks unique identifier is
> listed in column A and the number code (RCON) is listed in row 1.
> (Each bank identifier and number code are unique). The whole unzipped
> folder is approximately 65M.
>
> CURRENT SETUP
> Of these 40 sections mentioned, I need about 6 of the sections in my
> Excel file. Right now I convert the sections I need into an Excel
> format (Convert with a Tab Delimited) and put it into my model. I have
> template tabs that uses the INDEX-MATCH formulas to find the right
> number based on the RCON code and the banks unique identifier. This
> setup allows for comparisons amongst different banks. All the user has
> to do is input the banks unique identifier and it will return the
> appropriate numbers based on the RCON number. (Sometimes we will do 10
> banks side by side). The INDEX MATCH works very well.
>
> THE PROBLEM
> The issue is that each tab represents a section of the report. Each
> section lists each bank (7,500) and contains approximately 100 RCON
> numbers (a data set of 750,000 fields per tab). My file for a
> quarterly report is approximately 20M. Is there a way to have my
> Excel workbook search the text files for the appropriate RCON number
> and the banks unique identifier? That way I can keep my files limited
> in size and may be able to include previous Call Reports for Trend
> Analysis.
>
> Any help would be extremely appreciated. (Also let me know if this
> format is acceptable to understand my issues :)
>
>
> .
>
From: Gary Keramidas on
i actually automated a specific call report for a credit union. it was a
report that they needed to fill out. they would retrieve the gl income and
balance data from their server and then i would populate the report with the
correct amounts.

doesn't really help. just thought i'd mention it since you mentioned a call
report.
--


Gary Keramidas
Excel 2003


"cardan" <carlsondaniel(a)gmail.com> wrote in message
news:068115b3-ae8b-4c77-8f42-5f374bd970e0(a)d19g2000yqf.googlegroups.com...
Hello, I have a large database issue I hope someone can help me with.
It is hard to explain so I have tried to be as descriptive as possible
without losing sight of my problem.

OVERALL QUESTION
Is there a way to have my Excel workbook search a very large text
database outside the Excel file?
I am working with an extremely large data set and my Excel file is
starting to get too large (20M) that I need to figure out a way to
reduce file size (and maybe increase functionality and flexibility).

QUICK BACKGROUND
Every bank in the US is required to report their financial statements
on a quarterly basis called Call Reports. This information is
available free and online on the FDIC website. There are about 7,500
banks nationally that report. These reports contain maybe 1,000
different numbers, each given its own code (i.e. Total Assets is
RCON2170- I�ll call them RCON #�s for short) and every bank uses the
same template.

I am able to download this data in bulk form into a zipped text folder
which I then extract. This folder will now contains about 40
different text files � each named for a section in the report (ie.
RCCI is the balance sheet section). The Banks unique identifier is
listed in column A and the number code (RCON) is listed in row 1.
(Each bank identifier and number code are unique). The whole unzipped
folder is approximately 65M.

CURRENT SETUP
Of these 40 sections mentioned, I need about 6 of the sections in my
Excel file. Right now I convert the sections I need into an Excel
format (Convert with a Tab Delimited) and put it into my model. I have
template tabs that uses the INDEX-MATCH formulas to find the right
number based on the RCON code and the banks unique identifier. This
setup allows for comparisons amongst different banks. All the user has
to do is input the banks unique identifier and it will return the
appropriate numbers based on the RCON number. (Sometimes we will do 10
banks side by side). The INDEX MATCH works very well.

THE PROBLEM
The issue is that each tab represents a section of the report. Each
section lists each bank (7,500) and contains approximately 100 RCON
numbers (a data set of 750,000 fields per tab). My file for a
quarterly report is approximately 20M. Is there a way to have my
Excel workbook search the text files for the appropriate RCON number
and the banks unique identifier? That way I can keep my files limited
in size and may be able to include previous Call Reports for Trend
Analysis.

Any help would be extremely appreciated. (Also let me know if this
format is acceptable to understand my issues :)