Extract words from a .dita file [VbScript]

Prev: How to disable and enable my Local Area Connection using scripts
Next: VB script

From: Bob Barrows on 2 Apr 2010 15:38

There does not have to be a node called <indexterms> for this to work. The
"//" part of the xpath query strings tells it to find the specified tag name
anywhere in the document. So "//keyword" tells it to find all the
<keyword>...</keyword> elements (nodes) wherever they may be in the
document, and "//indexterm" does likewise. The way I did it is a little
risky in that if the nodes that are found contain nested nodes
(<keyword>word<subkeyword>...</subkeyword></keyword>), the text property
will return all the nested text as well, but if that was the problem you
wouldn't be telling me the spreadsheet contained _no_ data, would you.

Anyways, I don't know if I can be of any more help. Using the sample data
you gave me pasted into two files called test1.dita and test2.dita, my
script works perfectly. Here is what the output looks like:

*****************************************************************
Keywords Index Terms
complication complications
complications logbook and records, complications
data complications
health logbook and records, complications
info
information
logbook
logbooks
my
complication
complications
data
health
info
information
logbook
logbooks
my

*****************************************************************

I really don't know why it's not working for you. How big is the smallest of
those files? Small enough for you to paste the entire contents of one of
them into your reply?

Huber57 wrote:
> Bob,
>
> I checked the files and they are all lowercase. I did notice one
> line of code:
> set nodes = xmldoc.selectnodes("//indexterm")
>
> I am not exactly sure what this is but the only opening/closing tag
> for the section is <keywords> and then </keywords>. Within these are
> lots of <keyword>test</keyword and <indexterm>another
> test</indexterm>. But there is no:
> <indexterms> and then </indexterms>.
>
> Does selectnodes refert to the <keywords></keywords> or the
> <keyword></keyword> tags?
>
> Regardless, still no output in the spreadsheeet.
>
> I do appreciate your help.
>
> "Bob Barrows" wrote:
>
>> XML is case sensitive. If the structure you provided me is not
>> really the structure in your files, then nothing will be found. For
>> example, if the tags are named <Keywords> instead of <keywords>,
>> then the script will not find them.
>>
>> Huber57 wrote:
>>> We are on a path toward victory! I created the dita folder and the
>>> script ran. The script created the .xls file along with the headers
>>> "Keywords" and "Index Terms" in A1 and B1 (In Sheet1) respectively.
>>> Unfortunately, it did not strip any of the key words or index terms
>>> out of the dita files and place them in the spreadsheet.
>>>
>>
>>

--
Microsoft MVP - ASP/ASP.NET - 2004-2007
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"

From: Bob Barrows on 2 Apr 2010 15:42

I was thinking that the script would return an error if there were
parseErrors, but of course that is not the case. I've never heard of the
DITA thing before. I should have thought of googling it. You probably have
put your finger on the problem.

ekkehard.horner wrote:
> I did some tests with Bob's code and sample files
> from the DITA Open Toolkit. One problem with the
> files is their refering to external DTDs; Bob's code
> should have included a test for parseErrors.
>
> I'm fairly confident that setting suitable properties
> on the xmldoc
>
> set xmldoc=createobject("msxml2.domdocument")
> ' ---- adds
> xmldoc.setProperty "SelectionLanguage", "XPath"
> xmldoc.async = False
> xmldoc.resolveExternals = False
> xmldoc.validateOnParse = False
> ' ----
> set xl=createobject("excel.application")
>
> will help.
>
> The XPath expressions are valid for the DITA-OT samples.

--
Microsoft MVP - ASP/ASP.NET - 2004-2007
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"

From: Huber57 on 2 Apr 2010 16:26

PERFECT!!!!

Thanks both of you. You spent a lot of time but you saved me HOURS AND
HOURS!!!

"Bob Barrows" wrote:

> There does not have to be a node called <indexterms> for this to work. The
> "//" part of the xpath query strings tells it to find the specified tag name
> anywhere in the document. So "//keyword" tells it to find all the
> <keyword>...</keyword> elements (nodes) wherever they may be in the
> document, and "//indexterm" does likewise. The way I did it is a little
> risky in that if the nodes that are found contain nested nodes
> (<keyword>word<subkeyword>...</subkeyword></keyword>), the text property
> will return all the nested text as well, but if that was the problem you
> wouldn't be telling me the spreadsheet contained _no_ data, would you.
>
> Anyways, I don't know if I can be of any more help. Using the sample data
> you gave me pasted into two files called test1.dita and test2.dita, my
> script works perfectly. Here is what the output looks like:
>
> *****************************************************************
> Keywords Index Terms
> complication complications
> complications logbook and records, complications
> data complications
> health logbook and records, complications
> info
> information
> logbook
> logbooks
> my
> complication
> complications
> data
> health
> info
> information
> logbook
> logbooks
> my
>
> *****************************************************************
>
> I really don't know why it's not working for you. How big is the smallest of
> those files? Small enough for you to paste the entire contents of one of
> them into your reply?
>
>
> Huber57 wrote:
> > Bob,
> >
> > I checked the files and they are all lowercase. I did notice one
> > line of code:
> > set nodes = xmldoc.selectnodes("//indexterm")
> >
> > I am not exactly sure what this is but the only opening/closing tag
> > for the section is <keywords> and then </keywords>. Within these are
> > lots of <keyword>test</keyword and <indexterm>another
> > test</indexterm>. But there is no:
> > <indexterms> and then </indexterms>.
> >
> > Does selectnodes refert to the <keywords></keywords> or the
> > <keyword></keyword> tags?
> >
> > Regardless, still no output in the spreadsheeet.
> >
> > I do appreciate your help.
> >
> > "Bob Barrows" wrote:
> >
> >> XML is case sensitive. If the structure you provided me is not
> >> really the structure in your files, then nothing will be found. For
> >> example, if the tags are named <Keywords> instead of <keywords>,
> >> then the script will not find them.
> >>
> >> Huber57 wrote:
> >>> We are on a path toward victory! I created the dita folder and the
> >>> script ran. The script created the .xls file along with the headers
> >>> "Keywords" and "Index Terms" in A1 and B1 (In Sheet1) respectively.
> >>> Unfortunately, it did not strip any of the key words or index terms
> >>> out of the dita files and place them in the spreadsheet.
> >>>
> >>
> >>
>
> --
> Microsoft MVP - ASP/ASP.NET - 2004-2007
> Please reply to the newsgroup. This email account is my spam trap so I
> don't check it very often. If you must reply off-line, then remove the
> "NO SPAM"
>
>
> .
>

First | Prev |
Pages: 1 2 3 4 5
Prev: How to disable and enable my Local Area Connection using scripts
Next: VB script