From: Dr J R Stockton on
In comp.lang.javascript message <53e60b5e-ddec-4b97-aa14-64c31f883159(a)j1
9g2000yqk.googlegroups.com>, Sat, 31 Oct 2009 10:08:13, VK
<schools_ring(a)yahoo.com> posted:
>Dr J R Stockton wrote:
>> >> >> In effect, I want to read the file, HTML or TXT, as it exists on disc.
>
>VK wrote:
>> >> >You cannot do it for the reason explained at
>> >> >http://groups.google.com/group/comp.lang.javascript/msg/d9f3f6724bada573
>
>Dr J R Stockton wrote:
>> >> Unconvincing, because I *am* doing it,
>
>VK wrote:
>> >You don't, it is your delusion.
>> >I don't know how and why are you doing it, but it was stated that "I
>> >want to read the file, HTML or TXT, as it exists on disc." As long as
>> >you are not using AJAX calls - and you don't - you are not able and
>> >you are not reading any files "HTML or TXT, as it exists on disc" -
>> >however wide the definition "as it exists on disc" would be taken.
>
>Dr J R Stockton wrote:
>> Give or take irrelevant questions of character coding and newline
>> representation, I have been getting, by using innerHTML and by using
>> innerText, a string which agrees visually with the content of a TXT file
>> on disc, as would be shown by Notepad.
>
>It should be expected in many (but not all) situations.
>Contrary to the popular believe, browsers are *not* able to open text
>or graphics files. What they are able to - as part of their extended
>functionality - is to recognize some file types other than HTML and to
>wrap them on the fly into predefined HTML templates so to display them
>in the browser window. In the particular for text/plain files they are
>using template
> <HTML>
> <HEAD></HEAD>
> <BODY>
> <PRE> text file content goes here </PRE>
> </BODY>
> </HTML>
>with the exact tags' case (upper or lower) being browser dependent.

I wrote "getting, ..., a string", not "saw in a window".

Fram being a reference to an iframe recently loaded from a simple *.txt
file, the code

DIR = Fram.contentDocument.body
DIR = DIR.textContent || DIR.innerText // is latter needed? Yes, IE8
alert(DIR) // for VK

directly shows in the alert window plain text, not preceded by anything
using angle-brackets, for MS IE 8, Firefox 3.0.15, Opera 10.01, Safari
4.0.3, and Chrome 3.0. The <localhost> shown by Opera, and the
JavaScript shown by Safari, are parts of the alerts, not of their
contents.

>This way the text you "see" is in effect the content of a single <pre>
>element necessarily altered from the "as it is on disc" to be placed
>into this tag. For instance all less-than and greater-than signs will
>be converted to the corresponding named HTML entities. The fact that
>you were getting so far "by using innerHTML ..., a string which agrees
>visually with the content of a TXT file" suggests that so far you were
>lucky but not having any problematic characters in your .txt files,

"getting by using innerHTML" is not the same as "getting directly as
innerHTML". IIRC, most browsers wrapped with <pre> and one put rather
more at the top. When I was using innerHTML, I easily removed those by
RegExp.




>> Associated query : I have read a TXT file from disc, getting a matching
>> string. �It consists of many lines containing words separated by
>> punctuation. �They all start with the same sequence of words and
>> punctuation (improbably, zero length), but after that there is always
>> non-zero length. �No two lines completely agree. �What is the nicest way
>> of determining the common part AND obtaining in sequence strings for the
>> varying parts? �Think of it as like a representation of a directory
>> tree.
>
>This is OT to the discussed FAQ topic but an interesting problem per
>se. I am thinking to move it into separate thread or you may do it
>yourself. I have a rather close request for ggNoSpam, in order to give
>users an ability to adjust the regexp spam filter even with zero
>knowledge of regular expressions. The abstract task description would
>be:
>"Given an array of strings with the minimum 2 and the maximum 1o
>elements, find the shortest common word in these strings. If no such
>common character sequence found, then try to find the biggest subset
>of strings having a common word".
>
>"word" is understood in regexp terms. To avoid "rush answers" with
>common words like "a" or "the" articles let's define that the shortest
>common word must be no shorter than 4 characters.

I've changed my mind about whether, for the present, I want to do that.
It would certainly increase efficiency, though perhaps not noticeably.
But doing that and the changes which would necessarily be associated
with it would be an impediment to extending capability in a direction
which may be possible and useful.

If such a thread is started, I'll participate, if anything worth writing
occurs to me.


var AoS = ["aaa bbb ccc ddd ccc bbb ddd", "bbb zzz ggg", "banana"]

var J, A, K, T, Obj = {}, Z = 0

J = AoS.length
while (J--) { T = {} // for each string
A = AoS[J].split(/\W+/) // make array of words
K = A.length ; while (K--) T[A[K]] = 1 // no internal dupes
for (K in T) Obj[K] ? Obj[K]++ : Obj[K] = 1 // ...
// ... if entry exists, increment, else create entry value 1
}

for (K in Obj) if (Obj[K]>Z) { Z = Obj[K] ; Word = [K, Z] }


Then Word[0] appears in Word[1] of the strings, and no word appears in
more than Word[1] of them.

The last line can be amended by making the test >= and complicating what
follows, to list all words of the highest popularity and not just the
first one found.

By using another variable

VERY SLIGHTLY TESTED (uses technique of LINXCHEK).

--
(c) John Stockton, nr London UK. ?@merlyn.demon.co.uk DOS 3.3, 6.20; WinXP.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms & links.
PAS EXE TXT ZIP via <URL:http://www.merlyn.demon.co.uk/programs/00index.htm>
My DOS <URL:http://www.merlyn.demon.co.uk/batfiles.htm> - also batprogs.htm.
From: Garrett Smith on
Dr J R Stockton wrote:
> In comp.lang.javascript message <hcdshu$ds2$1(a)news.eternal-
> september.org>, Thu, 29 Oct 2009 22:11:19, Garrett Smith
> <dhtmlkitchen(a)gmail.com> posted:
>> As long as I have something to say about it, the entry will correctly
>> explain how to access the window object of the IFRAME.
>>
>
> You are supposed, as FAQ maintainer, to be sustaining something useful
> to the ordinary questioners, especially those who are not full-time
> professional JavaScript programmers.
>
> However, you appear entirely unable to understand their positions and
> points of view. FAQ maintaining is a task for the sympathetic
> communicator; not for the nerd.
>

A lot of the complaints with the FAQ is too verbose, too long.

The FAQ should not be too much of a chore to read. It should be easy
to understand.

Once the document is found, the next step is to do something with that,
right? That is what DOM and Forms section is for.

Things about the document seem more appropriate for "DOM and Forms", not
"window and frames".

Perhaps worth mentioning:-
| The frame must be fully loaded before its content can be accessed.
|
| fwin.document;// the document
| fwin.document.documentElement; // root element.
|
| See the section on DOM and Forms: #domRef

Perhaps worth another entry:-

How can I know when an iframe has loaded?

First, I'd rather edit/shorten the entry on #getWindowSize.

It is a long entry and explains a workaround for older versions of
Opera. It seems worth removing that workaround and its explanation.
That should shorten the entry considerably.

Less is more, here.
--
Garrett
comp.lang.javascript FAQ: http://jibbering.com/faq/
From: Garrett Smith on
Dr J R Stockton wrote:
> In comp.lang.javascript message <53e60b5e-ddec-4b97-aa14-64c31f883159(a)j1
> 9g2000yqk.googlegroups.com>, Sat, 31 Oct 2009 10:08:13, VK
> <schools_ring(a)yahoo.com> posted:
>> Dr J R Stockton wrote:
>>>>>>> In effect, I want to read the file, HTML or TXT, as it exists on disc.
>> VK wrote:
>>>>>> You cannot do it for the reason explained at
>>>>>> http://groups.google.com/group/comp.lang.javascript/msg/d9f3f6724bada573
>> Dr J R Stockton wrote:
>>>>> Unconvincing, because I *am* doing it,
>> VK wrote:

[a lot of context]

>
>> This way the text you "see" is in effect the content of a single <pre>
>> element necessarily altered from the "as it is on disc" to be placed
>> into this tag. For instance all less-than and greater-than signs will
>> be converted to the corresponding named HTML entities. The fact that
>> you were getting so far "by using innerHTML ..., a string which agrees
>> visually with the content of a TXT file" suggests that so far you were
>> lucky but not having any problematic characters in your .txt files,
>
> "getting by using innerHTML" is not the same as "getting directly as
> innerHTML". IIRC, most browsers wrapped with <pre> and one put rather
> more at the top. When I was using innerHTML, I easily removed those by
> RegExp.
>
>

You might try reading from the PRE element:

var fdoc = frames[0].document;
var pre = fdoc.getElementsByTagName("pre")[0];
var htmlString = pre.innerHTML;
var textString = (typeof pre.textContent == "string" ?
pre.textContent : pre.innerText);
--
Garrett
comp.lang.javascript FAQ: http://jibbering.com/faq/
From: Dr J R Stockton on
In comp.lang.javascript message <hcjdqb$joa$1(a)news.eternal-
september.org>, Sat, 31 Oct 2009 23:36:41, Garrett Smith
<dhtmlkitchen(a)gmail.com> posted:

>You might try reading from the PRE element:
>
>var fdoc = frames[0].document;
>var pre = fdoc.getElementsByTagName("pre")[0];
>var htmlString = pre.innerHTML;
>var textString = (typeof pre.textContent == "string" ?
> pre.textContent : pre.innerText);

As regards what I wanted to do, success is a matter of history. Fram is
the frame :

DIR = Fram.contentDocument.body
DIR = DIR.textContent || DIR.innerText // is latter needed? Yes, IE8

gives the content of the disc file; viewing alert(DIR) shows an exact
match to viewing the file in Notepad. That alert, commented out and
annotated VK, is in <URL:http://www.merlyn.demon.co.uk/linxchek.htm>.

OTOH, I cannot say what happens with non-Windows (indeed, with non-
XPsp3) systems - UNIX or Mac, for example ; and it would be helpful to
be able to give the necessary UNIX or Mac input to generate a suitable
file. Anyone like to try it in Mac or UNIX.

--
(c) John Stockton, nr London UK. ?@merlyn.demon.co.uk BP7, Delphi 3 & 2006.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
<URL:http://www.bancoems.com/CompLangPascalDelphiMisc-MiniFAQ.htm> clpdmFAQ;
NOT <URL:http://support.codegear.com/newsgroups/>: news:borland.* Guidelines
From: Dr J R Stockton on
In comp.lang.javascript message <hcj02m$62n$1(a)news.eternal-
september.org>, Sat, 31 Oct 2009 19:42:12, Garrett Smith
<dhtmlkitchen(a)gmail.com> posted:
>Dr J R Stockton wrote:
>> In comp.lang.javascript message <hcdshu$ds2$1(a)news.eternal-
>> september.org>, Thu, 29 Oct 2009 22:11:19, Garrett Smith
>> <dhtmlkitchen(a)gmail.com> posted:
>>> As long as I have something to say about it, the entry will correctly
>>> explain how to access the window object of the IFRAME.
>>>
>> You are supposed, as FAQ maintainer, to be sustaining something
>>useful
>> to the ordinary questioners, especially those who are not full-time
>> professional JavaScript programmers.
>> However, you appear entirely unable to understand their positions
>>and
>> points of view. FAQ maintaining is a task for the sympathetic
>> communicator; not for the nerd.
>>
>
>A lot of the complaints with the FAQ is too verbose, too long.
>
>The FAQ should not be too much of a chore to read. It should be easy
>to understand.
>
>Once the document is found, the next step is to do something with that,
>right? That is what DOM and Forms section is for.
>
>Things about the document seem more appropriate for "DOM and Forms", not
>"window and frames".

For the FAQ to be useful to its intended readership, its Subjects (as
seen at the beginning) must be structured ENTIRELY from the point of
view of the questioner, without any consideration of the structure of
the answers.

Otherwise, you're writing a nerdy document much like the majority of the
big Flamingo book.

You MUST learn how the ordinary FAQ reader will think, when seeking an
answer.



>Perhaps worth mentioning:-
>| The frame must be fully loaded before its content can be accessed.

I am incompletely convinced of that. When reading by timeout, I thought
I saw signs of gaining access to an only partially-filled links array.
Certainly they might have been deceiving signs; but the point should be
checked in several actual browsers.

If your sentence were strictly true, one could issue a frame load
directly followed by a frame read, and the system would wait until
loaded before executing the read.

But it would be safe to write :-
Access to frame content should not be attempted before the frame is
fully loaded.


>How can I know when an iframe has loaded?

Yes. I have success with the Fram.onLoad event in FF Op Sf Cr, but not
IE, where I seem to need to use a timeout. OTOH I suspect IE8 of lying
to me, or of being confused [*].

Flamingo wrote of a readyState property, which might be pollable.

[*] I get reports of many unused anchors in a file not containing those
anchors, in IE.

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)