From: Robert Klemme on
On 13.05.2010 16:34, Une Bévue wrote:
> Robert Klemme<shortcutter(a)googlemail.com> wrote:
>
>> There's also the flip flop operator:
>>
>> File.foreach "myfile" do |line|
>> if /pattern/ =~ line .. false
>> puts line
>> end
>> end
>>
>> The trick I am using is that the FF operator starts to return true if
>> the first expression returns true and stays true until the last
>> expression returns true - in this case never since you want to read
>> until the end of the file.
>
> coud that trick be used for start and stop tags ? like :
>
> File.foreach "myfile" do |line|
> if /<body/ =~ line .. /<\/body/ =~ line
> puts line
> end
> end
>
> if true, that's clever !

Yes, that could be done. However, I would not use this for languages
from the SGML family (XML, HTML) because there are no guarantees as to
how many tags you'll find on a single line of text. There are better
tools do deal with that (REXML, Nokogiri...).

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

From: =?ISO-8859-1?Q?Une_B=E9vue?= on
Robert Klemme <shortcutter(a)googlemail.com> wrote:

> Yes, that could be done. However, I would not use this for languages
> from the SGML family (XML, HTML) because there are no guarantees as to
> how many tags you'll find on a single line of text. There are better
> tools do deal with that (REXML, Nokogiri...).

Right, however REXML isn't working for badly balanced tags.
I dis some test, today, of Nokogiri, it works even better than tidy for
the first step cleaning unbalanced tags.

the only question i have about Nokogiri is how to avoid the DOCTYPE
because it outputs :
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">

even if i'm using #to_xhtml :

then, the DOCTYPE is wrong...
--
� La vie ne se comprend que par un retour en arri�re,
mais on ne la vit qu'en avant. �
(S�ren Kierkegaard)
From: Rick DeNatale on
On Wed, May 12, 2010 at 1:20 PM, Vandana <nairvan(a)gmail.com> wrote:
> Hello All,
>
>      I would like to read a file in ruby. It is a 2G file, but
> contain useless data in the beginning portion of the file.
>
> There is a particular pattern towards the middle of the file after
> which useful data begins. Is there a way to grep for this pattern and
> then read every line henceforth, but ignore all lines previous to line
> on which pattern found?

Grep is going to have to read the file to find that pattern anyway.


--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Github: http://github.com/rubyredrick
Twitter: @RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

From: Roger Pack on

> There is a particular pattern towards the middle of the file after
> which useful data begins. Is there a way to grep for this pattern and
> then read every line henceforth, but ignore all lines previous to line
> on which pattern found?

If you don't know where it is, then you'll probably have to parse each
line until you reach it, then continue on.

-rp
--
Posted via http://www.ruby-forum.com/.