From: Albert Schlef on
Hello.

I want to download some HTML page, but I also want to save with it the
images it contains. I was thinking about saving it as a MHT file, this
will make my life easier because I won't have to handle the files. I've
checked both my browsers (Firefox and Opera) but I see that there's no
command-line switch that allows me to save URLs as MHT files. I also
searched the net for a Ruby library but found one that seems to only
work on Windows (it's provided with a DLL) which is not good for me
because I'm using Ubuntu.

So, my question is:

Given a URL, how can I save this page as MHT?

(My program is in Ruby, but I don't mind delegating this part to a
command-line utility.)
--
Posted via http://www.ruby-forum.com/.

From: Nicholas Orr on
[Note: parts of this message were removed to make it a legal post.]

According to http://en.wikipedia.org/wiki/MHTML

<http://en.wikipedia.org/wiki/MHTML>pursuing the mht file format seems like
a lot of effort for not much gain...

On Mon, Jul 19, 2010 at 9:53 AM, Albert Schlef <albertschlef(a)gmail.com>wrote:

> Hello.
>
> I want to download some HTML page, but I also want to save with it the
> images it contains. I was thinking about saving it as a MHT file, this
> will make my life easier because I won't have to handle the files. I've
> checked both my browsers (Firefox and Opera) but I see that there's no
> command-line switch that allows me to save URLs as MHT files. I also
> searched the net for a Ruby library but found one that seems to only
> work on Windows (it's provided with a DLL) which is not good for me
> because I'm using Ubuntu.
>
> So, my question is:
>
> Given a URL, how can I save this page as MHT?
>
> (My program is in Ruby, but I don't mind delegating this part to a
> command-line utility.)
> --
> Posted via http://www.ruby-forum.com/.
>
>

From: Colin Bartlett on
[Note: parts of this message were removed to make it a legal post.]

On Mon, Jul 19, 2010 at 12:53 AM, Albert Schlef <albertschlef(a)gmail.com>wrote:

> Hello.
>
> I want to download some HTML page, but I also want to save with it the
> images it contains. I was thinking about saving it as a MHT file, this
> will make my life easier because I won't have to handle the files. I've
> checked both my browsers (Firefox and Opera) but I see that there's no
> command-line switch that allows me to save URLs as MHT files. I also
> searched the net for a Ruby library but found one that seems to only
> work on Windows (it's provided with a DLL) which is not good for me
> because I'm using Ubuntu.
>
> So, my question is:
>
> Given a URL, how can I save this page as MHT?
>
> (My program is in Ruby, but I don't mind delegating this part to a
> command-line utility.)
>

Although another post cites wikipedia as implying that using the mht file
format seems like a lot of effort for not much gain, I have found it useful
to save web pages (including images) to MHT (using all of Opera, Firefox and
Internet Explorer), and then extract what I want (including images) from the
MHT file.

That said, once a web page is saved (if necessary using plugins) as MHT, as
a file with images etc in a subdir, or as zip archives, it should be fairly
easy to take out what you want from whatever the save format is.

So: is the problem saving as MHT from the command line, or one of saving
anything - MHT or HTML+Images - from the command line?

Can you use Watir or http://watij.com + JRuby? From a quick look at their
websites these may work, but I haven't tried them yet because the initial
learning curve looks a bit steep, and because at the moment (on Microsoft
Windows) I can use AutoIt with Ruby to (programatically) switch from a Ruby
DosBox to the browser, and send keystrokes to save the page as MHT or plain
HTML or whatever. It's not exactly elegant, but it does (mostly!) work. If
all else fails, can you do something similar in Linux?

If you find a reasonably elegant solution, then I'd be very interested.