From: Stephane CHAZELAS on
2008-04-16, 04:29(+00), Rahul:
> Is there a way to convert a html snippet "sensibly" to ascii plain-text.
> I just want to display a no-frills version of this google translate
> query quickly from the command-line:
>
> curl -s
> 'http://translate.google.com/translate_dict?q=cat&hl=en&langpair=en%7Cde'
[...]

See elinks or w3m. In the old ages, you would have used lynx,
but it's quite bad on tables and frames.

Compare:

elinks -no-references -no-numbering -dump \
'http://translate.google.com/translate_dict?q=cat&hl=en&langpair=en%7Cde'

w3m -dump \
'http://translate.google.com/translate_dict?q=cat&hl=en&langpair=en%7Cde'

lynx -dump -nolist \
'http://translate.google.com/translate_dict?q=cat&hl=en&langpair=en%7Cde'

--
St�phane
From: Dave Uhring on
On Wed, 16 Apr 2008 16:43:34 +0000, Rahul wrote:

> I like these options much better. Thanks Stephane! I only have to solve
> some font issues now. Seem to be a problem with all three.
>
> dünne Eisschicht --> dÃŒnne Eisschicht Kätzin --> KÀtzin
> HÃŒhner -->HÃŒhner
>
> Seems like something to do with umlaut rendering in my font set.....Any
> ideas?

View the output in xterm or similar. You should see dünne, but maybe not
in that windows POS you are using. There is no need to set any special
locale.
From: Enrique Perez-Terron on
On Wed, 16 Apr 2008 04:29:27 +0000, Rahul wrote:

> Is there a way to convert a html snippet "sensibly" to ascii plain-text.
> I just want to display a no-frills version of this google translate
> query quickly from the command-line:
>
> curl -s
> 'http://translate.google.com/translate_dict?q=cat&hl=en&langpair=en%
7Cde'
>
> "cat" could be replaced by "dog" "beer" whatever and lo and behold I've
> a German translation on the command line (I wish!). This snippet throws
> a load of html at me. Is there a easy way to convert it to a
> "displayable" format? Basically just column-formatting or at most using
> bold etc. that my xterm-color console can support. html has all this
> info. embedded in its tags, right? So looks possible in theory; just
> wondering what's the best tool for the job.
>
> I have no intention of browsing further from that page so lynx seems an
> overkill.

I just tried
elinks -dump 'http:....'

in my gnome-terminal, and it displayed the text just fine, including some
umlauts, like "dünne Eisschict". However, in the tabular context, the
next entry got displaced one step to the left, as elinks got confused
about the number of characters in the word "dünne".

I have en_US.UTF-8.

To further investigate, I created the following file /tmp/test.html:

<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
</HEAD><BODY>
Trallala hopp'sann. Æh, bæ.
</BODY></HTML>

The special characters are <2019> (&rsquo;) and AE and ae ligatures.


Then I ran elink -dump file:///tmp/test.html, and it printed perfectly:

Trallala hopp'sann. Æh, bæ.

Then I changed the charset=UTF-8 to charset=ISO-8859-1, and the output
became

Trallala hoppâ**sann. Ã*h, bæ.

I think that indicates pretty much what the problem might be.

Regards