From: Dr.Ruud on
Peter J. Holzer wrote:
> On 2010-06-10 07:02, Chris Nehren <apeiron(a)invalid.isuckatdomains.net> wrote:
>> On 2010-06-09, Peter J. Holzer scribbled these curious markings:
>>> On 2010-06-08 07:57, Dr.Ruud <rvtol+usenet(a)xs4all.nl> wrote:

>>>> On the related subject of creating nice PDFs:
>>>> we are using webkit for that for the last few years,
>>>> we create many-many thousands a day,
>>>> and we are very happy with the results.
>>>
>>> Sounds interesting. Which perl module do you use (there are several on
>>> CPAN, but the descriptions don't look promising)?
>>
>> Not a module, per se, but I've had success with wkhtmltopdf. See
>> http://code.google.com/p/wkhtmltopdf/ for more info.
>
> Thanks, but after playing with it for a bit I found two problems:
>
> 1) It pretends to be a screen device, not a printing device (so for a
> stylesheet which contain both @media print and @media screen sections
> it chooses the wrong ones).
> 2) It sometimes makes a pagebreak in the middle of a line (so the upper
> half of the line is on page 1 and the lower half of the line is on
> page 2).
>
> It looks like the tool renders the page the same way as a browser on
> screen and then cuts the result into pages.

This should help:
--print-media-type
"page-break-inside: avoid;"
http://www.smashingmagazine.com/2007/02/21/printing-the-web-solutions-and-techniques/
http://code.google.com/p/wkhtmltopdf/issues/detail?id=9
http://code.google.com/p/wkhtmltopdf/issues/detail?id=57
http://search.cpan.org/~tbr/WKHTMLTOPDF-0.02/lib/WKHTMLTOPDF.pm

--
Ruud
From: Peter J. Holzer on
On 2010-06-11 23:58, Dr.Ruud <rvtol+usenet(a)xs4all.nl> wrote:
> Peter J. Holzer wrote:
>> On 2010-06-10 07:02, Chris Nehren <apeiron(a)invalid.isuckatdomains.net> wrote:
>>> Not a module, per se, but I've had success with wkhtmltopdf. See
>>> http://code.google.com/p/wkhtmltopdf/ for more info.
>>
>> Thanks, but after playing with it for a bit I found two problems:
>>
>> 1) It pretends to be a screen device, not a printing device (so for a
>> stylesheet which contain both @media print and @media screen sections
>> it chooses the wrong ones).
>> 2) It sometimes makes a pagebreak in the middle of a line (so the upper
>> half of the line is on page 1 and the lower half of the line is on
>> page 2).
>>
>> It looks like the tool renders the page the same way as a browser on
>> screen and then cuts the result into pages.
>
> This should help:
> --print-media-type

That was the option I was looking for. I guess I didn't expect to find
an option which I consider extremely important (in fact, I think it
should be the default) to be hidden under "less common command
switches".


> "page-break-inside: avoid;"

I see that I wasn't clear enough what I meant with "a pagebreak in the
middle of a line", so some screenshots may help:

http://www.hjp.at/junk/ss-wkhtmltopdf1.png
http://www.hjp.at/junk/ss-wkhtmltopdf2.png

As you can see, the last line of the page is split *horizontally*
slightly above the baseline in both cases - the descenders appear at the
top of the next page. That's clearly a bug and not something
"page-break-inside: avoid;" is supposed to fix. "page-break-inside:
avoid;" avoids pagebreaks within an element, e.g. a paragraph, but that
isn't the problem here.


> http://www.smashingmagazine.com/2007/02/21/printing-the-web-solutions-and-techniques/

Nice collection of links, although I'm not sure why you mention them.

> http://code.google.com/p/wkhtmltopdf/issues/detail?id=9

Yup, my problem number 2 is mentioned in comment 4 here. I already found
that before posting.

> http://code.google.com/p/wkhtmltopdf/issues/detail?id=57

Different problem.

> http://search.cpan.org/~tbr/WKHTMLTOPDF-0.02/lib/WKHTMLTOPDF.pm

Ouch! My eyes! Couldn't he have named the thing WkHTMLtoPDF of
WkHtmlToPdf, or something? ;-).

hp

From: Dr.Ruud on
Peter J. Holzer wrote:
> On 2010-06-11 23:58, Dr.Ruud <rvtol+usenet(a)xs4all.nl> wrote:

>> This should help:
>> --print-media-type
>
> That was the option I was looking for. I guess I didn't expect to find
> an option which I consider extremely important (in fact, I think it
> should be the default) to be hidden under "less common command
> switches".

Yes, I also don't understand why "they" did it like that, it makes it
all unnecessary less easy to understand.
But it still all works reasonably well, we create many thousands of
unique PDFs daily with it.


>> "page-break-inside: avoid;"
>
> I see that I wasn't clear enough what I meant with "a pagebreak in the
> middle of a line" [...]
> the last line of the page is split *horizontally*
> slightly above the baseline

That's what I understood, and I assumed that you could prevent that by
giving the element that attribute. BTW, the default page size is A4.


The manual says:

<quote>
Page Breaking

The current page breaking algorithm of WebKit leaves much to be
desired. Basically webkit will render everything into one long page,
and then cut it up into pages. This means that if you have two columns
of text where one is vertically shifted by half a line, then webkit
will cut a line into to pieces display the top half on one page, and
the bottom half on another page. It will also break image in two and so
on. If you are using the patched version of QT you can use the CSS
page-break-inside property to remedy this somewhat. There is no easy
solution to this problem, until this is solved try organising your HTML
documents such that it contains many lines on which pages can be cut
cleanly.

See also:
<http://code.google.com/p/wkhtmltopdf/issues/detail?id=9>,
<http://code.google.com/p/wkhtmltopdf/issues/detail?id=33> and
<http://code.google.com/p/wkhtmltopdf/issues/detail?id=57>.
</quote>

Fonts (and Qt's QPrinter::ScreenResolution) also can cause issues:
http://code.google.com/p/wkhtmltopdf/issues/detail?id=72

--
Ruud