Downloading only a particular page/selective pages of a document [Linux Networking]

Prev: dd command for remote disk copy?
Next: ar5b95 wifi adapter

From: Joe Beanfish on 22 Jan 2010 12:47

karthikbalaguru wrote:
> On Jan 19, 2:31 pm, Rob Robason <r...(a)robason.net> wrote:
>> karthikbalaguru wrote:
>>> On Jan 14, 1:15 pm, David Brown <da...(a)westcontrol.removethisbit.com>
>>> wrote:
>>>> Dusko Savatovic wrote:
>>>>> "karthikbalaguru" <karthikbalagur...(a)gmail.com> wrote in message
>>>>> news:19dbd3b7-1b3c-41d4-9ea2-4559c156ca1b(a)l30g2000yqb.googlegroups.com...
>>>>>> Hi,
>>>>>> Many tools have the feature of
>>>>>> printing a particular page or
>>>>>> a continous range of selective
>>>>>> pages.
>>>>>> Similarly,
>>>>>> Is there a free tool/protocol that
>>>>>> will help in downloading a particular
>>>>>> page of the document rather than
>>>>>> the whole document(PDF or Word
>>>>>> document or Linux based document
>>>>>> files or PPT file or Excel file) ?- Hide quoted text -
>>>> - Show quoted text -
>>>>> Hi Karthik Balaguru,
>>>>> Your question is not related to server, networking and many other
>>>>> newsgroups.
>>>>> However, the short answer to your question is no.
>>>> For a slightly longer answer as to /why/ the answer is no, you have to
>>>> look at the file formats. It is not necessarily impossible to view
>>>> early pages of a document before the end of the file is read in (acrobat
>>>> reader can do this with pdf files, for example), but later parts of the
>>>> files generally refer back to the earlier parts. This is immediately
>>>> obvious with any compressed file format (such as pdfs, or open document
>>>> files), and with any binary-format file where the file reader
>>>> application cannot interpret later parts without having first read
>>>> earlier parts.
>>>> You could always try asking the question in relevant newsgroups instead
>>>> of the random collection you've picked here, but you won't get any other
>>>> short answer, and I doubt if you'll get a long answer that helps you
>>>> much more.
>>> I have checked the same in the below
>>> link -
>>> http://serverfault.com/questions/102608/tool-to-download-only-selecti...
>>> Maybe, i might even check with the
>>> fileformat specific groups like
>>> acrobat(pdf), windows(doc/ppt/xls).
>>> openoffice.org and others.
>> Karthik,
>>
>> I think you're missing some important constraints. The format of the
>> file doesn't matter if the server only knows how to deliver it whole.
>> First, any solution to the question you're asking would require a
>> client/server dialog just for you to figure out which pages you want.
>> But who knows - before they view a document - which pages they want?
>
> True !
> Just wanted to clarify that I am considering
> scenarios in which everyone using a
> particular specification / standards might
> be knowing the section number and page
> numbers on which they have a query or
> debate so that only that particular page
> or selective pages can be dowloaded.
> The scenario is applicable only to
> documents that are well known to users.
>
>> Anyway, second: document structure only comes into play if you have a
>> server that knows how to deliver parts of that structure - meaning
>> that it has to have foreknowledge of the structure. Even highly
>> structured files like XML are delivered whole - and left for the
>> receiver to figure out.
>>
>> Given the variety of file formats out there, and the limited number of
>> these that have any consistent structure to actually support the kind
>> of feature you're suggesting, it seems unlikely to me that you'd end
>> up with any capability of value.
>>
>> It seems what you're really describing is, is simple terms, a database
>> query - to an infinitely flexible and intelligent database engine that
>> isn't dependent on, nay - even care about, the data structure. Given
>> that we can't even get computers and software smart enough to
>> interpret natural languages well, it seems a stretch to hope for them
>> to figure out all the convoluted ways people may put information
>> together in a document.
>>
>>
>>
>
> So, it is not straightforward as to finding the
> total pages, the corresponding page
> addresses or first page address and the
> offsets between pages directly from the
> file format header info/descriptors ?
>
> Karthik Balaguru

I've mentioned previously that Adobe acrobat reader can do exactly that
for pdf documents that have been "optimized for web" and are hosted on
a web server that supports "byte range serving". A certain amount of
the document is downloaded to get the table of contents as well as
global formatting info. Then the selected page is downloaded.

It's theoretically possible to do something similar for just about
any file format but I'm not aware that anyone has written any readers
capable of doing it. Also the documents would have to be similarly
optimized for web so the reader could download just the toc and global
info. I'm fairly certain there are no formats that support that.

So, yes, it *can* be done with a cooperative web optimized file format
and client side reader. Otherwise forget about it.

If you have any control over the situation you could put each page into
a database record and setup a search/lookup that could deliver any
chosen page. Or place each page in a separate file with a known naming
convention and deliver them directly.

|
Pages: 1
Prev: dd command for remote disk copy?
Next: ar5b95 wifi adapter