From: joblack on
All right I found the web link. It's an improvement to the pdf miner
project (adds pdf dump methods).

http://pastebin.com/P8SWj5YK

From: Cameron Simpson on
On 13Jul2010 05:56, joblack <johannes.black(a)gmail.com> wrote:
| Thanks for the answers so far. It's not my code I'm just curious how
| that could happen:
|
| Starting point:
| ...
| self.status['text'] = 'Processing ...'
| try:
| cli_main(argv)
| except Exception, e:
| self.status['text'] = 'Error: ' + str(e)
| return
| ...
| cli_main:
|
| keypath, inpath, outpath = argv[1:]
| ...
| with open(inpath, 'rb') as inf:
| serializer = PDFSerializer(inf, keypath)
| with open(outpath, 'wb') as outf:
| filenr = outf.fileno()
| serializer.dump(outf)
| return 0
|
| PDFSerializer.dump:
|
| def dump(self, outf):
| self.outf = outf
| ...


See that you set serializer.outf to the outf you open in cli_main?
Any attempt to use serializer _after_ exiting the "with open(outpath,
'wb') as outf" will use serializer.outf, but the outf is now closed.
And thus its file descriptor is invalid.

BTW, by catching Exception in the starting point code you prevent
yourself seeing exactly which line throws the error. It is usualy a bad
idea to catch broad things like "Exception". It is normally better to
place try/except around very small pieces of code and to catch very
specific things. That way you know exactly what's going wrong and don't
quietly catch all sorts of unplanned stuff.

Cheers,
--
Cameron Simpson <cs(a)zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

When buying and selling are controlled by legislation, the first things
bought and sold are the legislators. - P.J. O'Rourke
From: joblack on
> |
> | Starting point:
> | ...
> |         self.status['text'] = 'Processing ...'
> |         try:
> |             cli_main(argv)
> |         except Exception, e:
> |             self.status['text'] = 'Error: ' + str(e)
> |             return
> | ...
> | cli_main:
> |
> |     keypath, inpath, outpath = argv[1:]
> | ...
> |     with open(inpath, 'rb') as inf:
> |         serializer = PDFSerializer(inf, keypath)
> |         with open(outpath, 'wb') as outf:
> |             filenr = outf.fileno()
> |             serializer.dump(outf)
> |     return 0
> |
> | PDFSerializer.dump:
> |
> |     def dump(self, outf):
> |         self.outf = outf
> | ...
>
> See that you set serializer.outf to the outf you open in cli_main?
> Any attempt to use serializer _after_ exiting the "with open(outpath,
> 'wb') as outf" will use serializer.outf, but the outf is now closed.
> And thus itsfiledescriptoris invalid.

Okay, I changed it to a try: ... finally: block where I open the file
and in finally I close it. Nothing has changed. The error still
occures.

Doesn't the

with open(outpath, 'wb') as outf:

clause has to wait until the pdfserialiser.dump method has finished
anyway? IMHO it can't just call it and immediately close it.

At least the try: finally: construct should work? Or does it the same
(call the method and immediately jump to the finally close)?

Would it work if I would write:

with closing(outpath, 'wb') as outf: ?

I'm a little bit confused about Python's strange processing ...
From: Thomas Jollans on
On 07/14/2010 01:21 PM, joblack wrote:
>> |
>> | Starting point:
>> | ...
>> | self.status['text'] = 'Processing ...'
>> | try:
>> | cli_main(argv)
>> | except Exception, e:
>> | self.status['text'] = 'Error: ' + str(e)
>> | return
>> | ...
>> | cli_main:
>> |
>> | keypath, inpath, outpath = argv[1:]
>> | ...
>> | with open(inpath, 'rb') as inf:
>> | serializer = PDFSerializer(inf, keypath)
>> | with open(outpath, 'wb') as outf:
>> | filenr = outf.fileno()
>> | serializer.dump(outf)
>> | return 0
>> |
>> | PDFSerializer.dump:
>> |
>> | def dump(self, outf):
>> | self.outf = outf
>> | ...
>>
>> See that you set serializer.outf to the outf you open in cli_main?
>> Any attempt to use serializer _after_ exiting the "with open(outpath,
>> 'wb') as outf" will use serializer.outf, but the outf is now closed.
>> And thus itsfiledescriptoris invalid.
>
> Okay, I changed it to a try: ... finally: block where I open the file
> and in finally I close it. Nothing has changed. The error still
> occures.

Where does the error occur? If Cameron is right, it occurs somewhere
completely different, when serializer.dump() is already long done, when
some unsuspecting fool tries to do something with serializer.outf (such
as closing it)

>
> Doesn't the
>
> with open(outpath, 'wb') as outf:
>
> clause has to wait until the pdfserialiser.dump method has finished
> anyway? IMHO it can't just call it and immediately close it.
>
> At least the try: finally: construct should work? Or does it the same
> (call the method and immediately jump to the finally close)?
>
> Would it work if I would write:
>
> with closing(outpath, 'wb') as outf: ?
>
> I'm a little bit confused about Python's strange processing ...

From: joblack on
> Where does the error occur? If Cameron is right, it occurs somewhere
> completely different, when serializer.dump() is already long done, when
> some unsuspecting fool tries to do something with serializer.outf (such
> as closing it)
I have found it but still I don't get it.

Dump looks like this:

....

File "C: ... ineptpdf8.4.20.pyw", line 1266, in initialize return
self.initialize_fopn(docid, param)
File "C: ... ineptpdf8.4.20.pyw", line 1411, in initialize_fopn print
buildurl

IOError: [Errno 9] Bad file descriptor

Two strange things:

1st: in the buildurl is only the url (as a string) to open (something
like http://www.serverblaba.com/asdfasdf?wdfasdf=4&awfwf=34 ...)
2nd: it works if I start in in IDLE, it doesn't work if I double klick
on the .pyw file.

How can printing out a string throw a IOError exception? I'm quite
puzzled.