Programmatically discovering encoding types supported by codecsmodule [Python]

Prev: Programmatically discovering encoding types supported by codecs module
Next: the Python Foundation

From: Gabriel Genellina on 24 Mar 2010 13:39

En Wed, 24 Mar 2010 13:17:16 -0300, <python(a)bdurham.com> escribi�:

> Is there a way to programmatically discover the encoding types
> supported by the codecs module?
>
> For example, the following link shows a table with Codec,
> Aliases, and Language columns.
> http://docs.python.org/library/codecs.html#standard-encodings
>
> I'm looking for a way to programmatically generate this table
> through some form of module introspection.

After looking at how things are done in codecs.c and encodings/__init__.py
I think you should enumerate all modules in the encodings package that
define a getregentry function.
Aliases come from encodings.aliases.aliases.

--
Gabriel Genellina

From: python on 24 Mar 2010 13:55

Benjamin,

> According to my brief messing around with the REPL, encodings.aliases.aliases is a good place to start. I don't know of any way to get the Language column, but at the very least that will give you most of the supported encodings and any aliases they have.

Thank you - that's exactly the type of information I was looking for.

I'm including the following for anyone browsing the mailing list
archives in the future.

Here's the snippet we're using to dynamically generate the codec
documentation posted on the docs.python website.

import encodings
encodingDict = encodings.aliases.aliases
encodingType = dict()
for key, value in encodingDict.items():
if value not in encodingType:
encodingType[ value ] = list()
encodingType[ value ].append( key )

for key in sorted( encodingType.keys() ):
aliases = sorted( encodingType[ key ] )
aliases = ', '.join( aliases )
print '%-20s%s' % ( key, aliases )

Regards,
Malcolm

From: Gabriel Genellina on 24 Mar 2010 18:50

En Wed, 24 Mar 2010 14:58:47 -0300, <python(a)bdurham.com> escribi�:

>> After looking at how things are done in codecs.c and
>> encodings/__init__.py I think you should enumerate all modules in the
>> encodings package that define a getregentry function. Aliases come from
>> encodings.aliases.aliases.
>
> Thanks for looking into this for me. Benjamin Kaplan made a similar
> observation. My reply to him included the snippet of code we're using to
> generate the actual list of encodings that our software will support
> (thanks to Python's codecs and encodings modules).

I was curious as whether both methods would give the same results:

py> modules=set()
py> for name in glob.glob(os.path.join(encodings.__path__[0], "*.py")):
.... name = os.path.basename(name)[:-3]
.... try: mod = __import__("encodings."+name,
fromlist=['ilovepythonbutsometimesihateit'])
.... except ImportError: continue
.... if hasattr(mod, 'getregentry'):
.... modules.add(name)
....
py> fromalias = set(encodings.aliases.aliases.values())
py> fromalias - modules
set(['tactis'])
py> modules - fromalias
set(['charmap',
'cp1006',
'cp737',
'cp856',
'cp874',
'cp875',
'idna',
'iso8859_1',
'koi8_u',
'mac_arabic',
'mac_centeuro',
'mac_croatian',
'mac_farsi',
'mac_romanian',
'palmos',
'punycode',
'raw_unicode_escape',
'string_escape',
'undefined',
'unicode_escape',
'unicode_internal',
'utf_8_sig'])

There is a missing 'tactis' encoding (?) and about twenty without alias.

--
Gabriel Genellina

|
Pages: 1
Prev: Programmatically discovering encoding types supported by codecs module
Next: the Python Foundation