|
Prev: Summing numbers from a list to a goal
Next: stellenangebot in deutschland Einzelhandelkauffrau Einzelhandelkaufmann Maurer Maurerin Baecker Baeckerin arbeitsstellen deutschland siemens deutschland stellenangebote vertrieb Biologielaborant Biologielaborantin stellenangebote markt
From: gmagklaras on 1 Jul 2008 07:03 This question is with reference to RFC 4648 (http://tools.ietf.org/ html/rfc4648#section-3.4) addressing the canonical encoding format. What's the common practice for sorting base64 numbers? One could in theory construct a comparator function as part of a standard sort procedure, according to the values of the base64 alphabet which could briefly have the valid symbols in order: A to Z, a to z, 0 to 9, + / However, if one wanted to implement alphabetical (asciibetical) order, ASCII assigns a different order value to the above symbols: +-,0 to 9, A to Z, a to z Is there any preference or reason to stick to one or the other sorting method based on the priority order when dealing with base64 encoded values? References would be greatly appreciated. Thanks. GM
From: [Jongware] on 1 Jul 2008 10:22 gmagklaras(a)gmail.com wrote: > This question is with reference to RFC 4648 (http://tools.ietf.org/ > html/rfc4648#section-3.4) addressing the canonical encoding format. > What's the common practice for sorting base64 numbers? One could in > theory construct a comparator function as part of a standard sort > procedure, according to the values of the base64 alphabet which could > briefly have the valid symbols in order: > > A to Z, a to z, 0 to 9, + / > > However, if one wanted to implement alphabetical (asciibetical) order, > ASCII assigns a different order value to the above symbols: > > +-,0 to 9, A to Z, a to z > > Is there any preference or reason to stick to one or the other sorting > method based on the priority order when dealing with base64 encoded > values? Why would one want to sort numbers in /any/ base alphabetically? Consider the list (5, 6, 40, 41, 201, 202). Sorted by ASCII codes it would come out as (201, 202, 40, 41, 5, 6), which has no numeric meaning at all. To sort numbers (or, more generally, to compare two numbers in any base), you have to compare numbers with the same number of digits. Besides that, for any non-digit character, you'll have to consider its assigned numerical value, i.e., in your case that string "A to Z, a to z, 0 to 9, + /". I'm a bit surprised about your ordering string -- I'd guess it should be "0 to 9, A to Z, a to z, + /", so digits lower than a real value of 10 are displayed in their familiar form "-1, 0, 1, 2, 3", as opposed to what you state -- "-A, ..?, A, B, C". [Exactly one wiki later:] Aha, "[..] indices into the string: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"" -- without regarding any numerical value. From reading the wiki, it's not correct to consider this a base64 *number*; it's at best a base64 *string* (as the resulting value has no numerical meaning). Still, for any sorting you should stick to the definition: the index order in the predefined string. >References would be greatly appreciated. http://en.wikipedia.org/wiki/Base64 mentions a number of useful links. [Jongware]
From: Greg Herlihy on 1 Jul 2008 19:07 On Jul 1, 4:03 am, "gmagkla...(a)gmail.com" <gmagkla...(a)gmail.com> wrote: > This question is with reference to RFC 4648 (http://tools.ietf.org/ > html/rfc4648#section-3.4) addressing the canonical encoding format. > What's the common practice for sorting base64 numbers? Presumably one sorts base-64 numbers as one would sort numbers of any other base - lowest to highest. This question seems to have nothing to do with RFC 4648 (which describes a Base64 data -encoding- protocol). > One could in theory construct a comparator function as part of a standard sort > procedure, according to the values of the base64 alphabet which could > briefly have the valid symbols in order: > > A to Z, a to z, 0 to 9, + / > > However, if one wanted to implement alphabetical (asciibetical) order, > ASCII assigns a different order value to the above symbols: > > +-,0 to 9, A to Z, a to z > > Is there any preference or reason to stick to one or the other sorting > method based on the priority order when dealing with base64 encoded > values? References would be greatly appreciated. Let's try answering these questions by sorting some actual, base64- encoded data. For the sample data, I have created (and base64-encoded) a list with the names of ten, common fruits. So, the task here is to sort the fruits on my list, alphabetically by (base64-encoded) name. Here is the (base64-encoded) list to sort: begin-base64 644 fruits.txt b3JhbmdlCmFwcGxlCnBlYWNoCmdyYXBlZnJ1aXQKcGVhcgpncmFwZQphcHJpY290CmxlbW9uCm5l Y3RhcmluZQp0YW5nZXJpbmU= ==== Now, it strikes me that none of the proposed base64 sorting schemes are likely to be at all effective at sorting the fruit names on my list. In fact, this list is not even recognizable as such until it has been decoded - at which point the names in the list can be sorted quite easily. So the answer is that there is convention for sorting base64 encoded data, because there is no way to sort the data without first decoding it. After all, data encoding is simply a protocol of mapping a set of data values to another set of corresponding, encoded values (and back again) solely for the purpose of transporting the data safely. And as long as the data is specified by the set of encoded data values - the data itself is not accessible. Therefore, it makes little sense to discuss ways of sorting or searching or otherwise doing anything with the encoded data - other than decoding it. Greg
From: gmagklaras on 2 Jul 2008 04:59 On 2 Jul, 01:07, Greg Herlihy <gre...(a)mac.com> wrote: > On Jul 1, 4:03 am, "gmagkla...(a)gmail.com" <gmagkla...(a)gmail.com> > wrote: > > > This question is with reference to RFC 4648 (http://tools.ietf.org/ > > html/rfc4648#section-3.4) addressing the canonical encoding format. > > What's the common practice for sorting base64 numbers? > > Presumably one sorts base-64 numbers as one would sort numbers of any > other base - lowest to highest. This question seems to have nothing to > do with RFC 4648 (which describes a Base64 data -encoding- protocol). > > > One could in theory construct a comparator function as part of a standard sort > > procedure, according to the values of the base64 alphabet which could > > briefly have the valid symbols in order: > > > A to Z, a to z, 0 to 9, + / > > > However, if one wanted to implement alphabetical (asciibetical) order, > > ASCII assigns a different order value to the above symbols: > > > +-,0 to 9, A to Z, a to z > > > Is there any preference or reason to stick to one or the other sorting > > method based on the priority order when dealing with base64 encoded > > values? References would be greatly appreciated. > > Let's try answering these questions by sorting some actual, base64- > encoded data. > > For the sample data, I have created (and base64-encoded) a list with > the names of ten, common fruits. So, the task here is to sort the > fruits on my list, alphabetically by (base64-encoded) name. Here is > the (base64-encoded) list to sort: > > begin-base64 644 fruits.txt > b3JhbmdlCmFwcGxlCnBlYWNoCmdyYXBlZnJ1aXQKcGVhcgpncmFwZQphcHJpY290CmxlbW9uCm5l > Y3RhcmluZQp0YW5nZXJpbmU= > ==== > > Now, it strikes me that none of the proposed base64 sorting schemes > are likely to be at all effective at sorting the fruit names on my > list. In fact, this list is not even recognizable as such until it has > been decoded - at which point the names in the list can be sorted > quite easily. > Greg thanks for your answer. You assume that I encode/encapsulate data in Base64. I should have said that this is not the case. In fact, what we have is SHA-1 digest values produced to identify uniquely protein sequences, as part of a bioinformatics project. The digest values are 27 character long digests (without the padding) as specified here: http://bioinformatics.anl.gov/seguid/overview.aspx As part of an index generation process, we need to sort a list/array of these values and produce a new identifier. Thus, my question about the practice of sorting base64 values. I should of course had been more specific. GM
From: Sigmund Lappegård Lahn on 7 Jul 2008 05:49 gmagklaras(a)gmail.com wrote: > On 2 Jul, 01:07, Greg Herlihy <gre...(a)mac.com> wrote: >> On Jul 1, 4:03 am, "gmagkla...(a)gmail.com" <gmagkla...(a)gmail.com> >> wrote: >> (snip------------------) >> > Greg thanks for your answer. You assume that I encode/encapsulate data > in Base64. I should have said that this is not the case. In fact, what > we have is SHA-1 digest values produced to identify uniquely protein > sequences, as part of a bioinformatics project. The digest values are > 27 character long digests (without the padding) as specified here: > > http://bioinformatics.anl.gov/seguid/overview.aspx > > As part of an index generation process, we need to sort a list/array > of these values and produce a new identifier. Thus, my question about > the practice of sorting base64 values. I should of course had been > more specific. > > GM Here is a sketch of a compare function for two base64 strings of length 27. Havn't actually tried it, but I think you get my drift. int base64_charvalue(const char c) { if(c >= 'A' && c <= 'Z') return c - 'A'; else if(c >= 'a' && c <= 'z') return 'Z'-'A' + c - 'a'; else if(c >= '0' && c <= '9') return 'Z'-'A'+'z'-'a' + c - '0'; else //must be '+' or '/' now return 'Z'-'A'+'z'-'a'+'0'-'9'+ (c=='+'?1:2); } int base64_cmp(const char* stra, const char* strb) { int i = 0, a, b; while(i < 27 && str[i] == strb[i] ) { ++i; } if(i == 27) return 0; //strings are equal. a = base64_charvalue(stra[i]); b = base64_charvalue(strb[i]); return a-b; } - Sigmund
|
Next
|
Last
Pages: 1 2 Prev: Summing numbers from a list to a goal Next: stellenangebot in deutschland Einzelhandelkauffrau Einzelhandelkaufmann Maurer Maurerin Baecker Baeckerin arbeitsstellen deutschland siemens deutschland stellenangebote vertrieb Biologielaborant Biologielaborantin stellenangebote markt |