Filenames often contain numbers, e.g., picture17.png. If you apply the usual string comparison, two-digit numbers will be mixed with one-digit numbers:
picture 1.png picture 10.png picture 100.png picture 11.png picture 2.png picture 21.png picture 2_10.png picture 2_9.png picture 3.png picture 3b.png picture A.png
The “natural” order for alphanumeric data would be:
picture 1.png picture 2.png picture 2_9.png picture 2_10.png picture 3.png picture 3b.png picture 10.png picture 11.png picture 21.png picture 100.png picture A.png
Windows Explorer and other file managers sort your filenames in this order.
Here is a simple comparison function to use with qsort:
int compare_naturally1(const void* a_ptr, const void* b_ptr) {
const char * a = *(const char**)a_ptr;
const char * b = *(const char**)b_ptr;
while (*a == *b) {
if(*a == '\0')
return 0;
a++, b++;
}
int diff = strspn(a, "0123456789") - strspn(b, "0123456789");
if (diff)
return diff;
return *a - *b;
}
When the function finds the first difference, it counts the digits with strspn. The number with more digits is always larger (e.g., 100 > 2, even though '1' < '2'). If the count of digits is the same (e.g., 10 and 21), the values of characters are checked.
The problem is that the function undesirably sorts words before numbers (picture A < picture 1), because the words have zero digits. You cannot fix it by comparing only the items with digits, because this would break associativity (digits and letters would be compared incorrectly). A more complex algorithm is needed:
int count_digits(const char* s) { const char * p = s; while (isdigit(*p)) p++; return (int)(p - s); } int compare_naturally(const void* a_ptr, const void* b_ptr) { const char * a = *(const char**)a_ptr; const char * b = *(const char**)b_ptr; for (;;) { if (isdigit(*a) && isdigit(*b)) { int a_count = count_digits(a); int diff = a_count - count_digits(b); if (diff) return diff; diff = memcmp(a, b, a_count); if (diff) return diff; a += a_count; b += a_count; } if (*a != *b) return *a - *b; if (*a == '\0') return 0; a++, b++; } }
When the function finds a number in both strings, it counts the digits. Strspn is replaced here with a more effective custom function. If both numbers have an equal count of digits, their values are compared using memcmp.
Other solutions
- Martin Pool published a more complex comparison function, which works for decimal fractions and leading spaces. It's included in PHP as strnatcmp; he also provides links to similar functions in other languages.
- Jeff Atwood shows a short (but not efficient) implementation in Python by Ned Batchelder and Toothy.
- Zzspectrez dissects Perl implementation by Tye McQueen.
Leave your comment