I noticed a while back that there was something fishy with perl's built-in sort when dealing with Unicode text. Doing some research made me eventually notice the Unicode Collation Algorithm (UCA) and the perl implementation in Unicode::Collate and the very useful Unicode::Collate::Locale. Thanks a lot