Apparently, under Windows, Python does a UTF-16 word-by-word comparison when comparing two strings: >>> u'\ud700' < u'\U0001d41a' True >>> u'\ue000' < u'\U0001d41a' False Fix it by encoding as UTF-32 big endian before comparison, when that happens.