Garbage In, Awesome Out08.21.14 · python
Ctrl-V may be the world’s best-known keyboard shortcuts, but they’re fraught with deceit. Copying & pasting creates a false sense of security that what you see is always what you’re going to get. It is not.
Try copy-pasting text from a PDF, special characters from MS Word, or logs from your console. At some point, you’ll run into gobbledygook like �, Ã¶ or, åŒ. That’s called mojibake and it’s your computer’s way of saying, “I have no idea what this means.”
To avoid these embarrassing situations, run your text through ftfy, a Python library that converts most Unicode text into UTF-8, (the de facto web standard).
Created by Rob Speer at Luminoso, ftfy can also uncurl quote marks, strip out control/color sequences, and convert HTML entities back into their original text. Basically, it’s a magical Unicode unicorn.