Garbage In, Awesome Out

Ctrl-C and Ctrl-V may be the world’s best-known keyboard shortcuts, but they’re fraught with deceit. Copying & pasting creates a false sense of security that what you see is always what you’re going to get. It is not.

Try copy-pasting text from a PDF, special characters from MS Word, or logs from your console. At some point, you’ll run into gobbledygook like �, ö or, åŒ. That’s called mojibake and it’s your computer’s way of saying, “I have no idea what this means.”

To avoid these embarrassing situations, run your text through ftfy, a Python library that converts most Unicode text into UTF-8, (the de facto web standard).

Created by Rob Speer at Luminoso, ftfy can also uncurl quote marks, strip out control/color sequences, and convert HTML entities back into their original text. Basically, it’s a magical Unicode unicorn.

