Fix for UnicodeDecodeError#12
Fix for UnicodeDecodeError#12alexjj wants to merge 1 commit intoOpenTechSchool:gh-pagesfrom alexjj:patch-1
Conversation
|
Hi alexjj, Thanks for sending this PR! Encodings are a nightmare, and I think you're right this is definitely the right way to do it. I think UTF-8 should be the default Python 3 encoding on OS X and Linux if the locale is set to utf-8 (it usually is). But Windows and some Linux configs, and maybe Python 2, will probably all get errors like you did. Yay! The only qualm I have about merging the change is that we should probably explain in an aside what's happening here, at least pointing out that we've added this encoding argument and roughly what Unicode is and why we have to care. Are you able to add something like that to the PR? Cheers, Angus |
|
Is there any way we could dodge the topic of encodings here? I think encodings are too important to squeeze them in here. This chapter is about CSV techniques — any explanation is either going to be too short to do encodings justice, or too long and confuse learners. And I'm generally not a friend of "here is this boilerplate — copy it and everything will be happy sparkles." One possible way is rehosting the CSV files with all non-ASCII characters stripped, so that it should work in almost any encoding by accident. (Fun fact: OpenFlights.org claims the file is "ISO 8859-1 (Latin-1) encoded," so any reasonable learner might be doubly confused by Python 2 users will just get another error with this fix by the way, since its open function does not accept the |
|
Good points. For me on Windows with Python 3 I needed to specify the |
|
ISO 8859-1 works in the way that it doesn't throw an exception. The file is encoded in UTF-8, so you will get weird, scrambled, erroneous output though. I have filed jpatokal/openflights#405 to fix the OpenFlights docs, but my other points remain. |
I was getting UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 error, and this resolved it.