-
[Paramiko] SFTP open readline 사용시 encoding 에러Software Development/Python 2020. 2. 7. 09:21
Paramiko로 Python에서 SFTP를 사용할 수 있습니다.
원격지에 있는 컴퓨터와 프로그램이 실행되는 컴퓨터가 OS가 다르거나, 문자열 encoding 방식이 다를 때 UnicodeDecodeError를 볼 수 있습니다.
Paramiko를 사용해서 원격지에 있는 txt 파일을 open의 readline을 이용하여 한 줄씩 읽으려는데 UnicodeDecodeError가 발생했습니다.
readline의 함수에 들어가는 파라미터가 없는데
Paramiko가 있는 패키지에서 해당 파이썬 파일을 아래와 같이 수정하면
py3compat.py
def u()를
def u(s, encoding="utf8"): # NOQA """cast bytes or unicode to unicode""" if isinstance(s, str): return s.decode(encoding) elif isinstance(s, unicode): # NOQA return s elif isinstance(s, buffer): # NOQA return s.decode(encoding) else: raise TypeError("Expected unicode or bytes, got {!r}".format(s))
와 같이 변경하고 #encoding이라는 매개변수를 추가해서 "utf8"이라고 하드코딩된 부분을 교체했습니다.
file.py
def readline(self)를
def readline(self, size=None, encoding="utf8"): """ Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. An empty string is returned only when EOF is encountered immediately. .. note:: Unlike stdio's ``fgets``, the returned string contains null characters (``'\\0'``) if they occurred in the input. :param int size: maximum length of returned string. :returns: next line of the file, or an empty string if the end of the file has been reached. If the file was opened in binary (``'b'``) mode: bytes are returned Else: the encoding of the file is assumed to be UTF-8 and character strings (`str`) are returned """ # it's almost silly how complex this function is. if self._closed: raise IOError("File is closed") if not (self._flags & self.FLAG_READ): raise IOError("File not open for reading") line = self._rbuffer truncated = False while True: if ( self._at_trailing_cr and self._flags & self.FLAG_UNIVERSAL_NEWLINE and len(line) > 0 ): # edge case: the newline may be '\r\n' and we may have read # only the first '\r' last time. if line[0] == linefeed_byte_value: line = line[1:] self._record_newline(crlf) else: self._record_newline(cr_byte) self._at_trailing_cr = False # check size before looking for a linefeed, in case we already have # enough. if (size is not None) and (size >= 0): if len(line) >= size: # truncate line self._rbuffer = line[size:] line = line[:size] truncated = True break n = size - len(line) else: n = self._bufsize if linefeed_byte in line or ( self._flags & self.FLAG_UNIVERSAL_NEWLINE and cr_byte in line ): break try: new_data = self._read(n) except EOFError: new_data = None if (new_data is None) or (len(new_data) == 0): self._rbuffer = bytes() self._pos += len(line) return line if self._flags & self.FLAG_BINARY else u(line) line += new_data self._realpos += len(new_data) # find the newline pos = line.find(linefeed_byte) if self._flags & self.FLAG_UNIVERSAL_NEWLINE: rpos = line.find(cr_byte) if (rpos >= 0) and (rpos < pos or pos < 0): pos = rpos if pos == -1: # we couldn't find a newline in the truncated string, return it self._pos += len(line) return line if self._flags & self.FLAG_BINARY else u(line) xpos = pos + 1 if ( line[pos] == cr_byte_value and xpos < len(line) and line[xpos] == linefeed_byte_value ): xpos += 1 # if the string was truncated, _rbuffer needs to have the string after # the newline character plus the truncated part of the line we stored # earlier in _rbuffer if truncated: self._rbuffer = line[xpos:] + self._rbuffer else: self._rbuffer = line[xpos:] lf = line[pos:xpos] line = line[:pos] + linefeed_byte if (len(self._rbuffer) == 0) and (lf == cr_byte): # we could read the line up to a '\r' and there could still be a # '\n' following that we read next time. note that and eat it. self._at_trailing_cr = True else: self._record_newline(lf) self._pos += len(line) return line if self._flags & self.FLAG_BINARY else u(line, encoding=encoding)
와 같이 변경하면 #위에서 변경한 u()함수에 매개변수를 전달하기 위한 부분만 수정했습니다.
"readline(encoding = 'euc-kr')"와 같이 원하는 encoding 값을 전달하면
readline함수 사용시 발생할 수 있는 encoding 오류를 해결할 수 있습니다.
'Software Development > Python' 카테고리의 다른 글
Python Decorator 란? (0) 2020.04.10 Python metaclasses, singleton pattern (0) 2020.04.10 NumPy 알아보기 (0) 2020.04.06 [Python] 순서를 유지하면서 리스트의 연속된 중복 제거하기 (0) 2020.02.11 [Python] 순서를 유지하면서 리스트의 중복 제거하기 (0) 2020.01.31