ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [Paramiko] SFTP open readline 사용시 encoding 에러
    Software Development/Python 2020. 2. 7. 09:21

    Paramiko로 Python에서 SFTP를 사용할 수 있습니다.

     

    원격지에 있는 컴퓨터와 프로그램이 실행되는 컴퓨터가 OS가 다르거나, 문자열 encoding 방식이 다를 때 UnicodeDecodeError를 볼 수 있습니다. 

     

    Paramiko를 사용해서 원격지에 있는 txt 파일을 open의 readline을 이용하여 한 줄씩 읽으려는데 UnicodeDecodeError가 발생했습니다.

     

    readline의 함수에 들어가는 파라미터가 없는데

    Paramiko가 있는 패키지에서 해당 파이썬 파일을 아래와 같이 수정하면

     

    py3compat.py    

    def u()를 

        def u(s, encoding="utf8"):  # NOQA
            """cast bytes or unicode to unicode"""
            if isinstance(s, str):
                return s.decode(encoding)
            elif isinstance(s, unicode):  # NOQA
                return s
            elif isinstance(s, buffer):  # NOQA
                return s.decode(encoding)
            else:
                raise TypeError("Expected unicode or bytes, got {!r}".format(s))

    와 같이 변경하고 #encoding이라는 매개변수를 추가해서 "utf8"이라고 하드코딩된 부분을 교체했습니다.

     

    file.py

    def readline(self)를 

        def readline(self, size=None, encoding="utf8"):
            """
            Read one entire line from the file.  A trailing newline character is
            kept in the string (but may be absent when a file ends with an
            incomplete line).  If the size argument is present and non-negative, it
            is a maximum byte count (including the trailing newline) and an
            incomplete line may be returned.  An empty string is returned only when
            EOF is encountered immediately.
    
            .. note::
                Unlike stdio's ``fgets``, the returned string contains null
                characters (``'\\0'``) if they occurred in the input.
    
            :param int size: maximum length of returned string.
            :returns:
                next line of the file, or an empty string if the end of the
                file has been reached.
    
                If the file was opened in binary (``'b'``) mode: bytes are returned
                Else: the encoding of the file is assumed to be UTF-8 and character
                strings (`str`) are returned
            """
            # it's almost silly how complex this function is.
            if self._closed:
                raise IOError("File is closed")
            if not (self._flags & self.FLAG_READ):
                raise IOError("File not open for reading")
            line = self._rbuffer
            truncated = False
            while True:
                if (
                    self._at_trailing_cr
                    and self._flags & self.FLAG_UNIVERSAL_NEWLINE
                    and len(line) > 0
                ):
                    # edge case: the newline may be '\r\n' and we may have read
                    # only the first '\r' last time.
                    if line[0] == linefeed_byte_value:
                        line = line[1:]
                        self._record_newline(crlf)
                    else:
                        self._record_newline(cr_byte)
                    self._at_trailing_cr = False
                # check size before looking for a linefeed, in case we already have
                # enough.
                if (size is not None) and (size >= 0):
                    if len(line) >= size:
                        # truncate line
                        self._rbuffer = line[size:]
                        line = line[:size]
                        truncated = True
                        break
                    n = size - len(line)
                else:
                    n = self._bufsize
                if linefeed_byte in line or (
                    self._flags & self.FLAG_UNIVERSAL_NEWLINE and cr_byte in line
                ):
                    break
                try:
                    new_data = self._read(n)
                except EOFError:
                    new_data = None
                if (new_data is None) or (len(new_data) == 0):
                    self._rbuffer = bytes()
                    self._pos += len(line)
                    return line if self._flags & self.FLAG_BINARY else u(line)
                line += new_data
                self._realpos += len(new_data)
            # find the newline
            pos = line.find(linefeed_byte)
            if self._flags & self.FLAG_UNIVERSAL_NEWLINE:
                rpos = line.find(cr_byte)
                if (rpos >= 0) and (rpos < pos or pos < 0):
                    pos = rpos
            if pos == -1:
                # we couldn't find a newline in the truncated string, return it
                self._pos += len(line)
                return line if self._flags & self.FLAG_BINARY else u(line)
            xpos = pos + 1
            if (
                line[pos] == cr_byte_value
                and xpos < len(line)
                and line[xpos] == linefeed_byte_value
            ):
                xpos += 1
            # if the string was truncated, _rbuffer needs to have the string after
            # the newline character plus the truncated part of the line we stored
            # earlier in _rbuffer
            if truncated:
                self._rbuffer = line[xpos:] + self._rbuffer
            else:
                self._rbuffer = line[xpos:]
    
            lf = line[pos:xpos]
            line = line[:pos] + linefeed_byte
            if (len(self._rbuffer) == 0) and (lf == cr_byte):
                # we could read the line up to a '\r' and there could still be a
                # '\n' following that we read next time.  note that and eat it.
                self._at_trailing_cr = True
            else:
                self._record_newline(lf)
            self._pos += len(line)
            return line if self._flags & self.FLAG_BINARY else u(line, encoding=encoding)

    와 같이 변경하면 #위에서 변경한 u()함수에 매개변수를 전달하기 위한 부분만 수정했습니다.

     

    "readline(encoding = 'euc-kr')"와 같이 원하는 encoding 값을 전달하면 

    readline함수 사용시 발생할 수 있는 encoding 오류를 해결할 수 있습니다.

    댓글

Designed by Tistory.