Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures with cchardet-2.1.7 and chardet are installed #318

Open
mgorny opened this issue Jul 2, 2022 · 0 comments
Open

Test failures with cchardet-2.1.7 and chardet are installed #318

mgorny opened this issue Jul 2, 2022 · 0 comments

Comments

@mgorny
Copy link

mgorny commented Jul 2, 2022

When cchardet-2.1.7 and chardet-5.0.0 are both installed, the following tests fail.

FWICS two of them fail because of encoding name mismatches (expected is mixed-case, the value is uppercase), and two of them are recognized as a superset-encoding of the specified encoding (i.e. EUC-KR as UHC, and GB2312 as GB18030).


======================================================================
FAIL: test_001742 (__main__.TestCase)
./tests/illformed/chardet/windows1255.xml: windows-1255 with no encoding information
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/feedparser/tests/runtests.py", line 1191, in fn
    self.fail_unless_eval(xmlfile, eval_string)
  File "/tmp/feedparser/tests/runtests.py", line 177, in fail_unless_eval
    raise self.failureException(failure)
AssertionError: not eval(b"bozo and encoding == 'windows-1255'") 
WITH env({'bozo': True,
 'bozo_exception': CharacterEncodingOverride('document declared as utf-8, but parsed as WINDOWS-1255'),
 'content-type': '',
 'encoding': 'WINDOWS-1255',
 'entries': [{'summary': 'האם תדפיס נייר של אתר אינטרנט שמוצג על מסך משתמש הוא '
                         'העתק נאמן למקור של אתר האינטרנט? רבים יגידו שכן, '
                         'ולפעמים גם בתי המשפט יצטרפו אליהם שיקבלו פלט מאתר '
                         'אינטרנט כראיה קבילה. אבל, זה ממש לא כך. ויש אפילו '
                         'הוכחה מדהימה.',
              'summary_detail': {'base': '',
                                 'language': None,
                                 'type': 'text/html',
                                 'value': 'האם תדפיס נייר של אתר אינטרנט שמוצג '
                                          'על מסך משתמש הוא העתק נאמן למקור של '
                                          'אתר האינטרנט? רבים יגידו שכן, '
                                          'ולפעמים גם בתי המשפט יצטרפו אליהם '
                                          'שיקבלו פלט מאתר אינטרנט כראיה '
                                          'קבילה. אבל, זה ממש לא כך. ויש אפילו '
                                          'הוכחה מדהימה.'}}],
 'feed': {},
 'headers': {},
 'namespaces': {},
 'version': 'rss'})

======================================================================
FAIL: test_001746 (__main__.TestCase)
./tests/illformed/chardet/gb2312.xml: GB2312 with no encoding information
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/feedparser/tests/runtests.py", line 1191, in fn
    self.fail_unless_eval(xmlfile, eval_string)
  File "/tmp/feedparser/tests/runtests.py", line 177, in fail_unless_eval
    raise self.failureException(failure)
AssertionError: not eval(b"bozo and encoding == 'GB2312'") 
WITH env({'bozo': True,
 'bozo_exception': CharacterEncodingOverride('document declared as utf-8, but parsed as GB18030'),
 'content-type': '',
 'encoding': 'GB18030',
 'entries': [{'title': '不归移民漫画系列:专业工作',
              'title_detail': {'base': '',
                               'language': None,
                               'type': 'text/plain',
                               'value': '不归移民漫画系列:专业工作'}}],
 'feed': {},
 'headers': {},
 'namespaces': {},
 'version': 'rss'})

======================================================================
FAIL: test_001747 (__main__.TestCase)
./tests/illformed/chardet/euckr.xml: EUC-KR with no encoding information
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/feedparser/tests/runtests.py", line 1191, in fn
    self.fail_unless_eval(xmlfile, eval_string)
  File "/tmp/feedparser/tests/runtests.py", line 177, in fail_unless_eval
    raise self.failureException(failure)
AssertionError: not eval(b"bozo and encoding == 'EUC-KR'") 
WITH env({'bozo': True,
 'bozo_exception': CharacterEncodingOverride('document declared as utf-8, but parsed as UHC'),
 'content-type': '',
 'encoding': 'UHC',
 'entries': [{'summary': 'TypeKey 시스템이 UTF-8로 돌아가는데, 거기서 한글로 된 닉네임을 정할 경우에, '
                         'EUC-KR로 된 무버블타입 블록에선 리다이렉트되어 전송되어오는 닉네임이 UTF라 당연히 '
                         '깨어져 나타난다. 실제 블록 등에서 사용하는 필명 내지는 닉네임은 한글로 사용하는 많은 분들도 '
                         '타입키에서의 닉네임은 이런 문제때문에 울며겨자먹기로 영어로 짓고 있다....',
              'summary_detail': {'base': '',
                                 'language': None,
                                 'type': 'text/html',
                                 'value': 'TypeKey 시스템이 UTF-8로 돌아가는데, 거기서 한글로 '
                                          '된 닉네임을 정할 경우에, EUC-KR로 된 무버블타입 블록에선 '
                                          '리다이렉트되어 전송되어오는 닉네임이 UTF라 당연히 깨어져 '
                                          '나타난다. 실제 블록 등에서 사용하는 필명 내지는 닉네임은 '
                                          '한글로 사용하는 많은 분들도 타입키에서의 닉네임은 이런 '
                                          '문제때문에 울며겨자먹기로 영어로 짓고 있다....'},
              'title': 'EUC-KR 에서 TypeKey 한글닉네임 표시하기',
              'title_detail': {'base': '',
                               'language': None,
                               'type': 'text/plain',
                               'value': 'EUC-KR 에서 TypeKey 한글닉네임 표시하기'}}],
 'feed': {},
 'headers': {},
 'namespaces': {},
 'version': 'rss'})

======================================================================
FAIL: test_001749 (__main__.TestCase)
./tests/illformed/chardet/big5.xml: Big5 with no encoding information
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/feedparser/tests/runtests.py", line 1191, in fn
    self.fail_unless_eval(xmlfile, eval_string)
  File "/tmp/feedparser/tests/runtests.py", line 177, in fail_unless_eval
    raise self.failureException(failure)
AssertionError: not eval(b"bozo and encoding == 'Big5'") 
WITH env({'bozo': True,
 'bozo_exception': CharacterEncodingOverride('document declared as utf-8, but parsed as BIG5'),
 'content-type': '',
 'encoding': 'BIG5',
 'entries': [],
 'feed': {'title': '我希望??很容易?其翻?成中文,并有助于改??件。 感?您??本文。',
          'title_detail': {'base': '',
                           'language': None,
                           'type': 'text/plain',
                           'value': '我希望??很容易?其翻?成中文,并有助于改??件。 感?您??本文。'}},
 'headers': {},
 'namespaces': {'': 'http://www.w3.org/2005/Atom'},
 'version': 'atom10'})

----------------------------------------------------------------------
Ran 4354 tests in 4.892s

FAILED (failures=4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant