Ticket #1904 (assigned defect)

Opened 6 months ago

Last modified 6 months ago

CPS RSS-1.0.0 | UnicodeEncodeError: 'latin-1' codec can't encode characters ...

Reported by: tracguest Assigned to: madarche (accepted)
Priority: P2 Milestone: CPS 3.4.7
Component: CPSRSS Version: TRUNK
Severity: critical Keywords:
Cc: r.mahoney@iconz.co.nz

Description

Environment:

CPS_RSS-1.0.0

CPS-3.4.6

feedparser.py 3.2 (default) & 4.1 (latest)

SunOS proliant 5.10 Generic_118855-19 i86pc i386 i86pc (Solaris 10) & SunOS z12522AA 5.11 snv_62 i86pc i386 i86pc (OpenSolaris?)

Issue:

Refreshing the following Japanese feed in the RSS Tool:

http://blogs.dion.ne.jp/sanskrit/index.rdf

Results in:

UnicodeEncodeError?: 'latin-1' codec can't encode characters ...

[see the full error log attached as cps-latin-error.log]

Subsequently the RSS Tool becomes completely unusable. (The only way I could manage to get the RSS Tool running again was to reinstall the whole CPS site from backup.)

The problematic feed should render as follows (using SPIP):

Indica et Buddhica - Tabulae :: Kataoka, Kei http://tabulae.indica-et-buddhica.org/rubrique.php3?id_rubrique=261

I'm not sure if this issue with Japanese characters is related to the incorrect rendering of Latin diacritics with the following feed -- many commonly used in Romanised Sanskrit transliteration, e.g., a, u and i macron, S acute, n under-dot &c.:

http://www.informaworld.com/ampp/rss~content=t713405669

Incorrect (using CPS RSS):

Indica et Buddhica - Recently Published issues of Asian Philosophy http://indica-et-buddhica.org/sections/tabulae/periodica/a/asian-philosophy/asp-recently-published

Correct (using SPIP):

Indica et Buddhica - Tabulae :: Asian Philosophy - Recently Published http://tabulae.indica-et-buddhica.org/rubrique.php3?id_rubrique=238

I'd be very happy to receive any thoughts on how these issues might be resolved.

Kind regards,

Richard MAHONEY

-- Richard MAHONEY | internet: http://indica-et-buddhica.org/

Attachments

cps-latin-error.log (4.9 kB) - added by tracguest on 04/21/08 11:02:01.
CPS RSS error log

Change History

04/21/08 11:02:01 changed by tracguest

  • attachment cps-latin-error.log added.

CPS RSS error log

(follow-up: ↓ 3 ) 04/21/08 16:18:44 changed by madarche

  • owner changed from trac to madarche.
  • status changed from new to assigned.

I confirm the reproducibility of the reported bug.

Is this bug a regression? Was this bug happening on your portal before you switched to CPS 3.4.6 or is it simply the first time you have tried to use Japanese feeds in CPS?

04/21/08 16:25:41 changed by madarche

The problem doesn't come from feedparser 3.2 (default). The following command line works fine without any error:

$ python2.4 feedparser.py http://blogs.dion.ne.jp/sanskrit/index.rdf

(in reply to: ↑ 1 ) 04/22/08 02:10:27 changed by tracguest

Replying to madarche:

I confirm the reproducibility of the reported bug. Is this bug a regression? Was this bug happening on your portal before you switched to CPS 3.4.6 or is it simply the first time you have tried to use Japanese feeds in CPS?

I've only tried Japanese feeds with 3.4.6. Unfortunately neither my test or production servers still hold an instance of 3.4.5.

The incorrect rendering of Latin diacriticals did -- if I recall correctly -- occur with the previous version of CPS RSS under 3.4.5.

-- Richard MAHONEY

04/24/08 11:39:49 changed by madarche

  • priority changed from P1 to P2.