Page 1 of 1

RSS in UTF-8 format fails parsing.

PostPosted: Sat Feb 04, 2012 9:50 pm
by jhb50
All my created RSS to date have been ANSI encoded with this heading and they work fine:
<?xml version="1.0" encoding="UTF-8" ?>
Today I created an RSS containing Chinese characters and saved it in UTF-8 encoding but when added to Serviio the log shows parsing fails:
2012-02-04 16:12:25,120 DEBUG [OnlineLibraryManager] Resource https://sites.google.com/site/serviiors ... tional.rss not in cache yet, loading it
2012-02-04 16:12:25,120 DEBUG [FeedParser] Parsing feed 'https://sites.google.com/site/serviiorss/LiveFeeds_China_National.rss'
2012-02-04 16:12:26,244 DEBUG [FeedParser] Unexpected error during url extractor plugin matching (LiveFeeds): Content is not allowed in prolog.
2012-02-04 16:12:26,245 DEBUG [FeedParser] Skipping feed item '综合频道 CCTV-1 General Channel' because it's not of type VIDEO
2


Java throws this exception if there are characters before the <?xml and sure enough, utf-8 files have a leading EF BB BF hex signature.

How do I get Serviio to accept an RSS with unicode characters(Chinese)?

Re: RSS in UTF-8 format fails parsing.

PostPosted: Sat Feb 04, 2012 11:14 pm
by zip
this looks more like a bug in the plugin (I recall you're parsing the feed again in the matches() method). Looks like Serviio actually parses it but cannot invoke the matches() method.

Re: RSS in UTF-8 format fails parsing.

PostPosted: Sun Feb 05, 2012 6:05 am
by jhb50
You are right on! Stripped the leading stuff of the Get before parsing and it works fine. Looks to me like any one parsing an RSS in UTF-8 would have to do this. I guess you are already doing it yourself in the default RSS handler.

Strange thing though... a number of the Chinese characters just show as boxes on the TV even thought they display correctly on the PC.

Re: RSS in UTF-8 format fails parsing.

PostPosted: Sun Feb 05, 2012 11:23 am
by zip
Utf8 is normally ok. It might be the BOM of the feed.

Re: RSS in UTF-8 format fails parsing.

PostPosted: Sun Feb 05, 2012 2:17 pm
by jhb50
I guess it depends on how the RSS is generated. If I create one with "Notepad" and save in any Unicode format, the file will have a BOM header.

Re: RSS in UTF-8 format fails parsing.

PostPosted: Mon Feb 06, 2012 2:01 pm
by ylee
hi, i'm new to the forums. but i'm having exactly the same trouble getting a stream in an rss to work.

I've build my own rss feed and hosted it locally (just copied an working item out of the jhb50 list, like Espn America)
I'll get an

  Code:
2012-02-06 14:56:18,519 DEBUG [FeedParser] Parsing feed 'http://localhost/rss/tv_germany.rss'
2012-02-06 14:56:18,523 DEBUG [FeedParser] Skipping feed item 'ESPN America HD' because it's not of type VIDEO


i've tried different ways to save the file, but won't work. any clues?

/EDIT:
sorry my fault, seems like i didn't see that the rss feed filename has a special naming convention?
works now.