Page 1 of 1

Faster file rescan feature

PostPosted: Sat Feb 11, 2012 6:27 pm
by galmok
I like Serviio but at one point it is really annoying me, namely the detection of new files added. When I click the force refresh button, Serviio takes 10+ minutes to find the one file I added to my library. This really should take more than 30 seconds (+10 seconds for each disk that needs to wake up). The auto-refresh every 5 minutes wont do as my disks never go to sleep then.

So, pretty please, with sugar on top, make the file rescan much faster please. :)

Re: Faster file rescan feature

PostPosted: Sat Feb 11, 2012 6:30 pm
by will
Java 7 supports notificaitons of new files so that Serviio can listen and be told when a new file is added etc, rather than having to check everything manually. So once Java 7 is stable and available for all platforms (its ready for windows now), it will be included which should then speed up library refreshes.

Assumming your not adding new files every 5 minutes, but instead want them avialable 'instantly', you can force serviio to run a check by running a command that uses its REST API. See viewtopic.php?f=17&t=1572 for more information.

Re: Faster file rescan feature

PostPosted: Sat Feb 11, 2012 6:59 pm
by galmok
Thank you for the link. But isn't it doing the same as "Force Refresh" in the Console UI?

And will the Serviio service scan for new files when it is started?

I have a file that simply will not appear in my list. I wonder if it is due to not being available on the website (the one providing the meta data). But it also isn't available in the folder view which I expect to show every file (media file) with full name (which it seems to do, except for my new file). This leads me to believe that Serviio simply is slow.

Update: It is slow. It took over 6 minutes to find the file.

Even without java 7, a faster file scan should be easy to implement. A folder/file traversal algorithm gotta be easy to make. :-P I am sure Serviio is slow for a reason, but I try to argue that this is an issue. Especially because the Status window in the UI never shows anything. It could at least show how far the scan has progressed or which directory/file is has checked/is checking right now. No information makes me frustrated as I have to guess if it is working or not. I try to monitor the process with Process Explorer and in the log file, but they are worthless as well (at default log level).

Re: Faster file rescan feature

PostPosted: Sat Feb 11, 2012 7:05 pm
by will
Yes it is, but then point is you can make it refresh when you need it automatically e.g. some dvd burning software will support running a command when it finishes.

If it doesn't apear in the folder view then something is wrong. Are you sure it finished indexing everything?

Its as fast as it can be as serviio has to check every file, the windows file system is especially slow. Try unselecting "Search for updates of currently shared files" if you only add new files rather than add and change files.

You cannot show how far through you are until you know how many things you have to scan, and you have to scan everything to find that out :)

Re: Faster file rescan feature

PostPosted: Sat Feb 11, 2012 7:12 pm
by moltra
I am running the linux version of serviio and it takes at the most 5 minutes for a new file to show up. If I add multiple files at the same time then it will take a little while longer to index all the files.

The next time you add a new file, can you attach your serviio.log file after it shows up in serviio. This will allow us to see if there is something causing to slow down. Also if you can get the system time that the copying/moving to the serviio folder is completed so we can compare the times,

Re: Faster file rescan feature

PostPosted: Sat Feb 11, 2012 7:43 pm
by zip
It depends on the number of shared files. Serviio has to traverse all the folders and check if every single file already exists in the database. It doesn't know where the file is so it is possible that the folder where the new file is is checked as the last one, thus the delay.

Re: Faster file rescan feature

PostPosted: Sat Feb 11, 2012 9:45 pm
by galmok
I have now disabled "Search for updates of currently shared files" as I normally know that I only add files. It may help.

The status could (as I wrote) write which file it was currently visiting. This is much better info then not showing anything at all. A progress bar isn't necessary. Anything that shows the user that Serviio is actually doing something. As I said, I simply didn't know if Serviio was working properly or not as I got _zero_ feedback from it.

But as the search is depending (heavily) on the number of shared files, there seems to be some room for a better algorithm to see if the file already exists in the database. Have you considered using hashing and/or making the filename (I assume you have to search for the filename) a primary key, possibly in its own table? The hashing would allow to reduce the number of filename compares a lot, but if the filename was a primary key, this would probably already happen automatically (depending on the database used). If the filename is not a primary key and you make a db search for the filename, you are forcing the database to make a full table scan to find the filename. This of course is a very expensive way of doing it. Maybe creating an index of the filename is enough? Then no code change would have to be made (I think). If all this is already done, the whole process has to be reworked: Load the stored filelist into memory, make hashing of the filenames, compare first hashes and then filenames on matching hashes. This should be fast. Possible include sorting on the hashes and make a binary search for added speed.

This is my log (I manually started the refresh and check for updates was enabled at that time):

2012-02-11 19:47:08,031 INFO [LibraryAdditionsCheckerThread] Started looking for newly added files
2012-02-11 19:47:08,032 INFO [LibraryUpdatesCheckerThread] Started looking for updates to currently shared files
2012-02-11 19:47:08,442 INFO [FeedUpdaterThread] Started looking for information about online resources
2012-02-11 19:53:03,708 INFO [LibraryAdditionsCheckerThread] Added file '<thenewfile>.mkv' (title: '<thisisthetitle>) to Library
2012-02-11 19:53:49,262 INFO [LibraryAdditionsCheckerThread] Finished looking for newly added files
2012-02-11 20:03:39,716 INFO [LibraryUpdatesCheckerThread] Finished looking for updates to currently shared files

I hope we can improve upon the existing code. This issue is kind of an annoying spot in an otherwise great software. :)

Re: Faster file rescan feature

PostPosted: Sun Feb 12, 2012 1:58 am
by zip
It looks even traversing the filesystem (in Java) takes a lot of CPU so there is actually a bit of a pause between each file (100ms or so) to keep the CPU down. It's hard to rework the whole mechanism now when Java 7 is coming soon and there are other pressing things to implement.

Nevertheless you'll see the last added file in the console together with a green dot when the search is going on, so it's not like you're not getting ANY feedback,

Re: Faster file rescan feature

PostPosted: Sun Feb 12, 2012 8:48 am
by galmok
zip wrote:It looks even traversing the filesystem (in Java) takes a lot of CPU so there is actually a bit of a pause between each file (100ms or so) to keep the CPU down. It's hard to rework the whole mechanism now when Java 7 is coming soon and there are other pressing things to implement.

Nevertheless you'll see the last added file in the console together with a green dot when the search is going on, so it's not like you're not getting ANY feedback,


Hmm it can't be 100 ms because I think I share about 25000 files but in only 1600 folders. 25000*0.1 = 2500 sec which is about 42 minutes. About 15ms per file seems more correct. This is incidently about the same time as a random seek on a harddisk. If this is the case, do you actually open each file (which would explain the seek) or perhaps cause some other access that would cause seeks? If you only use the information available in the folder structure, you only have to make 1600 seeks which meant you could make a complete scan in about 23 seconds.

The issue with the feedback is that I do not get any feedback at all until Serviio finds the new file (or no feedback except in the log file when no change is detected). As a user, this really bugs me. :-/

Re: Faster file rescan feature

PostPosted: Sun Feb 12, 2012 12:26 pm
by zip
I have to chck every file's last modified date. Not sure how it's done on tle low level OS layer via Java... will be different for each OS.

Re: Faster file rescan feature

PostPosted: Sun Feb 12, 2012 2:43 pm
by galmok
I made a quick check with google and yes, Java is inefficient when checking a files last modified date. But there are solutions that do not require platform specific implementation:

http://stackoverflow.com/questions/4305448/file-lastmodified-painfully-slow

Basically, you should query more than 1 file at a time. Reason being is that Java cannot optimise the requests itself, but by issuing more requests at the same time, the OS itself can reorder the request for higher performance.

The OP in the mentioned thread resorted to use a thread pool to make bulk queries and it made the process much faster.