View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003040 | MMW v4 | Playlist / Search | public | 2007-05-09 12:06 | 2022-04-29 01:31 |
Reporter | jiri | Assigned To | |||
Priority | immediate | Severity | minor | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Product Version | 3.1 | ||||
Fixed in Version | 3.1 | ||||
Summary | 0003040: Implement proper full-text search | ||||
Description | Currently users can use our search bar to quickly find data in their collections. However, there are several problems: - The search can be rather slow for large collections because it need to pretty much sequentially process the whole database. - The syntax of search could be enhanced, e.g. to support 'xxx NOT yyy' We could implement our own full-text search engine or use the one currently developed for SQLite (in beta stage), but both approaches involve quite a lot of work and have some problems. Alternative approach would be to use Desktop Search engines already installed on user machines, like Windows Desktop Search (WDS) that's default on Vista and can be downloaded to XPs as well. We could write a handler that would supply data to WDS engine (i.e. index all audio files supported by MM) and then we could use the engine to implement our Quick search. | ||||
Additional Information | Some technical links related to WDS: Main MSDN page: http://msdn2.microsoft.com/en-us/library/aa965362.aspx Adding data to index: http://msdn2.microsoft.com/en-us/library/bb231248.aspx Querying the index: http://msdn2.microsoft.com/en-us/library/bb266517.aspx SDK: http://www.microsoft.com/downloads/details.aspx?familyid=645300ae-5e7a-4ce7-95f0-49793f8f76e8&displaylang=en Searchable file types: http://www.microsoft.com/windows/desktopsearch/technical/searchtype.mspx | ||||
Tags | todoc-help | ||||
Fixed in build | 1198 | ||||
related to | 0000885 | new | MMW v4 | Use 'madman' fuzzy search for getting track details from amazon | |
parent of | 0005134 | closed | Ludek | MMW v4 | Searching for special chars like '.' no longer works with new full-text search engine |
parent of | 0005280 | closed | Ludek | MMW v4 | Boolean Search: Add support for 'NOT' (in addition to '-') |
parent of | 0005104 | closed | Ludek | MMW v4 | Search bar: Boolean 'OR' should be localizable |
parent of | 0005133 | closed | jiri | MMW v4 | Improved context help for Search bar |
parent of | 0005186 | closed | Ludek | MMW v4 | Quick Search with "" (quoted strings) doesn't work as expected |
parent of | 0005004 | closed | Ludek | MMW v4 | Full-text-search: characters such as '.' should be removed from the query |
parent of | 0005179 | closed | Ludek | MMW v4 | Quick search for : fails to respect field selection |
parent of | 0005147 | closed | Ludek | MMW v4 | Search for exact text containing special characters doesn't always work |
parent of | 0005565 | closed | Ludek | MMW 5 | Full-text search only finds prefixes (gives poor results in Oriental languages) |
has duplicate | 0001337 | closed | rusty | MMW v4 | Google style fuzzy searching in the toolbar |
has duplicate | 0004503 | closed | Ludek | MMW v4 | Search for exact word is no longer possible via search bar (if the word is not quoted) |
related to | 0003464 | feedback | jiri | MMW v4 | Make Search bar options more clear |
related to | 0002570 | closed | petr | MMW v4 | Library node should show all tracks in Library |
related to | 0004395 | closed | Ludek | MMW v4 | Configurability re. which fields the searchbar searches |
related to | 0003754 | closed | Ludek | MMW v4 | Search bar: Configureable search mode on a per root node basis |
related to | 0003223 | feedback | rusty | MMW v4 | search box actions should be available in hot key |
related to | 0004088 | closed | Ludek | MMW v4 | Search Results node is filled with non-sensical searches --> Optimize timer |
related to | 0003760 | closed | Ludek | MMW v4 | Searchbar returns different results for 'Search selected' vs 'entire library' |
related to | 0004147 | closed | MMW v4 | Unclear search mechanism for multiple-value tags: ';' can represent OR or AND | |
related to | 0005639 | closed | Ludek | MMW v4 | Full text search yields poor results when searching for Drive paths |
|
Assigning to Rusty for a review - will discuss over IM. |
|
I'm all for improved searching. As to which approach to use, I'd say that we can leave this as a technical decision, assuming that: -search would work as is in the absence of a search engine -any of the various underlying search engines would result in roughly similar performance The only other approach that might be worth considering is to use an open source desktop search engine instead of an SQL one. see: http://www.searchtools.com/tools/tools-opensource.html Other key requirements: -the tool should support 'fuzzy' searches (e.g. if the user replaces an accented e with a plain e). -Lastly, can this be implemented as a plugin so that it can be done post release (e.g. script another entry to be added to the quicksearch bar)? -------------- Jiri's response: I was thinking about WDS because it would let users also by using WDS get to results from MM (i.e. by searching for ABBA in WDS there would not only be listed HTML pages, but also MP3, OGG, .. tracks from ABBA). I'm currently raising this only as a thing for discussion, there are many unclear things, e.g. - What to do when WDS isn't installed, some supplemantary search (as currently is?) would have to be present - WDS doesn't currently seem to properly handle accents, at least in my test it didn't find what it should in Czech language. Maybe Google Desktop Search could be checked out and even distributed with MM? --------------------------- The problem with bundling any external desktop search engine is that the user may already have another one running. However, it's unrealistic to create plugins for each engine unless we are able to modify the framework so that users could create such plugins. That leaves us with 2 real options: a) Bundle GDS or WDS (pros for GDS is that it involves revenue, cons are that it isn't installed by default on any systems, and Google may include competitive products in future Google Packs). b) Use an internal indexer I suspect that option b) might be the best approach from a usability perspective, but that a) with GDS could also be a decent approach. |
|
Deferred past 3.0. |
|
I pretty much agree with the spec at: http://www.jirihajek.net/MMwiki/index.php/Full-text_search The only additions I would suggest are: 1) It should be possible to search by other fields in addition to those described in the spec. e.g. in addition to artist:word(s) and year: the following should be supported: Album:word(s) genre: rating:x..y grouping: composer: conductor: 2) There should be a shorthand means of supporting these (e.g.): ar: y: al: g: r: g: cp: cd: I'm not sure of the best way to define these shortcuts though... 3) There should be a way of supporting multiple attributes for Artists, Genres, Composers. Eg artist:U2 King should return tracks for which there are multiple artists including U2 AND B.B. King. (this was probably already planned, I just want to make certain). 4) Is there a need to support a 'Does not contain' operator? Note: I'm also making several issues related to this bug since they should probably be modified at the same time. |
|
1-3) is now added to the spec. 4) I'm sure that users will find some good uses for this feature, I can imagine e.g. 'beatle -beatles', or more specifically 'beatle -artist:beatles'. Or 'rock -hard'. |
|
using '-' for NOT might be a problem since many tracks have '-' somewhere within. Any ideas how to get around this problem? e.g. Iron Maiden-The Best of could be a problem. Or perhaps we just agree that if the format is " -word" that it's ok to use '-' as NOT. |
|
'-' will mean NOT only if it directly precedes a word (or phrase), i.e.: -word -"a phrase" Other usage of '-' doesn't make much sense in the full-text search, since it's one of delimiters (like '.', ',', etc.) that are ignored for search and indexing purposes, for example search string 'AC/DC' will find both 'AC DC' and 'AC-DC'. |
|
I implemented core of this feature, i.e. SQLite support, full-text index maintenance, etc. It's all in SVN branch https://svn1.cvsdude.com/jirik/MediaMonkey/Branches/Full-text Search, where we should finish the feature before it's merged to the main branch. An updated SQLite.dll is needed, I uploaded it to FTP, folder SQLite-FullText. Some details of the implementation are described in wiki: http://www.jirihajek.net/MMwiki/index.php/Full-text_search |
|
I implemented all mentioned in the wiki's article http://www.jirihajek.net/MMwiki/index.php/Full-text_search It is all in the SVN branch ../Branches/Full-text Search. I added also acceptance for localized strings (e.g. in czech 'interpret:abba' = 'artist:abba'). I also added several DUnit regression tests. As has been already mentioned we should review some names of fields, e.g. I would replace current 'Track#:1..3' by 'Track:1..3', 'Disc#:1..3' by 'Disc:1..3', 'OrigDate:2005..' by 'OrigYear:2005..', etc. Fields like 'Album Artist' needs to be written as e.g.'AlbumArtist:Bjork' The functionality can be tested/reviewed by using the MM_FTS.exe and SQLite3.dll uploaded to our FTP, folder SQLite-FullText. |
|
Maybe we could ignore non-alphanumeric characters in field names, i.e. no matter whether user enters 'Track' or 'Track#', the result would always be the same. |
|
I think that for the purpose of Searching, all fields that are entered in the searchbar should be entered exactly as they appear. e.g. Album Artist:Abba should be valid--otherwise it's unclear to users what they have to enter. BUT, if we must create 'aliases' for the fields, then we might as well create shorter Aliases as well. e.g. Album Artist:= AlbumArtist: or alar: Though as Jiri pointed out, it may be too complex to do this in a localizable fashion. |
|
Ok, I enhanced full-text searching so that both 'track:5' and 'track#:5' takes effect. Re: searching for 'Album Artist:Abba' the problem is that space has meaning of and operator and according to FTS3 engine this should mean: any text field contains word 'Album' and artist field contains word 'Abba'. But I see your point of view and I agree that this should be overloaded in such a cases. i.e. I fixed it and in case of 'Album Artist:Abba' it means albumartist field contains word 'Abba' now. The functionality can be tested/reviewed by using the MM_FTS.exe and SQLite3.dll uploaded to our FTP, folder SQLite-FullText. |
|
I tried to do a quick test of this however I found that: 1) the FTS build doesn't permit MM 3.0.4 libraries to be scanned or imported 2) the FTS build regularly experiences SQL errors when attempting to scan a fresh DB |
|
Sorry for that, I probably uploaded bad SQLite3.dll that caused this. I also forgot to tell you that you should first backup your DB :-( Nevertheless I have uploaded the right files. I tested the files and works fine: 1. Installed 1185 2. Created DB (added some tracks) 3. Downloaded MM_FTS.exe and SQLite3.dll to the same directory. 4. Run MM_FTS.exe -> DB is updated, works fine 5. Possibly add the others tracks to the DB SO re-download the files and should work fine for you too. |
|
It's working now. There are a couple of search results that I'm not sure about: a) Search for: dargent _doesn't_ find d'argent b)i) Search for: hurricane _doesn't find_ 'Hurricane'. Shouldn't it? ii) Search for: hurr _doesn't find_ Hurricane. Shouldn't it? Also, the search results don't update dynamically as the user is typing. c) Search for: come with me _finds_ come away with me. I think this is intentional--just want to confirm. d) Search for: rating:3.5..5 doesn't work. e) Adding - for NOT gets rid of all current search results. It should only take effect after the user types something after the - |
|
RE: a) Search for: dargent _doesn't_ find d'argent -> I think that this works fine, the ' is not a delimiter like - or / and the word "dargent" differs from "d'argent". RE: b)i) Search for: hurricane _doesn't find_ 'Hurricane'. Shouldn't it? -> I cannot reproduce, works fine for me, searching is not case sensitive in my case, could you send me the file or find any steps to reproduce? RE: b)ii) Search for: hurr _doesn't find_ Hurricane. Shouldn't it? Also, the search results don't update dynamically as the user is typing. -> This is due to spec, see http://www.jirihajek.net/MMwiki/index.php/Full-text_search and the 'Searching of full words' paragraph, it is all about whether MM should automatically add '*'. c) Search for: come with me _finds_ come away with me. I think this is intentional--just want to confirm. -> Yes, now it works really like full-text search, i.e. searching for 'come away with me' means that e.g. words 'come', 'away' are in title, word 'with' is in path and word 'me' is in album field. Is this clear? d) Search for: rating:3.5..5 doesn't work. -> fixed, new MM_FTS.exe uploaded e) Adding - for NOT gets rid of all current search results. It should only take effect after the user types something after the - -> fixed, new MM_FTS.exe uploaded |
|
All Changes made in SVN Revision 6440 made that MM could not start on my PC by reporting SQLite3.dll Error. I see that Petr has uploaded My settings to FTP, if you need whole Library I could Upload it also (18MB compressed). |
|
The related DB update issue has been fixed in 3.0.1.1192. Another issue (not related to this) is tracked in 0004926. RE: b)ii) Search for: hurr _doesn't find_ Hurricane As Petr suggested, we should rather add an config entry - checkbox whether to search only full words (like google) or not. Otherwise I would expect that users will expect that searching for 'an' will find 'another' as used to be so far. |
|
On tools > options > Search, we can add the following to the top: [ ] Search whole words only Tooltip: Causes search for 'an' to not show show tracks containing 'another' Note: do you propose that this works for all searches? or only for the searchbar? |
|
I'm not sure whether this really needs to be an option, users will need it to act differently in different cases anyway. I'd propose to wait for more feedback on the feature. If you think we really have to add it, then ok, but it should be applied to Search bar only. |
|
Jiri, the reason why I think it IS required right away is that currently (1195) MM restricts searches to whole words (unlike previous versions of MM). I personally don't like this behaviour, and expect that a lot of users will find it annoying as well. |
|
Ok, we certainly can add it (even though I'd like to prevent too many settings to Options dialog). Note however, that due to design of FTS in SQLite, it only searches words _starting_ with the entered characters, there currently isn't a way how to use full-text search for searching a term in the middle of a word. |
|
i think that's fine. Note: I think that 'Search whole words only' should be disabled by default, unless we find that it doesn't work as well as I expect. |
|
Fixed in 1198. Added the '[ ] Search whole words only' checkbox and unchecked by default. |
|
Search for contained string does not seem to work. Per previous example, "bb" to find "ABBA". Tried e.g. *bb. No luck. http://www.mediamonkey.com/forum/viewtopic.php?f=6&t=34552 |
|
See the last Jiri's note (15235). That is why it doesn't work in case of 'Entire Library' quick search however searching for '*bb' should work in 'Current selection' mode that doesn't use the SQLite's FTS3 engine. |
|
Thanks. Tested it. "bb" finds, e.g. "B.B. King" but not Abba, "*bb" finds Abba, etc. |
|
Verified 1201. |