Better display of search results?
Hey all,
When I conduct a search (say 'India') across all my documents,I get thrown a huge list of everything that contains any mention of the word, which is great. However, given that my collection is over a 1000 items, I want to be able to relatively quickly determine a way of figuring out which items are important and which aren't.
Is it possible to see pointers to some phrases in which the word is used (like a google search) in the results of the search? (rather than just a list of files that match)
Additionally, is there a way to display and sort by some relatively basic things - like how many times a word shows up in a document, or 'relevance' or some such?
I've looked around everywhere for a setting or a plugin to deal with the deluge of information, and I can't seem to find it, and I'm assuming there must be a solution since it's such a basic feature. I was watching my friend use Mendeley, and they had the search results highlight instances of the phrase in each document. I really hope this is possible in Zotero, since it is the most useful feature, and I really don't want to migrate to Mendeley, since it's not open source.
Thanks =)
When I conduct a search (say 'India') across all my documents,I get thrown a huge list of everything that contains any mention of the word, which is great. However, given that my collection is over a 1000 items, I want to be able to relatively quickly determine a way of figuring out which items are important and which aren't.
Is it possible to see pointers to some phrases in which the word is used (like a google search) in the results of the search? (rather than just a list of files that match)
Additionally, is there a way to display and sort by some relatively basic things - like how many times a word shows up in a document, or 'relevance' or some such?
I've looked around everywhere for a setting or a plugin to deal with the deluge of information, and I can't seem to find it, and I'm assuming there must be a solution since it's such a basic feature. I was watching my friend use Mendeley, and they had the search results highlight instances of the phrase in each document. I really hope this is possible in Zotero, since it is the most useful feature, and I really don't want to migrate to Mendeley, since it's not open source.
Thanks =)
My recommendation in the meantime would be to just construct "better" searches, which Zotero can do a lot more powerfully than Mendeley, since you can string multiple search queries of different fields under advanced search.
But since I'm pretty bad at this, I'd probably appreciate someone else doing a proper job of it =P
In any case, to approach this, you would probably want to get familiar with javascript, XUL, and SQL first, which is a good amount of reading.
But the other issue is that showing context for metadata matches would be considerably more complicated, both technically and in terms of presentation. Having snippets just for attachment content seems kind of bizarre.
In any case, definitely not as simple as it might seem. I've always assumed that, if we were to do this, the middle pane rows would have to switch to double-height, and every other row would have to show the snippet. Unfortunately I don't think we can actually have rich text (bold or highlighting) within those lines, which sort of defeats the purpose.
I guess an alternative would be to build an actual HTML search results page with clickable zotero://select links back to items in the middle pane. That would give us a huge amount of flexibility in terms of presentation, but it also feels kind of clumsy. And now that I mention it, I suppose this is exactly what Thunderbird does with filtering vs. searching (and for the exact same technical reason, presumably): the former just filters the main XUL tree, while the latter opens a new tab with snippets and other fancy stuff. For what it's worth, I essentially never use the latter.
A hit count seems of limited utility — if you're searching for a given word, it seems more important to know that it's in the title than that it appeared a few times. A ranking algorithm that weighted, say, a title highly, followed by creators, followed by number of occurrences, etc., seems more useful, but obviously more difficult to implement.
The use case I can see is a search in attachements (notes, html, pdf text). I think this was also the starting point for the discussion here (documents =? attachements):
When I conduct a search (say 'India') across all my documents,
I guess one could really do a lot on the fulltext of all attachements (proximity search, term frequency within a document, snippets in search results, data mining and natural language processing methods...). Such tools can be very handy, but maybe not all of them are in the main focus here.
I doubt that we need a ranking algorithm for the search in bibliographic metadata of our own collected literature. I should - up to some point - know what is inside my collection, and not need a "I am feeling lucky"-search.
A ranking algorithm that weighted, say, a title highly, followed by creators, followed by number of occurrences, etc., seems more useful, but obviously more difficult to implement.
Yes and you would also to quantify your preferences (e.g. title factor 10, creator factor 8, etc.). In this way you would prefer literature about Shakespeare (having "Shakespeare" in the title) over literature from Shakespeare. Is this what an user would expect? Moreover, multiple hits have to be added somehow and in a search with various terms we have to add these together somehow (and maybe boosting some fields in the end...).
I am using discovery services (e.g. Primo) quite a lot and they have a relevance ranking based on the bibliographic metadata. The ranking is sometimes good and sometimes bad. A major fallback IMO is that it is always a blackbox, i.e., you cannot explain a human why the ranking is exactly how it is. The lack of transparency together with sometimes bad ranked results, will give a bad user experience. Moreover, these discovery services are designed for millions of entries, and I doubt that we have that many in a Zotero collection.
Definitely talking about attachments here. The way I personally use Zotero is to organise content I already have, not to search for more, that was where I was coming from here.
Of course, a proper ranking algorithm should weight things accordingly, I was just noting that a hit count would be really easy to implement and would be a good stop gap plugin until these issues around how to properly aggregate results are debated and fixed and integrated into Zotero proper.
The difference for Google, though, is that if you enter multiple keywords it will try to show you the context where those keywords are close together. If that's the case for Zotero searches as well, then maybe showing single hit is ok.
As for multiple hits, if the "hits" pane is scrollable, all of the hits could be shown.
But again, I don't think we can highlight/bold individual words within the lines, which may rule out using the tree at all. This would avoid the above issues, but it of course would mean that you'd have to individually go through each match to see the results, which sort of seems to negate most advantages of having these snippets in the first place.
In response to aurimas, there are essentially 2 goals: (a) on a quick glance to determine which attachments are important and (b) to understand more fully what any particular attachment contains. I think goal (a) should have priority, given that it's relatively easy with other existing programs to do (b) once you have located an important file, whereas (a) is not, and would make sense with zotero's role as an information manager.
Turning the middle pane into a separate search results page with rich text formatting would probably be the best solution as a user.
Failing that, multiple lines with one snippet per line is still definitely a good idea, even WITHOUT highlighting/bold. Could probably capitalise search terms as a workaround...
Also agreed that right hand side panel (i'm assuming this is what you're calling the information panel?) would be good to display all/more hit results once you click on the item if that's desirable and not too resource intensive/development time intensive. I'm essentially keen for fastest useful implementation =P
In my imagination the search results page would just replace the middle pane and be activated when a search occured. You'd 'get out' of it by clicking an x in the search bar.
Clicking a tag would be a filter on the search results (I imagine that would require a recreation of the search results page unless it was created with an ability to selectively display items based on tags)
I reckon you could probably have a 'sort by' drop down somewhere, possibly as a context menu (right click) or on the top. Otherwise, people would complain about the loss of functionality.
I'm curious to understand, Dan, what do you imagine is going to be jarring?
I work with scientific articles and this feature would be really useful. I know a very little bit of coding but I feel this project is too much to face it on my own. So far, it looks like implementing something separate as Thunderbird does seems the best choice.
Has anybody managed to get somewhere with this? Honestly I wouldn't even know where to start looking at.
I hope we can all keep making this open source alternatives more and more useful and reliable.
In the intermin, my solution is to just install Mendeley, have it 'watch' my Zotero's PDF folder, and do this kind of quick search in Mendeley. For all other things than 'quick search', Zotero still is better though.
Mendeley's zotero integration functionality (or import) sucks BTW. Sub-folders are not taken into account when you use zotero integration, and with all import functionalities, the PDF's are not imported (or wrongly referenced). So don't bother trying those and just use 'watch folder' - it will save you some time.
Nevertheless, a highlighting function would be a nice addition to Zotero imho. So I was wondering if there was any progress on this.