Better display of search results?

ashwinthomas · April 17, 2014

Hey all,

When I conduct a search (say 'India') across all my documents,I get thrown a huge list of everything that contains any mention of the word, which is great. However, given that my collection is over a 1000 items, I want to be able to relatively quickly determine a way of figuring out which items are important and which aren't.

Is it possible to see pointers to some phrases in which the word is used (like a google search) in the results of the search? (rather than just a list of files that match)

Additionally, is there a way to display and sort by some relatively basic things - like how many times a word shows up in a document, or 'relevance' or some such?

I've looked around everywhere for a setting or a plugin to deal with the deluge of information, and I can't seem to find it, and I'm assuming there must be a solution since it's such a basic feature. I was watching my friend use Mendeley, and they had the search results highlight instances of the phrase in each document. I really hope this is possible in Zotero, since it is the most useful feature, and I really don't want to migrate to Mendeley, since it's not open source.

Thanks =)

aurimas · April 17, 2014

Unfortunately, no, none of that is available in Zotero. Though I don't think anyone would object if this was implemented.

adamsmith · April 17, 2014

not just that, Dan has mentioned that this has been planned from the beginning, so it'd certainly be welcome, but I'm not aware of anyone working on implementation, so without a patch probably not going to happen super soon.

My recommendation in the meantime would be to just construct "better" searches, which Zotero can do a lot more powerfully than Mendeley, since you can string multiple search queries of different fields under advanced search.

ashwinthomas · April 17, 2014

Isn't it strange that we have all sorts of people working on something like visual mapping of data, but none on something so simple and powerful as this? I think even just a counter of how many times a term appears in a document would add so much and be reasonably easy to code.

ashwinthomas · April 17, 2014

Actually, speaking of, do you think someone like me, with some experience in coding macros in excel visual basic and easy c programs could hack a solution together at least for the search count? I've never written anything for something like Zotero. Where would I go to learn?

But since I'm pretty bad at this, I'd probably appreciate someone else doing a proper job of it =P

aurimas · April 17, 2014

This may not be too difficult to add, but it's not exactly clear where this data is supposed to be displayed (not so much the hit counts, but the hit context). I think displaying keywords and surrounding text would go well with the "preview" pane proposed here. Adding a dynamic column that appears and disappears for different views (i.e. browsing vs searching) may be less trivial given how Zotero functions currently (though I haven't looked too carefully). Also, what about hits within metadata? Does that get highlighted as well (not sure how we would implement that one)?

In any case, to approach this, you would probably want to get familiar with javascript, XUL, and SQL first, which is a good amount of reading.

dstillman · April 18, 2014

Also, what about hits within metadata?

Right, I've said this before, but the search system actually has always returned rudimentary snippets for attachment content searches — they're just not displayed anywhere, and the main problem has always been that we just don't have a good place to show them.

But the other issue is that showing context for metadata matches would be considerably more complicated, both technically and in terms of presentation. Having snippets just for attachment content seems kind of bizarre.

In any case, definitely not as simple as it might seem.

Adding a dynamic column that appears and disappears for different views (i.e. browsing vs searching)

I've always assumed that, if we were to do this, the middle pane rows would have to switch to double-height, and every other row would have to show the snippet. Unfortunately I don't think we can actually have rich text (bold or highlighting) within those lines, which sort of defeats the purpose.

I guess an alternative would be to build an actual HTML search results page with clickable zotero://select links back to items in the middle pane. That would give us a huge amount of flexibility in terms of presentation, but it also feels kind of clumsy. And now that I mention it, I suppose this is exactly what Thunderbird does with filtering vs. searching (and for the exact same technical reason, presumably): the former just filters the main XUL tree, while the latter opens a new tab with snippets and other fancy stuff. For what it's worth, I essentially never use the latter.

A hit count seems of limited utility — if you're searching for a given word, it seems more important to know that it's in the title than that it appeared a few times. A ranking algorithm that weighted, say, a title highly, followed by creators, followed by number of occurrences, etc., seems more useful, but obviously more difficult to implement.

zuphilip · April 18, 2014

I am not sure if these functions are useful for a search in the bibliographic metadata. The entries in Zotero represents normally literature which I already know or have found somewhere and want to cite in the end. Thus, if I remember an author's name or some words in the title, the normal search is fine to find the corresponding entry in Zotero.

The use case I can see is a search in attachements (notes, html, pdf text). I think this was also the starting point for the discussion here (documents =? attachements):

When I conduct a search (say 'India') across all my documents,

I guess one could really do a lot on the fulltext of all attachements (proximity search, term frequency within a document, snippets in search results, data mining and natural language processing methods...). Such tools can be very handy, but maybe not all of them are in the main focus here.

I doubt that we need a ranking algorithm for the search in bibliographic metadata of our own collected literature. I should - up to some point - know what is inside my collection, and not need a "I am feeling lucky"-search.

 A ranking algorithm that weighted, say, a title highly, followed by creators, followed by number of occurrences, etc., seems more useful, but obviously more difficult to implement.

Yes and you would also to quantify your preferences (e.g. title factor 10, creator factor 8, etc.). In this way you would prefer literature about Shakespeare (having "Shakespeare" in the title) over literature from Shakespeare. Is this what an user would expect? Moreover, multiple hits have to be added somehow and in a search with various terms we have to add these together somehow (and maybe boosting some fields in the end...).

I am using discovery services (e.g. Primo) quite a lot and they have a relevance ranking based on the bibliographic metadata. The ranking is sometimes good and sometimes bad. A major fallback IMO is that it is always a blackbox, i.e., you cannot explain a human why the ranking is exactly how it is. The lack of transparency together with sometimes bad ranked results, will give a bad user experience. Moreover, these discovery services are designed for millions of entries, and I doubt that we have that many in a Zotero collection.

ashwinthomas · April 18, 2014

In terms of display space, Mendeley does it by putting two/three lines underneath each item (I'm trying to get my friend to send me a screenshot). It seems pretty elegant.

Definitely talking about attachments here. The way I personally use Zotero is to organise content I already have, not to search for more, that was where I was coming from here.

Of course, a proper ranking algorithm should weight things accordingly, I was just noting that a hit count would be really easy to implement and would be a good stop gap plugin until these issues around how to properly aggregate results are debated and fixed and integrated into Zotero proper.

aurimas · April 18, 2014

In terms of display space, Mendeley does it by putting two/three lines underneath each item (I'm trying to get my friend to send me a screenshot). It seems pretty elegant.

That's pretty limited though. You could only show a single hit. What if there are 10 hits within a document?

adamsmith · April 18, 2014

That's pretty limited though. You could only show a single hit. What if there are 10 hits within a document?

I think that's fine - I like what Mendeley does. As noted above, it's what google does, too and I think it works nicely. I think showing every hit (or so) is massive overkill.

aurimas · April 18, 2014

I think showing every hit (or so) is massive overkill.

I would disagree. I think showing context for the first hit would be almost as useless as not showing any context at all. Odds of the first hit being what I'm looking for are not that high, so it would be just as good as telling me that the paper has a hit.

The difference for Google, though, is that if you enter multiple keywords it will try to show you the context where those keywords are close together. If that's the case for Zotero searches as well, then maybe showing single hit is ok.

bwiernik · April 18, 2014

What about showing the hits in the information panel? For items, it is relatively easy to see where in the metadata the hit is coming from. For attachments, essentially no information is shown other than that the attachment name is in solid text in the center pane. At present, the information panel for attachments is mostly empty unless it has notes in the notes entered. It would be efficient, I think, to cut the note space in half when displaying search results and show a pane with the context snippets that the search already finds.

As for multiple hits, if the "hits" pane is scrollable, all of the hits could be shown.

dstillman · April 18, 2014

You could only show a single hit. What if there are 10 hits within a document?

I'd say you could just show multiple snippets, with ellipses in between, but unfortunately I don't think can do auto-flow between multiple lines (e.g., and just have as many matches as will fit in one or two lines). The alternative would be to just have multiple lines with one snippet per line, with some maximum number of lines.

But again, I don't think we can highlight/bold individual words within the lines, which may rule out using the tree at all.

What about showing the hits in the information panel?

This would avoid the above issues, but it of course would mean that you'd have to individually go through each match to see the results, which sort of seems to negate most advantages of having these snippets in the first place.

aurimas · April 18, 2014

I guess an alternative would be to build an actual HTML search results page with clickable zotero://select links back to items in the middle pane. That would give us a huge amount of flexibility in terms of presentation, but it also feels kind of clumsy. And now that I mention it, I suppose this is exactly what Thunderbird does with filtering vs. searching (and for the exact same technical reason, presumably): the former just filters the main XUL tree, while the latter opens a new tab with snippets and other fancy stuff. For what it's worth, I essentially never use the latter.

This would be embedded where the current item tree is, right?

dstillman · April 18, 2014

Oh, I was thinking a separate window as in Thunderbird, but the middle pane might be less awkward. (Part of the reason I never use the TB search is that it takes you completely out of the context you're in.) I'm not sure how you'd switch between browse/filter mode and search results mode — in TB it's a completely separate search box, which is really clumsy.

aurimas · April 18, 2014

Looks like https://bugzilla.mozilla.org/show_bug.cgi?id=441414 would be exactly what we need. Too bad the project has completely stalled (has a bounty though :-) ).

ashwinthomas · April 19, 2014

Here's a screenshot of Mendeley's implementation for people who haven't seen it - http://i.imgur.com/JlSQhxD.jpg Mendeley highlights hits in key metadata and text. (only displaying non title metadata if there's a hit).

In response to aurimas, there are essentially 2 goals: (a) on a quick glance to determine which attachments are important and (b) to understand more fully what any particular attachment contains. I think goal (a) should have priority, given that it's relatively easy with other existing programs to do (b) once you have located an important file, whereas (a) is not, and would make sense with zotero's role as an information manager.

Turning the middle pane into a separate search results page with rich text formatting would probably be the best solution as a user.

Failing that, multiple lines with one snippet per line is still definitely a good idea, even WITHOUT highlighting/bold. Could probably capitalise search terms as a workaround...

Also agreed that right hand side panel (i'm assuming this is what you're calling the information panel?) would be good to display all/more hit results once you click on the item if that's desirable and not too resource intensive/development time intensive. I'm essentially keen for fastest useful implementation =P

zuphilip · April 19, 2014

Here is a screenshot of a tool which could fall into (b), also it seems to work only on text files not pdfs: http://www.antlab.sci.waseda.ac.jp/antconc_screenshots.html

dstillman · April 19, 2014

Could probably capitalise search terms as a workaround...

Ugh. That's a good idea, but I think it's just too gross.

dstillman · April 19, 2014

Before deciding to go with a search results page, we'd have to figure exactly how you'd get into and out of it, how it would interact with the right-hand pane, what would happen if you clicked on a tag in the tag selector, what exactly you'd click on in the search results and what exactly would happen when you did, etc. I'm concerned it's just going to look too weird and be too jarring to go back and forth between that and the items list.

ashwinthomas · April 25, 2014

Semi-returned from easter holidays and adding to this again.

In my imagination the search results page would just replace the middle pane and be activated when a search occured. You'd 'get out' of it by clicking an x in the search bar.

Clicking a tag would be a filter on the search results (I imagine that would require a recreation of the search results page unless it was created with an ability to selectively display items based on tags)

I reckon you could probably have a 'sort by' drop down somewhere, possibly as a context menu (right click) or on the top. Otherwise, people would complain about the loss of functionality.

I'm curious to understand, Dan, what do you imagine is going to be jarring?

dstillman · April 25, 2014

That doesn't work. The items list is how you interact with items in Zotero — to perform any operation on them — so you need to retain the ability to filter it the way you can now using the search bar and tag selector. So if a results view existed, it would need to be brought up explicitly. That's what I mean by switching back and forth.

thisisjoe · July 7, 2014

I totally want this too!

cjpoor · January 7, 2016

Highlighting of hits, at least initially in Notes and Info, would be very useful. Perhaps, if the search covered pdfs, the pdfs in the items view could be highlighted (when there are hits in those pdfs).

canozp · May 2, 2016

For what it's worth, I really like Thunderbird's search function (the ability to refine the results, and the fact that it opens each new search in a new tab). It would be great to see something similar in Zotero.

diegodlh · August 2, 2017

Hello all. This looks like the most complete thread I've found online about this feature request. For the past five years or so I have been constantly trying Zotero and falling back to Mendeley, but I have just taken the final decision to migrate everything to Zotero at last.
I work with scientific articles and this feature would be really useful. I know a very little bit of coding but I feel this project is too much to face it on my own. So far, it looks like implementing something separate as Thunderbird does seems the best choice.
Has anybody managed to get somewhere with this? Honestly I wouldn't even know where to start looking at.
I hope we can all keep making this open source alternatives more and more useful and reliable.

thematrix · January 31, 2018

I'd also like this feature! My friend who uses Mendeley told me about it. -- And I think it's a much nicer (and quicker) way to search literature in a fast way. Zotero's search always seems a bit klunky if you just want to do a quick search.

In the intermin, my solution is to just install Mendeley, have it 'watch' my Zotero's PDF folder, and do this kind of quick search in Mendeley. For all other things than 'quick search', Zotero still is better though.

Mendeley's zotero integration functionality (or import) sucks BTW. Sub-folders are not taken into account when you use zotero integration, and with all import functionalities, the PDF's are not imported (or wrongly referenced). So don't bother trying those and just use 'watch folder' - it will save you some time.

bwiernik · January 31, 2018

An improved search results interface is planned when Zotero migrates to a new development platform in an upcoming version. Right now, Zotero is limited in how it can display search results for technical reasons.

ilc168 · February 20, 2020

Adding to ashwinthomas' suggestions. I'd love to see a feature of being able to display more fields in advanced search results (eg. year and item type), and be able to sort by them.

CJAB · April 15, 2020

I am currently weighing the pro's and cons of Mendeley (which I just started to use) and Zotero (which I would really like to use due to it being open-source, Elsevier-independent and more versatile) for my colleagues. The current pro's of Mendeley are the watch folder functionality and the search function with highlights. A part that is not mentioned here but which also makes the Mendeley search function pretty convenient is that you can open the PDF's in Mendeley after your search and your search query is highlighted throughout the document. So you can very quickly scan if the document is what you are looking for. Which is obviously quicker than opening the pdf in an external viewer, CTRL+F, type query again, search as we would have to do now in Zotero.

Nevertheless, a highlighting function would be a nice addition to Zotero imho. So I was wondering if there was any progress on this.