huge number of automatically added tags

First I want to apologize if same topic exist before I searched and couldn't find same topic.

for some site especially Arabic, Zotero convert huge chunk of text to automatic tags. This makes tags useless or you need to delete all tags by hand at every entry added to the library which is very waste of time.
is there a solution without loosing automatic tagging by stopping the function?
  • not really. You could let us know which site, but if it's Arabic, it's almost certainly a generic translator and we can't do anything about it.

    I suspect toggling the pref to add--or not--automatic tags (which it sounds like you're aware of) requires a restart, but you could test that.
  • this site is an example
    http://o-t.tv/3Dr
    Automatic tagging is very useful and important, loosing it well be painful.
  • yes, as I thought: if you look at the page code (ctrl+u in Firefox/Chrome you'll see meta name="keywords" and then a huge list right at the top. That's where we get the automated tags from, and I don't see a way around that. So trying to figure out if you can enable/disable this on a per-import basis is the only thing I can think of.
  • I suppose it would be possible, in a plugin, to add a blacklist feature for automatic tag imports. It would need to have a database to maintain the list, and some mechanism for displaying the blacklist and removing items from it -- a significant amount of work. Whether it would be worthwhile for someone (else) to undertake would depend on how many sites there are out there that set masses of keywords like this.
  • We might also consider disabling tag imports via Embedded Metadata directly. They are way too often generic tags targeted at the website itself rather than an article in question. I feel like the mess that EM creates with tags is a big reason for people turning off automatic tags and that takes away from other translators that do a much better job of assigning correct tags.
  • I think that makes sense, but could we add a flag in the translator that could be set when it's called from other translators? If you think of some of the journal publishers where we are using the EM translator, the keywords there are quite good. Obviously it would also be possible to grab them separately, but that seems like a drag.
  • We can check if there's a parent translator from within EM. No need to explicitly set a flag. See RIS for example
  • I guess that's not very helpful seeing how large RIS is. The relevant code is here: https://github.com/zotero/translators/blob/master/RIS.js#L1341
  • thanks - I actually remember that option. I think that'd be a good solution. I'm not going to be able to do it anytime much before thanksgiving, though, so either will have to wait or someone else has to.
  • Done.

    @hamed_fcs, you can update your translators via Zotero Preferences -> General -> Update Now. Restart your browser. You should see fewer websites importing automatic tags now.
  • @aurimas I just did that the same huge auto tag list still
    no changes
  • If I am reading the source correctly (which is not guaranteed), it might be something simple.

    @aurimas I've made a note in the commit.
  • @hamed - try again, same instructions as above.
  • @adamsmith yes it's now Ok, but I don't know if all people will agree on this because I tested on tow other News and magazine sites and it couldn't retrieve auto tags.
    the sites are:
    http://www.aljazeera.net/news/arabic/2014/11/10/%D8%A5%D8%AE%D9%88%D8%A7%D9%86-%D8%B3%D9%88%D8%B1%D9%8A%D8%A7-%D9%84%D8%A7-%D9%81%D8%B1%D9%82-%D8%A8%D9%8A%D9%86-%D8%A7%D9%84%D9%86%D8%B8%D8%A7%D9%85-%D9%88%D8%AA%D9%86%D8%B8%D9%8A%D9%85-%D8%A7%D9%84%D8%AF%D9%88%D9%84%D8%A9

    and

    http://www.thedailybeast.com/articles/2014/11/05/darkness-at-noon-prayers-inside-the-islamic-police-state.print.html#

    the first one zotero didn't retrive any tags
    second zotero retrieve 4 new deferant tags
  • yeah, so what we're doing is to only import keywords from sites that have active Zotero support (i.e. someone wrote a translator). You can check that by hovering over the URL bar icon. You'll see "Embedded Metadata" for Al Jazeera (i.e. a generic translator --> no automatic tags) and "Daily Beast" for the second (which is why automatic tags are imported).

    It's not going to be perfect, but overall that seems like the most reasonable solution we can come up with.
  • It's good strategy, could I suggest future development?
    why you don't create a possibility for person to choose enable meta data tags or not and allow blacklist site?

    so then zotero default will be with auto tags disabled for all sites except those have support.
    but one could enable the auto tagging by hand and create his own blacklist.
  • because it's too much complexity for a relatively marginal functionality for most users. If anyone wants to write an add-on that wouldn't be terribly hard, but I'm pretty sure that shouldn't be in regular Zotero.
  • Hi again,
    If I want to modify one translator for another website, where to start?
    I'm not good at java script, but I could modify one file.
    many things I don't understand seems fuzzy to me.
  • Start a new thread and provide more detail (what translator, what modification). In general, the translators are located inside <Zotero data directory>/translators folder. Most file names match translator names.
Sign In or Register to comment.