MetaData Tools

I searched MetaData because I have pdfs which may have metadata on the internet, but not in the document. I want to use metadata lookup to add the metadata to my homemade pdfs.

This would be if I had to copy the text from the website, put it in word and then create a pdf. the website offered a pdf but it was full of junk. So maybe that pdf has metadata and I want to add it to my pdf.

I have a program that allows me to add metadata to the pdf. (BeCyPDFMetaEdit). I want to know, what data do these metadata servers request? Can I use whatever DOI or ISBN the pdf uses? or do I need all of the title and author, etc.? by which time I have simply entered the metadata myself.

I also encounter a lot of pdfs in the non-academic quarters (government, business and non-profits) which do not even trouble to get a DOI. Who issues these things anyway, and can we request DOIs to be assigned? I'm going to guess it costs money, but these folks must know that important white papers, reports and position papers need to be registered, for citation in papers. I guess we're not going to get the Library of Congress to write a rule this year...

Since there is so much diversity in this environment, we're all just trying to do the best we can. I would find it helpful if there were a widget that would scan the text of a pdf for metadata, or allow me to copy the portion of a web page with the name of the publisher or organization, and drop that copy into the widget to enter into fields. There should be html in the background to guide that assignment, and the user should expect to correct errors.

in my ideal Zotero iteration, Zotero writes to the PDF whatever we enter into the metadata fields, since Zotero already reads, it could write. Any pdf that gets dropped would automatically have a "new item" side bar, with unfilled fields. Entering a DOI might help track down metadata for a corresponding published pdf, or the user might need to fill manually or from the widget.

As an optimal feature, Zotero would have a check box for any PDF lacking a DOI or ISBN, "Request registration".

Enough for MetaData. Look for new topic "merge".
  • edited 16 days ago
    In order to mint a DOI, a publisher (private or public, one-person or multinational corporation) must:

    1) have a working contract with Crossref, Datacite, or possibly another DOI registration organization - which involves some kind of payment;
    2) provide the necessary metadata to describe the digital object.

    So there can't be a DOI without the metadata and some money. Who is supposed to supply that, in your ideal situation?

    And PDFs don't actually contain a lot of structured metadata, so it is difficult to estimate how reliable an automatic extraction would be. Title: probably. Authors: I'm not so sure. Publisher: maybe. Number of pages: OK, that one should be easy :-). Document type: forget it. Etc. As for storing structured metadata into a PDF, there are certainly ways to do it in principle but I'm not sure there is a standard choice that you can expect to be recognized by widely used software (such as Zotero or others).
  • Well, Maybe the rules need to be relaxed or streamlined so that publishers can pay $20 for a single paper and register online? It's an ideal world, not the world we have. but for sure what you describe sounds clunky. No button in Zotero anyway!!

    Otherwise, yes, that basic data would be important, useful. Well, often enough they have a DOI in the document. The Doi is metadata, right?

    Any way I know you are busy and I appreciate that you take time to answer questions in the forum. Right now I need to go weed my Zotero library.
  • If there's a DOI in a paper, Zotero should be able to automatically grab the metadata when you drag the file to Zotero.

    (And DOI registration has to require a membership because organization issuing DOIs need to have some sort of plan to keep the identified object accessible, e.g. by updating a URL if an item/site moves. The actual cost of especially CrossRef DOIs is quite low https://www.crossref.org/fees/#annual-membership-fees )
  • edited 16 days ago
    It's fairly easy to distinguish a DOI in a string of text automatically: it is constructed with a precise structure, and a computer can use that with sufficient confidence. That's why it works.

    And yes, there are commitments beyond just paying a few bucks for a DOI (which is actually the right order of magnitude for the price)
  • @adamsmith If there's a DOI in a paper, Zotero should be able to automatically grab the metadata when you drag the file to Zotero.

    Even if published in the text of the document? That makes it easier, and I think what that means is that if I include the putative DOI in my pdf, Zotero will identify the document I am trying to store.
  • Yes, Zotero looks for ISBNs and DOIs on the first couple of pages of PDFs.
  • The problem is that too many don't have a number at all. No help with that in this topic, Thank you for educating me!
  • The point of this last link was just information - so that you understand the current functionality, and some important differences with your suggestions.
  • Of course, thank you.
Sign In or Register to comment.