The Quality of Embedded Metadata in PDFs (Jan 2013)

Discussion on the poor quality of embedded metadata in PDFs. The discussion took place from 4-8 January 2013 following a blog post by @briankelly, which itself was inspired by a post by @rmounce. NOTE: anybody whose tweet has been included who wishes the content to be removed should contact me.

  1. Why the 'inside-out/outside-in' repository manager should address the poor quality of metadata embedded in PDFs: bit.ly/VCbHsH
  2. RT @briankelly: Why the 'inside-out/outside-in' repository manager should address the poor quality of metadata embedded in PDFs: http://bit.ly/VCbHsH
  3. RT @briankelly: Why the 'inside-out/outside-in' repository manager should address the poor quality of metadata embedded in PDFs: http://bit.ly/VCbHsH
  4. RT @briankelly: Why the 'inside-out/outside-in' repository manager should address the poor quality of metadata embedded in PDFs: http://bit.ly/VCbHsH
  5. **Comment by @mrnick:
  6. @briankelly Have spent a lot of time manually adding metadata to PDFs. Definitely improves SEO
  7. @mrnick Do you add cover pages to the PDFs? What does this do to the metadata?
  8. **Comment by @neilstewart (and exchange of ideas with @briankelly):
  9. @mrnick @briankelly this is an interesting post. There is another issue not covered: we do work for academics in converting .doc to .pdf...
  10. @mrnick @briankelly ...might this also distort or destroy metadata added to the original Word docs?
  11. @neilstewart Interesting. I'd assumed that authors created their own PDFs & sent 1 to publisher & other to IR.
  12. @briankelly if only! We tend to get tranches of back-files often in Word, sometimes in PDF, occasionally in exotic formats (Wordperfect etc)
  13. RT @briankelly: Why the 'inside-out/outside-in' repository manager should address the poor quality of metadata embedded in PDFs: http://bit.ly/VCbHsH
  14. **Comment by @mrnick (on workflow for creating cover pages):
  15. @neilstewart interested in workflow for converting .doc to .pdf. Do this for stuff uploaded via @symplectic_uk? See my comment @briankelly
  16. RT @briankelly: Why the 'inside-out/outside-in' repository manager should address the poor quality of metadata embedded in PDFs: http://bit.ly/VCbHsH
  17. Gelezen: Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out… goo.gl/fb/3DmIt
  18. Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out & Outside-In View | @scoopit sco.lt/7vDPzV
  19. @mrnick @neilstewart If PDFs in IRs have cover pages added manually, surely there'll be a huge backlog in fixing embedded metadata
  20. **Comment by @rmounce (and discussion with @neilstewart) on value of cover pages):
  21. why do IRs need 2 slap on cover page anyway? Perhaps they should just embed additional provenance metadata @briankelly @mrnick @neilstewart
  22. @rmounce @briankelly @mrnick viewed as a way of advertising provenance (propoer citation), branding as from home inst but agreed!
  23. @rmounce @briankelly @mrnick we're in the fortunate position of having them auto-generated, so could switch off some or all tomorrow
  24. Gelezen: Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out… goo.gl/fb/yITEy

Did you find this story interesting? Be the first to or comment.

Liked!

Brian Kelly

UK Web Focus, based at UKOLN, University of Bath, UK.

Total views
52

Storify

@Storify