Store Document Data in files (i.e. Metadata support)

Feature Requests for PaperScan.
Post Reply
jfcarbel
Posts: 20
Joined: Mon Sep 26, 2011 6:34 am

Store Document Data in files (i.e. Metadata support)

Post by jfcarbel » Tue Apr 10, 2012 8:09 pm

Thank you for adding the PDF properties feature, but I noticed that you only have these in the save options menu which means they get applied to every scanned document. While that is great to have some forced defaults, not all properties will be the same for all scans. For example, I may have several scans and each one will have a different PDF title or subject property. So it makes sense to have this as part of the save dialog. Perhaps when PDF drop down is selected it shows this fields in the dialog.

I think it would be beneficial for Paperscan to start saving some key document details to metadata area of file. With your upcoming document management tool, Paperlight, this becomes even more important so that the scanned data your Paperscan users are creating today are ready for the powerful search features of a document management tool.

The key data should be saved into the file either PDF XMP metadata or as JPEG ITPC/XMP metadata. Tools in Paperscan should be able to display each files metadata so that user can verify that data was written to file, but any document management features should be reserved for external program. Paperscan should focus on what it does best which is to scan and save documents along with any metadata. Using this document data should be reserved for a document management tool. Since the data is stored in the file itself, you give your users the option to choose any document management tool they prefer whether that is your upcoming Paperlight product or another vendor's tool.

Here are some ideas of key metadata to save and details on each one.

1) Creation Date
2) Title and Subject
3) Author
4) Keywords
5) Document Type
6) Document Index Date
7) Document Amount (not sure if this is correct name to give it, but see details for what it is)

1) Creation Date

This is the date the document is created or thus scanned. This could be be populated automaticall by the program without any user interaction neccessary.

2) Title and Subject

Self Explanatory. XMP provides for multiple entries, however for first release supporting just one entry so that implementation is more simplier would be fine.

It would also be a great feature if each for this field it would remember the last 10 or 20 typed entries and display them as a drop down selection field. This way the most recent could be used again quickly. This will help when user is doing batch scanning of similar documents and make the workflow more efficient.

3) Author

Self Explanatory. XMP provides for multiple entries, however for first release supporting just one entry so that implementation is more simplier would be fine.

4) Keywords

Delimited list of common tags or keywords associated with a document. The beauty of tags vs folders is that if you store a document in a physical folder on hard drive it only can exist in that one folder unless you duplicate the file which is bad practice. But with tags/keywords documents can have multiple keywords so a document that is personal could in addition to the "personal" keyword have another keyword called "medical". So a document management tool could show the document in 2 keyword folders - once in personal and once in medical and the physical location of the document does not matter yet you have the benefit of this multiple organization.

It would also be a great feature if each for this field it would remember the last 10 or 20 typed entries and display them as a drop down selection field. This way the most recent could be used again quickly. This will help when user is doing batch scanning of similar documents and make the workflow more efficient. This would also need to be a multi-selection drop down since keyword field stores multiple entries.

5) Document Type

One very important XMP field that could be maintained could be "Document Type" and this could be: bank statement, bill, receipt, medical, invoice, contract, manual, letter, misc, etc.
As a start, I would provide the common ones in a drop down selection list. But in future update allow users to add their own.

6) Document Index Date

One other very powerful XMP meta data field that can be added would be "Document Index Date" and this field could store a date that is meaningful to the document. This could be anything from invoice date, receipt date, billing date, warranty expire date, etc.

7) Document Amount

This field would represent a key monetary amount associated with the document. Maybe for a receipt or invoice its the total dollar amount. For an inventory item in a home maybe its the value of that item.

Search examples:
Because of this metadata, in Paperlight I could ask it to show me all medical documents (Document Type=medical) with a service date (Document Index Date) in the last 90 days with a billing amount (Document Amount) greater then $500.
Show me all documents with document type of misc with keywords "inventory" and "home" and "tools" and with a Document Amount over $200. So basically show me all the tools I own that are worth over $200.

Technical Implementation and compatibility details:

In addition to PDF that you also allow document metadata to be added to JPEG as well since some users do not use PDF for document storage and instead use JPEG. Also information for PDF and JPEG should exist in both legacy and new more powerful XML metadata area. See articles below.

JPEG/TIF metadata:
http://en.wikipedia.org/wiki/IPTC_Infor ... ange_Model
"...most image manipulation programs keep the XMP and non-XMP IPTC attributes synchronize"
http://www.organizepictures.com/2009/11 ... data-going

PDF Metadata:
http://www.pragmaticpdf.com/2009/07/inf ... adata.html

The point of this article is that Information Dictionary is legacy way to store metadata. While this is true, I do disagree with this article to not support both. Many programs including Adobe Reader display the document properties as the information in the PDF Information Dictionary. However, I believe while it is still some work that developer should sync this data to both legacy Information Dictionary and XMP.

jfcarbel
Posts: 20
Joined: Mon Sep 26, 2011 6:34 am

Re: Store Document Data in files (i.e. Metadata support)

Post by jfcarbel » Sun Mar 10, 2013 12:18 pm

Are there any plans on implementing some metadata improvements in PDF saving?

At very least having support for per document PDF properties:
PDF properties (per each document, likely in save dialog). Also to automatically set creation date property with current date.

Better yet would be addition of support for keywords with a drop down selector of most recently used or popular to select from.

Up next would be custom XMP document management like fields (Document-Type, Document-Index-Date, Document-Amount). Which could be in preparation for PaperLight so that users of Paperscan would be creating documents that PaperLight could index and scan/search for meta-data.

A Roadmap would be nice to see if your team has one. I would think document meta-data features would be a good approach in Paperscan in light of your upcoming Paperlight product. BTW - I signed up as beta-tester for Paperlight almost a year ago, has development on that planned product stopped?

Loïc
Site Admin
Posts: 96
Joined: Fri Nov 05, 2010 12:38 pm

Re: Store Document Data in files (i.e. Metadata support)

Post by Loïc » Wed Apr 10, 2013 9:29 pm

Hello Jeff,

I am sorry for the delay. We are intensively concentrated the next major releases of our major products and on our research fields. I can't provide an official roadmap since we prefer to keep such things as confidential. Also we've considerably enlarged our team which require some training time.

That said, we will seriously considerate all feature requests you have made. The one which have not been implemented require lot of work and some refactoring of our code. PaperScan V2 should be released on the 2nd half of this year, including lot of new capabilities.

Hope this brings some lights :)

Cheers,

Loïc

jfcarbel
Posts: 20
Joined: Mon Sep 26, 2011 6:34 am

Re: Store Document Data in files (i.e. Metadata support)

Post by jfcarbel » Wed Apr 10, 2013 11:33 pm

>> PaperScan V2 should be released on the 2nd half of this year, including lot of new capabilities.

I understand, thanks for keeping these suggestions in mind.
I am looking forward to this and hope it includes some basic meta-data support.

jersy123
Posts: 4
Joined: Thu Nov 16, 2017 9:44 am

Re: Store Document Data in files (i.e. Metadata support)

Post by jersy123 » Wed May 23, 2018 1:11 pm

Thanks

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests