Currently I have merged many PDFs together to create one PDF together. I have added metadata information which includes two fields "Created" and "Modified" but as a result these fields still do not display information. Here's my source code:
import re
import os
import fitz
from datetime import datetime
def importMetaData(path):
regex = r"^r20ut(\d+)ej(\d+)$"
r_UM = re.compile(regex)
extension = [".pdf"]
now = datetime.now() # current date and time
date_time = now.strftime("%m/%d/%Y %H:%M:%S %p")
print("date and time:",date_time)
Number = ""
for root, dirs, files in os.walk(path):
for file in files:
ext = os.path.splitext(file)[-1].lower()
f_name = os.path.splitext(file)[0]
if ext in extension:
if r_UM.search(f_name) is not None:
if root.endswith("thuan1"):
Number = dictRNumber["code1"]
elif root.endswith("thuan2"):
Number = dictRNumber["code2"]
else:
continue
inforPDF=fitz.open(os.path.join(root, file))
inforPDF.set_metadata({})
inforPDF.set_metadata(
{
"producer": "MicrosoftÂź Word for Office 365",
"author": "Thuan",
"modDate": date_time,
"title": "Data Analysis",
"creationDate": date_time,
"creator": "MicrosoftÂź Word for Office 365",
"subject": Number
})
inforPDF.save(os.path.join(root, f_name+".pdf"))
Two fields "Created" and "Modified" will display date time.
I have created a ticket on stackoverfollow:
https://stackoverflow.com/questions/66027402/fields-created-and-modified-in-document-properties-pdf-were-not-displayed
You used an illegal date/time format - see the documentation:
* If the date fields contain valid data (which need not be the case at all!), they are strings in the PDF-specific timestamp format "D:<TS><TZ>", where
- <TS> is the 12 character ISO timestamp YYYYMMDDhhmmss (YYYY - year, MM - month, DD - day, hh - hour, mm - minute, ss - second), and
- <TZ> is a time zone value (time intervall relative to GMT) containing a sign (â+â or â-â), the hour (hh), and the minute (âmmâ, note the apostrophies!).
* A Paraguayan value might hence look like D:20150415131602-04â00â, which corresponds to the timestamp April 15, 2015, at 1:16:02 pm local time Asuncion
If you put in some self-invented formats, PDF viewer applications may or may not understand it.
Why don't you use fitz.getPDFnow() for the current timestamp?
>>> import fitz
>>> print(fitz.getPDFnow())
D:20210207070439-03'00'
>>>
BTW you do not need to first empty the metadata via inforPDF.set_metadata({})
Dear JorjMcKie -san,
Thank you so much for your supporting.
I got it.
In the future, please do not hesitate or wait to ask. I am always trying to help people using the package as soon as I can.
You may also want to post a question under "Discussions" (top menu item). Apart from myself, other people may be there to answer ... or learn from your postings.
Most helpful comment
Dear JorjMcKie -san,
Thank you so much for your supporting.
I got it.