Read Pdf Metadata

When you view a PDF, you can get information about it, such as the title, the fonts used, and security settings. Some of this information is set by the person who created the document, and some is generated automatically.

In Acrobat, you can change any information that can be set by the document creator, unless the file has been saved with security settings that prevent changes.

Vbscript Read Pdf Metadata

Showsbasic information about the document. The title, author, subject,and keywords may have been set by the person who created the documentin the source application, such as Word or InDesign, or by the person whocreated the PDF. You can search for these description items to findparticular documents. The Keywords section can be particularly usefulfor narrowing searches.

Note that many search engines use the titleto describe the document in their search results list. If a PDFdoes not have a title, the filename appears in the results listinstead. A file’s title is not necessarily the same as its filename.

This post of the example tutorial series describes how to read Metadata from a PDF document using Java iText library. For those, who are beginners to the concept of Metadata, a small definition is provided below to get started. The Best PDF Metadata Viewer All PDF documents contain information about the properties of the file. This includes who created the document, when the document was created, what software was used to create the document, what type of restrictions the document has in place, what is the resolution of the document is, and more. Is there a PHP library, preferably open-source, that can read PDF metadata? If so, or if there isn't, how would one use the library (or lack thereof) to extract the metadata? To be clear, I'm not interested in creating or modifying PDFs or their metadata, and I don't care about the PDF bodies. To edit PDF metadata online with the help of PDF Candy, start with uploading of the file for posterior processing: “Add file” button will let you upload the file from your device; alternatively you may use the drag and drop mechanism for that.

The Advanced areashows the PDF version, the page size, number of pages, whether thedocument is tagged, and if it’s enabled for Fast WebView. (The size of the first page is reported in PDFsor PDF Portfolios that contain multiple page sizes.)This information is generated automatically and cannot be modified.

Describeswhat changes and functionality are allowed within the PDF. If apassword, certificate, or security policy has been applied to thePDF, the method is listed here.

Lists the fonts and the font typesused in the original document, and the fonts, font types, and encodingused to display the original fonts.

If substitute fonts are used andyou aren’t satisfied with their appearance, you may want to installthe original fonts on your system or ask the document creator tore-create the document with the original fonts embedded in it.

Describeshow the PDF appears when it’s opened. This includes the initialwindow size, the opening page number and magnification level, andwhether bookmarks, thumbnails, the toolbar, and the menu bar aredisplayed. You can change any of these settings to control how thedocument appears the next time it is opened. You can also createJavaScript that runs when a page is viewed, a document is opened,and more.

Lets you add document propertiesto your document.

ListsPDF settings, print dialog presets, and reading options for the document.

In the PDF settings for Acrobat,you can set a base Uniform Resource Locator (URL)for web links in the document. Specifying a base URL makes it easyfor you to manage web links to other websites. If the URL to theother site changes, you can simply edit the base URL and not haveto edit each individual web link that refers to that site. The baseURL is not used if a link contains a complete URL address.

Youcan also associate a catalog index file (PDX) with the PDF. Whenthe PDF is searched with the Search PDF window,all of the PDFs that are indexed by the specified PDX file are alsosearched.

You can include prepress information, such as trapping,for the document. You can define print presets for a document, whichprepopulate the Print dialog box with document-specific values.You can also set reading options that determine how the PDF is readby a screen reader or other assistive device.

You can add keywords to the document propertiesof a PDF that other people might use in a search utility to locatethe PDF.

Click the Description tab, and type the author’s name,subject, and keywords.

(Optional) Click Additional Metadata toadd other descriptive information, such as copyright information.

You can add custom document propertiesthat store specific types of metadata, such as the version numberor company name, in a PDF. Properties you create appear in the DocumentProperties dialog box. Properties you create must have unique namesthat do not appear in the other tabs in the DocumentProperties dialog box.

To add a property, type the name and value, and thenclick Add.

To change the properties, do any of the following, andthen click OK:

To edit a property, select it, change theValue, and then click Change.
To delete a property, select it and click Delete.

To change the name of a custom property, delete theproperty and create a new custom property with the name you want.

PDFdocuments created in Acrobat 5.0 or later contain document metadatain XML format. Metadata includes information about the documentand its contents, such as the author’s name, keywords, and copyrightinformation, that can be used by search utilities. The documentmetadata contains (but is not limited to) information that alsoappears in the Description tab of the Document Properties dialogbox. Document metadata can be extended and modified using third-partyproducts.

The ExtensibleMetadata Platform (XMP) provides Adobe applicationswith a common XML framework that standardizes the creation, processing,and interchange of document metadata across publishing workflows.You can save and import the document metadata XML source code inXMP format, making it easy to share metadata among different documents.You can also save document metadata to a metadata template thatyou can reuse in Acrobat.

Choose File > Properties, and clickthe Additional Metadata button in the Descriptiontab.

Click Advanced to display all the metadata embedded inthe document. (Metadata is displayed by schema—that is, in predefinedgroups of related information.) Display or hide the informationin schemas by schema name. If a schema doesn’t have a recognizedname, it is listed as Unknown. The XML name space is contained inparentheses after the schema name.

Choose File > Properties, click theDescription tab, and then click Additional Metadata.

To edit the metadata, do any of the following, and thenclick OK.

To add previously saved information, click Append, select an XMP or FFO file, and click Open.
To add new information and replace the current metadata with information stored in an XMP file, click Replace, select a saved XMP or FFO file, and click Open. New properties are added, existing properties that are also specified in the new file are replaced, and existing properties that are not in the replacement file remain in the metadata.
To delete an XML schema, select it and click Delete.
To append the current metadata with metadata from a template, hold down Ctrl (Windows) or Command (Mac OS) and choose a template name from the dialog box menu in the upper right corner.

Note:

You must save a metadata template before you can import metadata from a template.

To replace the current metadata with a template of metadata, choose a template file (XMP) from the dialog box menu in the upper right corner.

Choose File > Properties, click theDescription tab, and then click Additional Metadata.

To save the metadata to an external file,click Save and name the file. The metadata is stored as a file inXMP format. (To use the saved metadata in another PDF, open thedocument and use these instructions to replace or append metadatain the document.)
To save the metadata as a template, choose SaveMetadata Template from the dialog box menu in the upper right corner,and name the file.

Youcan view the metadata information of certain objects, tags, andimages within a PDF. You can edit and export metadata for Visioobjects only.

Use the Object Data tool to view object grouping and object data.

Select an object, right-click the selection, and chooseShow Metadata. (If Show Metadata is unavailable, the image has nometadata associated with it.)

Double-click an object on the page to show its metadata.

The Model Tree opens and showsa hierarchical list of all structural elements. The selected object’smetadata appears as editable properties and values at the bottomof the Model Tree.

Note:

Theselected object is highlighted on the page. Use the HighlightColor menu at the top of the Model Tree tochoose a different color.

To edit the metadata, type in the boxes at the bottomof the Model Tree.

To export object metadata, from the options menu, choose ExportAs XML > Whole Tree to exportall objects in the Model Tree, or choose Export As XML > CurrentNode to export only the selected object and its children.Name and save the file.

Double-click an object on the page to show its metadata.

From the options menu , chooseone of the following:

Choose Export As XML > WholeTree to export all objects.
Choose Export As XML > CurrentNode to export only the selected object and its children.

More like this

Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices | Online Privacy Policy

Active2 years, 2 months ago

I'm trying to read metadata attached to arbitrary PDFs: title, author, subject, and keywords.

Is there a PHP library, preferably open-source, that can read PDF metadata? If so, or if there isn't, how would one use the library (or lack thereof) to extract the metadata?

To be clear, I'm not interested in creating or modifying PDFs or their metadata, and I don't care about the PDF bodies. I've looked at a number of libraries, including FPDF (which everyone seems to recommend), but it appears only to be for PDF creation, not metadata extraction.

user113292

6 Answers

The Zend framework includes Zend_Pdf, which makes this really easy:

Limitations: Works only on files without encryption smaller then 16MB.

Community♦

Pdf File Metadata

user113292

Don't know about libraries, but a simple way to achieve the same result might be fopening the file and parsing everything that comes after the last 'endstream'.

Try to open a pdf on a text editor, a parser shouldn't take more than five lines.

user113292

cbrandolinocbrandolino

5,0322 gold badges15 silver badges27 bronze badges

PDF Parser does exactly what you want and it's pretty straightforward to use:

You can try it in the demo page.

Alessandro CosentinoAlessandro Cosentino

I was looking for the same thing today. And I came across a small PHP class over at http://de77.com/ that offers a quick and dirty solution. You can download the class directly. Output is UTF-8 encoded.

The creator says:

Here’s a PHP class I wrote which can be used to get title & author and a number of pages of any PDF file. It does not use any external application - just pure PHP.

For me, it work's! All thanks goes solely to the creator of the class ... well, maybe just a little bit thanks to me too for finding the class ;)

maxpower9000maxpower9000

joan16v

3,7822 gold badges41 silver badges43 bronze badges

ved uniyalasved uniyalas

You may use PDFtk to extract the page count:

If ImageMagick is available you may also use:

Execute in PHP via shell_exec():

maxpower9000maxpower9000

templateever

Read Pdf Metadata

Vbscript Read Pdf Metadata

More like this

6 Answers

Pdf File Metadata

티스토리툴바