How To Articles
by Thom Parker of WindJack Solutions.
Copyright
© 2004
by WindJack Solutions
Explore Annotations in a PDF Document
by Thom Parker
This article discusses how you can use PDF CanOpener to find and identify the parts of an Annotation Object in a PDF Document.
PDF File Used in this article:
WJAnnotExample1.pdf
General:
The PDF Content, the stuff in
the Page Content Stream, is static, it's just like a picture. With PDF 1.5, Adobe added Optional
Content Groups, which gave
the content some minor interactivity, but it's the Annotations that
really liven up PDF Documents. Form Fields, Links,
Multimedia Objects, Notes, Highlights, Stamps, and anything else
on a page that responds to the user is an Annotation. They are
primarily a way of extending the PDF with location oriented and
user interactive functionality. To a lesser extent they are
also used as local storage objects. An Annotation is a live
object only if an appropriate Annotation Handler is available in
the viewer application. The Handler is a set
of functions for responding to mouse, keyboard, and
drawing events.
The PDF 1.5 Spec. defines a rich set of Annotations.
The handlers for most of these Annotations are built into Acrobat. If you
want to make your own, the SDK provides facilities for creating custom Annotation
Handlers. This
Article will look at the "Object Views" of 4 different
Annotations, one of which will be a simple custom Annotation.
Where is that Annot?
Every Annotation is associated with a specific page and a
specific location on that page. Even if it has no
graphical representation. As such, each Page that
has an Annotation on it, has its' own "Annots" Array in the Page's Dictionary.
The Annotation Objects in this array are unique, they
don't (or shouldn't anyway) exist in any other "Annots" array on
any other page. So the first place to look for an
Annotation is always in this array on the page of interest.
PDF CanOpener offers some different ways to find Annotations.
Tree Walking:
- Open the sample document in Acrobat and activate PDF CanOpener.
The PDF CanOpener display will show the root node of the
document's CosObj tree.
- Open the "Pages" entry.
- Open the "Kids" entry. This is the list of pages for
this document. There is only one page.
- Open the Page Object.
- The "Annots" array entry in the Page Dictionary is the
list of Annotations contained on this page. Open it.
Snap To and Highlight Annotations in the Document Page View:
- Right click anywhere in the tree view portion of the PDF
CanOpener Display to bring up the popup context menu.
Make sure both the "Highlight Selected" and "Snap to Selected"
options are checked.
- Click on any one of the Annotation Objects in the "Annots"
array. The Page View will scroll to make the selected
Annotation visible and it will be highlighted with a blue rectangle. This is an easy way to see which
Annotation in the object tree is which Annotation on the page.
(Screen shot of the sample PDF Document.)
Object Selector Tool:
In the above image three methods are shown for activating the
Selector Tool. The Selector tool is the easiest way to
locate a specific Annotation in the COS Object tree from the
Page View.
- After you activate the Selector Tool the Page View cursor will change to
.
- As
it passes over an Annotation, the Annotation will be highlighted.
- Click on the Annotation to navigate to its' representation
in the Object Tree, very easy.
- Click on it a second time and Annotation dictionary
expands.
Simple Link Annotation:
Shown below are the contents of a Link Annotation Dictionary.
This Annotation Type is one of the simplest and most utilized of
the built-in Annotations.
This
Link Annotation has no static visual representation, so it is missing
some entries that are common to most Annotation types. We'll get
to those later. The Info Window, immediately below the CosObj
tree display, shows us that the Link Annotation Object is
really an indirect reference to a Cos Dictionary. All Annotations
are indirect so they can be referenced in other
locations. Some of the entries shown are specific to the
"Link" Annot and some are common to all Annots.
Common Annotation Entries in the Link Object:
"Type" - Optional
entry. Cannot be relied on for programmatically IDing
Annotations.
"SubType" - Required.
"Rect" - Required. Location
on page in User Coordinates.
"Border" - Border Style
Array. Original PDF implementation. Redundant here but
needed for backward compatibility.
"BS" - Also Border Style,
introduced in PDF 1.2
"A" - Action to
take when Annotation is clicked on. Common to all types but really only used
in a few.
The "H" entry is the only one shown
here that is specific to the Link Annotation, it selects the
type of highlight to be drawn by the Handler when the cursor passes over the
link. Another entry specific to the Link Annotation, but not
present here, is the "Dest" entry, which holds a Destination
Object. In this Annotation the "A" entry is taking
its place .
The Link Annotation will even work with
fewer members. Delete the "BS", "Border", and "H"
entries. The Annotation Handler will supply them with default
values.
A more complex Annotation: the Highlight
The Highlight Annotation features a visual
representation on the page and a text input box.
There are a lot more entries in this Annotation than in the
Link Example. These entries are common to a large group of
built-in Annotations used for text markup and commenting.
"Flags" - Determines how Acrobat (not
necessarily the Handler) treats the Annotation. It
sets characteristics like the visibility and printability. Very
common entry for all Annotations that have a static appearance on the
page.
"Subj" - User entered string. From the "Subject" field in the
Properties Dialog.
"CreationDate" - Check the PDF Spec sec. 3.83 for format
details.
"NM" - Annotation's Name, auto generated by Acrobat. This
string is the input value you use with the JavaScript method "doc.getAnnot()".
"C" - This array is the fill color used by the
Annotation
Handler whenever it needs to draw something, like the popup text
box. For example, the Highlight in the sample PDF is
yellow, by default the popup is also drawn in yellow. So the
three numbers in the "C" array are the RGB values for yellow
[1,1,0]. If you change these to [1,0,0] for red, the
highlight color is unchanged, but the popup is now drawn in red.
"M" - Modified date.
"P" - reference to the Page Object in which the
Annotation appears. Typically in every Annotation with an
appearance. It is an aid to navigation and it fixes the
Annotation's association with the page
"T" - User entered string, through the "Author" field in the
Properties Dialog.
"StructParent" - Associates this Annotation with a node
somewhere in the logical structure tree ("StructTreeRoot") in
the document catalog.
"QuadPoints" - This entry is specific to the set of Text Markup
Annotations (highlights, underlines and strikeouts). It's
an array of 4 points (8 numbers) that give the vertices of the
rectangle that bounds the affected text on the page. 4 points
allows the representation of rotated rectangles.
The two most important entries of the Highlight Annotation for
this discussion are "AP"
and "Popup". The "AP" or Appearance Dictionary
contains the Annotation's
graphical appearance in a Form XObject. This Annotation
has only one, but an Annotation may have many, each representing
a different visual state. The PDF Spec divides these appearances
into three broad categories, Normal, Down, and Rollover.
In this example, the Highlight Annotation has only one
appearance, so by default it must be named "N", for normal.
The Down appearance is used when the Annotation is clicked
on, and the Rollover Appearance is used when the cursor passes
over the Annotation. If there are more than one appearance per normal/down/rollover categories,
then the "N", "D" and "R" entries become
Dictionary Objects.
The Appearance Streams in these dictionaries are named according to visual
states defined by the Annotation Handler. The Annotation
dictionary must then also contain an "AS" (Annotation
State)
entry that indicates to the Annotation Handler, or Acrobat if
the handler is missing, which appearance to draw on the
page. We'll see this later in the discussions on both Form
Fields and the Custom Annotation.
The "Popup" entry is another Annotation. It is
used in this case as an input window. If you look
inside it you'll see it does not have an "AP" entry. The
Annotation Handler is responsible for drawing it on the page and
processing the user input. The Popup is connected to
the Highlight Annotation through the "Parent" entry of its'
dictionary. Popup Annotations are used as a kind of local
storage for the Annotation. One use is to store a history
of "Comment Status" changes made to an Annotation. In this case, the "Popup" Annotation
is just a way of keeping track of the last position and state of
the input box.
Form Fields:
Form Field Objects are all kept in the "AcroForm" entry of the
Document Catalog, so they have global, or Document Scope,
meaning they have the same value everywhere in the document. You cannot
have two Form Fields with the same name in the same document.
But you say, Form Fields have a graphical appearance on specific
pages in the document. And furthermore, you can copy them
all over the place so you can have lots of Form Fields with the
same name. Well, not really, as I'll explain a little
later.
The screen shot below
shows the "AcroForm" dictionary for the document displayed on
the right. This document has 4 Form Fields; one text field
and 3 radio buttons. The "Fields" entry of the AcroForm
Dictionary contains a list of all the Form Fields on this
document. It only contains 2 entries, and one of those
isn't even a Form Object, it's an Annotation, what gives?
The Widget Annotation is the Form Field's graphical
representation on the specific pages. Each Form Field has
one Widget Annotation for each place it is used in the document.
If there is only one location on a document where a Form Field
is used, Acrobat combines the Form Field Dictionary and the
Widget Annotation into a single object, like the Text Field in
this example. If a Form Field is used more than once,
Acrobat puts it in a proper Form Field Object. The location specific Widget Annotations
are stuffed into the "Kids" entry. See the
the Radio Button entries in the above example.
For the Form Field to be visible on a page, it must have a
Widget Annotation in the page's "Annots" Array. Here is a
view of the same Radio Button Widget as above inside the "Annots"
Array of a Page Dictionary.
There aren't too many things here that are different from the
other Annotation Types. The big difference between this
Annotation and the earlier examples is the "AP" entry. This one has 4 Appearance Streams,
two each in the normal appearance entry "N" and the down
appearance entry "D". One appearance for each of the
states this radio button can have, "Off" or "Sel1".
"Sel1" is the export
value of this button set by the user. The Annotation's "AS" entry tells Acrobat which of
these XObjects to use when drawing it on the page. It is
the Annotation Handlers responsibility to set this value.
Form Fields are highly interactive. The Field is "Active"
whenever it has the mouse and keyboard focus, for example, when
the user enters text into a Text Field or clicks on a button.
In this state the Form Handler is responsible for drawing
the Widget Annotation's Appearance. The Form Field becomes
"Inactive" when it loses focus. At this point the Form Field's static
appearance (the Widget's Appearance Stream,
or "AP" entry) needs to be changed to reflect the user's
changes.
If this change requires regenerating the Appearance Stream,
then the "MK" entry provides the Handler with hints on how to do
this. The entries in the "MK" Dictionary are set by the
user in the Properties Dialog for the Form Field. The one in
this example contains only two entries, one for the border
color, and one for the background color. If you set more
properties in the dialog, more entries will appear in this
dictionary. For the Radio Button Field, all the Appearance
Streams it will ever need are created when these properties are
set. So, user interaction causes the Handler to set the
value of the "AS" (Annotation State) entry, rather than
regenerate the Appearance Stream.
Custom Annotation:
No Handler was written for this example. The Annotation
was designed to simply be visible on the page.
For an Annotation to be displayable it needs:
To be in the Annots array of the Page Dictionary.
Have a "Rect" entry that places it on the page.
Have a "Subtype" entry, so Acrobat can identify the correct
Handler.
Have an "AP" entry that has at least one entry "N", that is a Form XObject.
That's all, any other entries in the Annotation
Dictionary are parameters for the Handler. The screen shot below shows a custom
Annotation in a section of the Page Annots array. This one is a
little more complex than the minimum. It has 2 Appearance
Streams and an "AS" entry that selects which one Acrobat
displays. The only way to change the "AS" entry is to
either write an Annotation handler or get PDF CanOpener.
We hope this material was helpful to you.
If you have any questions or comments for us or want more info
on PDF CanOpener, please send email to
info@windjack.com.
Check back regularly for new articles.