The PdfWriter Class
- class pypdf.PdfWriter(fileobj: str | IO[Any] = '', clone_from: None | PdfReader | str | IO[Any] | Path = None)[source]
Bases:
PdfDocCommon
Write a PDF file out, given pages produced by another class or through cloning a PDF file during initialization.
Typically data is added from a
PdfReader
.- property is_encrypted: bool
Read-only boolean property showing whether this PDF file is encrypted.
Note that this property, if true, will remain true even after the
decrypt()
method is called.
- flattened_pages: List[PageObject] | None = None
- property root_object: DictionaryObject
Provide direct access to Pdf Structure.
Note
Recommended be used only for read access.
- property xmp_metadata: XmpInformation | None
XMP (Extensible Metadata Platform) data.
- property pdf_header: str
Read/Write Property Header of the PDF document that is written.
This should be something like
'%PDF-1.5'
. It is recommended to set the lowest version that supports all features which are used within the PDF file.Note: pdf_header returns a string but accepts bytes or str for writing
- get_object(indirect_reference: int | IndirectObject) PdfObject [source]
- set_need_appearances_writer(state: bool = True) None [source]
Sets the “NeedAppearances” flag in the PDF writer.
The “NeedAppearances” flag indicates whether the appearance dictionary for form fields should be automatically generated by the PDF viewer or if the embedded appearance should be used.
- Parameters:
state – The actual value of the NeedAppearances flag.
- Returns:
None
- create_viewer_preferences() ViewerPreferences [source]
- add_page(page: PageObject, excluded_keys: Iterable[str] = ()) PageObject [source]
Add a page to this PDF file.
Recommended for advanced usage including the adequate excluded_keys.
The page is usually acquired from a
PdfReader
instance.- Parameters:
page – The page to add to the document. Should be an instance of
PageObject
excluded_keys
- Returns:
The added PageObject.
- insert_page(page: PageObject, index: int = 0, excluded_keys: Iterable[str] = ()) PageObject [source]
Insert a page in this PDF file. The page is usually acquired from a
PdfReader
instance.- Parameters:
page – The page to add to the document.
index – Position at which the page will be inserted.
excluded_keys
- Returns:
The added PageObject.
- add_blank_page(width: float | None = None, height: float | None = None) PageObject [source]
Append a blank page to this PDF file and return it.
If no page size is specified, use the size of the last page.
- Parameters:
width – The width of the new page expressed in default user space units.
height – The height of the new page expressed in default user space units.
- Returns:
The newly appended page
- Raises:
PageSizeNotDefinedError – if width and height are not defined and previous page does not exist.
- insert_blank_page(width: float | Decimal | None = None, height: float | Decimal | None = None, index: int = 0) PageObject [source]
Insert a blank page to this PDF file and return it.
If no page size is specified, use the size of the last page.
- Parameters:
width – The width of the new page expressed in default user space units.
height – The height of the new page expressed in default user space units.
index – Position to add the page.
- Returns:
The newly appended page.
- Raises:
PageSizeNotDefinedError – if width and height are not defined and previous page does not exist.
- property open_destination: None | Destination | TextStringObject | ByteStringObject
- add_js(javascript: str) None [source]
Add JavaScript which will launch upon opening this PDF.
- Parameters:
javascript – Your Javascript.
>>> output.add_js("this.print({bUI:true,bSilent:false,bShrinkToFit:true});") # Example: This will launch the print window when the PDF is opened.
- add_attachment(filename: str, data: str | bytes) None [source]
Embed a file inside the PDF.
Reference: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf Section 7.11.3
- Parameters:
filename – The filename to display.
data – The data in the file.
- append_pages_from_reader(reader: PdfReader, after_page_append: Callable[[PageObject], None] | None = None) None [source]
Copy pages from reader to writer. Includes an optional callback parameter which is invoked after pages are appended to the writer.
append
should be preferred.- Parameters:
reader – a PdfReader object from which to copy page annotations to this writer object. The writer’s annots will then be updated.
after_page_append – Callback function that is invoked after each page is appended to the writer. Signature includes a reference to the appended page (delegates to append_pages_from_reader). The single parameter of the callback is a reference to the page just appended to the document.
- update_page_form_field_values(page: ~pypdf._page.PageObject | ~typing.List[~pypdf._page.PageObject] | None, fields: ~typing.Dict[str, ~typing.Any], flags: ~pypdf.constants.FieldFlag = <FieldFlag: 0>, auto_regenerate: bool | None = True) None [source]
Update the form field values for a given page from a fields dictionary.
Copy field texts and values from fields to page. If the field links to a parent object, add the information to the parent.
- Parameters:
page – PageObject - references PDF writer’s page where the annotations and field data will be updated. List[Pageobject] - provides list of page to be processsed. None - all pages.
fields – a Python dictionary of field names (/T) and text values (/V).
flags – An integer (0 to 7). The first bit sets ReadOnly, the second bit sets Required, the third bit sets NoExport. See PDF Reference Table 8.70 for details.
auto_regenerate – set/unset the need_appearances flag ; the flag is unchanged if auto_regenerate is None.
- reattach_fields(page: PageObject | None = None) List[DictionaryObject] [source]
Parse annotations within the page looking for orphan fields and reattach then into the Fields Structure.
- Parameters:
page – page to analyze. If none is provided, all pages will be analyzed.
- Returns:
list of reattached fields.
- clone_reader_document_root(reader: PdfReader) None [source]
Copy the reader document root to the writer and all sub elements, including pages, threads, outlines,… For partial insertion,
append
should be considered.- Parameters:
reader – PdfReader from the document root should be copied.
- clone_document_from_reader(reader: PdfReader, after_page_append: Callable[[PageObject], None] | None = None) None [source]
Create a copy (clone) of a document from a PDF file reader cloning section ‘/Root’ and ‘/Info’ and ‘/ID’ of the pdf.
- Parameters:
reader – PDF file reader instance from which the clone should be created.
after_page_append – Callback function that is invoked after each page is appended to the writer. Signature includes a reference to the appended page (delegates to append_pages_from_reader). The single parameter of the callback is a reference to the page just appended to the document.
- generate_file_identifiers() None [source]
Generate an identifier for the PDF that will be written.
The only point of this is ensuring uniqueness. Reproducibility is not required. When a file is first written, both identifiers shall be set to the same value. If both identifiers match when a file reference is resolved, it is very likely that the correct and unchanged file has been found. If only the first identifier matches, a different version of the correct file has been found. see 14.4 “File Identifiers”.
- encrypt(user_password: str, owner_password: str | None = None, use_128bit: bool = True, permissions_flag: ~pypdf.constants.UserAccessPermissions = <UserAccessPermissions.PRINT|MODIFY|EXTRACT|ADD_OR_MODIFY|R7|R8|FILL_FORM_FIELDS|EXTRACT_TEXT_AND_GRAPHICS|ASSEMBLE_DOC|PRINT_TO_REPRESENTATION|R13|R14|R15|R16|R17|R18|R19|R20|R21|R22|R23|R24|R25|R26|R27|R28|R29|R30|R31|R32: 4294967292>, *, algorithm: str | None = None) None [source]
Encrypt this PDF file with the PDF Standard encryption handler.
- Parameters:
user_password – The password which allows for opening and reading the PDF file with the restrictions provided.
owner_password – The password which allows for opening the PDF files without any restrictions. By default, the owner password is the same as the user password.
use_128bit – flag as to whether to use 128bit encryption. When false, 40bit encryption will be used. By default, this flag is on.
permissions_flag – permissions as described in TABLE 3.20 of the PDF 1.7 specification. A bit value of 1 means the permission is grantend. Hence an integer value of -1 will set all flags. Bit position 3 is for printing, 4 is for modifying content, 5 and 6 control annotations, 9 for form fields, 10 for extraction of text and graphics.
algorithm – encrypt algorithm. Values may be one of “RC4-40”, “RC4-128”, “AES-128”, “AES-256-R5”, “AES-256”. If it is valid, use_128bit will be ignored.
- write(stream: Path | str | IO[Any]) Tuple[bool, IO[Any]] [source]
Write the collection of pages added to this object out as a PDF file.
- Parameters:
stream – An object to write the file to. The object can support the write method and the tell method, similar to a file object, or be a file path, just like the fileobj, just named it stream to keep existing workflow.
- Returns:
A tuple (bool, IO)
- add_metadata(infos: Dict[str, Any]) None [source]
Add custom metadata to the output.
- Parameters:
infos – a Python dictionary where each key is a field and each value is your new metadata.
- get_reference(obj: PdfObject) IndirectObject [source]
- get_outline_root() TreeObject [source]
- get_threads_root() ArrayObject [source]
The list of threads.
See §12.4.3 of the PDF 1.7 or PDF 2.0 specification.
- Returns:
An array (possibly empty) of Dictionaries with
/F
and/I
properties.
- property threads: ArrayObject
Read-only property for the list of threads.
See §8.3.2 from PDF 1.7 spec.
Each element is a dictionaries with
/F
and/I
keys.
- add_outline_item_destination(page_destination: IndirectObject | PageObject | TreeObject, parent: None | TreeObject | IndirectObject = None, before: None | TreeObject | IndirectObject = None, is_open: bool = True) IndirectObject [source]
- add_outline_item_dict(outline_item: OutlineItem | Destination, parent: None | TreeObject | IndirectObject = None, before: None | TreeObject | IndirectObject = None, is_open: bool = True) IndirectObject [source]
- add_outline_item(title: str, page_number: None | ~pypdf._page.PageObject | ~pypdf.generic._base.IndirectObject | int, parent: None | ~pypdf.generic._data_structures.TreeObject | ~pypdf.generic._base.IndirectObject = None, before: None | ~pypdf.generic._data_structures.TreeObject | ~pypdf.generic._base.IndirectObject = None, color: ~typing.Tuple[float, float, float] | str | None = None, bold: bool = False, italic: bool = False, fit: ~pypdf.generic._fit.Fit = <pypdf.generic._fit.Fit object>, is_open: bool = True) IndirectObject [source]
Add an outline item (commonly referred to as a “Bookmark”) to the PDF file.
- Parameters:
title – Title to use for this outline item.
page_number – Page number this outline item will point to.
parent – A reference to a parent outline item to create nested outline items.
before
color – Color of the outline item’s font as a red, green, blue tuple from 0.0 to 1.0 or as a Hex String (#RRGGBB)
bold – Outline item font is bold
italic – Outline item font is italic
fit – The fit of the destination page.
- Returns:
The added outline item as an indirect object.
- add_named_destination_array(title: TextStringObject, destination: IndirectObject | ArrayObject) None [source]
- add_named_destination_object(page_destination: PdfObject) IndirectObject [source]
- add_named_destination(title: str, page_number: int) IndirectObject [source]
- remove_annotations(subtypes: Literal['/Text', '/Link', '/FreeText', '/Line', '/Square', '/Circle', '/Polygon', '/PolyLine', '/Highlight', '/Unterline', '/Squiggly', '/StrikeOut', '/Stamp', '/Caret', '/Ink', '/Popup', '/FileAttachment', '/Sound', '/Movie', '/Widget', '/Screen', '/PrinterMark', '/TrapNet', '/Watermark', '/3D', '/Redact'] | Iterable[Literal['/Text', '/Link', '/FreeText', '/Line', '/Square', '/Circle', '/Polygon', '/PolyLine', '/Highlight', '/Unterline', '/Squiggly', '/StrikeOut', '/Stamp', '/Caret', '/Ink', '/Popup', '/FileAttachment', '/Sound', '/Movie', '/Widget', '/Screen', '/PrinterMark', '/TrapNet', '/Watermark', '/3D', '/Redact']] | None) None [source]
Remove annotations by annotation subtype.
- Parameters:
subtypes – SubType or list of SubTypes to be removed. Examples are: “/Link”, “/FileAttachment”, “/Sound”, “/Movie”, “/Screen”, … If you want to remove all annotations, use subtypes=None.
- remove_objects_from_page(page: PageObject | DictionaryObject, to_delete: ObjectDeletionFlag | Iterable[ObjectDeletionFlag]) None [source]
Remove objects specified by
to_delete
from the given page.- Parameters:
page – Page object to clean up.
to_delete – Objects to be deleted; can be a
ObjectDeletionFlag
or a list of ObjectDeletionFlag
- remove_images(to_delete: ~pypdf.constants.ImageType = <ImageType.ALL: 7>) None [source]
Remove images from this output.
- Parameters:
to_delete – The type of images to be deleted (default = all images types)
- add_uri(page_number: int, uri: str, rect: RectangleObject, border: ArrayObject | None = None) None [source]
Add an URI from a rectangular area to the specified page.
- Parameters:
page_number – index of the page on which to place the URI action.
uri – URI of resource to link to.
rect –
RectangleObject
or array of four integers specifying the clickable rectangular area[xLL, yLL, xUR, yUR]
, or string in the form"[ xLL yLL xUR yUR ]"
.border – if provided, an array describing border-drawing properties. See the PDF spec for details. No border will be drawn if this argument is omitted.
- set_page_layout(layout: Literal['/NoLayout', '/SinglePage', '/OneColumn', '/TwoColumnLeft', '/TwoColumnRight', '/TwoPageLeft', '/TwoPageRight']) None [source]
Set the page layout.
- Parameters:
layout – The page layout to be used
Valid layout
arguments/NoLayout
Layout explicitly not specified
/SinglePage
Show one page at a time
/OneColumn
Show one column at a time
/TwoColumnLeft
Show pages in two columns, odd-numbered pages on the left
/TwoColumnRight
Show pages in two columns, odd-numbered pages on the right
/TwoPageLeft
Show two pages at a time, odd-numbered pages on the left
/TwoPageRight
Show two pages at a time, odd-numbered pages on the right
- property page_layout: Literal['/NoLayout', '/SinglePage', '/OneColumn', '/TwoColumnLeft', '/TwoColumnRight', '/TwoPageLeft', '/TwoPageRight'] | None
Page layout property.
Valid layout
values/NoLayout
Layout explicitly not specified
/SinglePage
Show one page at a time
/OneColumn
Show one column at a time
/TwoColumnLeft
Show pages in two columns, odd-numbered pages on the left
/TwoColumnRight
Show pages in two columns, odd-numbered pages on the right
/TwoPageLeft
Show two pages at a time, odd-numbered pages on the left
/TwoPageRight
Show two pages at a time, odd-numbered pages on the right
- property page_mode: Literal['/UseNone', '/UseOutlines', '/UseThumbs', '/FullScreen', '/UseOC', '/UseAttachments'] | None
Page mode property.
Valid mode
values/UseNone
Do not show outline or thumbnails panels
/UseOutlines
Show outline (aka bookmarks) panel
/UseThumbs
Show page thumbnails panel
/FullScreen
Fullscreen view
/UseOC
Show Optional Content Group (OCG) panel
/UseAttachments
Show attachments panel
- add_annotation(page_number: int | PageObject, annotation: Dict[str, Any]) DictionaryObject [source]
Add a single annotation to the page. The added annotation must be a new annotation. It can not be recycled.
- Parameters:
page_number – PageObject or page index.
annotation – Annotation to be added (created with annotation).
- Returns:
The inserted object This can be used for pop-up creation, for example
- clean_page(page: PageObject | IndirectObject) PageObject [source]
Perform some clean up in the page. Currently: convert NameObject nameddestination to TextStringObject (required for names/dests list)
- Parameters:
page
- Returns:
The cleaned PageObject
- append(fileobj: str | IO[Any] | PdfReader | Path, outline_item: str | None | PageRange | Tuple[int, int] | Tuple[int, int, int] | List[int] = None, pages: None | PageRange | Tuple[int, int] | Tuple[int, int, int] | List[int] | List[PageObject] = None, import_outline: bool = True, excluded_fields: List[str] | Tuple[str, ...] | None = None) None [source]
Identical to the
merge()
method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.- Parameters:
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
outline_item – Optionally, you may specify a string to build an outline (aka ‘bookmark’) to identify the beginning of the included file.
pages – Can be a
PageRange
or a(start, stop[, step])
tuple or a list of pages to be processed to merge only the specified range of pages from the source document into the output document.import_outline – You may prevent the source document’s outline (collection of outline items, previously referred to as ‘bookmarks’) from being imported by specifying this as
False
.excluded_fields – Provide the list of fields/keys to be ignored if
/Annots
is part of the list, the annotation will be ignored if/B
is part of the list, the articles will be ignored
- merge(position: int | None, fileobj: Path | str | IO[Any] | PdfReader, outline_item: str | None = None, pages: str | PageRange | Tuple[int, int] | Tuple[int, int, int] | List[int] | List[PageObject] | None = None, import_outline: bool = True, excluded_fields: List[str] | Tuple[str, ...] | None = ()) None [source]
Merge the pages from the given file into the output file at the specified page number.
- Parameters:
position – The page number to insert this file. File will be inserted after the given number.
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
outline_item – Optionally, you may specify a string to build an outline (aka ‘bookmark’) to identify the beginning of the included file.
pages – can be a
PageRange
or a(start, stop[, step])
tuple or a list of pages to be processed to merge only the specified range of pages from the source document into the output document.import_outline – You may prevent the source document’s outline (collection of outline items, previously referred to as ‘bookmarks’) from being imported by specifying this as
False
.excluded_fields – provide the list of fields/keys to be ignored if
/Annots
is part of the list, the annotation will be ignored if/B
is part of the list, the articles will be ignored
- Raises:
TypeError – The pages attribute is not configured properly
- add_filtered_articles(fltr: Pattern[Any] | str, pages: Dict[int, PageObject], reader: PdfReader) None [source]
Add articles matching the defined criteria.
- Parameters:
fltr
pages
reader
- decode_permissions(permissions_code: int) Dict[str, bool]
Take the permissions as an integer, return the allowed access.
- get_destination_page_number(destination: Destination) int | None
Retrieve page number of a given Destination object.
- Parameters:
destination – The destination to get page number.
- Returns:
The page number or None if page is not found
- get_fields(tree: TreeObject | None = None, retval: Dict[Any, Any] | None = None, fileobj: Any | None = None) Dict[str, Any] | None
Extract field data if this PDF contains interactive form fields.
The tree and retval parameters are for recursive use.
- Parameters:
tree
retval
fileobj – A file object (usually a text file) to write a report to on all interactive form fields found.
- Returns:
A dictionary where each key is a field name, and each value is a
Field
object. By default, the mapping name is used for keys.None
if form data could not be located.
- get_form_text_fields(full_qualified_name: bool = False) Dict[str, Any]
Retrieve form fields from the document with textual data.
- Parameters:
full_qualified_name – to get full name
- Returns:
A dictionary. The key is the name of the form field, the value is the content of the field.
If the document contains multiple form fields with the same name, the second and following will get the suffix .2, .3, …
- get_named_dest_root() ArrayObject
- get_num_pages() int
Calculate the number of pages in this PDF file.
- Returns:
The number of pages of the parsed PDF file
- Raises:
PdfReadError – if file is encrypted and restrictions prevent this action.
- get_page(page_number: int) PageObject
Retrieve a page by number from this PDF file. Most of the time
`.pages[page_number]`
is preferred.- Parameters:
page_number – The page number to retrieve (pages begin at zero)
- Returns:
A
PageObject
instance.
- get_page_number(page: PageObject) int | None
Retrieve page number of a given PageObject.
- Parameters:
page – The page to get page number. Should be an instance of
PageObject
- Returns:
The page number or None if page is not found
- get_pages_showing_field(field: Field | PdfObject | IndirectObject) List[PageObject]
Provides list of pages where the field is called.
- Parameters:
field – Field Object, PdfObject or IndirectObject referencing a Field
- Returns:
List of pages –
- Empty list:
The field has no widgets attached (either hidden field or ancestor field).
- Single page list:
Page where the widget is present (most common).
- Multi-page list:
Field with multiple kids widgets (example: radio buttons, field repeated on multiple pages).
- property metadata: DocumentInformation | None
Retrieve the PDF file’s document information dictionary, if it exists.
Note that some PDF files use metadata streams instead of document information dictionaries, and these metadata streams will not be accessed by this function.
- property named_destinations: Dict[str, Any]
A read-only dictionary which maps names to
Destinations
- property outline: List[Destination | List[Destination | List[Destination]]]
Read-only property for the outline present in the document.
(i.e., a collection of ‘outline items’ which are also known as ‘bookmarks’)
- property page_labels: List[str]
A list of labels for the pages in this document.
This property is read-only. The labels are in the order that the pages appear in the document.
- property pages: List[PageObject]
Property that emulates a list of
PageObject
. this property allows to get a page or a range of pages.For PdfWriter Only: It provides also capability to remove a page/range of page from the list (through del operator) Note: only the page entry is removed. As the objects beneath can be used somewhere else. A solution to completely remove them - if they are not used anywhere - is to write to a buffer/temporary file and to load it into a new PdfWriter object afterwards.
- remove_page(page: int | PageObject | IndirectObject, clean: bool = False) None
Remove page from pages list.
- Parameters:
page –
int / PageObject / IndirectObject PageObject : page to be removed. If the page appears many times only the first one will be removed
IndirectObject: Reference to page to be removed
int: Page number to be removed
clean – replace PageObject with NullObject to prevent destination, annotation to reference a detached page
- property user_access_permissions: UserAccessPermissions | None
Get the user access permissions for encrypted documents. Returns None if not encrypted.
- property viewer_preferences: ViewerPreferences | None
Returns the existing ViewerPreferences as an overloaded dictionary.
- find_outline_item(outline_item: Dict[str, Any], root: List[Destination | List[Destination | List[Destination]]] | None = None) List[int] | None [source]
- find_bookmark(outline_item: Dict[str, Any], root: List[Destination | List[Destination | List[Destination]]] | None = None) List[int] | None [source]
Deprecated since version 2.9.0: Use
find_outline_item()
instead.
- reset_translation(reader: None | PdfReader | IndirectObject = None) None [source]
Reset the translation table between reader and the writer object.
Late cloning will create new independent objects.
- Parameters:
reader – PdfReader or IndirectObject referencing a PdfReader object. if set to None or omitted, all tables will be reset.
- set_page_label(page_index_from: int, page_index_to: int, style: PageLabelStyle | None = None, prefix: str | None = None, start: int | None = 0) None [source]
Set a page label to a range of pages.
Page indexes must be given starting from 0. Labels must have a style, a prefix or both. If to a range is not assigned any page label a decimal label starting from 1 is applied.
- Parameters:
page_index_from – page index of the beginning of the range starting from 0
page_index_to – page index of the beginning of the range starting from 0
style –
The numbering style to be used for the numeric portion of each page label:
/D
Decimal arabic numerals/R
Uppercase roman numerals/r
Lowercase roman numerals/A
Uppercase letters (A to Z for the first 26 pages, AA to ZZ for the next 26, and so on)/a
Lowercase letters (a to z for the first 26 pages, aa to zz for the next 26, and so on)
prefix – The label prefix for page labels in this range.
start – The value of the numeric portion for the first page label in the range. Subsequent pages are numbered sequentially from this value, which must be greater than or equal to 1. Default value: 1.
- class pypdf.ObjectDeletionFlag(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
IntFlag
- NONE = 0
- TEXT = 1
- LINKS = 2
- ATTACHMENTS = 4
- OBJECTS_3D = 8
- ALL_ANNOTATIONS = 16
- XOBJECT_IMAGES = 32
- INLINE_IMAGES = 64
- DRAWING_IMAGES = 128
- IMAGES = 224