By nik

2011-05-15 20:42:43 8 Comments

I'am tasked with converting tons of .doc files to .pdf. And the only way my supervisor wants me to do this is through MSWord 2010. I know I should be able to automate this with python COM automation. Only problem is I dont know how and where to start. I tried searching for some tutorials but was not able to find any (May be I might have, but I don't know what I'm looking for).

Right now I'm reading through this. Dont know how useful this is going to be.


@abdelhedi hlel 2020-02-28 20:05:01

I have tested many solutions but no one of them works efficiently on Linux distribution.

I recommend this solution :

import sys
import subprocess
import re

def convert_to(folder, source, timeout=None):
    args = [libreoffice_exec(), '--headless', '--convert-to', 'pdf', '--outdir', folder, source]

    process =, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout)
    filename ='-> (.*?) using filter', process.stdout.decode())


def libreoffice_exec():
    # TODO: Provide support for more platforms
    if sys.platform == 'darwin':
        return '/Applications/'
    return 'libreoffice'

and you call your function :

result = convert_to('TEMP Directory',  'Your File', timeout=15)

all ressources :

@John Paul Lemmon 2020-02-17 18:28:25

I was working with this solution but I needed to search all .docx, .dotm, .docm, .odt, .doc or .rtf and then turn them all to .pdf (python 3.7.5). Hope it works...

import os
import win32com.client

wdFormatPDF = 17

for root, dirs, files in os.walk(r'your directory here'):
    for f in files:

        if  f.endswith(".doc")  or f.endswith(".odt") or f.endswith(".rtf"):
                word = win32com.client.Dispatch('Word.Application')
                word.Visible = False
                doc = word.Documents.Open(in_file)
                doc.SaveAs(os.path.join(root,f[:-4]), FileFormat=wdFormatPDF)
                word.Visible = True
                print ('done')
                print('could not open')
                # os.remove(os.path.join(root,f))
        elif f.endswith(".docx") or f.endswith(".dotm") or f.endswith(".docm"):
                word = win32com.client.Dispatch('Word.Application')
                word.Visible = False
                doc = word.Documents.Open(in_file)
                doc.SaveAs(os.path.join(root,f[:-5]), FileFormat=wdFormatPDF)
                word.Visible = True
                print ('done')
                print('could not open')
                # os.remove(os.path.join(root,f))

The try and except was for those documents I couldn't read and won't exit the code until the last document.

@Al Johri 2019-12-24 08:04:16

You can use the docx2pdf python package to bulk convert docx to pdf. It can be used as both a CLI and a python library. It requires Microsoft Office to be installed and uses COM on Windows and AppleScript (JXA) on macOS.

from docx2pdf import convert

convert("input.docx", "output.pdf")
pip install docx2pdf
docx2pdf input.docx output.pdf

Disclaimer: I wrote the docx2pdf package.

@Todd 2020-02-07 00:04:42

Just used your package to print my .docx file. It worked like a charm! Couldn't have been simpler to use. Great job!

@Al Johri 2020-02-07 05:17:12

Thanks @Todd! Give the repo a star when you get a chance.

@abdelhedi hlel 2020-02-28 10:45:37

it not working on linux

@Al Johri 2020-03-03 06:48:56

@Abdelhedihlel Unfortunately, it requires Microsoft Office to be installed and thus only works on Windows and macOS.

@abdelhedi hlel 2020-03-03 08:24:51

@AlJohri take a look here this solution works on both windows and linux. runnig on linux it's a must bcause the most of deployement servers use linux

@patrick 2018-05-24 19:06:17

As an alternative to the SaveAs function, you could also use ExportAsFixedFormat which gives you access to the PDF options dialog you would normally see in Word. With this you can specify bookmarks and other document properties.

    ExportFormat=17, #17 = PDF output, 18=XPS output
    OptimizeFor=0,  #0=Print (higher res), 1=Screen (lower res)
    CreateBookmarks=1, #0=No bookmarks, 1=Heading bookmarks only, 2=bookmarks match word bookmarks

The full list of function arguments is: 'OutputFileName', 'ExportFormat', 'OpenAfterExport', 'OptimizeFor', 'Range', 'From', 'To', 'Item', 'IncludeDocProps', 'KeepIRM', 'CreateBookmarks', 'DocStructureTags', 'BitmapMissingFonts', 'UseISO19005_1', 'FixedFormatExtClassPtr'

@user2921789 2017-07-01 12:37:04

I tried the accepted answer but wasn't particularly keen on the bloated PDFs Word was producing which was usually an order of magnitude bigger than expected. After looking how to disable the dialogs when using a virtual PDF printer I came across Bullzip PDF Printer and I've been rather impressed with its features. It's now replaced the other virtual printers I used previously. You'll find a "free community edition" on their download page.

The COM API can be found here and a list of the usable settings can be found here. The settings are written to a "runonce" file which is used for one print job only and then removed automatically. When printing multiple PDFs we need to make sure one print job completes before starting another to ensure the settings are used correctly for each file.

import os, re, time, datetime, win32com.client

def print_to_Bullzip(file):
    util = win32com.client.Dispatch("Bullzip.PDFUtil")
    settings = win32com.client.Dispatch("Bullzip.PDFSettings")
    settings.PrinterName = util.DefaultPrinterName      # make sure we're controlling the right PDF printer

    outputFile = re.sub("\.[^.]+$", ".pdf", file)
    statusFile = re.sub("\.[^.]+$", ".status", file)

    settings.SetValue("Output", outputFile)
    settings.SetValue("ConfirmOverwrite", "no")
    settings.SetValue("ShowSaveAS", "never")
    settings.SetValue("ShowSettings", "never")
    settings.SetValue("ShowPDF", "no")
    settings.SetValue("ShowProgress", "no")
    settings.SetValue("ShowProgressFinished", "no")     # disable balloon tip
    settings.SetValue("StatusFile", statusFile)         # created after print job
    settings.WriteSettings(True)                        # write settings to the runonce.ini
    util.PrintFile(file, util.DefaultPrinterName)       # send to Bullzip virtual printer

    # wait until print job completes before continuing
    # otherwise settings for the next job may not be used
    timestamp =
    while( ( - timestamp).seconds < 10):
        if os.path.exists(statusFile) and os.path.isfile(statusFile):
            error = util.ReadIniString(statusFile, "Status", "Errors", '')
            if error != "0":
                raise IOError("PDF was created with errors")
    raise IOError("PDF creation timed out")

@Yang 2016-12-13 09:48:44

I have worked on this problem for half a day, so I think I should share some of my experience on this matter. Steven's answer is right, but it will fail on my computer. There are two key points to fix it here:

(1). The first time when I created the 'Word.Application' object, I should make it (the word app) visible before open any documents. (Actually, even I myself cannot explain why this works. If I do not do this on my computer, the program will crash when I try to open a document in the invisible model, then the 'Word.Application' object will be deleted by OS. )

(2). After doing (1), the program will work well sometimes but may fail often. The crash error "COMError: (-2147418111, 'Call was rejected by callee.', (None, None, None, 0, None))" means that the COM Server may not be able to response so quickly. So I add a delay before I tried to open a document.

After doing these two steps, the program will work perfectly with no failure anymore. The demo code is as below. If you have encountered the same problems, try to follow these two steps. Hope it helps.

    import os
    import comtypes.client
    import time

    wdFormatPDF = 17

    # absolute path is needed
    # be careful about the slash '\', use '\\' or '/' or raw string r"..."
    in_file=r'absolute path of input docx file 1'
    out_file=r'absolute path of output pdf file 1'

    in_file2=r'absolute path of input docx file 2'
    out_file2=r'absolute path of outputpdf file 2'

    # print out filenames
    print in_file
    print out_file
    print in_file2
    print out_file2

    # create COM object
    word = comtypes.client.CreateObject('Word.Application')
    # key point 1: make word visible before open a new document
    word.Visible = True
    # key point 2: wait for the COM Server to prepare well.

    # convert docx file 1 to pdf file 1
    doc=word.Documents.Open(in_file) # open docx file 1
    doc.SaveAs(out_file, FileFormat=wdFormatPDF) # conversion
    doc.Close() # close docx file 1
    word.Visible = False
    # convert docx file 2 to pdf file 2
    doc = word.Documents.Open(in_file2) # open docx file 2
    doc.SaveAs(out_file2, FileFormat=wdFormatPDF) # conversion
    doc.Close() # close docx file 2   
    word.Quit() # close Word Application 

@lxx 2014-10-15 22:15:28

unoconv(writen in python) and openoffice running as a headless daemon.

works very nicely for doc,docx, ppt,pptx, xls, xlsx. Very useful if you need to convert docs or save/convert to certain formats on a server

@Basj 2020-04-25 13:07:52

Can you include a sample code to show how to do it from a python script (import unoconv unoconv.dosomething(...))? The documentation only shows how to do it from command line.

@James N 2013-05-15 07:44:54

It's worth noting that Stevens answer works, but make sure if using a for loop to export multiple files to place the ClientObject or Dispatch statements before the loop - it only needs to be created once - see my problem: Python win32com.client.Dispatch looping through Word documents and export to PDF; fails when next loop occurs

@Steven 2011-05-16 13:19:36

A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:

import sys
import os
import comtypes.client

wdFormatPDF = 17

in_file = os.path.abspath(sys.argv[1])
out_file = os.path.abspath(sys.argv[2])

word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)

You could also use pywin32, which would be the same except for:

import win32com.client

and then:

word = win32com.client.Dispatch('Word.Application')

@nik 2011-05-17 21:02:18

This is exactly what I was looking for. Thanks :)

@ecoe 2014-04-10 14:09:48

For many files, consider setting: word.Visible = False to save time and processing of the word files (MS word will not display this way, code will run in background essentially)

@Snorfalorpagus 2015-03-25 13:39:22

I've managed to get this working for powerpoint documents. Use Powerpoint.Application, Presentations.Open and FileFormat=32.

@slaveCoder 2016-11-03 06:18:20

I am using a linux server and these libraries dont work in linux.. is there any other way to make it work in linux

@Aman Gautam 2017-03-14 07:05:47

8 years into development and it's this answer that made me feel bad that I am not on Windows!

@user3732708 2017-04-21 05:05:17

when I run this, came an error File "", line 7, in <module> in_file = os.path.abspath(sys.argv[1]) IndexError: list index out of range

@Peter Wood 2017-06-10 06:16:15

@user3732708 argv[1] and argv[2] will be the names of the input and output files. You get that error if you don't specify the files on the command line.

@asetniop 2018-07-12 19:11:31

When running the doc.SaveAs() command I got an error and had to drop the "FileFormat=" prefix, and then it worked fine.

@abdelhedi hlel 2020-02-28 10:22:27

it only works on windows

@Basj 2020-04-25 13:09:45

Do you have an example working with LibreOffice? word = comtypes.client.CreateObject('LibreWriter.Application') doesn't work.

@Bas Bossink 2011-05-15 20:53:39

If you don't mind using PowerShell have a look at this Hey, Scripting Guy! article. The code presented could be adopted to use the wdFormatPDF enumeration value of WdSaveFormat (see here). This blog article presents a different implementation of the same idea.

@nik 2011-05-15 21:44:11

I'am a linux/Unix user and more inclined towards python. But the ps script looks pretty simple and exactly what I was looking for. Thanks :)

@c-smile 2011-05-15 20:54:07

You should start from investigating so called virtual PDF print drivers. As soon as you will find one you should be able to write batch file that prints your DOC files into PDF files. You probably can do this in Python too (setup printer driver output and issue document/print command in MSWord, later can be done using command line AFAIR).

@mikerobi 2011-05-15 20:53:58

I would suggest ignoring your supervisor and use OpenOffice which has a Python api. OpenOffice has built in support for Python and someone created a library specific for this purpose (PyODConverter).

If he isn't happy with the output, tell him it could take you weeks to do it with word.

Related Questions

Sponsored Content

26 Answered Questions

[SOLVED] Does Python have a ternary conditional operator?

38 Answered Questions

[SOLVED] How to get the current time in Python

  • 2009-01-06 04:54:23
  • user46646
  • 3289120 View
  • 2875 Score
  • 38 Answer
  • Tags:   python datetime time

62 Answered Questions

[SOLVED] Calling an external command from Python

10 Answered Questions

[SOLVED] Does Python have a string 'contains' substring method?

30 Answered Questions

[SOLVED] Finding the index of an item in a list

  • 2008-10-07 01:39:38
  • Eugene M
  • 3862473 View
  • 3158 Score
  • 30 Answer
  • Tags:   python list indexing

20 Answered Questions

[SOLVED] What are metaclasses in Python?

8 Answered Questions

[SOLVED] What is __future__ in Python used for and how/when to use it, and how it works

  • 2011-08-16 07:52:30
  • leslie
  • 287038 View
  • 689 Score
  • 8 Answer
  • Tags:   python python-2.x

21 Answered Questions

[SOLVED] Recommended way to embed PDF in HTML?

  • 2008-11-14 23:55:13
  • Daniel Silveira
  • 1423989 View
  • 1178 Score
  • 21 Answer
  • Tags:   html pdf

Sponsored Content