Monday, September 30, 2019

DOC to PDF with MS Word and Vbscript


Dear readers,

Before we get into the subject -
It has been some time now that I gave an update to my blog. One of the demotivating reasons for not updating the blog is the fact that much of the content like the assembly language code and related tutorial are all very old and were hosted in geocities. I just dumped the archive here. And I wanted to re-organize this blog a little. However, some of my blog entries are newer but just that some older technologies are used.

There are two reasons to this (1) It simplifies the learning 2) Old is still Gold. Hence today, I am going to present a simple scripting task using a tool of a "bygone era" called VBScript. The words "tool of a bygone era" is not mine. I had a discussion the other day with a colleague of mine and we were discussing on how many new technologies are available now especially programmer tools. How much value add do they bring especially when we need to port some of our old code to the new technology and so on. I have spent quite some time with VB, VBA , VBScript in the past. I was always thrilled with the "automation power" VB family had. I could make the system talk with just a line of code in VBScript. I could animate the "OLD MS ASSISTANT" MS Agent in a few lines of code, animate it and make it show some text. Those were old and golden days. Now we have newer technologies and newer ways of doing things. Yet thanks to Microsoft, they don't just dump these technologies. They do consider the "cost" involved for any business to move to a newer technology. They also consider a developer's pain when it involves learning up a new paradigm. Of course there are some low level technologies that have changed over time and it has indeed presented problems to the developers. But as most of you know, right now it is just crazy, how many technologies we have and how many technologies businesses are using (mixed programming).

Anyway let us get into business. So the task is - you have a lot of .doc files, in a directory, that you want to convert to .pdf. There are tools available now as part of some professional editions of "PDF readers" and probably there are some scripts available. Frankly I didn't do much research. I just want to remember VB hence I just got into the coding. Now there are at least two ways, that I could quickly think of, to do this task. One is to load the files and then "Print them to PDF". The second option is to use the MS WORD's inbuilt "Export to PDF" function. So below is the code that I came up with using the latter option.

If not for this script, you would normally open each .doc file and export it to PDF. Again some "PDF creators" would have in-built option and integration with MS Word etc., I haven't done much research on this.

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Set objFSO = CreateObject("Scripting.FileSystemObject")
currentpath = objFSO.GetParentFolderName(WScript.ScriptFullName)

Set wordapp = CreateObject("Word.Application")
wordapp.Visible = False

objStartFolder = "."

Set objFolder = objFSO.GetFolder(objStartFolder)

Set colFiles = objFolder.Files

For Each objFile in colFiles

if instr(1,objFile.Name,".doc",1) > 0 then
iname = currentpath  & "\" & objFile.Name
oname = currentpath  & "\" & objFile.Name & ".pdf"

WScript.Echo "Converting " & objFile.Name & "..."
wordapp.Documents.Open iname
wordapp.ActiveDocument.ExportAsFixedFormat oname,&H11

        wordapp.ActiveDocument.Close
end if
Next

wordapp.Quit
Set wordapp = Nothing
Set objFSO = Nothing

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Just save above script as doc2pdf.vbs or use any name you want to but save to the directory where you have the .doc files. Open a command prompt. Ensure you are in that directory and run the script as "cscript doc2pdf.vbs". 

How this works - First we use the automation power of vbscript to instantiate two objects, one that deals with the "filesystem operations" and the other to deal with "MS Word". Indeed you need to have MS Word installed in the system for this to work. 

We create a "Word application" first. We use the "File system object" to write some script that will do the "dir effect" or listing all files in a given folder. Note that I am using "." to make the script use the current directory. I leave it to the reader to modify the script to take any path and convert files in that path. Also I haven't dealt with a lot other use cases. For example you may want to convert .docx not the old format .doc or you just want to skip .docm and so on. Again these tasks are left to the readers. 

So here, we just traverse the directory, list all the files, check if the name has ".doc" and if yes then we use the Word application's instance and invoke the "export" option. To do this we use the instance of Word and open each document one by one. We then call the method ExportAsFixedFormat and pass it the name of the output file. Here I just append a .pdf to the actual .doc file name, so if your doc name is Sreejith.doc it gets exported to Sreejith.doc.pdf. And I also pass as the second argument, the value &H11, which is nothing but decimal 17. This is important and this means, I need a PDF file as output. Once a document is done, we close it and then we loop all over again, loading a new document and this continues for as long as the script "finds a .doc file".

That's it for today :)

Long live VB (hopefully VB.Net holds the fort)!

Happy scripting!