Mac OS X, Word and the quest for the unbloated PDF

Hard-science scholars are strange people, who insist on using TeX because it “typesets beautifully”, but then forget to check badness warnings, letting the lines spill beyond the right margin. I have resisted TeX as much as I could, until I finally caved to peer pressure. Still, I only use it for cooperative work : when all by myself, I want something, let’s say frankly, less Jurassic.

Still, I am forced to envy my frozen-in-1975 colleagues, when I find out that saving to PDF, an operation that the industry should had gotten right by now, turns my 1 MB Microsoft Word file into an 80 MB PDF-zilla.

I’ve spent a good part of my morning solving that problem, considering both the official solution, and more independente initiatives. The official solution flunked when I’ve found out that Adobe had no trial of Acrobat for Mac (am I really willing to spent US$ 500 just to find out whether or not it’ll do what I want?). I tried a PDF compression solution, PDF Shrink, which reduced my PDF… from 89 MB to 87 MB, while mangling horribly all the images: not exactly worth the US$ 35. I’ve also tried recreating the PDF from scratch, but PDF Studio, at US$ 125, just refused to open the Word file with a cryptic ‘error reading’ message. I was glad both were in trial.

In despair, I continued searching the Web. Lots of users crying “Large PDF !”, “Word PDF too big !”, “Huge PDFs on Mac !”, but very few answers. Industry, why u no listen ?

They say that we should never attribute to malice that which can be explained by incompetence. But Hanlon’s razor notwithstanding,  I couldn’t avoid drifting into conspiracy theories. What if that horrible implementation of PDF conversion was not completely accidental ?

Conspiracy theories are unfalsifiable, of course, but I’ll tell you what finally solved the problem and you’ll tell me if it doesn’t make you itsy bitsy suspicious :

  1. On Word, instead of saving to PDF, save to PostScript (using File… Print…, and then, on the print dialog, the PDF button on the lower left corner. The Save to PostsScript is one of the options);
  2. Open the PostScript file (double-click its icon) and let Preview make the automatic conversion;
  3. On Preview, save the file as PDF (using the menu File… Export… — or on older OS X versions File… Save as…)

And that’s all. Now lets check the sizes :

  • Original PDF file (using Save as PDF or Print to PDF from Word) : 89 MB
  • PostScript file (Using Print to PostScript) : 94 MB
  • Final PDF file (using the steps above) : 5 MB

That is, using only tools already present in OS X, and three small steps, I’ve got an almost 18x smaller file. Risking joining the ranks of the ‘moon hoax’ lunatics, I smell something rotten in the current state of PDF conversion implemented by Word–OS X.

* * *

Incidentally, I’ve found something I also needed : how to password-protect PDFs. I was ready to buy a solution, but I’ve found that unnecessary.

When creating a new one, on OS X, you can click on the “Security Options” menu of the “Export as” (“Save as” in older versions) dialog — how come I’ve never remarked that one ?.

If the PDF exists already, you can open it with Preview, go to File… Export… (File… Save as… in older OS X versions), and check the box “Encrypt”. Two textboxes below let you put a password. Save the file and it will be only visible after the password is entered.

EDIT 28/02/12 : I am finding out that the above method is by no means foolproof, i.e., it doesn’t work for every kind of PDF. In particular, I tested it for PDFs generated by PowerPoint, and it backfired (PostScript conversion generated a file much bigger). For image loaded PDFs from PowerPoint, contrarily to the mainly textual ones from Word, I’m finding that the usual tip of using the Quartz Filter (open the PDF file with Preview, then File… Export… [File… Save As… in older OS X versions], then select “Reduce File Size” on the Quartz Filter field in the dialog) works quite well.

EDIT 23/07/13 : I’ve never dreamed this blog entry was to become my most popular one. (Apple and Microsoft, aren’t you listening ?) I’ve edited the procedure above to reflect the change in the Save As… logic introduced in OS X Lion, when it became Export…

58 thoughts on “Mac OS X, Word and the quest for the unbloated PDF

    • Yes, it seems that the results are highly dependent on each individual document: back to square zero. What is worse : even Acrobat Pro does not seem to *consistently* reduce pdf files (the results seem to depend on the application used to create them, on whether they use a lot of images or not, and on whether they employ only common fonts or more “exotic” ones).

  1. Worked perfectly for my situation. A 5 MB .docx is now a 5 MB .pdf, instead of a 140 MB .pdf! Simple workaround. I’m glad I wasn’t dissuaded by the first 15 pages I read about this problem that asserted “that’s just the way PDFs are.” Thank you!

  2. I find that document need to be in docx format and conversation after that is normal. I had a problem whan the document was saved in doc format then conversation in pdf was to large.

  3. Thanks. I have been trying to convert a 3 MB ppt to pdf using Acrobat 9.5.2 on a Mac and ended up with a pdf 12 MB. This ppt had some images and hence the quality was important to me. I tried the same in a windows with acrobat and it gave me a 2.7 MB file. I tried the same file in windows with ( PDF Xchange Pro – an alternative to Acrobat but windows only software) and got a 1 MB file with a good resolution. After trying a lot of different ways and this is what i came up with as a workaround solution in Mac to create PDFs.

    1) For most of the files , the method suggested by Eduardo works great. However for ppts, i found that the step 2 works.

    2) Save the ppt file as a post script ( Save as Post Script ) file. Then open the ps file with acrobat distiller and this worked fine for me. I got a 2.7 MB file with good resolution.

    3) Also as an add-on for step 1, i found a bunch of quartz filters ( 110 dpi, 75 dpi, 150 dpi ) from the apple forum which can be imported as additional quartz filters to be used in preview. So when you click save as option in preview, and select quartz filter drop down, you would see a few other filter options that you imported. Here is the link to the quartz filters.
    https://github.com/joshcarr/Apple-Quartz-Filters
    Credit goes to Jerome.

    Thanks

  4. Hi!
    Thank you so much! It definitely worked for me but with small adjustments!
    I had to save my file as .docx first. Then I used the function File – Reduce file size.
    And afterwards I did what you suggested… (save to PostScript, open the PosScript and save as pdf).

    It got me from 295 MB to a 4,8 MB pdf file. THANK YOU SO MUCH! It was driving me crazy….

  5. Hi there, I am trying to convert a 15Mb word document (docX format) to pdf. Its a newsletter with loads of photos that I have compressed, needs to be PDF and smaller so that I can distribute it. I followed all of the steps but when I get to the preview stage after having saved as postscript there is no save as option.

    Help I’m going crazy and publication date is due!

    • Sorry: on OS X Lion and Mountain Lion (and probably future versions), the File… Save As… step has changed to the almost equivalent File… Export… command. The other steps remain the same.

      Good luck !

  6. What worked for me was Saving the .docx file to .doc (Word 97-2004) then printing to .pdf.
    I was able to create a 459 kb file for a 72 page Word document instead of the 49 mb file that was printed to .pdf from the .docx file. As I must file the document online with a Federal court where the file size limitation is 5 mb the problem was much more than a matter of convenience for me.

    Thank you for creating this thread as it led me to a solution. None of the other discussions on the internet led me there.

  7. Thank you sir! The postscript step was just what I needed. I have Acrobat Distiller so I used that to convert to PDF. A 300+ MB PDF was reduced to 1MB via a 600MB PS file (!)

  8. I save as PDF from Word, then open in viewer and export with the “Reduce file size” Quartz filter. This has not failed me once and reduces by orders of magnitude – from megabytes to kilobytes. (In viewer, do not use “export as PDF”, but use “export”, then select PDF as the file format – this is the only way in which you can select a quartz filter).
    Still: it is a bloody shame that Microsoft has not fixed this issue yet as the problem also existed in previous word versions. I keep hoping a one step solution will arrive soon.

  9. Hello. Thank you for posting this. I was able to reduce my print quality file from 50.4 MB to a web quality file of only 4 MB by 1)saving in the .doc format then 2)opening in Open Office and then 3)exporting to PDF. I haven’t proofed the whole thing yet but so far the quality seems acceptable. Thanks again for providing this information. I was at my wits end.

  10. Hello again,
    Unfortunately, my celebration was premature. There were overprints and all sorts of registration errors in the file. Some of these were resolved when I changed the font to Times New Roman but it still was not a good representation of the source document. After trying a bunch of different schemes, I tried selecting print to adobe pdf in the services menu on the print dialog in Skim (a pdf reader). Another dialog popped up asking what type of adobe pdf I wanted to create. I was able to select smallest file size (which didn’t look that great so I went with standard) and got a regular-sized pdf. I saw the automator icon in the dock while doing this, so I opened automator to see if I could find the actual program. It is listed under actions –>pdf–>save as adobe pdf. A get info query shows that it is version 9.3 ©2008-2009 Abobe Systems and the file name is Save As Adobe PDF.action. I don’t know if this is bundled software or if I downloaded it. I was able to do the same from within Word and generate a normal sized, normal looking pdf. I am using Word 2008 and running Snow Leopard on a 2011 laptop so this might not work for those running later software and hardware but hopefully it will help someone🙂. Thanks again.

  11. Word for Mac often saves a TON of ‘kerning’ data for fonts. If its a long text document, you might have 300kb of content and 50MB of kerning info. Before saving the word file to PDF, highlight all the text, dig deep into the “font” menu, and uncheck the “kerning” box. I find that often reduces PDF file size by a substantial margin.
    Your ‘save to postscript’ step might just be knocking out the kerning info (just a guess).

  12. I found it still made a large file, but I have discovered a new way…

    have word file open
    file – print
    “PDF” button at bottom left
    drop down option
    “open in PDF preview”
    it opens….
    in preview
    File – “EXPORT” (not “export PDF”)
    select format as PDF
    then select quarts filter to
    “reduce file size”
    makes you about file 10 times smaller than above method…..

    • Yes — for some pdfs the “reduce” quartz filter seems to work best, yet for others the postscript technique is much more effective — I’m yet to find the exact reason, but rule of the thumb seems to be that picture-heavy pdfs reduce better with the “reduce file size” quartz filter, and typography-heavy pdfs reduce better with postscript.

  13. I’ve been struggling with the same issue for years, but mostly with smaller documents where the extra MB or 2 it was producing wasn’t a big deal, until I recently prepared a 150+ page handout for a seminar that used commercial fonts for branding purposes, so I had to send the PDF to our company’s small print shop (we’re by no means a design company), and the PDF was too large they could not get the whole thing to print. The postscript suggestion did not work from your post in this case.

    One thing I noticed first of all is that it is somehow font related — but it isn’t as simple as you think. I tried saving my first document in one of the two commercial fonts we use, which give the 100+ MB PDF. I then tried Segoe UI, which is one of Microsoft’s fonts (used for the interfaces in Windows Vista and later), since it looked similar enough at the time for the draft version… PDF was about 4MB… then, just for testing, I tried it with Calibri (Microsoft’s default font for Word docs on Windows)… back to a 100MB PDF… then I tried another commercial font we purchased… 6 MB.

    After investigating the PDFs of the two larger files, I found that it wasn’t saving the text the PDFs as bitmaps… but it was embedding the fonts in the documents literally hundreds of times, instead of each one (regular, italic, bold, and bold italic) each once. I didn’t have enough time, as this happened recently, but I’m wondering if the number of times a font appears embedded in a PDF has some correlation to how many times it appears in the file — almost as if Word is mistakenly treating each instance where that font is used as a new font.

    Anyway, I tried the suggestion a commenter in this post said to save as a DOC, which worked and gave me an expected-sized PDF… interestingly, I resaved it as a DOCX, and it worked fine too. So it appears that goes on is the DOCX does something strange with formatting that the DOC strips, and then it starts as “new” when resaved as a DOCX file.

    I checked to see what happened if I simply took the DOCX, did a Save As to another DOCX (over the original or as a new), and neither worked. Doing so kept the PDFs at the same obnoxious size. But the DOCX->DOC->DOCX method has worked with every document I had this issue with in the past. I’ve noticed that the documents that were giving me the issue were all files that I have been editing for years. For example, that handout I mentioned was previously a different handout somebody else gave me that I started with and cleared the existing 60 pages because it was easier to keep all of the formatting. And most of the other files are files for exams, syllabi, etc. for courses I have been teaching at a local community college for several years that I edit at the start of every term to update as needed.

  14. I’ve been struggling with the same issue for years, but mostly with smaller documents where the extra MB or 2 it was producing wasn’t a big deal, until I recently prepared a 150+ page handout for a seminar that used commercial fonts for branding purposes, so I had to send the PDF to our company’s small print shop (we’re by no means a design company), and the PDF was too large they could not get the whole thing to print. The postscript suggestion did not work from your post in this case.

    One thing I noticed first of all is that it is somehow font related — but it isn’t as simple as you think. I tried saving my first document in one of the two commercial fonts we use, which give the 100+ MB PDF. I then tried Segoe UI, which is one of Microsoft’s fonts (used for the interfaces in Windows Vista and later), since it looked similar enough at the time for the draft version… PDF was about 4MB… then, just for testing, I tried it with Calibri (Microsoft’s default font for Word docs on Windows)… back to a 100MB PDF… then I tried another commercial font we purchased… 6 MB. So it’s not necessarily an issue of built-in Microsoft font vs. built-in Apple font vs. other font.

    After investigating the PDFs of the two larger files, I found that it wasn’t saving the text the PDFs as bitmaps… but it was embedding the fonts in the documents literally hundreds of times, instead of each one (regular, italic, bold, and bold italic) each once. I didn’t have enough time, as this happened recently, but I’m wondering if the number of times a font appears embedded in a PDF has some correlation to how many times it appears in the file — almost as if Word is mistakenly treating each instance where that font is used as a new font.

    Anyway, I tried the suggestion a commenter in this post said to save as a DOC, which worked and gave me an expected-sized PDF… interestingly, I resaved it as a DOCX, and it worked fine too. So it appears that goes on is the DOCX does something strange with formatting — presumably fonts — that the DOC strips, and then it starts as “new” when resaved as a DOCX file.

    I checked to see what happened if I simply took the DOCX, did a Save As to another DOCX (over the original or as a new), and neither worked. Doing so kept the PDFs at the same obnoxious size. But the DOCX->DOC->DOCX method has worked with every document I had this issue with in the past. I’ve noticed that the documents that were giving me the issue were all files that I have been editing for years. For example, that handout I mentioned was previously a different handout somebody else gave me that I started with and cleared the existing 60 pages because it was easier to keep all of the formatting. And most of the other files are files for exams, syllabi, etc. for courses I have been teaching at a local community college for several years that I edit at the start of every term to update as needed.

    • Interesting ! One thing that changed in Microsoft Word is the treatment of styles : Microsoft has introduced dynamic styles to reflect changes in the document, thing like “Normal + Centered” or “Title 2 + Bold”, etc. I have never understood the logic behind this and always found this a classical case of “the previous version was an improvement over the new version”. I wonder if this explosion of styles has something to do with the explosion of fonts ? I mean : is Word embedding the fonts once per registered style ?

      • That could very well be it. The handout, ironically, was a seminar for how to use Microsoft Office for Windows, and I had a lot of bold and italicized text to describe which buttons, icons, etc. to click to distinguish them from regular text.

  15. I was struggling with this too but the fix where I saved as Postscript wasn’t working, I was getting an empty file. I did find another fix somewhere, forget where now, but it enables you to save straight from word.
    Go to Word->Preferences->Compatibility and tick the check on for “Disable Opentype Font Formatting Features”
    Works great for me.

  16. Thank you ever so much. Your solution still works in 2015, and maybe it will work in the next century. I found out the method of disabling opentype font first, and although it reduced the file size drastically I wasn’t satisfied, because it did away with all ligatures and old style numbers. But your solution was perfect for me, and it actually produced smaller file size than what was achieved in disabling opentype, and, far more importantly, it preserved everything. Thanks very much indeed.

      • I finally found this site and the recommendation has worked well for me.
        But I am here after upgrading to Office 2016, which in addition to crashing frequently also provided the feature of causing the size of my saved PDFs to balloon. Using the .ps to .pdf trick is working well for that at least.
        I was thinking Office 2016 was Microsoft’s way of forcing me to use Office for Windows just to create PDFs. Conspiracies indeed!

  17. You are my hero! As others have mentioned, Word 2011 embeds a ton of font data when kerning is enabled.
    Thanks to your solution I can have both decent typography and small file sizes!

  18. The issue is clearly OpenType related, at least for me, because I tend to use serif fonts and change the number style, and that’s when the problem pops up. The reason why saving as a doc instead of docs works is doc doesn’t have OpenType features. So the doc doesn’t help because I just don’t like the old-style numbers of serif fonts.

  19. worked for me – but have to select “reduce file size” in quartz filter pull down below pdf selection in Preview Save As. my 5M word file ballooned to 73M post script down to 500kb. Understandably, the images aren’t as crisp as the 73M version, but they’re good enough! thanks for the solution

  20. Hi! I’m French and I was going to be crazy about that problem! I couldn’t find any thing about that problem on french sites then I tried a research in english and there you are my savior!!!! Thank you so much!!!! I was unable to upload my resume on job seeking sites because it was toooooooo big ! You saved me! really! Thank you thank you thank you again!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s