Saturday, December 8, 2012

Create PDF invoices with HTML5 and PhantomJS

Creating invoices in PDF is always a bit tricky: there are many libraries to create PDF documents with PHP, but most can't handle complex layouts and require a lot of memory and CPU time. Things like Unicode characters and line/page breaks are often difficult to program and the source of many bugs and memory problems. Using logos as vector graphics or embedding TrueType fonts is often required, but not possible with most libraries.

The good thing is that all these features are included in HTML5. Since most companies offer their invoices also in HTML, it would be great to convert them directly to PDF. Tools like html2ps and ps2pdf can produce PDFs with 200+ pages without problems, but only support HTML4 and some very limited CSS. Also, the layout is never the same as in a web browser.

So we need a real web browser to convert HTML pages to PDFs. From testing web applications, we know that PhantomJS is a headless WebKit browser that runs on the commandline and can automate things very easily. The great thing is that it can also create screenshots and PDFs with selectable texts and embedded fonts! So we can use all kinds of CSS (Truetype fonts, round borders, etc.), SVG images, DIVs and tables to create great looking invoices.

Here is a PhantomJS script that converts an HTML file into a PDF:

// html2pdf.js
var page = new WebPage();
var system = require("system");
// change the paper size to letter, add some borders
// add a footer callback showing page numbers
page.paperSize = {
  format: "Letter",
  orientation: "portrait",
  margin: {left:"2.5cm", right:"2.5cm", top:"1cm", bottom:"1cm"},
  footer: {
    height: "0.9cm",
    contents: phantom.callback(function(pageNum, numPages) {
      return "<div style='text-align:center;'><small>" + pageNum +
        " / " + numPages + "</small></div>";
    })
  }
};
page.zoomFactor = 1.5;
// assume the file is local, so we don't handle status errors
page.open(system.args[1], function (status) {
  // export to target (can be PNG, JPG or PDF!)
  page.render(system.args[2]);
  phantom.exit();
});

Let's start the conversion:

time phantomjs html2pdf.js invoice.html invoice.pdf
real    0m0.189s
user    0m0.096s
sys     0m0.020s

time html2ps invoice.html | ps2pdf - invoice.pdf
...
real    0m3.521s
user    0m0.656s
sys     0m0.208s
(phantomjs 1.6.1, html2ps 1.0b7, 3.4 GHz single core, QEMU)

Here is the example invoice.html containing a SVG logo, tables, a TrueType font and CSS styles:

And it gets a perfect PDF with 2 pages in less than 1 second:

The size of the PDF is 65 KB. Using LibreOffice for the conversion gives 130 KB, html2ps produces 8 KB without the logo and the fonts.

The source files and the target PDF can be downloaded here.

Note: manual page breaks can be added with:
<div style="page-break-before:always;"></div>
or
<div style="page-break-after:always;"></div>

Note: Support for the WebOpenFontFormat (woff) will be available in future releases, so currently only TrueType fonts are supported.

8 comments:

  1. thanks for the article, it was incredibly helpful, and phantomjs rocks when it comes to making pdf files.

    ReplyDelete
  2. Nice article, it really helped to create a nice pdf invoice.
    www.peoplesinnovation.com

    ReplyDelete
  3. How are you getting vector output here? Is it toggled only the file extension? When I specify *.PDF as the file name, I get a PDF with bitmap content, not a vector in sight. Any ideas?

    ReplyDelete
  4. https://github.com/ariya/phantomjs/issues/10373

    Looks like an OS-X issue...

    ReplyDelete
  5. Thanks for the nice blog. It was very useful for me. Keep sharing such ideas in the future as well. This was actually what I was looking for, and I am glad to came here! Thanks for sharing the such information with us.
    html5

    ReplyDelete
  6. Hi
    thanks for posting this.
    I currently have a php page with a button that when click will convert html2pdf using www.html2pdf.fr
    How can I accomplish the same "Click to download PDF" using phantom js and your script?
    I have installed and configured phantomjs on my linux server already.
    Thanks again.

    ReplyDelete
  7. Hi. Are you able to use html2pdf with images embedded in the .html file? I'm having trouble getting them to display properly

    ReplyDelete
    Replies
    1. I did not test embedded images, but local images should work fine.

      Delete

Labels

performance (23) benchmark (6) MySQL (5) architecture (5) coding style (5) memory usage (5) HHVM (4) C++ (3) Java (3) Javascript (3) MVC (3) SQL (3) abstraction layer (3) framework (3) maintenance (3) Go (2) Golang (2) HTML5 (2) ORM (2) PDF (2) Slim (2) Symfony (2) Zend Framework (2) Zephir (2) firewall (2) log files (2) loops (2) quality (2) real-time (2) scrum (2) streaming (2) AOP (1) Apache (1) Arrays (1) C (1) DDoS (1) Deployment (1) DoS (1) Dropbox (1) HTML to PDF (1) HipHop (1) OCR (1) OOP (1) Objects (1) PDO (1) PHP extension (1) PhantomJS (1) SPL (1) SQLite (1) Server-Sent Events (1) Silex (1) Smarty (1) SplFixedArray (1) Unicode (1) V8 (1) analytics (1) annotations (1) apc (1) archiving (1) autoloading (1) awk (1) caching (1) code quality (1) column store (1) common mistakes (1) configuration (1) controller (1) decisions (1) design patterns (1) disk space (1) dynamic routing (1) file cache (1) garbage collector (1) good developer (1) html2pdf (1) internationalization (1) invoice (1) just-in-time compiler (1) kiss (1) knockd (1) legacy code (1) legacy systems (1) logtop (1) memcache (1) memcached (1) micro framework (1) ncat (1) node.js (1) openssh (1) pfff (1) php7 (1) phpng (1) procedure models (1) ramdisk (1) recursion (1) refactoring (1) references (1) regular expressions (1) search (1) security (1) sgrep (1) shm (1) sorting (1) spatch (1) ssh (1) strange behavior (1) swig (1) template engine (1) threads (1) translation (1) ubuntu (1) ufw (1) web server (1) whois (1)