html, security 

Let’s talk about security. Sometimes you will need to generate PDF from highly classified documents and send them securely to the HTML PDF API service.

There are a few techniques you can use depending on the level of security you want to achieve.

This article covers general concepts of security when you need to send data to a remote API and it is not only related to the HTML PDF API service.

Secret URL

Security level none/low

You have seen this type of URL many times in online applications. This is a very primitive type of security and should be used only if you want to hide content but it is still publicly available.

Example of URL: http://example.com/e58526b98fe3df274fc0e6fa4247d692

For that purpose you can use the following digest algorithms:

  • MD5 (Message Digest Algorithm)
  • SHA2 (Secure Hash Algorithm)

Let’s say you want to convert a user CV to PDF.

Pseudo code:

user.id = 43
user.username = 'luke_skywalker'
md5(user.id + user.username) //md5('43luke_skywalker')
"4036a9adcc389d4244c983981f68d956" //output of md5

Final URL could look something like this:
http://example.com/users/cv/4036a9adcc389d4244c983981f68d956

Pros:
Simple and fast

Cons:
Not secured, just hidden.

Usage:
You just want to hide URL from public domain and exposing the document does not really matter.

URL with Authentication

Security level medium

If you have some kind of REST API from where you need to authenticate users to fetch private data you could use this technique. HTTPS protocol on your side is much prefered in this case.

Types:

  • Token based URL
  • Basic authentication

Token based URL

Example of Token based URL:
Pattern: /api/:token/users/:user_id/cv

URL: https://example.com/api/f6dfcda3fd5d4414b5155e9f297e97a0/users/22/cv

Basic authentication

Example of URL with basic authentication:
URL: https://username:password@example.com/users/22/cv

.... or you could create the following request to the HTML PDF API service while generating PDF:

curl -H 'Authentication: Token <your token>' \
-d 'url=http://htmlpdfapi.com/examples/example.html' \
-d 'username=luke' \
-d 'pasword=secret' \
'https://htmlpdfapi.com/api/v1/pdf' > result.pdf

Hey, how is this different from a secret URL? Well if a secret URL pattern is broken any URL is available to the attacker. While if you accidentally expose your token or basic auth you can always generate a new one.

Pros:
You can secure your HTML content and assets.

Cons:
Assets must be embedded to achieve a higher level of security resulting in a bigger HTML file.

Usage:
When the text content of a generated PDF needs to be confidential. Assets are part of the design, they do not have any value or they are of reasonable size.

Upload HTML or post it

Security level high, very high.

You need a higher level of security and want to generate HTML on the fly just for conversion to PDF and discard it immediately. While using this technique you don’t have to worry about some URL that could get exposed.

Let’s describe it with a pseudo code:

//fetch user from database
user = User.find(43)

//render template to variable
user_top_secret_cv_html = render_template_as_string('cv', user)

response = SomeRestClient('https://htmlpdfapi.com/api/v1/pdf')
.headers({'Authentication': 'Token <your token>'})
.parameters({
  html: user_top_secret_cv_html
})

pdf = response.body

As you can see the generated HTML and therefore the PDF document are only available in memory in that process. You can send PDF to the user email, send file to the user browser or save it to some private directory on your server.

Pros:
There is no URL. HTML content is only available in memory in your code. Assets which are part of design can be hosted on HTML PDF API service. Faster conversion due to less downloading/uploading to the HTML PDF API service.

Cons:
Content images like a user picture which is hosted on your server still needs to be public or embedded.

Usage:
Similar to “URL with Authentication”

Compressed file

Security level very high, ultra high

You need the maximum level of security as well as be in control of every bit of data.

First the directory structure:

- top_secret_report/
  - index.html
  - images/
    - subject/
      - luke_skywalker.jpg
    - enemies/
      - darth_vader.jpg
    - associates:
      - princess_leia.jpg
      - chewbacca.jpg
      - han_solo.jpg
    - fonts/
      - confidential.ttf

… and now the pseudo code:

zip_file = Zip.create_from_directory('./top_secret_report')

response = SomeRestClient('https://htmlpdfapi.com/api/v1/pdf')
.headers({'Authentication': 'Token <your token>'})
.parameters({
  file: zip_file
})

pdf = response.body

Pros:
All of the data is secured. It can be combined with assets hosted on HTML PDF API service.

Cons:
Bigger file size.

Usage:
You need to securely send a lot of content images while converting PDF and have both HTML and assets secured.

Conclusion:

In the end it all depends on how sensitive you data is so here is a table that can help you decide.

Technique Type Security HTML Links Data-URI Hosted In file
Secret URL URL none, low P P P - -
URL with Authentication URL medium, high S P S - -
HTML in post body HTML high, very high S P S S -
Compressed file FILE very high, ultra high S P S S S

Legend:

  • Type - Conversion type
  • Security - Level of security
  • Assets:
    • Links - Assets are normally linked in html document
    • Data-URI - Assets are embedded using data-uri
    • Hosted - Assets are hosted on the HTML PDF API service
    • In File - Assets are sent in a compressed file once per conversion and deleted after the conversion. You are in full control of your assets.
  • S Secured
  • P Public
  • - Not applicable

PRO TIP:

Whatever level of security you are using always use HTTPS URL when sending data to the HTML PDF API service. https://htmlpdfapi.com/api/v1/

Using the HTTPS protocol on your side can also lift the level of security.