Preparing the HTML file, part 1

Part 12: add paragraph and heading tags to your ebook, and replace punctuation and symbols with HTML entity codes.
Notepad++ is a lightweight but robust text editor which is perfect for generating HTML files which are suitable for converting into Kindle ebooks.

Preparing your ebook file

1. Open Notepad++, select your entire manuscript (CTRL A), then copy and paste it into a new Notepad++ document.
2. Save this as bookname.html and keep the filename short – you’ll save yourself some typing later, when you generate the ebook.
3. You’ll notice that long paragraphs will trail off the right of the screen. You can keep all the text on one screen width (while maintaining one paragraph per line – see the grey line numbers on the left of the window) by selecting view → word wrap.
4. Some characters need to be replaced with HTML entity codes. An entity code represents a particular character, symbol, or item of punctuation, and this code is used because (1) the original character may be misinterpreted as a HTML command by the Kindle, and (2) some non-standard punctuation won’t be displayed at all. The Kindle will render these unrecognised characters with squares or question marks, killing the professional finish of your ebook. The good news is that entity codes can be easily added to your ebook, either ‘manually’, or by an ‘automatic’ cut and paste via an encoding website.

Manually replacing characters with entity codes

By using find and replace → replace all, you can change these potentially troublesome characters into entities in a matter of minutes. Of course, this makes it difficult to do further proofreading of your manuscript, but you can easily solve this by opening a blank tab in Firefox or Chrome and dragging your HTML file into it, which will display your entity code as the characters they represent.

List of common characters and their entity codes

Replace “ with “
Replace ‘ with ‘
Replace ” with ”
Replace ’ with ’
Replace — with —
Replace – with –
Replace … with …
Replace © with ©
Replace & with &
Replace * with *
A full list of entity codes, grouped by function, is available here. If you are using mathematical symbols and accented characters from foreign languages, I would definitely recommend that you replace these with entity codes. Before I replaced the en dashes in my manuscript with their entity code, they were refusing to render on my Kindle 4.

Automatically replacing characters with entity codes

If you are going to use an automatic method to replace characters with entity codes, you must do this before you add any HTML <em>italic</em> and <strong>bold</strong> tags to the manuscript file (or else, those ‘greater than’ or ‘less than’ signs in your HTML tags will be converted alongside the characters you actually want to be converted). Select your entire HTML document, cut it, and paste it into the input field of this website. Then select the output text, copy it and paste it back into your HTML file.
Notepad++ also has an entity encoder plugin called HTMLTag available at Plugin Central. Navigate to the HTMLTag download link (you can find which version of Notepad++ you are running (Unicode or ASCII) by pressing F1 within the Notepad++ window), follow the simple installation procedure outlined in the HTMLTag-readme.txt, located in the ZIP file’s Doc folder, and restart Notepad++. From there, highlight the text which you wish to encode (i.e. all of it) and press CTRL E to encode the characters within.
5. Now we’ll add paragraph HTML tags, which tell browsers and ebook readers when a paragraph starts and ends.
a) Place paragraph opening tags at the start of each paragraph: CTRL F → replace → search mode = regular expression → find what = ^ → replace with = <p> → replace all.
b) Place paragraph closing tags at the end of each paragraph: CTRL F → replace → search mode = regular expression → find what = $ → replace with = </p> → replace all.
6. Your manuscript now has paragraph tags on all lines. You will find that certain lines which were empty (your section break spacers), now contain the tags <p></p>.
7. Replace these empty paragraphs with properly formatted breaks: CTRL F → replace → search mode = normal → find what = <p></p> → replace with = <br /> → replace all.
8. Now, you need to hand code each of your chapter headings. This is as simple as replacing the ‘p’ in the opening and closing tags to a heading value (h1 – h6, with the higher numbers representing proportionally smaller headings). The image below will give you a better idea of heading sizes as they are displayed by a Kindle (click the image to view it in full size). I prefer to use h3 for my chapter headings. Remember to close your <h3>heading tags</h3>.

Kindle heading examples
Heading examples on the Kindle Paperwhite, via Kindle Previewer

9. There is a page break command specific to the Amazon Kindle, which is <mbp:pagebreak />. You can manually code one of these between each chapter of your novel (i.e. between the last paragraph of the first chapter and the heading for the second chapter). When the Kindle reaches the end of the first chapter, it will then start the second on a fresh page, even if the last page of the first chapter was only a sentence long.
10. Note that you can enter blank lines, with no text in them at all, to help prevent your code from becoming cluttered and difficult to edit. I like to insert them after each heading, section break, chapter and pagebreak command. In the next article, you will be able to download an ebook template where this layout is employed as an example.

Coding errors

By now, you’ll notice that Notepad++ text colour changes to indicate different functions performed by that particular piece of HTML. Unfortunately, it won’t actually show you if you’ve left out a quotemark or forgotten to close a tag. However, when you compile via KindleGen, these output errors will be displayed on a line by line basis, allowing you to go back and fix them.

A word on quotation marks

‘Smart’ quotation marks (or “smart quotes”) are not just anything that tumbled out of Winston Churchill’s mouth; they are also quotation marks which curve to show whether they are opening or closing dialogue. Compare them to the colloquially termed "dumb quotes", which are straight up and down. I always have them enabled in Writer (tools → autocorrect options → localized options → check both ‘replace’ boxes), because I prefer the look of them to dumb quotes.
If you have used smart quotes and wish to change them to dumb, then you can do so via a simple find and replace. Changing global dumb quotes to smart is a bit trickier, because of the difference between opening and closing quotes. There must be a workaround to replace dumb quotes with their smart cousins via regular expressions, but unfortunately this is beyond the scope of this article.

Kindle custom HTML

Note that certain Kindle-specific commands, such as the <mbp:pagebreak /> will not display in Firefox or Chrome browser windows as they are not considered regular HTML. I can see you shaking your head, and all I can say is welcome to coding. In case you were wondering about the Kindle custom HTML which is not supported by Firefox and Chrome (I know I stay up at night worrying about strange things, too), a list is available here.

In the next article, we’ll use an ebook template to drop head, chapter, front matter and back matter around your formatted HTML.

Proceed to Part 13: ‘Preparing the HTML file, part 2’, or return to the article index.
Return to Re: writing
While I’ve endeavoured to provide you with accurate information, what is considered ‘accurate’ will change over time. If I’m wrong, or you’d like to ask a question or share your thoughts, I’d love to hear your take on things.

Rhys About Rhys

Teacher, writer, editor, cook: a bit like that nursery rhyme, really.
Facebook / Google+ / Twitter

Speak Your Mind