August 21, 2011

EPUB and KindleGen Tutorial - eBook Formatting


Thank you for visiting this eBook design tutorial. We now have an eBook design startup—BB eBooks—dedicated to helping independent authors and small presses get their eBooks formatted, converted, and ready for sale at all the major online retailers (e.g. Amazon's Kindle Store, Barnes & Noble's Nook, iBookstore, Smashwords, etc.) Please contact us for a no-obligation quote. For those writers, editors, and publishers looking to go the DIY route for eBook production (you probably are if you visited this page), we offer free online tutorials and apps to help you professionally design your eBook. Please visit our Developers page and let’s work together to improve the overall standards of eBooks. Also, please sign up for the mailing list for promotions, design & marketing tips, plus eBook industry news.



Looking for a complete guide on eBook design and development? Please consider The eBook Design and Development Guide, which contains everything you need to know about HTML, CSS, EPUB, and MOBI/KF8 to make an eBook like a pro. Pick it up at Amazon for $6.99 today.

**Warning**

This tutorial is out-of-date due to changing requirements from Amazon. Please view a more up-to-date and better EPUB and KindleGen tutorial at the BB eBooks website.

***

The full tutorials for the eBook formatting series include a basic XHTML tutorial, a tutorial for converting your manuscript into XHTML, and a Calibre tutorial for converting XHTML into eBooks. For those looking for something more advanced, you can also peruse the Regular Expressions tutorial, as well as the EPUB and KindleGen tutorial. Templates for XHTML and EPUB are also available for your formatting arsenal. Additionally, there are some helpful hints for formatting for Smashwords in this tutorial.

Table of Contents for EPUB and KindleGen Tutorial
Introduction to Advanced Formatting
Prerequisites for Manually Coding EPUB and MOBI
EPUB Structure
The mimetype and container.xml Files
The content.opf File
The toc.ncx File
Compressing the EPUB File
Creating Your Own EPUB Files and Prepping for MOBI Conversion
Adding Embedded Fonts
Running KindleGen to Create MOBI
Troubleshooting KindleGen
Video Tutorials

Introduction to Advanced Formatting
For many self-publishers, converting eBooks from your XHTML source code using Calibre will be suitable to provide a professional reading experience for your customers. However, if you are a purist, this section will discuss the actual layout of an EPUB file and how to convert EPUB files into MOBI with the Amazon.com KindleGen program. This will allow you to not rely on Calibre, and construct your eBooks entirely from the ground up. You can also utilize the knowledge in this section to tweak your EPUB that came out of Calibre to get it exactly the way you want it.

The EPUB format is based on the International Digital Publishing Forum's guidance for the EPUB 2.01 format. Instructions on the EPUB 2.01 standard are available here. It is not a very exciting read, and the instructions are geared towards software developers rather than self-publishers. This guide will provide you with a basic understanding of the EPUB format so that you can construct your own EPUBs by hand-coding them yourself. Sometime down the road, EPUB 3.0 will be released. This promises to have good support for embedding audio and video files, which is currently lacking in EPUB 2.01. Having a good understanding of EPUB 2.01 will certainly prepare you for the release of EPUB 3.0.

Likewise, the MOBI format is based on the standards outlined by the International Digital Publishing Forum for EPUB. However, you need to convert the EPUB format into MOBI format by using a command-line program from Amazon.com called KindleGen, which is available for free from Amazon.com.

This advanced section might be a little bit challenging, and it is not necessary for every self-publisher to understand. However, once you understand the principles of the EPUB format, you will have a high knowledge base for formatting eBooks in any format.

Prerequisites for Manually Coding EPUB and MOBI
In addition to the tools utilized in the previous sections, you need the following software for this chapter
If you are using Windows, ensure that you are not hiding extensions inside your directories (i.e. the file "mybook.epub" appears in your directory and not "mybook"). This will be essential in modifying the extensions. A tutorial of how to do this in Windows 7 is available from the How to Geeks website.

EPUB Structure
The EPUB file is actually a compressed ZIP file which contains all of the information needed for an eReader to read the EPUB eBook. Inside the EPUB compression are the following files:
Your XHTML content (required)
  • Your cover and content images (required)
  • A file called "toc.ncx" which is the NCX Table of Contents (required)
  • A file called "content.opf" which dictates exactly how the EPUB is structured and what exact files are in the EPUB package(required)
  • A file called "container.xml" which tells the eReader where the content.opf file is located (required)
  • A file called "mimetype" which says that the EPUB file is an EPUB and ZIP file (required)
  • One or more CSS files (optional)
The files are organized into the following directory structure:
Mimetype (must be in the root folder)
META-INF/container.xml (must be in this folder)
OEPBS/content.opf (can be in any folder, but OEPBS is recommended)
OEPBS/toc.ncx
OEPBS/title.html
OEPBS/content.html files
OEPBS/image files
OEPBS/CSS stylesheets
Display of EPUB File Structure
A sample EPUB standard developed by the author of this guide can be downloaded here for your convenience.

The mimetype and container.xml Files
The mimetype file has no extension, and the content must read exactly as follows:
application/epub+zip
The mimetype file must not have any spaces or line returns inside the text file. It must be exactly 20 bytes.

The container.xml file says where the content.opf file is located, and it looks as follows
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEPBS/content.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>
The only thing you ever need to adjust is the "full-path=" attribute to where the content.opf is located. It is not necessary to have an OEPBS directory, but it is recommended in the EPUB 2.01 standard.

The content.opf File
This file contains all of the metadata about the eBook (the Metadata section), a list of all the files used in the eBook (the Manifest section), linear instructions on how the eBook is ordered (the Spine section), and links to the cover, beginning, and HTML traditional Table of Contents (the Guide section).

The basic package format for the content.opf file is as follows:
<?xml version="1.0" ?>
<package version="2.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId">
  Metadata section
  Manifest section
  Spine section
  Guide section
</package>

Metadata section: This section allows you to enter metadata on everything from the name of the author to the recommended cost of the book. A full list of everything you can define in the Metadata section is available at the IDPF's OPF standard. It is important that you have a unique identifier for your eBook (a series of digits and/or letters). A universally unique identifier (or UUID) is a randomly generated series of numbers and letters where there is a one in bazillion chance that the same one will be generated twice. You can obtain one online here. You can also use an ISBN number, if you prefer to go that route.

The Amazon.com Kindle Store and Barnes & Noble NOOK have certain requirements for what is in the Metadata section, so this guide will follow those, since they are more stringent than the OPF standard. However, it is possible to add as much metadata as you like beyond what is required.
The required metadata is as follows:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<!--Required metadata-->
<dc:title>Your eBook's Title</dc:title>
<dc:language>en-us</dc:language>
<dc:identifier id="BookId" opf:scheme="uuid">urn:uuid:You Must Obtain</dc:identifier>
<dc:creator>Your Name Here</dc:creator>
<dc:publisher>Publisher here</dc:publisher>
<dc:date>YYYY-MM-DD</dc:date>
<meta name="cover" content="My_Cover_ID" />
</metadata>
Language meta: This guide uses the English-US language, but you can see the full list of abbreviations for languages here.

Identifier meta: For the identifier, use the 8-4-4-4-12 format. You can use an ISBN if you prefer, and you just need to change the opf:scheme attribute to opf:scheme="ISBN". You can obtain a unique identifier that is randomly-generated online here.

Meta creator: This is the author of the book.

Meta publisher: This is the name of your publisher or yourself if you are self-publishing.

Meta date: This is the publishing date (a valid example would be 2001-01-30).

Meta cover: Amazon.com requires this information. The "My_Cover_ID refers to the ID that you will define in the Manifest section.

An example of some optional, but useful metadata is as follows:
<dc:description>Book's Description</dc:description>
<dc:subject>Keyword 1</dc:subject>
<dc:subject>Keyword 2</dc:subject>
<dc:subject>Keyword 3</dc:subject>
<dc:contributor opf:role="edt">Editor's Name</dc:contributor>
Meta description: This is the back-jacket description. Do not put any HTML Entity codes or Special Characters in this meta description.

Meta subject: These are the keywords or tags that help indicate the themes of your book (e.g. thriller, fiction, crime, etc.) You can add as many as you like.

Meta contributor: There are a wide variety of contributors you can specify, even relatively inane ones like “sponsor”. The OPF standard has the full list.

The Manifest Section: This section is where you specify all the XHTML, images, CSS files, NCX Table of Contents and everything else that is utilized in your eBook.
The following is required:
<manifest>
<item href="cover.jpg" id="My_Cover_ID" media-type="image/jpeg" />
<item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
<item href="content1.html" id="html1" media-type="application/xhtml+xml" />
<item href="content2.html" id="html2" media-type="application/xhtml+xml" />
</manifest>
The id names are arbitrary, and this guide uses the above ones for convenience. You must not repeat any id declarations. The "My_Cover_ID" must be the same id name as what is in the meta cover specified in the metadata (this is per the Amazon.com KindleGen instructions).

Some other items in the manifest which are recommended, but not required, are as follows:
<item href="extra.jpg" id="extraimg" media-type="image/jpeg" />
<item href="stylesheet.css" id="css" media-type="text/css" />
<item href="coverpage.html" id="coverpg" media-type="application/xhtml+xml" />
<item href="titlepage.html" id="titlepg" media-type="application/xhtml+xml" />
<item href="toc.html" id="htmltoc" media-type="application/xhtml+xml" />
You must include everything in the Manifest section that you will use in the eBook. Do not duplicate item entries.

Media type declarations in the Manifest: Use the following media type declarations when constructing your Manifest section. Note that everything needs to be lowercase:
  • "application/xhtml+xml" - XHTML content files
  • "application/x-dtbncx+xml" - NCX Table of Contents
  • "text/css" - CSS files
  • "image/jpeg" - JPEG images
  • "image/gif" - GIF images
  • "image/png" - PNG Images
The Spine Section: This section specifies the exact linear order of the eBook. Each link in the spine will automatically insert a page break when eReaders go in between links.

A sample Spine section looks as follows:
<spine toc="ncx">
  <itemref idref="coverpg" />
  <itemref idref="titlepg" />
  <itemref idref="html1"/>
  <itemref idref="html2"/>
  <itemref idref="htmltoc"/>
</spine>
The opening toc="ncx" attribute for the <spine> tag is required, and defines the NCX Table of Contents for this eBook. Each idref=value refers to the item specified in the Manifest section.

In this example of a Spine section, the eBook will flow from coverpage.html -> titlepage.html -> content1.html -> content2.html -> toc.html. You can define this linear order however you want.

The Guide section: This is where you define the target locations for the extra buttons that are available on eReaders such as "Cover" or "Beginning".
<guide>
  <reference href="coverpage.html" type="cover" title="Cover" />
  <reference href="toc.html" type="toc" title="Table of Contents" />
  <reference href="content1.html" type="text" title="Beginning" />
</guide>
A full list of the types you can define in the guide is available at the OPF standard. However, the Amazon.com guidelines recommend limiting your guide section to Cover, Table of Contents, and Beginning to prevent confusion on the reader's behalf. In this example of the Guide section, clicking "Cover" in the ereader would go to coverpage.html, clicking "Table of Contents" would go to toc.html, and clicking "Beginning" would go to the first part of the story (which is content1.html). Also, for the Kindle, opening the MOBI file will automatically start at where “Beginning” is defined.

An example of a complete content.opf file is available in the EPUB standard sample.

The toc.ncx File
The toc.ncx file is how you construct the NCX Table of Contents. Calibre's powerful conversion tools automatically generate this based on XPath expressions; however, it may be advantageous for you to manually code your own. This can be time-consuming, but it will guarantee that it looks exactly the way you want.

The standard for the toc.ncx is available on the IPDF website. The toc.ncx file defines the text, target, and play order for the NCX Table of Contents.

A simple NCX is divided as follows into 3 sections:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="en">
  Metadata section
  Title and Author section
  Navigation Map section
</ncx>
Metadata section: This is where you define how many tiers are in the NCX Table of Contents. This guide recommends using just one tier, because the Barnes & Noble NOOK only supports one.
Sample code of the Metadata section is as follows:
<head>
  <meta content="urn:uuid:Exact Same as Content.OPF" name="dtb:uid" />
  <meta content="1" name="dtb:depth" />
  <meta content="0" name="dtb:totalPageCount" />
  <meta content="0" name="dtb:maxPageNumber" />
</head>
Meta dtb:uid: The value in the content attribute must be exactly the same as the dtb:uid identifier that was defined in the content.opf metadata.

Meta dtb:depth: This value must be a positive integer. For a value of "1", this means a single-tiered Table of Contents (recommended due to restriction on NOOK).

Other meta: Ensure that the page count and page number meta is set to 0.

Title and Author section: This will appear in the NCX section of most eReader devices. You simply add the title of your eBook and the name of the author. The code is as follows:
<docTitle>
  <text>Title</text>
</docTitle>
<docAuthor>
  <text>Author Name</text>
</docAuthor>
Navigation Map section: This is where you define each individual link, text as it appears to the reader, and play order for the NCX Table of contents. The play order provides navigation for when you click the Next/Previous Section button on a eReader.

Sample Navigation Map code looks as follows:
<navMap>
  <navPoint id="NCX_Cover" playOrder="1">
    <navLabel>
      <text>Cover</text>
    </navLabel>
    <content src="coverpage.html" />
  </navPoint>
  <navPoint id="NCX_Title" playOrder="2">
    <navLabel>
      <text>Title Page</text>
    </navLabel>
    <content src="titlepage.html" />
  </navPoint>
  <navPoint id="NCX_Chapter1" playOrder="3">
    <navLabel>
      <text>Chapter 1 - The Beginning</text>
    </navLabel>
    <content src="content1.html" />
  </navPoint>
  <navPoint id="NCX_Chapter2" playOrder="4">
    <navLabel>
      <text>Chapter 2 - The Climax</text>
    </navLabel>
    <content src="content1.html#c2" />
  </navPoint>
  <navPoint id="NCX_Chapter3" playOrder="5">
    <navLabel>
     <text>Chapter 3 - The Ending</text>
    </navLabel>
    <content src="content2.html" />
  </navPoint>
  <navPoint id="NCX_AuthorNotes" playOrder="6">
    <navLabel>
      <text>Author's Notes</text>
    </navLabel>
    <content src="content2.html#a1" />
  </navPoint>
  <navPoint id="NCX_HTMLTOC" playOrder="7">
    <navLabel>
      <text>Table of Contents</text>
    </navLabel>
    <content src="toc.html" />
  </navPoint>
</navMap>
Each individual <navpoint> wraps a unique link in the NCX. For the attribute "id=" in the <navpoint>, it can be defined as anything, as long as they are not duplicated. <text> defines how the reader will see the link. <content> points to where the target is for the link. You can establish anchors inside your content XHTML if you want it to go to the middle of the XHTML file.

A full example toc.ncx file can be viewed in the EPUB Standard.

Compressing the EPUB File
Unfortunately, simply compressing all these files in Windows will cause your EPUB file to fail validation. It is necessary to have the mimetype file first in the zip file, and also to have it "stored" (i.e. uncompressed). Trying to use 7-Zip in Windows is troublesome, so you will have to use a command line program. This EPUB compression issue has been discussed on MobileRead forums and there is a proper way to zip these files.

For those of you whippersnappers too young to remember MS-DOS, command line prompts were how operating systems on PCs worked before Windows 95 came around. You need to install the zip.exe file in a folder called "zip". Zip.exe can be downloaded here.

Let's travel back in time and perform the following steps in the command line:
  1. Access the command line by typing "cmd" in the search/run window on Windows
  2. Access your root directory with all your EPUB files by typing cd\rootdirectory
  3. Verify you are in your c:\rootdirectory>
  4. Type "c:\zip\zip mybook.epub -DX0 mimetype"
  5. Type "c:\zip\zip mybook.epub -rDX9 META-INF OEPBS"

Using zip.exe in the Command Prompt (Steps 1-3)
Step 4 adds the mimetype file first into the new EPUB file. The "-DX0" makes it uncompressed and without additional attributes. Step 5 adds both directories into the same EPUB file. The "-rDX9" fields ensure the directory structure is maintained in the EPUB file, but doesn't add extra attributes that may corrupt your file.

Important Note: These line commands are all case sensitive, and it is a zero in "-DX0".

Creating Your Own EPUB Files and Prepping for MOBI Conversion
Now that you understand what's under the hood of an EPUB file, you can either modify the Calibre conversion or create your own. Calibre adds a lot of unnecessary metadata into the EPUB files that you can delete. It also generates its own CSS stylesheet based on the XHTML code you put through the conversion. If you've been following along with this guide, you have all the adequate knowledge to construct your EPUB file. Please feel free to download the EPUB standard utilized by this tutorial, and adjust for your own purposes.

Keep a few of these notes in mind as you work with the raw EPUB format:

Body vs. @page margins: If you want a 5.0pt left and right margin around the text, you could define a style in your CSS for the body as follows:
body.epub_margins { margin-left: 5.0pt; margin-right: 5.0pt; text-align: justify; }
To implement in the XHTML, you would have the body tag as <body class="epub_margins">. Unfortunately, if you try to apply a top and bottom margin, it would only apply those margins on the first paragraph and last paragraph of the XHTML file rather than on each page of the eReader.

To apply a top and bottom margin on every single page of the eReader, you can use the @page property. The @page feature in the CSS section is allowed in EPUB, but not allowed in MOBI. It applies styles to the page being viewed in the eReader. You can use this to add top and bottom margins between the text and the edge of the eReader's viewing screen on every page.
An example is as follows:
@page { margin-bottom: 5.0pt; margin-top: 5.0pt; }
Parsing and Pagebreaks: The "page-break-before: always;" style can be used in EPUB, but you can run into problems when you split up chapter headings. If you have a margin-top value assigned to a heading, it will be ignored following the page break.
Top Margin Lost during Page Break
A way around this is file parsing. This involves splitting your XHTML code wherever you want a page break into numerous small XHTML files. You need to make sure to update the Manifest and Spine sections of the content.opf file after you parse your XHTML. In the Spine section, whenever one XHTML file goes to another, a page break is automatically inserted.

Cover Page: A cover page is basically your eBook cover being the first element in the linear order of the Spine section. Some self-publishers may prefer to just have the cover accessible as meta. In that case, your title page (the title and copyright info) would probably be the first element in the linear order of the Spine section. It should be noted that Calibre automatically makes the first page a cover page.
Here is some suitable XHTML code for a coverpage.html file:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
        <title>Cover</title>
        <style type="text/css">
            @page {padding: 0pt; margin:0pt;}
            body {text-align: center; padding: 0pt; margin: 0pt;}
            div {padding:0pt; margin: 0pt;}
            img {padding:0pt; margin: 0pt;}
        </style>
    </head>
    <body>
        <div>
            <img src="cover.jpg" alt="The Book's Cover" style="height: 100%;" />
        </div>
    </body>
</html>
The "height: 100%;" value in the <img> tag forces the image to fill up the height of the entire eReader device.

Adding Embedded Fonts
Fonts are a way to spruce up the text if you feel that it is boring. Unfortunately, embedding fonts is not allowed in the Amazon.com Kindle Store and it is discouraged at the Barnes & Noble NOOK store due to confusion over licensing issues. However, you can still try it out anyway. For foreign language eBook shops, it is essential to embed fonts to support special characters. Here is a tutorial on embedding fonts and working with foreign languages.

Running KindleGen to Create MOBI
You will have to travel back in time to the days of the command prompt to run KindleGen, since it has no GUI support. Download KindleGen from the Amazon.com website and install it in a folder. This guide installed the program in the folder C:\Kindlegen.

Once you have a clean and validated EPUB, perform the following steps to convert an EPUB into MOBI.
  1. Access the command prompt by typing "cmd" in the find/search window
  2. Go to the directory where your EPUB file is located
  3. Type "C:\Kindlegen\Kindlegen filename.epub -c1 -verbose"
Working with KindleGen in the Command Prompt
The different fields you can type for KindleGen can be seen by just typing "C:\kindlegen\kindlegen". The "-c1 -verbose" field is a standard compression that causes the least amount of bugs during the conversion process.
The program will spit out a lot of information, but as long as the last line says "MOBI File successfully generated!", you should be good to go.

Open up Kindle Previewer to see how it looks. Cycle through the sections to make sure that your NCX Table of Contents play order is correct. Click on the "go to" button and try going to "Cover", "Beginning", and "Table of Contents". These were all specified in the Guide section of the content.opf.

Troubleshooting KindleGen
The rules for the Amazon.com Kindle Store are a bit different than the outlets that distribute eBooks in the EPUB format. Here is some guidance:

No @ values in the CSS: This means that all the margins that were set in the EPUB (e.g. @page { margin-left: 5em; margin-right: 5em;}) need to be removed from anywhere in your EPUB being converted to MOBI.

Strange margins on Ordered/Unordered Lists: The MOBI format does not function very well when margins are assigned to the <ol> and <ul> tags. Try taking ol, ul, and li out of the reset code in the CSS stylesheet and do not specify any margins for them. Leaving them at their defaults seems to work okay.
Margin Errors in MOBI for Ordered and Unordered Lists
Need to specify meta cover: Unlike the NOOK store, you need to specify a meta cover for Amazon.com. Ensure that you complied with the content.opf guidance in this tutorial.
This code needs to be in the Metadata section:
<meta name="cover" content="My_Cover_ID" />
And this code needs to be in the Manifest section:
<item href="cover.jpg" id="My_Cover_ID" media-type="image/jpeg" />

Hyperlinks to headings changing them into the base font: There is a bug in the MOBI format where if you click on a hyperlink to a heading with an anchor, the anchored text will sometimes lose its formatting. The anchor tags (i.e. div id="anchorname">) need to be wrapped around the entire heading.
Sample code of proper wrapping would look like this:
<div id="c1"><p class="chapter">Chapter 1</p></div>
In some cases, it is a lot easier to convert from an EPUB to MOBI than trying to use the Calibre options. You now have the knowledge to make eBooks yourself without the use of Calibre.

Video Tutorials


Share/Bookmark

5 comments:

Bjorn Backlund said...

Thanks for a great tutorial! We just released two tools to automate using the KindleGen to create eBooks. Check out http://www.backspinsoftware.com/site/KindleGen/Kindle_Overview.aspx for more details. Feedback more than welcome.
bjorn@backspinsoftware.com

Anonymous said...

Is it OEBPS or OEPBS?

Paul Salvette said...

Yes, it is supposed to be OEBPS by convention, but it really doesn't matter. This was the mega facepalm portion of these posts. I'm updating to OEBPS for the new Kindle Fire tutorials. Thank you.

colombian writer said...

thank you, for all your help and your kindness.I never read your books, but I am sure there are full of aknowledge, truth and wisdom.
my books are in spanish and I'm trying to use all your tutorials. wish me luck.

Maddy said...

Thanks Bjorn & Paul


Mahendran
JMinfosys

Mail jminfosys08@gmail.com