Honza Pokorný

A personal blog


Read it later the hard way

There is a seemingly endless stream of articles we come across daily, and little time to read them. A quick glance at Hacker News in between tasks can yield half a dozen open tabs with articles to read at some point later. There are many services that help with this problem by maintaining a queue of these articles for you. You know the type: things like Instapaper or Pocket.

In this article, we will build a service like that. The hard way.

I’d like to spend time less time at the computer, and read in a more unhurried, and offline manner. For me, this means reading these articles on my Kindle. I would also like to introduce a bigger gap between collecting the article, and reading it to correct for any recency bias. If it sits in the queue for two weeks, maybe I will realize I don’t care to read it?

Let’s call this project: sa2k — send articles to Kindle. We will not be using the email a file to Amazon feature. This will be a Golang project, so let’s start with:

$ mkdir sa2k
$ cd sa2k
$ go mod init github.com/honza/sa2k

Let’s create a simple CLI project using Cobra, in main.go:

package main

import (
	"github.com/spf13/cobra"
)

var rootCmd = &cobra.Command{
	Use:   "sa2k",
	Short: "sa2k --- send articles to kindle",
}

func main() {
	if err := rootCmd.Execute(); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
}

And the simplest action will be adding a new article:

var addCmd = &cobra.Command{
	Use:   "add",
	Short: "Add an article to the queue",
	RunE: func(cmd *cobra.Command, args []string) error {
		return Add(config, Title, args)
	},
}

Next, hook it up to the Cobra mechanism, in main():

rootCmd.AddCommand(addCmd)

Now for the implementation of Add(). We will accept a config struct which will tell us where to store articles, etc. The Title is a CLI flag that we can use to override whatever the website’s <title> is, and args is list of articles to enqueue.

The Config struct looks like this:

type Config struct {
	// Where to store epubs of articles we want to read
	EpubDir string `json:"epub_dir"`
	// Once we send an article to the Kindle, where should we archive it?
	ArchiveDir string `json:"archive_dir"`
	// Where is the Kindle going to be mounted?
	KindleDocDir string `json:"kindle_doc_dir"`
}

We probably want a way to initialize this config, so let’s add an init command.

var initCmd = &cobra.Command{
	Use:   "init",
	Short: "Init sa2k dir",
	RunE: func(cmd *cobra.Command, args []string) error {
		configDir := path.Join(xdg.ConfigHome, "sa2k")

		if _, err := os.Stat(configDir); !os.IsNotExist(err) {
			fmt.Println("Already configured")
			return nil
		}

		os.Mkdir(configDir, 0744)

		config := Config{
			EpubDir:      "epubs",
			ArchiveDir:   "archive",
			KindleDocDir: "",
		}

		b, err := json.MarshalIndent(config, "", "  ")

		if err != nil {
			return err
		}

		configFile := path.Join(configDir, "config.json")
		ioutil.WriteFile(configFile, b, 0744)

		return nil
	},
}

And import the required libraries:

import (
	"path"
	"os"
	"encoding/json"
	"io/ioutil"
	"github.com/spf13/cobra"
	"github.com/adrg/xdg"
)

Next, hook it up to the Cobra mechanism, in main():

rootCmd.AddCommand(addCmd)
rootCmd.AddCommand(initCmd)

Add the Title flag at top level:

var Title string

And hook it up to Cobra:

	rootCmd.PersistentFlags().StringVar(&Title, "title", "", "")

And a global config:

// Global config
var config Config

which we can populate at the top of main():

func main() {
	var err error
	config, err = GetConfig()
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
}

The GetConfig function is nothing special:

func GetConfig() (Config, error) {
	var config Config

	configPath := path.Join(xdg.ConfigHome, "sa2k", "config.json")

	if _, err := os.Stat(configPath); os.IsNotExist(err) {
		return config, fmt.Errorf("No config file present.  Run init first.")
	}

	contents, err := ioutil.ReadFile(configPath)

	if err != nil {
		return config, err
	}

	err = json.Unmarshal(contents, &config)

	if err != nil {
		return config, err
	}

	if !path.IsAbs(config.EpubDir) {
		config.EpubDir = path.Join(xdg.ConfigHome, "sa2k", config.EpubDir)
	}

	if !path.IsAbs(config.ArchiveDir) {
		config.ArchiveDir = path.Join(xdg.ConfigHome, "sa2k", config.ArchiveDir)
	}

	if !path.IsAbs(config.KindleDocDir) {
		return config, fmt.Errorf("Kindle Doc Dir should be absolute")
	}

	return config, nil
}

Alright, with that housekeeping out of the way, let’s get back to Add. Here is the signature:

func Add(config Config, title string, urls []string) error

We will use readability to download each of the urls, and simplify the HTML structure. Then, we will create a epub file based on that HTML content.

func Add(config Config, title string, urls []string) error {
	for _, url := range urls {
		article, err := PrepareArticle(config, url)
		if err != nil {
			return err
		}

		title := title
		if title == "" {
			title = article.Title
		}

		err = createKfx(config, url, title, article.Byline, article.Content)

		if err != nil {
			return fmt.Errorf("epub error: %w", err)
		}
	}
}

The PrepareArticle function does the network parts:

func PrepareArticle(config Config, url string) (*readability.Article, error) {
	var article readability.Article
	var err error

	if strings.HasPrefix(url, "http") {
		client := &http.Client{Timeout: 30 * time.Second}
		resp, err := client.Get(url)

		if err != nil {
			return &article, err
		}

		if resp.StatusCode > 399 {
			return &article, fmt.Errorf("Status code: %d for %s", resp.StatusCode, url)
		}

		article, err = readability.FromReader(resp.Body, nil)
	} else {
		f, err := os.Open(url)
		if err != nil {
			return nil, err
		}
		article, err = readability.FromReader(f, nil)
	}

	if err != nil {
		return nil, err
	}

	return &article, nil
}

And import readability:

readability "github.com/go-shiori/go-readability"

What on earth is KFX? It’s the latest, and greatest ebook format for Kindles. It features excellent typography which is why I like it. Once we have the epub file ready, we will use the amazing Calibre KFX Output plugin. It can be used via CLI so no need to bother with the Calibre GUI. The plugin is available from the link above, and it’s trivial to install. You will need to download Amazon Kindle Previewer, and if you are on Linux, you will need to install it via wine. Once installed, everything is seamless.

In createKfx, we will create the epub file, and use Pandoc to create the epub file. Then will will convert it to KFX. This might look like a mouthful but it’s pretty simple:

func createKfx(config Config, url string, title string, author string, content string) error {
	slug := slugify.Slugify(title)

	now := time.Now()
	ts := now.Format("20060102-150405")

	metaFilename := fmt.Sprintf("%s-%s.txt", ts, slug)
	epubFilename := fmt.Sprintf("%s-%s.epub", ts, slug)
	kfxFilename := fmt.Sprintf("%s-%s.kfx", ts, slug)
	htmlFilename := fmt.Sprintf("%s-%s.html", ts, slug)
	coverFilename := fmt.Sprintf("%s-%s.png", ts, slug)

	if config.EpubDir != "" {
		os.MkdirAll(config.EpubDir, 0744)
		metaFilename = filepath.Join(config.EpubDir, metaFilename)
		epubFilename = filepath.Join(config.EpubDir, epubFilename)
		kfxFilename = filepath.Join(config.EpubDir, kfxFilename)
		htmlFilename = filepath.Join(config.EpubDir, htmlFilename)
		coverFilename = filepath.Join(config.EpubDir, coverFilename)
	}

	err := ioutil.WriteFile(htmlFilename, []byte(content), 0644)
	if err != nil {
		return err
	}

	title = strings.Trim(title, "\"“”")

	meta := fmt.Sprintf(`---
title: >
  %s
author: >
  %s
date: %s
...`, title, author, now.Format("2006-01-02"))
	err = ioutil.WriteFile(metaFilename, []byte(meta), 0644)
	if err != nil {
		return err
	}

	err = GenerateCover(title, coverFilename)
	if err != nil {
		return fmt.Errorf("failed to generate cover: %w", err)
	}

	pandocCmd := fmt.Sprintf("pandoc -o %s --epub-cover-image %s %s %s",
		epubFilename, coverFilename, metaFilename, htmlFilename)
	pandocOutput, err := runShellCommand(pandocCmd)

	if err != nil {
		return fmt.Errorf("pandoc error: %s - %w", pandocOutput, err)
	}

	convertCmd := fmt.Sprintf("ebook-convert %s %s", epubFilename, kfxFilename)
	convertOutput, err := runShellCommand(convertCmd)

	if err != nil {
		return fmt.Errorf("kfx error: %s - %w", convertOutput, err)
	}

	err = os.Remove(metaFilename)
	if err != nil {
		return err
	}
	err = os.Remove(htmlFilename)
	if err != nil {
		return err
	}

	err = os.Remove(coverFilename)
	if err != nil {
		return err
	}

	return nil
}

We want our articles to look nice on the Kindle, so we generate a cover:

import (
	"embed"
	"github.com/fogleman/gg"
	"github.com/golang/freetype/truetype"
	"golang.org/x/image/font"
)
//go:embed EBGaramondSC12-Regular.ttf
var fontFS embed.FS

func LoadFontFace(path string, points float64) (font.Face, error) {
	fontBytes, err := fontFS.ReadFile(path)
	if err != nil {
		return nil, err
	}
	f, err := truetype.Parse(fontBytes)
	if err != nil {
		return nil, err
	}
	face := truetype.NewFace(f, &truetype.Options{
		Size: points,
	})
	return face, nil
}

func GenerateCover(text string, outputFilename string) error {
	const W = 1600
	const H = 2560
	dc := gg.NewContext(W, H)
	dc.SetRGB(1, 1, 1)
	dc.Clear()
	dc.SetRGB(0, 0, 0)
	fontFace, err := LoadFontFace("EBGaramondSC12-Regular.ttf", 140)
	if err != nil {
		return err
	}
	dc.SetFontFace(fontFace)
	const h = 180

	y := H/2 - h/2
	dc.DrawStringWrapped(text, 800, float64(y), 0.5, 0.5, 1400.0, 1.3, gg.AlignCenter)

	dc.SavePNG(outputFilename)
	return nil
}

And runShellCommand is a cheeky helper:

func runShellCommand(cmd string) (string, error) {
	output, err := exec.Command(
		"bash",
		"-c",
		cmd,
	).CombinedOutput()
	if err != nil {
		return string(output), err
	}

	return string(output), nil
}

We can finally collect articles with:

$ go run main.go add "https://..."

Next, we will want to sync any queued articles to the Kindle. Let’s add a sync command:

var syncCmd = &cobra.Command{
	Use:   "sync",
	Short: "sync articles to kindle",
	RunE: func(cmd *cobra.Command, args []string) error {
		return Sync(config)
	},
}

// and in main()

rootCmd.AddCommand(addCmd)
rootCmd.AddCommand(initCmd)
rootCmd.AddCommand(syncCmd)

And the implementation copies any KFX files to the Kindle, and moves epubs to the archive in case we ever need them in the future:

func Sync(config Config) error {
	if _, err := os.Stat(config.KindleDocDir); os.IsNotExist(err) {
		return fmt.Errorf("Kindle not connected")
	}

	epubs, err := ioutil.ReadDir(config.EpubDir)
	if err != nil {
		return err
	}

	if _, err := os.Stat(config.ArchiveDir); os.IsNotExist(err) {
		os.Mkdir(config.ArchiveDir, 0744)
	}

	for _, epub := range epubs {
		src := path.Join(config.EpubDir, epub.Name())
		kindle := path.Join(config.KindleDocDir, epub.Name())
		archive := path.Join(config.ArchiveDir, epub.Name())

		if strings.HasSuffix(src, "epub") {
			os.Rename(src, archive)
			continue
		}

		err := copy(src, kindle)
		if err != nil {
			return fmt.Errorf("Failed to copy file to Kindle: %w", err)
		}

		os.Remove(src)
	}

	return nil
}

OK! We are getting somewhere.

I also want to make it easy to collect articles. So, let’s make a Chrome extension.

Here are the requirements:

  • Show a simple button in the Chrome UI
  • This button is always visible
  • Clicking the button will add the URL of the current tab to the queue of articles to be read later

Create a chrome directory, and create some files:

$ mkdir chrome
$ touch chrome/manifest.json
$ touch chrome/background.json

In manifest.json, let’s add:

{
    "manifest_version": 2,
    "name": "sa2k",
    "version": "0.1.0",
    "description": "desc",
    "browser_action": {},
    "background": {
      "scripts": [
        "background.js"
      ]
    },
  "permissions": ["activeTab", "nativeMessaging", "notifications"]
}

In background.js, we will create a port, and set up some listeners on that port. The port will start our CLI Go program, and keep it running. Chrome will communicate with the process over stdin.

var port = chrome.runtime.connectNative('sa2k.host');

chrome.browserAction.onClicked.addListener(function(tab) {
  port.postMessage({"url": tab.url});
});

When we click the Chrome UI button, send a message to the sa2k process with the URL of the current tab.

Before Chrome is willing to run some random program on your computer, it needs to know that this is safe.

Create this file:

~/.config/google-chrome/NativeMessagingHosts/sa2k.host.json

… with the following contents:

{
  "name": "sa2k.host",
  "description": "sa2k",
  "path": "<absolute path to your GOPATH>/bin/sa2k",
  "type": "stdio",
  "allowed_origins": ["chrome-extension://<your ext id/"]
}

You can find the ID of the extension on the extension configuration page in Chrome.

We will also need start installing our Go program with:

$ go install .

OK. Chrome will start our program with a single argument, and it’s the string "chrome-extension://<your ID>". So, let’s modify our main():

func main() {
	var err error
	config, err = GetConfig()
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}

	logFilename := path.Join(xdg.ConfigHome, "sa2k", "log")

	file, err := openLogFile(logFilename)
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}

	log.SetOutput(file)
	log.SetFlags(log.LstdFlags | log.Lshortfile | log.Lmicroseconds)

	if len(os.Args) > 1 {
		if strings.HasPrefix(os.Args[1], "chrome-extension://") {
			err = Receive(config)
			if err != nil {
				log.Println("Receive failed:", err)
				os.Exit(1)
			}
			return
		}
	}

	rootCmd.PersistentFlags().StringVar(&Title, "title", "", "")

	rootCmd.AddCommand(addCmd)
	rootCmd.AddCommand(initCmd)
	rootCmd.AddCommand(syncCmd)

	if err := rootCmd.Execute(); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
}

And Receive in where the magic happens. The native messaging protocol uses JSON for encoding the messages, and each message is prefixed with the length of the message. Each message is sent in binary.

We will need a shovel for reading the header:

func readHeader(reader io.Reader) (uint32, error) {
	// Read message length.
	var length uint32

	if err := binary.Read(reader, binary.LittleEndian, &length); err != nil {
		if err == io.EOF {
			return 0, fmt.Errorf("EOF")
		}

		return length, err
	}

	return length, nil
}

You may recall that os.Stdin implements the reader interface. readHeader returns the size of the message.

Next, let’s create a message type:

type IncomingMessage struct {
	URL string `json:"url"`
}

Now we can start implementing Receive:

func Receive(config Config) error {
	log.Println("Receiving now...")
	for {
		length, err := readHeader(os.Stdin)
		if err != nil {
			log.Println("failed to read header", err)
			return err
		}

		if length == 0 {
			log.Println("length is zero")
			return nil
		}

		var message IncomingMessage

		// Read message body.
		if err := json.NewDecoder(io.LimitReader(os.Stdin, int64(length))).Decode(&message); err != nil {
			log.Println("failed to parse body")
			return err
		}

		go func() {
			err := Add(config, "", []string{message.URL})
			if err != nil {
				log.Println("Add error", err)
				return
			}
			log.Println("Add done, sending message")
			SendMessage(os.Stdout, &OutgoingMessage{Type: Ready})
		}()

		log.Println("sending success accept message")
		SendMessage(os.Stdout, &OutgoingMessage{Type: Accepted})
	}

	return nil
}

We are looping forever, decoding each message, using our Add function to enqueue new URLs, and then sending mesages back to Chrome when we have status updates. Of course, the long Add process happens asynchronously in a goroutine.

Let’s fill out the message sending parts. First the types:

type MessageType int

const (
	Accepted MessageType = 1
	Ready    MessageType = 2
)

type OutgoingMessage struct {
	Type MessageType `json:"type"`
}

And then the actual sending:

func writeHeader(writer io.Writer, length int) error {
	header := make([]byte, 4)
	binary.LittleEndian.PutUint32(header, (uint32)(length))

	if n, err := writer.Write(header); err != nil || n != len(header) {
		return err
	}

	return nil
}

func SendMessage(writer io.Writer, v interface{}) error {
	message, err := json.Marshal(v)
	if err != nil {
		return err
	}

	length := len(message)

	if err := writeHeader(writer, length); err != nil {
		return err
	}

	// Write message body.
	if n, err := writer.Write(message); err != nil || n != length {
		return err
	}

	return nil

}

Now we can go back, and update the background.js file to accept these messages:

port.onMessage.addListener(function (response) {
  if (response.type === 1) {
    var opt = {
      type: "basic",
      title: "Added!",
      message: "URL added to the list, processing now...",
      iconUrl: "icon.png"
    };
    chrome.notifications.create(null, opt);
  }

  if (response.type === 2) {
    var opt = {
      type: "basic",
      title: "Ready!",
      message: "Processed, and ready to sync",
      iconUrl: "icon.png"
    };
    chrome.notifications.create(null, opt);
  }
});

That’s it! Now we can click a Chrome extension button, and the article at the current URL will be turned into an ebook in the backgroung. When you connect your Kindle to your computer, you can run sa2k sync to grab new articles. Magic.


This article was first published on April 21, 2023. As you can see, there are no comments. I invite you to email me with your comments, criticisms, and other suggestions. Even better, write your own article as a response. Blogging is awesome.