Saturday, April 09, 2016

The danger in Golang infinite for-loop

This is a blog post about how my colleagues and I got bitten by simple looking infinite/forever/endless for-loop in Golang.

We were implementing a server app that waits for a client, receives data and stores it in DB. As simple as it sounds. The implementation goes like this

...
func main() {
    s := Server{Handler: handler}
    server.Start()
    for {
    }
}

func handler(c *Client){
    for {
        buf := make([]byte, 1024*1024)
        n, err := c.Read(buf)
        if err != nil {
            break;
        }
        // process buf
    }
}

The infinite for-loop in main function is used to prevent it getting terminated. I know it is a busy waiting and keeps the CPU active but we didn't give much thought about it.

The problem with above implementation is, the server freezes after receiving N buffers from the client. Initially we suspected the client, network, buffered IO but after adding some logs we realized that the program freezes when it tries to allocate the buffer. Yes, at the call to "make([]byte, ...)". Don't believe me, try the following snippet and see it yourself.

func main() {
    go work()

    for {
    }
}

func work() {
    println("work started")
    var r int64
    for i := 0; i < 1000; i++ {
        buf := make([]byte, 1024*1024)
        r = r + int64(buf[0]) // just to make sure the allocation is not optimized away
    }
    println("work completed")
}

The program will freeze after printing "work started" and it will never print "work completed". You have to terminate it forcefully. I tried with "GOMAXPROCS > 1" but no luck. I don't know the exact reason for this behaviour but I have a guess. I suppose the garbage collector(GC) kicks in after N number of allocations(i.e. make) and it tries to pause the runtime. As the runtime is busy with the infinite for-loop the GC couldn't pause runtime and the program freezes. We can avoid the freeze if we change the program in such way that the runtime can pause the infinite for-loop for the GC. There are different ways we can do this:
  • Replacing the infinite for-loop with infinite select loop. Apparently this seems to be the recommended way of blocking execution in Golang. This also keeps your CPU utilization low.
  • Sleeping within the for-loop. The sleep duration could be as small as zero.
  • Yielding the execution to runtime by calling runtime.Gosched()
I've observed the above behaviour with Go 1.5.1 on OSX. I am not sure whether it changes in different version or platform.

Saturday, April 02, 2016

Embedding DB migrations through Golang build tags

In this post I am going to explain how to embed database migration sqls within application binary and how can we utilize the Golang build tags to maintain both embedded and non-embedded versions of database migrations.

I am developing a system application in Golang that uses SQLite as database. It is destined to run on user's machine. The application runs database migrations at every startup. In a typical web application the migration sql files are usually deployed along with the web application so that it can migrate the database next time when it is launched. But in my case as it is an application distributed to the user, I don't want to keep the migrations as separate sql files. So I decided to embed the migration sqls with the application binary. I also took advantage of the Golang build tags to maintain both embedded and non-embedded versions.

Embedding Migration SQLs

I've used goose and migrate libraries in an earlier projects. But both libraries didn't support embedding the migration sqls within application binary. Quick search revealed that sql-migrate library can do both non-embedded and embedded migrations. We need to convert the migration sqls into Go source files using go-bindata and instruct sql-migrate to use embedded sqls.

The following command converts all migration sqls from "db/migrations" directory into Go source file named "bindata.go" with package name of "myapp".
$ go-bindata -pkg myapp -o bindata.go db/migrations/
The "bindata.go" exports two functions named "Asset" and "AssetDir". These functions are used to retrieved the embedded file and file list of a embedded directory.

The following code snippet wires up these functions with sql-migrate for it retrieve the embedded migration sqls.
...
migrations := &migrate.AssetMigrationSource{
    Asset:    Asset,
    AssetDir: AssetDir,
    Dir:      "db/migrations",
}
...

Embedding with Build Tags

I want to embed the migrations only for production build. I want to use non-embedded migrations for development builds to avoid the additional step of running go-bindata every time the migration sql changes. I achieved this workflow with help of Go build tags.

Go build tags are originally created to select platform specific source files while building multi-platform applications/libraries. The mechanism can be used for other purposes too, like my usage of embedding migration sqls. All build related go commands(e.g. go build, go test) supports specifying build tags.

So to achieve the above workflow I've created two migration source files, one for non-embedded version and another one for embedded version.

migrations_non_embedded.go
// +build !embedded_migrations
package db 
import (
        "github.com/rubenv/sql-migrate"
)

func migrations() migrate.MigrationSource {
        return &migrate.FileMigrationSource{
                Dir: "db/migrations",
        }
}

migrations_embedded.go
// +build embedded_migrations
package db 
import (
        "github.com/rubenv/sql-migrate"
)

func migrations() migrate.MigrationSource {
        return &migrate.AssetMigrationSource{
                Asset:    myapp.Asset,
                AssetDir: myapp.AssetDir,
                Dir:      "db/migrations",
        }
}
In the migration source files, the build tag "+build !embedded_migrations" instructs the build tool to use the source if the build tag "embedded_migrations" is not specified. Similarly the build tag "+build embedded_migrations" instructs the build tool to use the source if the build tag "embedded_migrations" is specified. Basically only one of the source file is used by the build tool based on the presence/absence of the build tag.

Regardless whether the migration sqls are embedded or not, we can retrieve the migration sqls by simply calling "db.migrations()" function and upgrade the database like below.

main.go
...
n, err := migrate.Exec(db, "sqlite3", db.migrations(), migrate.Up)
if err != nil {
    log.Fatal("db migrations failed: ", err)
}
...
Now to build the application with non-embedded migrations we can run the following command
$ go build
and to build with embedded migrations for production/deployment we can run the following command.
$ go-bindata -pkg myapp -o bindata.go db/migrations/
$ go build -tags embedded_migrations