Sunday, December 04, 2016

Roll-call - A microservices pattern to find service instances and their metadata

I am working on a product whose back-end is implemented as collection of micro services. We had a requirement to implement an API that should respond information about running instances of services and their metadata. As the services are independent of each other and already communicating through message queues, we decided to achieve this through message passing. The following section explains how we implemented it and the challenges faced.

One of the back-end service(Service-A) is exposed to internet through HTTP APIs. The front-end communicates with back-end through this service. Initially we thought of making this service aware of all the running instances of the services through service registry like mechanism. But we felt we are complicating Service-A implementation and adding unnecessary coupling with other services. In this approach, every time we add new service we need to make changes in Service-A to make it aware of the new service instances. So we dropped this plan.


The brainstorming continued, one idea lead to another idea and suddenly we remembered how roll-calls were happened in our schools/colleges. The teacher will announce he/she is going to do roll-call then each student will announce their presence by saying their name/roll-no. While the students were announcing their presence, the teacher notes down the present and absent students name/roll-no.

We realized a similar approach can be used to implement the requirement. We made the Service-A to broadcast a "roll-call" message to all running service instances. Upon receiving this message, the service instances will respond an "alive" message with metadata like name, ip, instance-id, etc. After broadcasting the "roll-call" message, Service-A will wait for fixed amount time and collect all "alive" messages during this period. This process is triggered by an API call and the metadata about service instances are returned as response.


The advantage of this approach is we don't need to make Service-A aware of all running service instances. We can add new services or their instances at anytime and the same API will be able to fetch information about newly added service/instance.

There was one caveat we encountered during the implementation. As the API taking few seconds(i.e. fixed wait time) to respond, if the same API is called again during this period, we might miss or get duplicate metadata in any of the API calls. We solved this problem by creating request specific temporary queues and send the queue name along with the "roll-call" message. The services are made to send their "alive" to the queue specified in the "roll-call" message. As we are using RabbitMQ for our message based communication, creating disposable, temporary queue is quite easy. The problem can also be solved by including "request-id" in the "roll-call" message and collecting the "alive" messages corresponding to specific "request-id".

Please let me know if you find this pattern useful. I love to get your feedback and improve it if you find any shortfalls.

Saturday, April 09, 2016

The danger in Golang infinite for-loop

This is a blog post about how my colleagues and I got bitten by simple looking infinite/forever/endless for-loop in Golang.

We were implementing a server app that waits for a client, receives data and stores it in DB. As simple as it sounds. The implementation goes like this

...
func main() {
    s := Server{Handler: handler}
    server.Start()
    for {
    }
}

func handler(c *Client){
    for {
        buf := make([]byte, 1024*1024)
        n, err := c.Read(buf)
        if err != nil {
            break;
        }
        // process buf
    }
}

The infinite for-loop in main function is used to prevent it getting terminated. I know it is a busy waiting and keeps the CPU active but we didn't give much thought about it.

The problem with above implementation is, the server freezes after receiving N buffers from the client. Initially we suspected the client, network, buffered IO but after adding some logs we realized that the program freezes when it tries to allocate the buffer. Yes, at the call to "make([]byte, ...)". Don't believe me, try the following snippet and see it yourself.

func main() {
    go work()

    for {
    }
}

func work() {
    println("work started")
    var r int64
    for i := 0; i < 1000; i++ {
        buf := make([]byte, 1024*1024)
        r = r + int64(buf[0]) // just to make sure the allocation is not optimized away
    }
    println("work completed")
}

The program will freeze after printing "work started" and it will never print "work completed". You have to terminate it forcefully. I tried with "GOMAXPROCS > 1" but no luck. I don't know the exact reason for this behaviour but I have a guess. I suppose the garbage collector(GC) kicks in after N number of allocations(i.e. make) and it tries to pause the runtime. As the runtime is busy with the infinite for-loop the GC couldn't pause runtime and the program freezes. We can avoid the freeze if we change the program in such way that the runtime can pause the infinite for-loop for the GC. There are different ways we can do this:
  • Replacing the infinite for-loop with infinite select loop. Apparently this seems to be the recommended way of blocking execution in Golang. This also keeps your CPU utilization low.
  • Sleeping within the for-loop. The sleep duration could be as small as zero.
  • Yielding the execution to runtime by calling runtime.Gosched()
I've observed the above behaviour with Go 1.5.1 on OSX. I am not sure whether it changes in different version or platform.

Saturday, April 02, 2016

Embedding DB migrations through Golang build tags

In this post I am going to explain how to embed database migration sqls within application binary and how can we utilize the Golang build tags to maintain both embedded and non-embedded versions of database migrations.

I am developing a system application in Golang that uses SQLite as database. It is destined to run on user's machine. The application runs database migrations at every startup. In a typical web application the migration sql files are usually deployed along with the web application so that it can migrate the database next time when it is launched. But in my case as it is an application distributed to the user, I don't want to keep the migrations as separate sql files. So I decided to embed the migration sqls with the application binary. I also took advantage of the Golang build tags to maintain both embedded and non-embedded versions.

Embedding Migration SQLs

I've used goose and migrate libraries in an earlier projects. But both libraries didn't support embedding the migration sqls within application binary. Quick search revealed that sql-migrate library can do both non-embedded and embedded migrations. We need to convert the migration sqls into Go source files using go-bindata and instruct sql-migrate to use embedded sqls.

The following command converts all migration sqls from "db/migrations" directory into Go source file named "bindata.go" with package name of "myapp".
$ go-bindata -pkg myapp -o bindata.go db/migrations/
The "bindata.go" exports two functions named "Asset" and "AssetDir". These functions are used to retrieved the embedded file and file list of a embedded directory.

The following code snippet wires up these functions with sql-migrate for it retrieve the embedded migration sqls.
...
migrations := &migrate.AssetMigrationSource{
    Asset:    Asset,
    AssetDir: AssetDir,
    Dir:      "db/migrations",
}
...

Embedding with Build Tags

I want to embed the migrations only for production build. I want to use non-embedded migrations for development builds to avoid the additional step of running go-bindata every time the migration sql changes. I achieved this workflow with help of Go build tags.

Go build tags are originally created to select platform specific source files while building multi-platform applications/libraries. The mechanism can be used for other purposes too, like my usage of embedding migration sqls. All build related go commands(e.g. go build, go test) supports specifying build tags.

So to achieve the above workflow I've created two migration source files, one for non-embedded version and another one for embedded version.

migrations_non_embedded.go
// +build !embedded_migrations
package db 
import (
        "github.com/rubenv/sql-migrate"
)

func migrations() migrate.MigrationSource {
        return &migrate.FileMigrationSource{
                Dir: "db/migrations",
        }
}

migrations_embedded.go
// +build embedded_migrations
package db 
import (
        "github.com/rubenv/sql-migrate"
)

func migrations() migrate.MigrationSource {
        return &migrate.AssetMigrationSource{
                Asset:    myapp.Asset,
                AssetDir: myapp.AssetDir,
                Dir:      "db/migrations",
        }
}
In the migration source files, the build tag "+build !embedded_migrations" instructs the build tool to use the source if the build tag "embedded_migrations" is not specified. Similarly the build tag "+build embedded_migrations" instructs the build tool to use the source if the build tag "embedded_migrations" is specified. Basically only one of the source file is used by the build tool based on the presence/absence of the build tag.

Regardless whether the migration sqls are embedded or not, we can retrieve the migration sqls by simply calling "db.migrations()" function and upgrade the database like below.

main.go
...
n, err := migrate.Exec(db, "sqlite3", db.migrations(), migrate.Up)
if err != nil {
    log.Fatal("db migrations failed: ", err)
}
...
Now to build the application with non-embedded migrations we can run the following command
$ go build
and to build with embedded migrations for production/deployment we can run the following command.
$ go-bindata -pkg myapp -o bindata.go db/migrations/
$ go build -tags embedded_migrations

Wednesday, October 21, 2015

Cross compiling libs3 for Raspberry Pi

Recently I had a need to upload files from Raspberry Pi to Amazon S3 bucket. There are at least two ways to do this: uploading through REST API and using libs3 C library. The REST API approach requires HMAC-SHA1 signature authentication. Calculating the signature is not trivial considering the various rules in constructing the signature payload. So I chose to go with libs3 approach to achieve the goal. This post explains how I cross compiled libs3 for Raspberry Pi/armhf.

Getting Source

The libs3 source is hosted in Github in the repository https://github.com/bji/libs3. Clone the repository and make sure it is accessible from our build environment.

Getting Dependencies

We need header and library files of libcurl and libxml2 to build libs3. We also need library files of zlib and liblzma while building host application that uses libs3.

The easy way to get these files is to get your OS distribution packages of these libraries and extract the required files. My Raspberry Pi 2 runs Raspbian OS created from Debian Wheezy. So I downloaded Wheezy packages of these libraries for "armhf" architecture from here and extracted the files as explained here.

The hard way is to get source of all these libraries and cross compiling them for Raspberry Pi then using the resultant libraries. It is a time consuming and unnecessary exercise as the required files are readily available in the OS distribution packages. So I didn't choose to go in this way.

The libs3 can be built as static and shared library. The default build configuration builds both version. When you build libs3 static library you need the .a files from these dependencies and when you build shared library you need the .so files.

Build Environment 

Get your Raspberry Pi cross compiling environment ready as explained here. Copy the dependencies header files(i.e. "curl" and "libxml" directories) into "$HOME/raspberrypi/rootfs/usr/include". Copy the dependencies library files into "$HOME/raspberrypi/rootfs/usr/lib/arm-linux-gnueabihf".

Building The Library

Navigate to the libs3 source directory and issue the below commands to build libs3 library.

$ make CC=arm-linux-gnueabihf-gcc CURL_LIBS=$HOME/raspberrypi/rootfs/usr/lib/arm-linux-gnueabihf/libcurl.a CURL_CFLAGS=-I/opt/raspberrypi/rootfs/usr/include LIBXML2_LIBS=/$HOME/raspberrypi/rootfs/usr/lib/arm-linux-gnueabihf/libxml2.a LIBXML2_CFLAGS=-I$HOME/raspberrypi/rootfs/usr/include -f GNUmakefile

It is enough to specify the include directory "-I/opt/raspberrypi/rootfs/usr/include" once either in CURL_CFLAGS or LIBXML2_CFLAGS but if we didn't specify in both it will attempt to invoke the programs "curl-config" or "xml2-config" which we don't have.

After successfully building the library, the libs3 static and shared libraries will be available under "build/lib" directory of libs3 source directory. The directory "inc" under libs3 source directory contains the corresponding libs3 header files. Now we can either point our application build system to consume libs3 from here or copy the header and library files into the corresponding directories of "$HOME/raspberrypi/rootfs/usr" to consume them automatically via the pi.cmake toolchain file. Make sure you link zlib and lzma libraries with your application when you consume libs3, otherwise the linking process will fail with undefined symbol error.

I hope the information helps you, thanks for reading.

Thursday, December 18, 2014

Pure CMake based bin2h implementation

I had been writing CMake build script for a project that requires converting few files into C/C++ headers so that the content of those files can be embedded in the output binary.

I could've done this by executing an external bin2h program to do the conversion. But as it is a cross-platform application I need to depend on platform specific bin2h programs. A widely used approach for this is to include bin2h source in the project code base and compile it on the target platform then use the resultant executable for the bin2h conversion.

I didn't want to include the source bin2h in my project and compile it on-demand. I was thinking there should be a CMake command/function/module to do this in cross-platform way but I couldn't find anything.

So I started writing a pure CMake based bin2h implementation and end up with the following. The following module provides a CMake function "BIN2H" that can be used to convert any files into C/C++ header file. We need to specify the source file, header file and the C/C++ variable name that points to the raw bytes of the source file content. We can also specify the optional parameter 'APPEND' to append to the header file instead of overwriting and the optional parameter 'NULL_TERMINATE'  to terminate the raw bytes array with null byte.


I hope it will be useful to others.

Monday, July 07, 2014

RSA OAEP padding with SHA512 hash algorithm

Recently I wanted to encrypt a message with RSA with OAEP padding. I also wanted to use SHA512 as hashing algorithm and mask generation function(MGF) in the OAEP padding instead of SHA1. But it looks like it is not possible with the OpenSSL/libcrypto as the SHA1 hash algorithm is hard coded in OAEP padding implementation. This is confirmed by this thread in OpenSSL forum. Though the forum thread was written around 2012 but still I couldn't find a way use either SHA256 or SHA512 as my hashing algorithm and MGF in OAEP padding.

As suggested by "Dr Stephen N. Henson"(the core developer of OpenSSL) in the forum thread , I've took the implementation of RSA OAEP padding and modified to use SHA512 instead of SHA1. It is mostly just find EVP_sha1 and replace with EVP_sha512. We also need to update the usage of SHA_DIGEST_LENGTH macro to SHA512_DIGEST_LENGTH to reflect the output length of SHA512 hash. Below is the modified RSA OAEP padding implementation which uses SHA512 algorithm. Hope it helps, cheers.



Wednesday, July 02, 2014

Viewing raw RGB bitmaps with ImageMagick

I've been looking for a tool to view raw RGB/RGBA bitmaps but couldn't locate such one. It will be useful at times when we want to troubleshoot what bitmaps we are feeding to video encoder or what frames are getting produced by video decoder. The RGB bitmaps contains raw image data and doesn't carry any header info like width, height and etc. For example, if the image is 100x100 and it is a 32 bits per-pixel(RGBA) then the RGBA image will be 100 * 100 * 4 = 40,000 bytes. In this image data each pixel is represented by four bytes and the bytes present the color primary values of Red, Green, Blue & Alpha respectively.

Very recently I realized I can use ImageMagick to convert these RGB bitmaps to any viewable image formats like png or jpg and use any standard image viewer to view them. The following command uses ImageMagick's convert utility to convert the raw RGBA image to png image which can be viewed with any image viewer application.
convert.exe -depth 8 -size 2048x858 image.rgba image.png
as you can see we specify the bit depth(how many bits per pixel component) and size, without which the utility doesn't know how to interpret the image data from source image. We can also specify source and destination color space with '-colorspace' option to convert the image from one color space to another. Also the option '-alpha off' can be used if we want to ignore the alpha channel from source image.