Sunday, December 13, 2009

Using heap as stack and finding stack usage

Recently I learned how a memory allocated from heap can be used as stack and how we can find the stack usage of particular function invocation. The following is a sample code utilizing this technique and also it detects the stack usage of function call.
#include <stdlib.h>
#include <string.h>

static int depth = 0;
static void *prevESP;

void func()
{
    if (depth++ < 10)  // change the 10 to increase/decrease stack usage
        func();
}

int main(int argc, _TCHAR* argv[])
{
    const int STACK_SIZE = 1024 * 1024 * 4; // we allocate 4MB for our heap stack
    unsigned char *heapStack = (unsigned char*)malloc(STACK_SIZE);
    unsigned char *heapStackBottom = heapStack + STACK_SIZE; // move to bottom of the heap stack

    // fill the stack with 0xaa byte so that we can detect the stack
    // usage by scanning non 0xaa byte
    memset(heapStack, 0xaa, STACK_SIZE);

    __asm
    {
        mov prevESP, ESP          // take backup of current stack pointer
        mov ESP, heapStackBottom  // store our heap stack bottom as current stack pointer  
    }

    func();

    __asm
    {
        mov ESP, prevESP  // restore the original stack pointer
    }

    // lets scan for a byte which is not 0xaa. this reveals the
    // last dirty byte from which we can calculate the stack usage
    heapStackBottom = heapStack;
    while (*heapStackBottom++ == 0xaa);

    // the stack started from bottom, subtract it from top and the total size to find the stack usage
    printf ("Heap stack usage %d\n", STACK_SIZE - (heapStackBottom - heapStack));

    free(heapStack);

    return 0;
}
So in the above code we allocate 4MB heap space to be used as stack(lets call this as "heap stack"). Now we need to store the bottom(x86 stack grows downwards, e.g. 2000 -> 1000) of heap stack into ESP register. Before modifying the ESP we need to take the current ESP value which we will restore after function invocation. These operations are done in inlined assembly code. To the stack usage of a function invocation, we fill the allocated heap area with some known pattern(here 0xaa -> 0b10101010) and call the function. When the function returns we start scanning for a byte which is not 0xaa. This will be the last dirty byte from heap stack bottom. We subtract this offset from heap stack top which results the no. of bytes not used. Again subtracting it with total heap stack size we will end up with actual stack usage. Though the above code is writtend for VC++ compiler it can be ported to Linux by changing the inlined assembly with its GCC equivalent.
I thought about the above technique when I was reading setcontext/getcontext/swapcontext POSIX functions and how it could be implemented. As per C99 standard setjmp/longjmp is not guaranteed to work when we call longjmp of a function which is already completed its execution(i.e. returned). This is because the stack is not guaranteed to be same when setjmp was called. The POSIX solves this by using user allocated memory for stack. At this pointed I was wondering how could it be implemented and came up with the above code.

Tuesday, December 08, 2009

A simple TCP redirector in Python

A very simple TCP redirector written in python as an experiment. It redirects any data sent to local port's to a target host, port. It acts as a bridge between local port and the target host, port pair. You can run it by specifying a local port, target host and target host's port. For example the following will redirect all HTTP request to local port 8080 into Google web server
$ ./SimpleTCPRedirector localhost 8080 www.google.com 80
#!/usr/bin/env python

import socket
import threading
import select
import sys

terminateAll = False

class ClientThread(threading.Thread):
 def __init__(self, clientSocket, targetHost, targetPort):
  threading.Thread.__init__(self)
  self.__clientSocket = clientSocket
  self.__targetHost = targetHost
  self.__targetPort = targetPort
  
 def run(self):
  print "Client Thread started"
  
  self.__clientSocket.setblocking(0)
  
  targetHostSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  targetHostSocket.connect((self.__targetHost, self.__targetPort))
  targetHostSocket.setblocking(0)
  
  clientData = ""
  targetHostData = ""
  terminate = False
  while not terminate and not terminateAll:
   inputs = [self.__clientSocket, targetHostSocket]
   outputs = []
   
   if len(clientData) > 0:
    outputs.append(self.__clientSocket)
    
   if len(targetHostData) > 0:
    outputs.append(targetHostSocket)
   
   try:
    inputsReady, outputsReady, errorsReady = select.select(inputs, outputs, [], 1.0)
   except Exception, e:
    print e
    break
    
   for inp in inputsReady:
    if inp == self.__clientSocket:
     try:
      data = self.__clientSocket.recv(4096)
     except Exception, e:
      print e
     
     if data != None:
      if len(data) > 0:
       targetHostData += data
      else:
       terminate = True
    elif inp == targetHostSocket:
     try:
      data = targetHostSocket.recv(4096)
     except Exception, e:
      print e
      
     if data != None:
      if len(data) > 0:
       clientData += data
      else:
       terminate = True
      
   for out in outputsReady:
    if out == self.__clientSocket and len(clientData) > 0:
     bytesWritten = self.__clientSocket.send(clientData)
     if bytesWritten > 0:
      clientData = clientData[bytesWritten:]
    elif out == targetHostSocket and len(targetHostData) > 0:
     bytesWritten = targetHostSocket.send(targetHostData)
     if bytesWritten > 0:
      targetHostData = targetHostData[bytesWritten:]
  
  self.__clientSocket.close()
  targetHostSocket.close()
  print "ClienThread terminating"

if __name__ == '__main__':
 if len(sys.argv) != 5:
  print 'Usage:\n\tpython SimpleTCPRedirector    '
  print 'Example:\n\tpython SimpleTCPRedirector localhost 8080 www.google.com 80'
  sys.exit(0)  
 
 localHost = sys.argv[1]
 localPort = int(sys.argv[2])
 targetHost = sys.argv[3]
 targetPort = int(sys.argv[4])
  
 serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 serverSocket.bind((localHost, localPort))
 serverSocket.listen(5)
 print "Waiting for client..."
 while True:
  try:
   clientSocket, address = serverSocket.accept()
  except KeyboardInterrupt:
   print "\nTerminating..."
   terminateAll = True
   break
  ClientThread(clientSocket, targetHost, targetPort).start()
  
 serverSocket.close()

Wednesday, April 22, 2009

FreeType 2 Caching

Recently I found that FreeType 2 has support for caching. Previously I used to cache myself in custom data structure in order to avoid rendering glyph every time. But this new caching mechanism provides robust way of caching rendered glyph. It also helps us to cache other things like char map lookups, face sizes and etc. It also flushes the cache whenever there is no room for new one. Now I am going to explain some of component of FreeType 2 cache sub-system and how to use it. Please refer http://freetype.sourceforge.net/freetype2/docs/reference/ft2-cache_subsystem.html for more information.
FTC_Manager
This is the main component in cache sub-system that is responsible for managing all kind of cache. We have to create a FTC_Manager before doing anything with cache sub-system. The function for creating a new FTC_Manager is


FTC_Manager_New(FT_Library library,
                FT_UInt max_faces,
                FT_UInt max_sizes,
                FT_ULong max_bytes,
                FTC_Face_Requester requester,
                FT_Pointer req_data,
                FTC_Manager *amanager);
here "max_faces", "max_sizes" determines the max no. of faces and sizes to cache. The "max_bytes" is the max amount of memory that can be used for caching. The "requester" and "req_data" is callback mechanism for converting face id into face. When we use the caching functions we don't pass FT_Face directly, instead we pass a face id which will be mapped into corresponding FT_Face by this callback function. The face id is typedef-ed as FT_Pointer so you can use any integer or pointer value as face id. If you are going to use only one face then you can simply return the FT_Face for any face id or if you are going to use more than one faces then probably you can pass the filename via "req_data" then load and return FT_Face inside "requester" callback. Note, the "requester" callback function will be called only once for the first lookup(Face, Size, CMap, Image, SBit) of the face id. When the "requester" callback returns a FT_Face for given face id the FTC_Manager stores the pair for future lookups. So don't expect the "requester" callback to be called for every lookup.

Once we done with caching we can call FTC_Manager_Done which will free the memory used for caching.

FTC_Node
All objects(Face, Size, CMap, SBit and Image) in caching sub-system associated with FTC_Node. FTC_Manager manipulates all type of objects in the form of FTC_Node and each FTC_Node is reference counted. We have to manually de-reference(FTC_Node_Unref) the node when we no longer use it. All the caching lookup function accepts a pointer to FTC_Node, if we provide an FTC_Node pointer then the FTC_Node of the lookup result will be returned. The returned FTC_Node will have reference count incremented by one. So its our responsibility to de-referencing the FTC_Node when we no longer need it. We can also pass NULL as address of FTC_Node pointer if we don't want to manually manage the FTC_Node, in that case the lookup result which we get will be freed in the next lookup.

FTC_SBitCache
A rendered glyph can be cached using FTC_SBitCache(SBit stands for Small Bitmap). First we need to create the SBitCache using FTC_SBitCacheNew, then use the returned FTC_SBitCache for any SBit lookup. The SBit lookup can be done using the function FTC_SBitCacheLookup

FTC_SBitCache_Lookup(FTC_SBitCache cache,
                     FTC_ImageType type,
                     FT_UInt gindex,
                     FTC_SBit *sbit,
                     FTC_Node *anode);

here the parameter "type" holds the face id, width, height and rendering flags(i.e. FT_LOAD_RENDER). "gindex" is the glyph index we want to render as bitmap and the resultant FTC_SBit is stored in "sbit". A FTC_SBit holds all the necessary information to paint the bitmap on a surface/canvas. The "buffer" field of FTC_SBit nothing but the "buffer" field of FT_Bitmap which holds the raw bitmap data.

Sample
The following code snippet shows how typical usage of FreeType 2 Caching. I didn't show handling the return value of the FreeType 2 API to make code readable, but don't do that in your real code.

int main(int argc, char **argv)
{
  FT_Library lib;
  FTC_Manager ftcManager;
  FTC_SBitCache ftcSBitCache;
  FTC_SBit ftcSBit;
  int ret;

  ret = FT_Init_FreeType(&lib);
  assert(ret == 0);

  ret = FTC_Manager_New(lib, 1, 1, 1024 * 1024, ftcFaceRequester, fontFilename, &ftcManager);
  assert(ret == 0);
  ret = FTC_CMapCache_New(ftcManager, &ftcCMapCache);
  assert(ret == 0);
  ret = FTC_SBitCache_New(ftcManager, &ftcSBitCache);
  assert(ret == 0);

  ftcImageType.face_id = 0;
  ftcImageType.width = 16;
  ftcImageType.height = 16;
  ftcImageType.flags = FT_LOAD_DEFAULT | FT_LOAD_RENDER;

  glyphIndex = FTC_CMapCache_Lookup(ftcCMapCache, 0, 0, 'a');
  assert(glyphIndex > 0);
  ret = FTC_SBitCache_Lookup(ftcSBitCache, &ftcImageType, glyphIndex, &ftcSBit, &ftcNode);
  assert(ret == 0);

  /* render the glyph */
  FTC_Node_Unref(ftcNode, ftcManager);
  FTC_Manager_Done(ftcManager);
}

FT_Error FtcFaceRequester(FTC_FaceID faceID, FT_Library lib, FT_Pointer reqData, FT_Face *face)
{
  int ret = FT_New_Face(lib, (char *)reqData, 0, face);
  assert(ret == 0);
  return 0;
}