OpenCV is multithreaded, and you can set the number of threads it uses by cvSetNumThreads(N). The haar cascades used for face and other feature detection will fully utilize all the threads you give it. Using glib.timeout_add( miliseconds, self.loop ) even with 4 threads you will still get unacceptable lag doing haar detection on a 320x240 image; the GUI will become unresponsive because glib will not update its own events if the timeout is always fully loaded.
The solution is spliting the process in two so that pygtk can have its own core and update without being blocked by the OpenCV/Pygame process. One approach for this is to use the subprocess module, and serialize data from the GUI to the OpenCV/Pygame process over a pipe - but this is rather slow. The fastest approach available in Python is the multiprocessing module that uses ctypes wrapped sharedmemory. If sharedmemory is only read from the OpenCV/Pygame process then there will be no data copy overhead, giving us the maximum possible speed. (copy-on-write is implemented in hardware and supported by modern Linux kernels.)
Ctypes restricts what types can be shared, so its not possible to share a dictionary or instance. Luckly ctypes has a simple interface for using C-level structs, and all the data from the GUI can easily be reformatted from using a dictionary to using structs in an array.
_cfg_ubytes = 'active alpha blur athresh_block_size thresh_min thresh_max'.split()
_cfg_ubytes += 'FXstencil FXblur FXsobel FXathresh FXthresh FXdetect'.split()
class LayerConfig( ctypes.Structure ):
_fields_ = [ ('colorspace',ctypes.c_int) ]
for tag in _cfg_ubytes: _fields_.append( (tag, ctypes.c_ubyte) )
mysharedmemory = multiprocessing.sharedctypes.Array( LayerConfig, [ (cv.CV_BGR2RGB,) ], lock=False )