1 - optimize translated cache chaining (DLL PLT like system)
2 - improved 16 bit support
3 - optimize inverse flags propagation (easy by generating intermediate
4 micro operation array).
7 - make it self runnable (use same trick as ld.so : include its own relocator and libc)
8 - fix FPU exceptions (in particular: gen_op_fpush not before mem load)