Utilize movd mmx->gpr so return values don't go through memory
Convert predict_mmx from NASM to inline C assembly (from Gert Vervoort)
fix instruction operands for mmx/sse assembly
get rid of SSE quant_non_intra (which was broken, and abusing
fix -M 0 (don't initialize Despatcher if parallelism==0)
Add sse version of fdct based on daan (more accurate, but 50% slower)
fix bug in conversion to C
minor improvements -- use mov ins instead of shuf or extra adds whenever possible; some reordering to improve out of order execution