cs267_matrix_optimization cs267 HW1 Optimization for square matrix multiplication with loop unrolling reordering (reusing variables) avx vectorization compiler optimization analysis