Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders