AutoTune-TCU: Hardware-Software Co-Design of a Learning-Based Self-Optimizing TCU for HPC