shradhasehgal a day ago

Super interesting work. Wild that AF3 launched 100x more kernels. 768 tokens length training results seem cool.