Skip to content

Rewrote flash attention to use BF16, transpose k and v, rewrote the t…

7b55d41
Select commit
Loading
Failed to load commit list.
Open

Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. #835

Rewrote flash attention to use BF16, transpose k and v, rewrote the t…
7b55d41
Select commit
Loading
Failed to load commit list.