我试图将计算和内存操作与 HuggingFace SwitchTransformer 重叠。
\n这里\xe2\x80\x99有详细的解释。
\n s_0 = torch.cuda.Stream() # Create a new stream.\n s_1 = torch.cuda.Stream() # Create a new stream.\n\n with torch.cuda.stream(s_0):\n this_gate_info = router_mask, router_probs, router_logits\n router_mask = router_mask.bool()\n idx_mask = router_mask.transpose(1,2)\n idx_mask = torch.cat(torch.split(idx_mask, 1, dim=0), dim=2)\n idx_mask = idx_mask.sum(dim=2)\n idx_mask = idx_mask.squeeze()\n \n if next_blk is not None:\n active_idx = …Run Code Online (Sandbox Code Playgroud)