分类标签归档：python

ALiBi 位置编码

320 views

原始的 Transformer 模型本身是 置换不变 的。也就是说，它对输入序列中单词的顺序不敏感。如果不提供位置信息，模型会将句子 “猫追老鼠” 和 “老鼠...

手撕-线性注意力

274 views

计算顺序优化：从 (Q·K^T)·V 改为 Q·(K^T·V)，避免显式计算注意力矩阵

复杂度降低：从 O(n²d) 降到 O(nd²)，当序列长度 n > 特征维度 d 时...

346 views

论文通过以下方法解决如何提升大型语言模型（LLMs）在Text-to-SQL任务中的推理能力和准确性问题：

377 views

def _preprocess(
        self,
        images: Union[ImageInput, VideoInput],
      ...

459 views

from torch import nn
import torch.nn.functional as F
import torch
import math

class MoELayer(nn....

298 views

import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import GPT2L...

576 views

实现 strStr() 函数。

给定一个 haystack 字符串和一个 needle 字符串，在 haystack 字符串中找出 needle 字符串出现的第一个位置 (从0开始)。如果不存在...

291 views

# softmax

import torch

# X = torch.tensor([-0.3, 0.2, 0.5, 0.7, 0.1, 0.8])
# X_exp_sum = X.exp(...

717 views

from torch import nn
import torch.nn.functional as F
import torch
import math


class SelfAttenti...

619 views