在wsl下python3.10.16 torch2.4.0 cuda12.1 微调笔记

Posted on 2025-12-06 Edited on 2026-01-26 Views:

在wsl下python3.10.16 torch2.

4.0 cuda12.1微调笔记

0.屏蔽wsl中windows的环境变量

#屏蔽wsl中的windows环境变量
1.在wsl 的 ubuntu中编辑/etc/wsl.conf，输入：

[interop]
enabled = false
appendWindowsPath = false

退出保存之后，需要重启wsl。
在cmd中，输入：

wsl --shutdown

1.环境配置

1.1安装nvidia驱动，最新版即可

https://www.nvidia.cn/geforce/drivers/ ，选择自己的型号，这次安装了NVIDIA Studio 驱动程序 - WHQL

驱动程序版本: 572.60 - 发行日期: 2025-2-27

安装后运行nvidia-smi，这里在windows下安装完，wsl中也可以执行

(u2) zk@baize:~/ai$ whereis nvidia-smi
nvidia-smi: /usr/bin/nvidia-smi /usr/lib/wsl/lib/nvidia-smi /usr/share/man/man1/nvidia-smi.1.gz

(u2) zk@baize:~/ai$ nvidia-smi
Wed Mar 12 10:53:45 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 572.60       CUDA Version: 12.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| 41%   46C    P8    38W / 420W |   1003MiB / 24576MiB |      9%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

1.2conda开虚拟环境

1
2
3

conda create --name u2 \
	python=3.10 \
	-y

1.3安装xformers 0.0.27.post1版本

1	pip install -U xformers==0.0.27.post1

安装0.0.27.post1对应python310的torch版本为2.4.0，安装后cuda12.1也跟着装好了。

检查xformers情况

(u2) zk@baize:~/ai$ python -m xformers.info
is_triton_available:                               True
pytorch.version:                                   2.4.0+cu121
pytorch.cuda:                                      available
gpu.compute_capability:                            8.6
gpu.name:                                          NVIDIA GeForce RTX 3090
dcgm_profiler:                                     unavailable
build.info:                                        available
build.cuda_version:                                1201
build.hip_version:                                 None
build.python_version:                              3.10.14
build.torch_version:                               2.4.0+cu121
build.env.TORCH_CUDA_ARCH_LIST:                    6.0+PTX 7.0 7.5 8.0+PTX
build.env.PYTORCH_ROCM_ARCH:                       None
build.env.XFORMERS_BUILD_TYPE:                     Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   wheel-v0.0.27.post1
build.nvcc_version:                                12.1.66
source.privacy:                                    open source

这里可以看到pytorch.version: 2.4.0+cu121，build.torch_version: 2.4.0+cu121，这两个必须一致，前期安装好几次都不一致。

1.4检查cuda安装，nvcc

(u2) zk@baize:~/ai$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

1.5检查cuda激活

(u2) zk@baize:~/ai$ python
Python 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
2.4.0+cu121
>>> print(torch.cuda.is_available())
True
>>> print(torch.cuda.get_device_name(torch.cuda.current_device()))
NVIDIA GeForce RTX 3090
>>> print(torch.cuda.device_count())
1
>>> print(torch.cuda.get_device_properties(torch.cuda.current_device()))
_CudaDeviceProperties(name='NVIDIA GeForce RTX 3090', major=8, minor=6, total_memory=24575MB, multi_processor_count=82)
>>> print(torch.version.cuda)
12.1
>>> print(torch.backends.cudnn.version())
90100
>>> print(torch.cuda.get_arch_list())
['sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90']

1.6安装unsloth

unsloth官网example，根据cuda和torch版本选择

pip install "unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"

pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"

pip install "unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"

这里要注意，有坑。找到pip install “unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"符合版本，下载过程中发现他去下载xformers-0.0.28.post1，这会把torch和cuda又改变版本，需要加参数--no-deps

1 2	pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git" --no-deps #这里安装完毕因为no deps没拉去了unsloth_zoo，应该加上unsloth_zoo

安装unsloth所需其他依赖

1
2
3

pip install --no-deps trl peft accelerate bitsandbytes

pip install unsloth_zoo  #zoo不会改变torch和cuda版本

1.7预先编译好llama.cpp

调用gpu编译llama.cpp

1
2
3

#官方帮助文档：https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

编译后，在/home/zk/ai/llama.cpp/build/bin下要有llama-quantize和llama-cli这两个主要文件。

2.微调

2.1微调主代码

这里把基础模型和数据集都改写成本地调用

import os
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

# 加载模型
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/home/zk/ai/base_model/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

# 准备训练数据
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction：
{}
### Input:
{}
### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token  # 必须添加 EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs = examples["input"]
    outputs = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # 必须添加EOS_TOKEN，否则无限生成
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}
pass

# 本地数据集
dataset = load_dataset("json", data_files = "/home/zk/ai/dataset/caishui_2011_100hao.json", split="train")
dataset = dataset.map(formatting_prompts_func, batched = True)

# 设置训练参数
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                        "gate_proj", "up_proj", "down_proj", ],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 3407,
    max_seq_length = max_seq_length,
    use_rslora = False,
    loftq_config = None,
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = TrainingArguments(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 4,
        warmup_steps = 2,
        max_steps = 20,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        output_dir = "outputs",
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        learning_rate = 2e-5,
    ),
)

# 开始训练
trainer.train()

# 保存微调模型
model.save_pretrained("lora_model")

# 选项：保存为16位hf模型
save_16bit = input("是否保存为16位hf模型？(y/n): ")
if save_16bit.lower() == "y":
    model.save_pretrained_merged("outputs", tokenizer, save_method="merged_16bit")

# 选项：保存为gguf模型
save_gguf = input("是否保存为gguf模型？(y/n): ")
if save_gguf.lower() == "y":
    os.system("python /home/zk/ai/llama.cpp/convert_hf_to_gguf.py --outfile /home/zk/ai/gguf_model/lm38b_tax_jzjt.gguf /home/zk/ai/outputs")

# 选项：量化为4位gguf模型
quantize_4bit = input("是否量化为4位gguf模型？(y/n): ")
if quantize_4bit.lower() == "y":
    os.system("/home/zk/ai/llama.cpp/build/bin/llama-quantize /home/zk/ai/gguf_model/QWQ_tax_jzjt.gguf /home/zk/ai/gguf_model/lm38b_tax_jzjt-Q4_K_M.gguf Q4_K_M")