好多的鱼


  • Home

  • Archives

建立使用nginx服务的静态网站

Posted on 2025-11-24

使用Nginx部署静态网站指南

概述

本指南介绍如何使用Nginx在端口20000部署静态网站,提供比Python内置服务器更安全、高性能的解决方案。

安装Nginx

安装Nginx
bash

Ubuntu/Debian系统

sudo apt update
sudo apt install nginx -y

配置网站

创建网站目录

1
2
3
bash
sudo mkdir -p /var/www/mysite
sudo chown -R $USER:$USER /var/www/mysite

放置网站文件
将你的index.html和其他静态文件复制到 /var/www/mysite/ 目录。

创建Nginx配置文件

1
2
bash
sudo nano /etc/nginx/sites-available/mysite

添加以下配置:

1
2
3
4
5
6
7
8
9
10
11
nginx
server {
listen 23456;
server_name _;
root /var/www/mysite;
index index.html;

location / {
try_files $uri $uri/ =404;
}
}

启用站点

1
2
3
4
bash
sudo ln -s /etc/nginx/sites-available/mysite /etc/nginx/sites-enabled/
sudo nginx -t # 测试配置
sudo systemctl reload nginx # 重新加载

防火墙设置

bash

开放23456端口

1
2
sudo ufw allow 23456/tcp
sudo ufw reload

验证服务

bash

1
curl -I http://localhost:23456

访问 http://你的服务器IP:23456 即可查看网站。

wsl安装并运行TRT

Posted on 2025-11-17

安装指导文件见https://nvidia.github.io/TensorRT-LLM/latest/installation/linux.html

1
2
3
4
5
6
7


conda create -n=trt python=3.12


conda activate trt

注意wsl安装cuda有自己的步骤

1
2
3
4
5
6
7
8
9
10

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.9.1/local_installers/cuda-repo-wsl-ubuntu-12-9-local_12.9.1-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-9-local_12.9.1-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-9-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-9


然后在home目录的.bashrc中增加如下内容

1
2
3
4
5
6

export PATH=/usr/local/cuda-12/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-12

conda activate trt

后续正式开始安装,首先pytorch和必要库,需要装一个多G

1
2
3
4
5
6
7
8
9
pip3 install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128


sudo apt-get -y install libopenmpi-dev

# Optional step: Only required for disagg-serving
sudo apt-get -y install libzmq3-dev


然后装一下trt

1
pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm

卡死了….

1
2
3
4
5
6
pip3 install tensorrt_llm
Collecting tensorrt_llm
Using cached tensorrt_llm-1.0.0.tar.gz (1.6 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... -

卡了足足10多分钟后,自动继续滚动下去了
然后接着卡这句:

1
2
3
4
5
Collecting tensorrt_cu12_libs==10.11.0.33 (from tensorrt_cu12==10.11.0.33->tensorrt~=10.11.0->tensorrt_llm)
Downloading tensorrt_cu12_libs-10.11.0.33.tar.gz (709 bytes)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... -

大约10分钟,继续了
最后安装了这么一大堆后完成了,很顺利

1
2
      Successfully uninstalled fsspec-2025.9.0
Successfully installed StrEnum-0.4.15 accelerate-1.11.0 aenum-3.1.16 aiohappyeyeballs-2.6.1 aiohttp-3.13.2 aiosignal-1.4.0 annotated-types-0.7.0 antlr4-python3-runtime-4.9.3 anyio-4.11.0 attrs-25.4.0 backoff-2.2.1 blake3-1.0.8 blobfile-3.1.0 build-1.3.0 certifi-2025.10.5 cffi-2.0.0 charset_normalizer-3.4.4 click-8.3.0 click_option_group-0.5.9 colored-2.3.1 contourpy-1.3.3 cuda-bindings-12.9.4 cuda-pathfinder-1.3.2 cuda-python-12.9.4 cycler-0.12.1 datasets-3.1.0 diffusers-0.35.2 dill-0.3.8 distro-1.9.0 einops-0.8.1 etcd3-0.12.0 evaluate-0.4.6 fastapi-0.115.4 flashinfer-python-0.2.5 fonttools-4.60.1 frozenlist-1.8.0 fsspec-2024.9.0 grpcio-1.76.0 h11-0.16.0 h5py-3.12.1 hf-xet-1.2.0 httpcore-1.0.9 httpx-0.28.1 huggingface-hub-0.36.0 idna-3.11 importlib_metadata-8.7.0 jiter-0.11.1 kiwisolver-1.4.9 lark-1.3.1 llguidance-0.7.29 lxml-6.0.2 markdown-it-py-4.0.0 matplotlib-3.10.7 mdurl-0.1.2 meson-1.9.1 ml_dtypes-0.5.3 mpi4py-4.1.1 multidict-6.7.0 multiprocess-0.70.16 ninja-1.13.0 numpy-1.26.4 nvidia-ml-py-12.575.51 nvidia-modelopt-0.33.1 nvidia-modelopt-core-0.33.1 nvtx-0.2.13 omegaconf-2.3.0 onnx-1.19.1 onnx_graphsurgeon-0.5.8 openai-2.7.1 opencv-python-headless-4.11.0.86 optimum-2.0.0 ordered-set-4.1.0 packaging-25.0 pandas-2.3.3 peft-0.17.1 pillow-10.3.0 polygraphy-0.49.26 propcache-0.4.1 protobuf-6.33.0 psutil-7.1.3 pulp-3.3.0 pyarrow-22.0.0 pycparser-2.23 pycryptodomex-3.23.0 pydantic-2.12.4 pydantic-core-2.41.5 pydantic-settings-2.11.0 pygments-2.19.2 pynvml-12.0.0 pyparsing-3.2.5 pyproject_hooks-1.2.0 python-dateutil-2.9.0.post0 python-dotenv-1.2.1 pytz-2025.2 pyyaml-6.0.3 pyzmq-27.1.0 regex-2025.11.3 requests-2.32.5 rich-14.2.0 safetensors-0.6.2 scipy-1.16.3 sentencepiece-0.2.1 setuptools-79.0.1 six-1.17.0 sniffio-1.3.1 soundfile-0.13.1 starlette-0.41.3 tenacity-9.1.2 tensorrt-10.11.0.33 tensorrt_cu12-10.11.0.33 tensorrt_cu12_bindings-10.11.0.33 tensorrt_cu12_libs-10.11.0.33 tensorrt_llm-1.0.0 tiktoken-0.12.0 tokenizers-0.21.4 torchprofile-0.0.4 tqdm-4.67.1 transformers-4.53.1 typing-inspection-0.4.2 tzdata-2025.2 urllib3-2.5.0 uvicorn-0.38.0 xgrammar-0.1.21 xxhash-3.6.0 yarl-1.22.0 zipp-3.23.0

写一个程序跑一下,用TRT给的例子:
···
from tensorrt_llm import LLM, SamplingParams

def main():

# Model could accept HF model name, a path to local HF model,
# or TensorRT Model Optimizer's quantized checkpoints like nvidia/Llama-3.1-8B-Instruct-FP8 on HF.
llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

# Create a sampling params.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

for output in llm.generate(prompts, sampling_params):
    print(
        f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}"
    )

# Got output like
# Prompt: 'Hello, my name is', Generated text: '\n\nJane Smith. I am a student pursuing my degree in Computer Science at [university]. I enjoy learning new things, especially technology and programming'
# Prompt: 'The president of the United States is', Generated text: 'likely to nominate a new Supreme Court justice to fill the seat vacated by the death of Antonin Scalia. The Senate should vote to confirm the'
# Prompt: 'The capital of France is', Generated text: 'Paris.'
# Prompt: 'The future of AI is', Generated text: 'an exciting time for us. We are constantly researching, developing, and improving our platform to create the most advanced and efficient model available. We are'

if name == ‘main‘:
main()

···

执行后,在漫长的等待后,报了个错,访问不了huggingface.co

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164

Traceback (most recent call last):
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/connectionpool.py", line 787, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/connectionpool.py", line 488, in _make_request
raise new_e
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/connectionpool.py", line 464, in _make_request
self._validate_conn(conn)
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/connectionpool.py", line 1093, in _validate_conn
conn.connect()
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/connection.py", line 753, in connect
self.sock = sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/connection.py", line 213, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x716f0fd40b30>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/requests/adapters.py", line 644, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/urllib3/util/retry.py", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/TinyLlama/TinyLlama-1.1B-Chat-v1.0/revision/main (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x716f0fd40b30>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
···
后面还有一堆,不用管它了。
国内用hf,执行了
export HF_ENDPOINT=https://hf-mirror.com
之后,重新运行还是报错,按提示升级了一下

pip install transformers -U
增加
export HF_HUB_BASE_URL=https://hf-mirror.com
运行发现还报错。

一通折腾后,发现
trtllm-serve "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
居然下来了。
然后立刻回到试验程序
···
(trt) bobo@DESKTOP-K65EUBR:~/test_trt$ python test_trt.py
<frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cuda module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.driver module instead.
<frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
[2025-11-06 21:27:24] INFO config.py:54: PyTorch version 2.7.1+cu128 available.
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/modelopt/torch/utils/import_utils.py:32: UserWarning: Failed to import huggingface plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
warnings.warn(
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/modelopt/torch/__init__.py:36: UserWarning: transformers version 4.57.1 is incompatible with nvidia-modelopt and may cause issues. Please install recommended version with `pip install nvidia-modelopt[hf]` if working with HF models.
_warnings.warn(
2025-11-06 21:27:28,299 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT LLM version: 1.0.0
[11/06/2025-21:27:28] [TRT-LLM] [I] Using LLM with PyTorch backend
[11/06/2025-21:27:28] [TRT-LLM] [W] Using default gpus_per_node: 1
[11/06/2025-21:27:28] [TRT-LLM] [I] Set nccl_plugin to None.
[11/06/2025-21:27:28] [TRT-LLM] [I] neither checkpoint_format nor checkpoint_loader were provided, checkpoint_format will be set to HF.
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
rank 0 using MpiPoolSession to spawn MPI processes
<frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cuda module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.driver module instead.
<frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
[2025-11-06 21:27:36] INFO config.py:54: PyTorch version 2.7.1+cu128 available.
Multiple distributions found for package optimum. Picked distribution: optimum
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/modelopt/torch/utils/import_utils.py:32: UserWarning: Failed to import huggingface plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
warnings.warn(
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/modelopt/torch/__init__.py:36: UserWarning: transformers version 4.57.1 is incompatible with nvidia-modelopt and may cause issues. Please install recommended version with `pip install nvidia-modelopt[hf]` if working with HF models.
_warnings.warn(
2025-11-06 21:27:41,023 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT LLM version: 1.0.0
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
[TensorRT-LLM][INFO] Refreshed the MPI local session
`torch_dtype` is deprecated! Use `dtype` instead!
Loading safetensors weights in parallel: 100%|██████████| 1/1 [00:00<00:00, 104.27it/s]
Loading weights: 100%|██████████| 449/449 [00:00<00:00, 572.52it/s]
Model init total -- 3.57s
[TensorRT-LLM][INFO] Number of tokens per block: 32.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 0.18 GiB for max tokens in paged KV cache (8352).
2025-11-06 21:27:47,676 - INFO - flashinfer.jit: Loading JIT ops: norm
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
^CTraceback (most recent call last):
File "/home/bobo/test_trt/test_trt.py", line 33, in <module>
main()
File "/home/bobo/test_trt/test_trt.py", line 8, in main
llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/tensorrt_llm/llmapi/llm.py", line 1125, in __init__
super().__init__(model, tokenizer, tokenizer_mode, skip_tokenizer_init,
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/tensorrt_llm/llmapi/llm.py", line 942, in __init__
super().__init__(model,
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/tensorrt_llm/llmapi/llm.py", line 214, in __init__
self._build_model()
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/tensorrt_llm/llmapi/llm.py", line 1072, in _build_model
self._executor = self._executor_cls.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/tensorrt_llm/executor/executor.py", line 423, in create
return GenerationExecutorProxy(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/tensorrt_llm/executor/proxy.py", line 105, in __init__
self._start_executor_workers(worker_kwargs)
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/tensorrt_llm/executor/proxy.py", line 319, in _start_executor_workers
if self.worker_init_status_queue.poll(1):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/tensorrt_llm/executor/ipc.py", line 110, in poll
events = dict(self.poller.poll(timeout=timeout * 1000))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/zmq/sugar/poll.py", line 106, in poll
return zmq_poll(self.sockets, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "zmq/backend/cython/_zmq.py", line 1680, in zmq.backend.cython._zmq.zmq_poll
File "zmq/backend/cython/_zmq.py", line 179, in zmq.backend.cython._zmq._check_rc
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/home/bobo/miniforge3/envs/trt/lib/python3.12/threading.py'>
Traceback (most recent call last):
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/threading.py", line 1594, in _shutdown
atexit_call()
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/mpi4py/futures/_core.py", line 172, in join_threads
thread.join()
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/threading.py", line 1149, in join
self._wait_for_tstate_lock()
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/threading.py", line 1169, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt:
[11/06/2025-21:30:34] [TRT-LLM] [E] Failed to send object: None
^C^C^CException ignored in atexit callback: <function shutdown_compile_workers at 0x7707940725c0>
Traceback (most recent call last):
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 113, in shutdown_compile_workers
pool.shutdown()
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 239, in shutdown
self.process.wait(300)
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/subprocess.py", line 1277, in wait
self._wait(timeout=sigint_timeout)
File "/home/bobo/miniforge3/envs/trt/lib/python3.12/subprocess.py", line 2047, in _wait
time.sleep(delay)
KeyboardInterrupt:
···
发现卡住后补了个变量
export TORCH_CUDA_ARCH_LIST="8.6;8.9"

后面的值是这么来的:
python -c "import torch; print(torch.cuda.get_device_capability())"
回复(8,9),就把8.9加后面,然后接着执行。

(trt) bobo@DESKTOP-K65EUBR:~/test_trt$ python test_trt.py
:1301: FutureWarning: The cuda.cuda module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.driver module instead.
:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
[2025-11-06 21:30:46] INFO config.py:54: PyTorch version 2.7.1+cu128 available.
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/modelopt/torch/utils/import_utils.py:32: UserWarning: Failed to import huggingface plugin due to: AttributeError(“module ‘transformers.modeling_utils’ has no attribute ‘Conv1D’”). You may ignore this warning if you do not need this plugin.
warnings.warn(
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/modelopt/torch/init.py:36: UserWarning: transformers version 4.57.1 is incompatible with nvidia-modelopt and may cause issues. Please install recommended version with pip install nvidia-modelopt[hf] if working with HF models.
_warnings.warn(
2025-11-06 21:30:50,629 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT LLM version: 1.0.0
[11/06/2025-21:30:50] [TRT-LLM] [I] Using LLM with PyTorch backend
[11/06/2025-21:30:50] [TRT-LLM] [W] Using default gpus_per_node: 1
[11/06/2025-21:30:50] [TRT-LLM] [I] Set nccl_plugin to None.
[11/06/2025-21:30:50] [TRT-LLM] [I] neither checkpoint_format nor checkpoint_loader were provided, checkpoint_format will be set to HF.
rank 0 using MpiPoolSession to spawn MPI processes
:1301: FutureWarning: The cuda.cuda module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.driver module instead.
:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
[2025-11-06 21:31:03] INFO config.py:54: PyTorch version 2.7.1+cu128 available.
Multiple distributions found for package optimum. Picked distribution: optimum
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/modelopt/torch/utils/import_utils.py:32: UserWarning: Failed to import huggingface plugin due to: AttributeError(“module ‘transformers.modeling_utils’ has no attribute ‘Conv1D’”). You may ignore this warning if you do not need this plugin.
warnings.warn(
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/modelopt/torch/init.py:36: UserWarning: transformers version 4.57.1 is incompatible with nvidia-modelopt and may cause issues. Please install recommended version with pip install nvidia-modelopt[hf] if working with HF models.
_warnings.warn(
2025-11-06 21:31:07,496 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT LLM version: 1.0.0
[TensorRT-LLM][INFO] Refreshed the MPI local session
torch_dtype is deprecated! Use dtype instead!
Loading safetensors weights in parallel: 100%|██████████| 1/1 [00:00<00:00, 61.60it/s]
Loading weights: 100%|██████████| 449/449 [00:00<00:00, 557.09it/s]
Model init total – 2.23s
[TensorRT-LLM][INFO] Number of tokens per block: 32.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 0.18 GiB for max tokens in paged KV cache (8352).
2025-11-06 21:31:12,625 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-11-06 21:31:48,328 - INFO - flashinfer.jit: Finished loading JIT ops: norm
[TensorRT-LLM][INFO] Number of tokens per block: 32.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 17.72 GiB for max tokens in paged KV cache (844512).
Processed requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3.15it/s]
Prompt: ‘Hello, my name is’, Generated text: ‘[Your Name] and I am a [Your Position] at [Your Company]. I am writing to express my interest in the [Job Title] position at’
Prompt: ‘The capital of France is’, Generated text: ‘Paris.\n\n2. B. C. The capital of Canada is Ottawa.\n\n3. A. C. The capital of Australia is Can’
Prompt: ‘The future of AI is’, Generated text: “bright, and it’s not just for big companies. Small businesses can also benefit from AI technology. Here are some ways:\n\n1.”

1
2
3
顺利跑通。

跑个服务试试:

trtllm-serve “TinyLlama/TinyLlama-1.1B-Chat-v1.0”

1
直到显示:

INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
/home/bobo/miniforge3/envs/trt/lib/python3.12/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The ‘validation_alias’ attribute with value ‘max_tokens’ was provided to the Field() function, which has no effect in the context it was used. ‘validation_alias’ is field-specific metadata, and can only be attached to a model field using Annotated metadata or by assignment. This may have happened because an Annotated type alias using the type statement was used, or if the Field() function was attached to a single member of a union type.
warnings.warn(
INFO: 127.0.0.1:54364 - “POST /v1/chat/completions HTTP/1.1” 200 OK

1
2
然后再起一个wsl的bash,输入

curl -X POST http://localhost:8000/v1/chat/completions -H “Content-Type: application/json” -H “Accept: application/json” -d ‘{
“model”: “TinyLlama/TinyLlama-1.1B-Chat-v1.0”,
“messages”:[{“role”: “system”, “content”: “You are a helpful assistant.”},
{“role”: “user”, “content”: “Where is New York? Tell me in a single sentence.”}],
“max_tokens”: 32,
“temperature”: 0
}’

1
2
得到答复:"New York is a city in the northeastern United States, located on the eastern coast of the state of New York."

{“id”:”chatcmpl-31b02f6ab4854863909850ab9688d8b1”,”object”:”chat.completion”,”created”:1762436547,”model”:”TinyLlama/TinyLlama-1.1B-Chat-v1.0”,”choices”:[{“index”:0,”message”:{“role”:”assistant”,”content”:”New York is a city in the northeastern United States, located on the eastern coast of the state of New York.”,”reasoning_content”:null,”tool_calls”:[]},”logprobs”:null,”finish_reason”:”stop”,”stop_reason”:null,”disaggregated_params”:null}],”usage”:{“prompt_tokens”:43,”total_tokens”:70,”completion_tokens”:27},”prompt_token_ids”:null}(trt)

1
2
3

又试了几个,能回答,理解能力有限,不能说中文。
换!换DeepSeek-R1-Distill-Qwen-1.5B

trtllm-serve “deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B”

1
挂了~提示了一堆关于如何控制显存的信息。

Please refer to the TensorRT LLM documentation for information on how to control the memory usage through TensorRT LLM configuration options. Possible options include:
Model: reduce max_num_tokens and/or shard the model weights across GPUs by enabling pipeline and/or tensor parallelism
Sampler: reduce max_seq_len and/or max_attention_window_size
Initial KV cache (temporary for KV cache size estimation): reduce max_num_tokens
Drafter: reduce max_seq_len and/or max_draft_len
Additional executor resources (temporary for KV cache size estimation): reduce max_num_tokens
Model resources created during usage: reduce max_num_tokens
KV cache: reduce free_gpu_memory_fraction
Additional executor resources: reduce max_num_tokens

trtllm-serve "meta-llama/Llama-3.2-1B"
要token,放弃。

验证完成。

备份和迁移conda环境

Posted on 2025-11-06

以test为例
添加环境新目录
conda config –add envs_dirs D:\conda\envs
查看
conda config –show envs_dirs

导出:
conda env export > test_env.yml

退出环境然后执行删除:
conda remove -n test –all

重建环境
conda env create -n test -f test_env.yml
验证
conda info –envs

包缓冲迁移
修改.conda文件或者直接执行:
conda config –add pkgs_dirs D:\conda\pkgs
conda config –remove pkgs_dirs C:\Users\xxxx.conda\pkgs
修改后文件内容

1
2
3
4
5
6
channels:
- conda-forge
envs_dirs:
- D:\conda\envs
pkgs_dirs:
- D:\conda\pkgs

为docker添加代理

Posted on 2025-05-17

在/etc/docker下创建daemon.json文件,内容如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
"dns": ["8.8.8.8", "8.8.4.4"],
"registry-mirrors": [
"https://docker.m.daocloud.io/",
"https://huecker.io/",
"https://dockerhub.timeweb.cloud",
"https://noohub.ru/",
"https://dockerproxy.com",
"https://docker.mirrors.ustc.edu.cn",
"https://docker.nju.edu.cn",
"https://xx4bwyg2.mirror.aliyuncs.com",
"http://f1361db2.m.daocloud.io",
"https://registry.docker-cn.com",
"http://hub-mirror.c.163.com"
],
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}


然后 sudo systemctl restart docker

测试一下:
docker pull busybox
是可以的

33.AI对AI算法过时情况的回答

Posted on 2025-04-04

问:如果时间有限,需要放弃一些过时算法,请给所有你知道的算法排个序,哪些优先放弃

在时间有限的情况下,可以按照以下优先级放弃或延后学习某些算法。排序基于实际应用频率、替代算法的有效性以及基础重要性(从最优先放弃到尽量保留):

第一优先级:可放弃的算法

  1. 完全过时且替代性强的算法
    单层感知机(Perceptron):已被多层感知机(MLP)和深度学习取代,教学价值有限。

Apriori算法(关联规则挖掘):计算效率低,FP-Growth 或深度学习(如序列模型)更高效。

LVQ(学习向量量化):原型聚类方法,实际应用极少,K-Means 更通用。

ID3决策树:仅支持离散特征,已被 C4.5 和 CART(支持连续特征)取代。

  1. 应用场景狭窄的算法
    SOM(自组织映射):无监督神经网络,应用集中在可视化领域,深度学习(如 t-SNE、UMAP)更常用。

传统贝叶斯网络结构学习:计算复杂且依赖领域知识,实际中更常用概率图模型工具包(如 PyMC3)。

K-Medoids:类似 K-Means,但对噪声敏感且计算成本高,优先学习 K-Means。

第二优先级:视领域需求选择性放弃
3. 特定领域替代性强的算法
LDA(线性判别分析):降维和分类任务中,PCA 更通用,除非需要类别分离指导降维。

传统遗传算法:优化问题中,梯度下降或贝叶斯优化更高效,除非研究进化计算。

HMM(隐马尔可夫模型):语音识别中被 RNN/Transformer 取代,但生物信息学仍可能用到。

传统马尔可夫链:简单序列建模中,RNN/LSTM 更强大。

  1. 复杂度高且少用的算法
    SMO 算法(SVM 的原始实现):实际使用中直接调库(如 sklearn.svm.SVC),无需手动实现。

传统核方法(如自定义核函数):深度学习兴起后,核技巧的重要性下降。

尽量保留的核心算法
5. 基础且广泛应用的算法
逻辑回归:分类任务的基础,可解释性强。

线性 SVM(LinearSVC):高维数据分类高效,工业界常用。

决策树(CART):随机森林和 GBDT 的基础。

KNN:惰性学习代表,适合实时场景。

随机森林 & 梯度提升(XGBoost/LightGBM):结构化数据的主流模型。

K-Means & PCA:无监督学习的基础工具。

神经网络基础(MLP、CNN、RNN):深度学习的核心。

  1. 特定场景的关键算法
    朴素贝叶斯:文本分类和小数据场景高效。

协同过滤:推荐系统的经典方法。

ARIMA:时间序列分析(若涉及该领域需保留)。

总结建议
放弃:单层感知机、Apriori、LVQ、ID3、SOM。

选择性放弃:LDA、传统遗传算法、HMM(除非领域相关)、传统核方法。

保留核心:逻辑回归、SVM、决策树、集成学习(RF/GBDT)、KNN、PCA、神经网络。

实践优先:通过工具库(如 Scikit-learn)快速应用,再深入原理。

注意:即使放弃某些算法,仍需了解其基本思想(如“关联规则挖掘”或“进化计算”),以便需要时快速补足。

sip调测工具

Posted on 2025-03-17

sipp
wireshark

linux系统新装必要工具

Posted on 2025-03-08

应用:
sudo apt conky guake terminator
环境:
vsc 微信 wps

为系统增加deepseek接入

Posted on 2025-02-21

上次说完ollama部署,现在我们试试将deepseek接入到系统里,使用方法就是调用http接口

首先 ollama serve

然后增加调用
curl -X POST http://localhost:11434/api/generate -H “Content-Type: application/json” -d “{"model":"deepseek-r1:1.5b","prompt":"天空为什么是蓝色的?"}”

安装boost和使用log4cpp的一些事情

Posted on 2025-01-14

安装boost用:
apt install libboost-all-dev

安装log4cpp
sudo apt install liblog4cpp5-dev

安装docker
sudo apt install docker.io

上传my_mongo.tar,然后
docker load < my_mongo.tar

使用ollama运行deepseek的条件和过程(待续)

Posted on 2025-01-14

首先是空间需求,Qwen系列是使用DeepSeek蒸馏后的Qwen模型,Llama是DeepSeek蒸馏后的Llama模型。

模型的安装命令

1.5B Qwen DeepSeek R1

所需空间大约 1.1G

ollama run deepseek-r1:1.5b

7B Qwen DeepSeek R1

所需空间大约 4.7G

ollama run deepseek-r1:7b

8B Llama DeepSeek R1

所需空间大约 4.9G

ollama run deepseek-r1:8b

14B Qwen DeepSeek R1

所需空间大约 9G

ollama run deepseek-r1:14b

32B Qwen DeepSeek R1

所需空间大约 20G

ollama run deepseek-r1:32b

70B Llama DeepSeek R1

所需空间大约 43G

ollama run deepseek-r1:70b

671B Llama DeepSeek R1

所需空间大约 404G

ollama run deepseek-r1:671b

Windows的环境变量监听

OLLAMA_HOST 0.0.0.0

启动命令

ollama serve

可以用chatbox连接访问

下载了ollama和本地模型后,导入本地模型的方法:
创建一个modelfile.mf文件,内容为:
FROM ./xxxx.gguf
然后ollama create deepseek-r1_local:7b -f modelfile.mf
就可以通过ollama ls 看到新导入的镜像,然后用
ollama run 镜像名就可以运行了

12…4下一页

39 posts
15 tags
© 2025 qinsibo
Powered by Hexo
|
Theme — NexT.Gemini v5.1.4
津ICP备19000975号-1
# 津ICP备19000975号-1