Update README.md

This commit is contained in:
Zeng Aohan 2023-03-13 17:19:13 +00:00 committed by huggingface-web
parent 7424d1fead
commit 0772a2f0c8
1 changed files with 7 additions and 3 deletions

View File

@ -56,10 +56,14 @@ print(history)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
```
替换为
替换为8-bit 量化)
```python
model = AutoModel.from_pretrained("THUDM/chatglm-6b", device_map="auto", load_in_8bit=True, trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().quantize(8).cuda()
```
或者4-bit 量化)
```python
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().quantize(4).cuda()
```
进行 2 至 3 轮对话后8-bit 量化下约占用 10GB 的 GPU 显存4-bit 量化仅需占用 6GB 的 GPU 显存。随着对话轮数的增多,对应消耗显存也随之增长。