baichuan-7B is an open-source large-scale pre-trained model developed by Baichuan Intelligent Technology. Based on the Transformer architecture, it is a model with 7 billion parameters trained on approximately 1.2 trillion tokens. It supports both Chinese and English, with a context window length of 4096. It achieves the best performance of its size on standard Chinese and English authoritative benchmarks (C-EVAL/MMLU).
If you wish to use baichuan-7B (for inference, finetuning, etc.), we recommend using the accompanying code library [baichuan-7B](https://github.com/baichuan-inc/baichuan-7B).
The following is a task of performing 1-shot inference using baichuan-7B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez"
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA:
- Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
- Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
- Layer Normalization: Pre-Normalization based on [RMSNorm](https://arxiv.org/abs/1910.07467).
The specific parameters are as follows:
| Hyperparameter | Value |
|----------------|-------|
|n_parameters | 7000559616 |
|n_layers | 32 |
| n_heads | 32 |
| d_model | 4096 |
| vocab size | 64000 |
| sequence length | 4096 |
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Downstream Use
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
We have also open-sourced the training code that accompanies this model, allowing for efficient finetuning for downstream tasks. For more details, please refer to [baichuan-7B](https://github.com/baichuan-inc/baichuan-7B).
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
在没有充分评估风险和采取缓解措施的情况下投入生产使用;任何可能被视为不负责任或有害的使用案例。
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
baichuan-7B can produce factually incorrect output, and should not be relied on to produce factually accurate information. baichuan-7B was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
In addition to Chinese, we also tested the model's performance in English.
#### MMLU
[MMLU](https://arxiv.org/abs/2009.03300) is an English evaluation dataset that includes 57 multiple-choice tasks, covering elementary mathematics, American history, computer science, law, etc. The difficulty ranges from high school level to expert level, making it a mainstream LLM evaluation dataset.
We adopted the [open-source]((https://github.com/hendrycks/test)) evaluation scheme, and the final 5-shot results are as follows:
| Model | Humanities | Social Sciences | STEM | Other | Average |