llmcompressor.entrypoints.oneshot
Oneshot
Class responsible for carrying out one-shot calibration on a pretrained model.
This class handles the entire lifecycle of one-shot calibration, including preprocessing (model and tokenizer/processor initialization), model optimization (quantization or sparsification), and postprocessing (saving outputs). The intructions for model optimization can be specified by using a recipe.
-
Input Keyword Arguments:
kwargs
are parsed into:model_args
: Arguments for loading and configuring a pretrained model (e.g.,AutoModelForCausalLM
).dataset_args
: Arguments for dataset-related configurations, such as calibration dataloaders.recipe_args
: Arguments for defining and configuring recipes that specify optimization actions.
Parsers are defined in
src/llmcompressor/args/
. -
Lifecycle Overview: The oneshot calibration lifecycle consists of three steps:
- Preprocessing:
- Instantiates a pretrained model and tokenizer/processor.
- Ensures input and output embedding layers are untied if they share tensors.
- Patches the model to include additional functionality for saving with quantization configurations.
- Oneshot Calibration:
- Optimizes the model using a global
CompressionSession
and applies recipe-defined modifiers (e.g.,GPTQModifier
,SparseGPTModifier
)
- Optimizes the model using a global
- Postprocessing:
- Saves the model, tokenizer/processor, and configuration to the specified
output_dir
.
- Saves the model, tokenizer/processor, and configuration to the specified
- Preprocessing:
-
Usage:
Methods: init(**kwargs): Initializes the Oneshot
object by parsing input arguments, performing preprocessing, and setting instance attributes.
__call__(**kwargs):
Performs the one-shot calibration process by preparing a calibration
dataloader, applying recipe modifiers to the model, and executing
postprocessing steps.
save():
Saves the calibrated model and tokenizer/processor to the specified
`output_dir`. Supports saving in compressed formats based on model
arguments.
apply_recipe_modifiers(calibration_dataloader, **kwargs):
Applies lifecycle actions (e.g., `initialize`, `finalize`) using modifiers
defined in the recipe. Each action is executed via the global
`CompressionSession`.
Source code in llmcompressor/entrypoints/oneshot.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
__call__()
Performs one-shot calibration.
This method prepares a calibration dataloader using dataset arguments and applies recipe-based modifiers to optimize the model. The lifecycle actions are executed sequentially, and the modified model is saved during postprocessing.
Source code in llmcompressor/entrypoints/oneshot.py
__init__(log_dir='sparse_logs', **kwargs)
Initializes the Oneshot
class with provided arguments.
Parses the input keyword arguments into model_args
, dataset_args
, and recipe_args
. Performs preprocessing to initialize the model and tokenizer/processor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_args | ModelArguments parameters, responsible for controlling model loading and saving logic | required | |
dataset_args | DatasetArguments parameters, responsible for controlling dataset loading, preprocessing and dataloader loading | required | |
recipe_args | RecipeArguments parameters, responsible for containing recipe-related parameters | required | |
output_dir | Path to save the output model after carrying out oneshot | required | |
log_dir | Optional[str] | Path to save logs during oneshot run. Nothing is logged to file if None. | 'sparse_logs' |
Source code in llmcompressor/entrypoints/oneshot.py
apply_recipe_modifiers(calibration_dataloader, recipe_stage=None)
Applies recipe modifiers to the model during the lifecycle.
The modifiers are defined in the recipe and executed via lifecycle actions (initialize
, finalize
) through the global CompressionSession
.
Source code in llmcompressor/entrypoints/oneshot.py
oneshot(model, distill_teacher=None, config_name=None, tokenizer=None, processor=None, cache_dir=None, use_auth_token=False, precision='auto', tie_word_embeddings=False, trust_remote_code_model=False, save_compressed=True, model_revision='main', recipe=None, recipe_args=None, clear_sparse_session=False, stage=None, dataset=None, dataset_config_name=None, dataset_path=None, num_calibration_samples=512, shuffle_calibration_samples=True, max_seq_length=384, pad_to_max_length=True, text_column='text', concatenate_data=False, streaming=False, overwrite_cache=False, preprocessing_num_workers=None, min_tokens_per_module=None, calibrate_moe_context=False, output_dir=None, log_dir='sparse_logs', **kwargs)
Performs oneshot calibration on a model.
Model arguments
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model | Union[str, PreTrainedModel] | A pretrained model identifier from huggingface.co/models or a path to a local model. Required parameter. | required |
distill_teacher | Optional[str] | Teacher model (a trained text generation model) for distillation. | None |
config_name | Optional[str] | Pretrained config name or path if not the same as model_name. | None |
tokenizer | Optional[Union[str, PreTrainedTokenizerBase]] | Pretrained tokenizer name or path if not the same as model_name. | None |
processor | Optional[Union[str, ProcessorMixin]] | Pretrained processor name or path if not the same as model_name. | None |
cache_dir | Optional[str] | Where to store the pretrained data from huggingface.co. | None |
use_auth_token | bool | Whether to use Hugging Face auth token for private models. | False |
precision | str | Precision to cast model weights to, default to auto. | 'auto' |
tie_word_embeddings | bool | Whether the model's input and output word embeddings should be tied. | False |
trust_remote_code_model | bool | Whether to allow for custom models to execute their own modeling files. | False |
save_compressed | bool | Whether to compress sparse models during save. | True |
model_revision | str | The specific model version to use (can be branch name, tag, or commit id). # Recipe arguments | 'main' |
recipe | Optional[Union[str, List[str]]] | Path to a LLM Compressor sparsification recipe. | None |
recipe_args | Optional[List[str]] | List of recipe arguments to evaluate, in the format "key1=value1", "key2=value2". | None |
clear_sparse_session | bool | Whether to clear CompressionSession/ CompressionLifecycle data between runs. | False |
stage | Optional[str] | The stage of the recipe to use for oneshot. # Dataset arguments | None |
dataset | Optional[Union[str, Dataset, DatasetDict]] | The name of the dataset to use (via the datasets library). | None |
dataset_config_name | Optional[str] | The configuration name of the dataset to use. | None |
dataset_path | Optional[str] | Path to a custom dataset. Supports json, csv, dvc. | None |
num_calibration_samples | int | Number of samples to use for one-shot calibration. | 512 |
shuffle_calibration_samples | bool | Whether to shuffle the dataset before calibration. | True |
max_seq_length | int | Maximum total input sequence length after tokenization. | 384 |
pad_to_max_length | bool | Whether to pad all samples to | True |
text_column | str | Key to use as the | 'text' |
concatenate_data | bool | Whether to concatenate datapoints to fill max_seq_length. | False |
streaming | bool | True to stream data from a cloud dataset. | False |
overwrite_cache | bool | Whether to overwrite the cached preprocessed datasets. | False |
preprocessing_num_workers | Optional[int] | Number of processes for preprocessing. | None |
min_tokens_per_module | Optional[float] | Minimum percentage of tokens per module, relevant for MoE models. # Miscellaneous arguments | None |
output_dir | Optional[str] | Path to save the output model after calibration. Nothing is saved if None. | None |
log_dir | Optional[str] | Path to save logs during oneshot run. Nothing is logged to file if None. | 'sparse_logs' |
Returns:
Type | Description |
---|---|
PreTrainedModel | The calibrated PreTrainedModel |
Source code in llmcompressor/entrypoints/oneshot.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
|