llmcompressor.observers.min_max
MinMaxObserver
Bases: Observer
Implements a quantization observer that calculates scale and zero point based on the minimum and maximum values of the tensor being observed. If averaging_constant is specified, then the scales are updated using a moving average
Source code in llmcompressor/observers/min_max.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
calculate_gparam(observed)
Generate a global scale using the observed min and max.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observed | Tensor | observed tensor to calculate quantization parameters for | required |
Returns:
Type | Description |
---|---|
Tensor | updated global scale derived from the observed tensor |
Source code in llmcompressor/observers/min_max.py
calculate_qparams(observed, reduce_dims=None, tensor_id=None, global_scale=None)
Generate a scale and zero-point using the observed min and max.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observed | Tensor | observed tensor to calculate quantization parameters for | required |
reduce_dims | Optional[Tuple[int]] | optional tuple of dimensions to reduce along, returned scale and zero point will be shaped (1,) along the reduced dimensions | None |
tensor_id | Optional[Any] | Optional id if different ranges of observed tensors are passed, useful for sharding tensors by group_size | None |
global_scale | Optional[Tensor] | optional scale to further scale local quantization scales | None |
Returns:
Type | Description |
---|---|
Tuple[FloatTensor, IntTensor] | tuple of scale and zero point derived from the observed tensor |
Source code in llmcompressor/observers/min_max.py
calculate_updated_min_max(observed, reduce_dims=None, tensor_id=None)
Updates the observed min and max using a moving average smoothed by the averaging_constant. Set the averaging_constant to 1.0 to disable averaging.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observed | Tensor | observed tensor to calculate quantization parameters for | required |
reduce_dims | Optional[Tuple[int]] | optional tuple of dimensions to reduce along, returned scale and zero point will be shaped (1,) along the reduced dimensions | None |
tensor_id | Optional[Any] | Optional id if different ranges of observed tensors are passed, useful for sharding tensors by group_size | None |
Returns:
Type | Description |
---|---|
updated min and max values |
Source code in llmcompressor/observers/min_max.py
get_qparams_along_dim(observed, dim, tensor_id=None, global_scale=None)
Calculate quantization parameters along the specified dimension