Quantization and Pruning APIs#
Create Pruning Strategies API#
Endpoint: /optimization/prune
Method: POST
Description: The Create Pruning Strategies API allows users to create pruning strategies for model optimization.
Parameters:
model_name (string, query, required): Model name
inputs (string, query, required): Shapes of the input tensors (comma-separated)
outputs (string, query, required): Shapes of the output tensors (comma-separated)
strategy (string, query, required): Pruning strategy
compression (integer, query): Compression factor (Default value: 1)
X-Fields (string, header): An optional fields mask
Responses:
200 OK: Success
List All Pruning Strategies API#
Endpoint: /optimization/prune/strategies
Method: GET
Description: The List All Pruning Strategies API retrieves a list of all pruning strategies.
Responses:
200 OK: Success
Create Quantization Strategies API#
Endpoint: /optimization/quantize
Method: POST
Description: The Create Quantization Strategies API allows users to create quantization strategies for model optimization.
Parameters:
model_name (string, query, required): Model name
strategy (string, query, required): Quantization strategy
quant_type (string, query): Quantization type
precision (integer, query, required): Quantization precision
leaf_node (boolean, query): Quantize leaf nodes only (Default value: false)
X-Fields (string, header): An optional fields mask
Responses:
200 OK: Success
List All Quantization Strategies API#
Endpoint: /optimization/quantize/strategies
Method: GET
Description: The List All Quantization Strategies API retrieves a list of all quantization strategies.
Responses:
200 OK: Success