Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Request V2 files from graphql #361

Merged
merged 9 commits into from
Sep 7, 2023
Merged

Request V2 files from graphql #361

merged 9 commits into from
Sep 7, 2023

Conversation

horheynm
Copy link
Contributor

@horheynm horheynm commented Aug 23, 2023

Using this branch on deepsparse:

(.venv) george@quad-mle-2:~/nm/deepsparse$ deepsparse.benchmark zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none
2023-09-05 14:56:20 deepsparse.benchmark.helpers INFO     Thread pinning to cores enabled
Downloading (…)A8/model.onnx.tar.gz: 100%|████████████████████████████████████████████████████| 1.01G/1.01G [01:33<00:00, 11.6MB/s]
2023-09-05 14:58:29 deepsparse.benchmark.benchmark_model INFO     Found model with KV cache support. Benchmarking the autoregressive model with input_ids_length: 1 and sequence length: 512.
2023-09-05 14:58:30 deepsparse.transformers.utils.helpers INFO     Overwriting in-place the input shapes of the transformer model at /home/george/.cache/sparsezoo/neuralmagic/opt-1.3b-opt_pretrain-pruned50_quantW8A8/model.onnx
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230815 COMMUNITY | (134dba40) (release) (optimized) (system=avx2, binary=avx2)
[7f1214ded700 >WARN<  operator() ./src/include/wand/utility/warnings.hpp:14] Generating emulated code for quantized (INT8) operations since no VNNI instructions were detected. Set NM_FAST_VNNI_EMULATION=1 to increase performance at the expense of accuracy.
2023-09-05 15:00:33 deepsparse.benchmark.benchmark_model INFO     deepsparse.engine.Engine:
...
2023-09-05 15:00:33 deepsparse.utils.onnx INFO     Generating input 'causal_mask', type = int64, shape = [1, 1, 1, 512]
2023-09-05 15:00:33 deepsparse.benchmark.benchmark_model INFO     Starting 'singlestream' performance measurements for 10 seconds
Original Model Path: zoo:nlg/text_generation/opt-1.3b/pytorch/huggingface/opt_pretrain/pruned50_quantW8A8-none
Batch Size: 1
Sequence Length: 512
Input IDs Length: 1
Scenario: sync
Throughput (items/sec): 17.4194
Latency Mean (ms/batch): 57.3149
Latency Median (ms/batch): 57.0588
Latency Std (ms/batch): 1.6795
Iterations: 175

@horheynm horheynm marked this pull request as ready for review September 5, 2023 14:20
@horheynm horheynm merged commit 05e55e5 into main Sep 7, 2023
@horheynm horheynm deleted the request-v2-files branch September 7, 2023 20:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants