| Login | | Don't have an account yet? You can create one. As a registered user you have some advantages like theme manager, comments configuration and post comments with your name. | |
| Who's Online | There are currently, 65 guest(s) and 0 member(s) that are online.
You are Anonymous user. You can register for free by clicking here | |
 | |
|
Verification Guild: Forums |
|
| View previous topic :: View next topic |
| Author |
Message |
Newsletter Original Contribution

Joined: Dec 08, 2003 Posts: 1107
|
Posted: Sun Aug 04, 2002 11:00 pm Post subject: Expected performance of C/C++ BFMs through PLI |
|
|
(Originally from Issue 3.12, Item 5.0)
From: Kanwarpreet Grewal
I am creating a BFM in C/C++ that is supposed to work through a
verilog wrapper and supposed to work with a verilog simulator.
What are the kinds of speeds I should expect in terms of simulation
cycles per second?
I am getting a speed to 100s of cycles per second... Is this too slow?
What is expected of similar BFMs that use a lot of PLI calls? |
|
| Back to top |
|
 |
Newsletter Original Contribution

Joined: Dec 08, 2003 Posts: 1107
|
Posted: Sun Sep 15, 2002 11:00 pm Post subject: Expected performance of C/C++ BFMs through PLI |
|
|
(Originally from Issue 3.13, Item 5.0)
From: Bernard Deadman
This reminds me of the famous question "how long is a piece of
string"? The problem here is we don't know if "100's of cycles per
second" relates to the whole model, or just to the BFM on it's own
(how the heck do you run a BFM in isolation....)? What simulator is
he using? What platform is it on? What kind of protocol? Is it
simple like ARM APB (2 clock cycles per transactions), or does it
require 1,000's of cycles per transaction like serial Ethernet?
As many of your correspondents will know by now, we're working on
automatically generating bus interfaces for a variety of environments.
Performance is a critical and commercially sensitive issue for us, but
let me illustrate this with an example and "order of magnitude"
results.
First, what exactly is a BFM? We build transactors or transducers
that convert high-level representations of transactions to low level
signal activity. An example might be a data structure that contains a
number of parameters about a burst-mode transaction, and pointer to
the actual data. It's our task to deliver this to the simulated bus
according to a protocol, or alternatively to monitor the bus and to
recognize and record transactions taking place. Do we have the same
concept of a "BFM", because adding functionality can substnatially
change performance.
As an eample, we run tests where we connect an ARM AHB master to an
ARM AHB slave, add decode and arbitration logic and a non-invasive AHB
monitor. To the nearest order of magnitude, this configuration takes
~100 seconds to move 1,000,000 pseudo randomly generated transactions
between master and slave on a ~1GHz Pentium III using SystemC compiled
with optimization under MS Visual C++. It simulates over 10,000,000
clock cycles when you allow for waits and split transactions. If we
do the same thing in pure Verilog or pure VHDL the performance is in
the same ballpark.
Our benchmark is therefore of the order of 10,000 transactions or
100,000 bus cycles per second through a combination of 3 (master,
slave & monitor) transactors, though remember we don't have much
overhead due to a model. As anyone familiar with AHB knows we are
decoding and routing signals for multiple masters and slaves under a
non-tri-state scheme, therefore, as a guide, we are handling 500-1,000
signals at the top level of the testbench.
If I try now to relate our experience to the PLI question, there are
two basic architectures you can adopt:
1) Create transactions in C/C++, and pass the whole transaction to
Verilog for simulation. This is efficient because there is only
one PLI call per transaction, *but* you can run into significant
problems passing blocks of data for burst mode transactions, and
data/address values wider than 32 bits through the calls. If you
solve the argument problems I would not expect to see the
performance drop by more than an order of magnitude in our
application.
2) Manipulate all of the signals within C/C++. This solves the
argument passing issues, but has serious performance implications
because you need to make a PLI call at least for every clock edge,
and sometimes for every signal change. If you have a complex bus
and factor in a significant amount of time to process a design
model (DUV if you prefer) it's easy to drop the speed to "100's of
cycles per second" or less.
I hope this gives some guidance to the possible length of a piece of
string. |
|
| Back to top |
|
 |
Newsletter Original Contribution

Joined: Dec 08, 2003 Posts: 1107
|
Posted: Mon Oct 07, 2002 11:00 pm Post subject: Expected performance of C/C++ BFMs through PLI |
|
|
(Originally from Issue 3.14, Item 6.0)
From: Jeff Li
According to the numbers in
http://www.research.ibm.com/journal/rd/414/mullen.html#table1, your
performance is comparable to hardware accelerators and cycle-based
simulators if your model is that large. However, your model size is
probably comparable to Bernard's case.
Bernard's comparison of 2 basic architectures is well supported by
many people's experiences. One of them is
http://janick.bergeron.com/guild/3-09.html#Item_06, where the C code
takes 14% of CPU time though its architecture is not clear. Another
one is the example on page 12 of
http://www.synopsys.com/news/pubs/veritb/q302/verification_ave5.pdf,
where PLI takes more than 86% of CPU time. At least, these 2 examples
show how the efficiencies can vary.
Bernard's comparison of 2 basic architectures also applies to HVL
coding because HVLs are all PLI applications!!! It is surprising that
HVL experts do not tell people about this. The basic idea is "write
BFM in HDL only!".
It is said that Vera-VCS should be better because the connection
between them is better than the normal PLI. How much better? According
to a Synopsys internal experiment, the simulation speed of an example
from a Vera training class is 5 times faster when switching between
Bernard's 2 basic architectures!!!
So, when you code or use a BFM in a HVL, please check whether your
overnight simulation can be done at lunch time if the BFM is just in
HDL. Your simulator's profiling feature should provide some hints!
However, BFMs in HVLs are still very helpful if you have lots of
computers and simulator licenses. They are your best friends if you
enjoy the breaks while simulator is getting to where you can start
debugging... It is even better if your boss is responsible for having
these breaks:) |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
| |
|
|