Sunday, March 27, 2011

SciMark2 and Delphi

Your reviewer was surprised with SciMark test ported to Delphi that Mr. Phillip Goh posted.

On a Core i7-2600K it achieved:

**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:         1885.62
FFT             Mflops:  1426.20    (N=1024)
SOR             Mflops:  1146.66    (100 x 100)
MonteCarlo:     Mflops:   515.48
Sparse matmult  Mflops:  2096.31    (N=1000, nz=5000)
LU              Mflops:  4243.46    (M=100, N=100)

With same Delphi version:

**                                                               **
** SciMark2a Numeric Benchmark, see http://math.nist.gov/scimark **
**                                                               **
** Delphi Port, see http://code.google.com/p/scimark-delphi/     **
**                                                               **
Mininum running time = 2.00 seconds
Composite Score MFlops:   576.03
FFT             Mflops:   363.08    (N=1024)
SOR             Mflops:   935.87    (100 x 100)
MonteCarlo:     Mflops:   181.13
Sparse matmult  Mflops:   488.35    (N=1000, nz=5000)
LU              Mflops:   911.74    (M=100, N=100)

Readers are welcome to submit their own tests and find their own conclusions.
The C++ version is 3.2 times faster than Delphi version.

Readers can download the test at http://code.google.com/p/scimark-delphi and run it on their own computers.


There are deep implications beyond MFlops, FFT, LU and other metric tests.

Suppose you tell an organization to "get the latest hardware" to run your Delphi apps faster.

This benchmark was run on an Intel Core i7-2600K, 3.4GHZ (TurboBoost to 3.7GHZ). This is quite expensive for consumers. Most corporations would probably buy a Core i3 or Core i5 to give to their staff to use. If the program is not optimized to take advantage of latest hardware, it's like telling whatever expensive hardware you give it, it gives very little performance gains.

Suppose you're the developer creating these wonderful apps that sell for money but for some reason, there's this part which takes incredibly long, like generating reports, crunching numbers for sales report, making that graph (licensed from TeeChart) or computational intensive operation (like posting to proprietary Delphi Database). It takes hours or maybe a whole day to generate report.

Your competitor uses C# (or JavaScript or C++) and then makes a competing app, which seems to be 30 or maybe 50 times quicker than your app. What happens next?

You can foresee trouble that starts to brew.

Update 1:
The benchmark for Delphi is from Mr. Phillip Goh's Google Code site. Download the link from that GoogleCode website, it comes with 3 EXE files. From looking at the binary, it is GCC.

51 comments:

Anonymous said...

Many years ago the keepers of Delphi made it plain they were not working very hard to maintain their compiler. There was a time it seemed so sweet that they could just go DotNet and not worry about the native side anymore. It also seemed quaint that they could neglect their C++ compiler yet remain relevant to the C++ arena. It crushed me when I realized I had to move to C++/Qt to get the performance I needed. The Delphi forums were always for *excusers* that held the Delphi compiler team to any high standard. "It's fast enough" is their motto. The fact is, their compilers simply do not receive another attention. They have fallen behind in performance and they have insufficient desire to EVER compete performance-wise, because in their minds their resultant executables are "fast enough". I have cried the last tear and sighed the last sigh, that is just how things are.

Anonymous said...

Are there any test done with C++Builder?

Ciprian Khlud said...

As any benchmark appears anyone will ask: what about Java/C#/whatever. What about this or that optimization (I know, Delphi has not that many). Yet, this post really makes me to stay away as I miss the advantages of using such of an aging platform.

Anonymous said...

Okay i tested it myself:

I know configuration is not top but it
is common as dev machine in my country
and it gives comparable results.

Pentium 4 CPU 3.2 2 GB RAM
WinXP SP3 x86

Tests are downloaded from
http://scimark-delphi.googlecode.com/files/Scimark-binary.7z
(binaries, delphi XE, GCC and C#) and just tested :-)


http://math.nist.gov/scimark2/scimark2_1c.zip
are compiled with (MSC, MSC .NET, BCC32 631, BCC32 551)
for release /O2

Headers are striped from post to fit 4000 chars...

** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov) **
** **


T:\Scimark-binary>scimark-c.exe
Using 2.00 seconds min time per kenel.
Composite Score: 461.99
FFT Mflops: 395.88 (N=1024)
SOR Mflops: 422.39 (100 x 100)
MonteCarlo: Mflops: 64.59
Sparse matmult Mflops: 660.48 (N=1000, nz=5000)
LU Mflops: 766.61 (M=100, N=100)

T:\Scimark-binary>Scimark-delphi.exe
Mininum running time = 2.00 seconds
Composite Score MFlops: 215.75
FFT Mflops: 104.92 (N=1024)
SOR Mflops: 419.45 (100 x 100)
MonteCarlo: Mflops: 49.09
Sparse matmult Mflops: 142.66 (N=1000, nz=5000)
LU Mflops: 362.64 (M=100, N=100)

T:\Scimark-binary>SciMark-csharp.exe
Mininum running time = 2 seconds

Composite Score: 293.12 MFlops
FFT : 286.24 - (1024)
SOR : 400.41 - (100x100)
Monte Carlo : 22.37
Sparse MatMult : 401.37 - (N=1000, nz=5000)
LU : 355.21 - (100x100)


T:\SCIMARK>sciamrk2_bcc_551_O2_686.exe
Using 2.00 seconds min time per kenel.
Composite Score: 211.73
FFT Mflops: 84.54 (N=1024)
SOR Mflops: 411.04 (100 x 100)
MonteCarlo: Mflops: 56.13
Sparse matmult Mflops: 139.80 (N=1000, nz=5000)
LU Mflops: 367.12 (M=100, N=100)

T:\SCIMARK>scimark_bcc32_631_O2_686.exe
Using 2.00 seconds min time per kenel.
Composite Score: 246.04
FFT Mflops: 106.51 (N=1024)
SOR Mflops: 422.39 (100 x 100)
MonteCarlo: Mflops: 55.42
Sparse matmult Mflops: 140.76 (N=1000, nz=5000)
LU Mflops: 505.12 (M=100, N=100)

T:\SCIM>scimark2_msc_16_O2.exe
Using 2.00 seconds min time per kenel.
Composite Score: 568.55
FFT Mflops: 432.60 (N=1024)
SOR Mflops: 622.80 (100 x 100)
MonteCarlo: Mflops: 66.58
Sparse matmult Mflops: 655.36 (N=1000, nz=5000)
LU Mflops: 1065.42 (M=100, N=100)

T:\SCIM>scimark2_msc_16_clrpure_O2_TP.exe
Using 2.00 seconds min time per kenel.
Composite Score: 486.91
FFT Mflops: 436.06 (N=1024)
SOR Mflops: 425.38 (100 x 100)
MonteCarlo: Mflops: 64.59
Sparse matmult Mflops: 625.94 (N=1000, nz=5000)
LU Mflops: 882.57 (M=100, N=100)


From fastest to slowest: (only Composite score)

MS C++ 16 - 568.55
MS C++ 16 CLR PURE .NET 4 - 486.91 NOTICE COMPILED AS CPP
GCC 4.5.1 - 461.99
C# .NET 4 - 293.12
BCC32 631 XE - 246.04
DELPHI XE - 215.75
BCC32 551 - 211.73

Wow, C++Builder XE 83% of C#
And nice improvements since 2000 (BCC32 551) till 2011 (BCC32 631) whole 16%

Žilvinas Ledas said...

Keep in mind, that this Delphi port is not a good port. It was ported with some "de-optimizing" modifications.

For example, comparing LU.c and LU.pas we can clearly see, that we can:
1. inline SwapRow();
2. Use Inc() instead of "something := something + X";
3. translate this:
---
int ii;
for (ii=j+1; ii<M; ii++)
{
double *Aii = A[ii];
double *Aj = A[j];
double AiiJ = Aii[j];
int jj;
for (jj=j+1; jj<N; jj++)
Aii[jj] -= AiiJ * Aj[jj];

}
---
to this:
---
for ii := j + 1 to M do begin
Aii := A[ii];
Aj := A[j];
AiiJ := Aii[j];
for jj := j + 1 to N do
Aii[jj] -= AiiJ * Aj[jj];
end;
---
instead of not optimal current translation.

These modifications and latest (trunk) Free Pascal Compiler increased LU score by ~25% on my computer.

Žilvinas Ledas said...

Keep in mind, that this Delphi port is not a good port. It was ported with some "de-optimizing" modifications.

For example, comparing LU.c and LU.pas we can clearly see, that we can:
1. inline SwapRow();
2. Use Inc() instead of "something := something + X";
3. translate this:
---
int ii;
for (ii=j+1; ii<M; ii++)
{
double *Aii = A[ii];
double *Aj = A[j];
double AiiJ = Aii[j];
int jj;
for (jj=j+1; jj<N; jj++)
Aii[jj] -= AiiJ * Aj[jj];

}
---
to this:
---
for ii := j + 1 to M do begin
Aii := A[ii];
Aj := A[j];
AiiJ := Aii[j];
for jj := j + 1 to N do
Aii[jj] -= AiiJ * Aj[jj];
end;
---
instead of not optimal current translation.

These modifications and latest (trunk) Free Pascal Compiler increased LU score by ~25% on my computer.

Anonymous said...

These tests look like a complete bullshit:

original ANSI C based version of the tests does not use neither generics nor OOP features.

Just plain C code.

What we are comparing? Apples and boxes?

Or "qualification" of the Delphi developer who did the port?

Also attached Delphi binaries are unoptimized. I recompiled them using D2009 Release Mode and got +15% boost.

Also I've compared results with Java SE and C#.

Overall score:

C - 903
C# (.NET 4.0) - 678
Java SE6 - 589,7
Delphi 2009 - 472,11

(Core 2 Duo E8500 W7/x64 was used).

So even the Delphi port is quite fast as for me (but I'm sure it's still possible to optimize it using code rewrite).

Delphi Haters said...

Hi,
You missed the point.

If you be so kind, post your update to a public GoogleCode or SourceForget with both your *.pas file and *.EXE. I'll re-run the test and post results again.

The C source code is unoptimized. But it seems the Sandy-Bridge Core i7-2600K runs the code significantly faster.

The term quite fast is relative. Did you notice your Delphi score is the LOWEST and BELOW Java, C++ and C#?

You talk to component vendors. Do you know of ANY code from ANY component vendor which is optimized for Generics and better OOP features?

Frankly, they will not be interested in doing any optimizations because their "base" or profit center is Delphi 7, Delphi 2007, which is before generics came out.

Delphi Haters said...

See original score posted.

1885 vs 576.

OK, suppose you raise performance by 15%. So you get around 662. Maybe someone who's an expert can get it another 10% (25%) bump. That's 720.

So the C version gives you 1885, the Delphi version 720.

Put that in percentage. The performance is BELOW 50% of the C version.

I was hoping at least the Delphi version get at RESPECTABLE performance, like 1000+ or maybe 1500+ on a Core i7-2600K.

Oh wait, the C# and Java version with SSE2, SSE3 performance would beat Delphi performance.

That can mean two things. Either Delphi EXEs are fat, bloated and lazy.

Or maybe the people at Sun, Microsoft, and open-source developers who write GCC are making better progress than the Delphi compiler.

Who has shame now?
If you want to talk big, want to make bold claims, go ahead.

Go ahead and say Delphi is better. Good luck if you can get a well paying Delphi job.

Wait... there's plenty of PHP, Java and NET jobs which pay well. Wonder why?

PG said...

Hi,

@ Žilvinas Ledas
1) Shouldn't the compiler automatically inline any functions it thinks are inlinable? GCC turns on this optimization by default at -O3.

2) something := something + 1, is ugly but I was looking for ++something. Even so, shouldn't the compiler be smart enough to know that Inc(something) is equivalent to something := something + 1?

3) There are loads of C#/C++ constructs that I found difficult to port to Delphi. This is particularly visible in the for loops of the FFT class. It is common to increment the loop counter by more than one, or increment multiple variables at each iteration. I had to resort to using a while loop instead.

@Anonymous
The Delphi version hardly uses OOP, and generics should have a compile time cost but no runtime cost. This wasn't done to intentionally hobble Delphi.

The generated Delphi binary was compiled with optimizations, though it was from Delphi XE Starter so perhaps Embarcardero disabled some optimizations despite their marketing literature claims?

At the end of the day, these are the facts about the Scimark Delphi port:
1) I am a C++/C# developer, and (unfortunately) will approach Delphi with that mindset. Things I take for granted in C++ may cost a fortune in Delphi.

2) If you think you can optimize it better, change the code and check it in. It's hosted on Google code for this very reason :)

PG said...

C - 903
C# (.NET 4.0) - 678
Java SE6 - 589,7
Delphi 2009 - 472,11


The Delphi version appears to be about 24.7% slower than Java SE6. Looking at those results, it looks like you're comparing against the client VM. If you rerun the tests using the server VM (by passing -server as a command line option), the Java performance should be even closer to C.

Žilvinas Ledas said...

Sorry for the negative comment - I had a rough day yesterday so it reflected a bit in the comment too. And sorry for the accidental double post!

As I said - I used Free Pascal for my speed experiments so it can be a little behind with some optimizations (but looking at the general results - it seems that FPC is ~the same as Delphi).

I'm afraid it can be a little problematic for me to commit my changes back to the repository as, while experimenting, I have changed quite a bit: starting with formatting, ending with dynamic array removal for some parts (this did not increase speed, at least in the FFT test). And I do not have any Delphi version so my used syntax and other things can be a little incompatible and I can break your code unintentionally...

As I already said - the biggest speed increases were from "caching arrays" as it is shown in my first comment.
Other changes gave less than 5% increase.

Answer to an answer to my comments @Philip Goh:
1. Actually I do not know. To my mind - it should not :) Yes, in C# there is no "inline" keyword and when I desire something to be inlined I'm so frustrated that I can not do it explicitly, as it usually does not inline the things I want to be inlined :)
2. I think it should (but I don't know it there are some problems with it [e.g. when properties come to the mix]). BUT the important thing is that in c code those parts were optimized by hand as well :) e.g. in FFT.c, 132 line:
---
/*j = j - k ; */
j -= k;
---
so maybe even c compiler does not do that? ;)
3. Yes, some constructs are different in some languages, but the part I am talking about is straightforward translation :) At least in FPC the code I pasted in previous comment works perfectly.

P. S. I do not use Delphi for about 4 or 5 years now. I moved to Free Pascal (with Lazarus) and I am happy not having Delphi :) FPC+Lazarus accomplishes what was the main goal of Delphi (clean Object Pascal syntax, faster compilation than C, RAD) and I have the ability to use the SAME code cross-platform effortlessly so why should I look at Delphi any more?

Žilvinas Ledas said...

@Michael Bunny,

yes, I agree with you ;) general purpose applications do not suffer from being JIT compiled.

Anonymous said...

Next Generation Framework (at least part of the GUI based on VGScene if not speculation) requires to be optimized for float numbers calculations (many of them can not be done by the GPU but by CPU) perhaps Delphi/C++ compiler be ready for something like that....

Anonymous said...

Delphi Compiler is not optimized for latest ASM Instruction but

INC(var,x), var+=x, var = var + x are quite different in Delphi. The INC(var,x) is the fast test since is it use less asm registers then the other constructs. You need to look at compiled asm codes in order to see the different.

MF said...

can someone try to compile same code by only enabling fastMM in delphi?

MF said...

can someone recompile the project just by enabling FastMM in delphi?

PG said...

http://en.wikipedia.org/wiki/FastCode#FastMM4_Memory_Manager

Doesn't Delphi automatically use the FastMM4 memory manager?

Anonymous said...

I think the port is not correct, looking at the source for the LU bench you could notice that for the C code new_Array2D_double() (outside the timer) allocate the memory and Array2D_double_copy copy the data ; for the Delphi port the CopyArray2D (inside the timer) allocate AND copy ; so probably the main difference in perfs is here : you measure the memory allocation time in Delphi and not in C. Remember that a setlength() in Delphi is a memory allocation. If you write the port like the C code, in plain old static pascal, you will probably get the same perfs as C.

Regards

PG said...

I agree that the copy array should not be allocating memory. I've changed it in the latest to make it more equivalent to the C# and C versions.

However, I didn't bother to change it in the original code. Know why? It isn't a hot spot, and wasn't registered on the profiler at all. I'm using the excellent Sampling Profiler which is free, so there is no reason to make random claims about performance when it comes to Delphi.

Making the change to only allocate memory for the array outside of the timer improved the Delphi execution performance of the LU code by roughly 3.3%.

In contrast, the C version is still over 3x faster than the Delphi version (219.75% faster) to be precise. I'm not posting concrete numbers (you can thank Embarcadero's license for that). But you can download the latest sources and run it yourself.

MF said...

Hi,

I have converted all C program to Delphi line by line. Now lets see the results:

http://rapidshare.com/files/456987841/scimark2delphi.rar


regards...

MF said...

Hi,

I converted all C program to Delphi line by line. Now lets see the results:

http://rapidshare.com/files/456987841/scimark2delphi.rar


regards..

Delphi Haters said...

Mehmet Fide,

These are results again:

** **
** SciMark2a Numeric Benchmark, see http://math.nist.gov/scimark **
** **
** Delphi Port, see http://code.google.com/p/scimark-delphi/ **
** **
Mininum running time = 2.00 seconds
Composite Score MFlops: 753.13
FFT Mflops: 328.85 (N=1024)
SOR Mflops: 1069.64 (100 x 100)
MonteCarlo: Mflops: 167.88
Sparse matmult Mflops: 482.95 (N=1000, nz=5000)
LU Mflops: 1716.32 (M=100, N=100)

Good optimization, but no-where near the C++ value of 1885.62...

PG said...

Original as published:
Mininum running time = 2.00 seconds
Composite Score MFlops: 412.46
FFT Mflops: 291.19 (N=1024)
SOR Mflops: 802.23 (100 x 100)
MonteCarlo: Mflops: 116.26
Sparse matmult Mflops: 363.79 (N=1000, nz=5000)
LU Mflops: 488.84 (M=100, N=100)

Mehmet's binary:
Mininum running time = 2.00 seconds
Composite Score MFlops: 517.35
FFT Mflops: 251.84 (N=1024)
SOR Mflops: 791.68 (100 x 100)
MonteCarlo: Mflops: 117.07
Sparse matmult Mflops: 359.10 (N=1000, nz=5000)
LU Mflops: 1067.08 (M=100, N=100)

A significant improvement in the LU performance, but a minor performance decrease in everything else.

Still no where near the C results or even C#. Good effort though, especially with regards to the LU code. I'll need to profile it to see where the differences are.

NOTE: You can download the other binaries from the Google code project page so you can run them yourself.

PG said...

If you need to write C style code (arrays of pointers and bit shifting) to get better performance in Delphi and yet trail behind C by a significant amount, perhaps it's a valid question to ask whether it's better to just write the code in C?

Additionally notice that with both the C and Delphi versions, we needed to implement our own 2D array handling code while in C# we could just rely on standard arrays. Guess who is going to be more productive? What if you also want 2D arrays of ints, floats, n types? Generics in Delphi have huge performance penalties as I discovered in earlier attempts at the scimark port, so this is not a solution. Given that the C# version still performs better than the C-style Delphi while providing better high level abstractions, is it any surprise which is the more popular language?

Like I've said in an earlier post. This started as an investigation into Delphi. I think I'm satisfied with my findings, and have moved on. While I have invested only some time and the cost of a starter edition, I can understand the pain and anguish some developers on this blog must feel.

Good luck guys, and hopefully one day Embarcadero will listen to your requests and bring Delphi up to scratch.

Anonymous said...

The only thing I can say now: I get double results for FFT in Delphi 64bit (due to SSE2 for floating point)

Java and C# can optimize for 586 (and higher) with 64bit (due to JIT), where Delphi 32bit generates 386 and 32bit.

Delphi Haters said...

So can you post a Delphi 64-bit EXE to prove it?

At end of day, the question will be:
- 64 bit, when?

develdevil said...

Just my 2 cents:

0) you should have used the "fastcode" library (http://fastcode.sourceforge.net/)
1) the "Delphi" code is by far not optimized
2) the compiler is not configured correctly (e.g. false")
3) the config of FastMM4 is "unknown" i.e. are you using the optimized code?
4) Using "GetTickCount" isn't ok. Use "QueryPerformanceCounter" or at least "GetTickCount64" instead
5) No OOP at all here.
6) Pure "mathematic" isn't the purpose of Delphi: consider to compare the languages using a "real case/common user" application
7) ...
Cheers ;)

develdevil said...

Just my 2 cents:

0) you should have used the "fastcode" library (http://fastcode.sourceforge.net/)
1) the "Delphi" code is by far not optimized
2) the compiler is not configured correctly (e.g. false")
3) the config of FastMM4 is "unknown" i.e. are you using the optimized code?
4) Using "GetTickCount" isn't ok. Use "QueryPerformanceCounter" or at least "GetTickCount64" instead
5) No OOP at all here.
6) Pure "mathematic" isn't the purpose of Delphi: consider to compare the languages using a "real case/common user" application
7) ...
Cheers ;)

MF said...

Pulsar Beta 8 outputs vs MS VS2010
XE2 64 bit exe outputs:
Code:
Mininum running time = 2,00 seconds
Composite Score MFlops: 684,72
FFT Mflops: 450,82 (N=1024)
SOR Mflops: 772,02 (100 x 100)
MonteCarlo: Mflops: 130,37
Sparse matmult Mflops: 763,82 (N=1000, nz=5000)
LU Mflops: 1306,54 (M=100, N=100)

XE2 32 bit exe outputs:
Code:
Mininum running time = 2,00 seconds
Composite Score MFlops: 524,05
FFT Mflops: 237,89 (N=1024)
SOR Mflops: 762,24 (100 x 100)
MonteCarlo: Mflops: 113,99
Sparse matmult Mflops: 488,35 (N=1000, nz=5000)
LU Mflops: 1017,77 (M=100, N=100)

and Visual Studio 2010 C 32 bit , fully optimized for speed with Streaming SIMD Extensions 2:
Code:
Composite Score: 896.67
FFT Mflops: 826.65 (N=1024)
SOR Mflops: 1034.05 (100 x 100)
MonteCarlo: Mflops: 157.63
Sparse matmult Mflops: 756.77 (N=1000, nz=5000)
LU Mflops: 1708.27 (M=100, N=100)

and Visual Studio 2010 C 64 bit , fully optimized for speed with Streaming SIMD Extensions 2:
Code:
Composite Score: 981.36
FFT Mflops: 881.82 (N=1024)
SOR Mflops: 1031.56 (100 x 100)
MonteCarlo: Mflops: 138.73
Sparse matmult Mflops: 1061.74 (N=1000, nz=5000)
LU Mflops: 1792.95 (M=100, N=100)

Delphi Haters said...

Hi,
can I please know your CPU type?

MF said...

Tested on i7 Q720, 4GB RAM and Win7 x64.

Maziar Navahan said...

the same as your write delphi 64 to now (beta8)

have half speed of vc++ for 64bit !

Maziar Navahan said...

the same scimark2 for beta8 of delphi xe2

still delphi has half speed of vc++ for 64bit in math function !

snorkel said...

Who cares these benchmarks are meaningless. I would take a app coded with Delphi over C# any day of the week.
The difference between C# and Delphi would not even be noticeable to a user!!!! These benchmarks are complete bull......

PG said...

The Scimark C# sources are available from http://code.google.com/p/scimark-csharp/

The port was originally done by the Rotor project, but since that project is now defunct and the license was open source compatible, I took the liberty of putting the code up on Google Code for posterity.

The Delphi version is available at http://code.google.com/p/scimark-delphi/
Be sure to check the latest revision (tagged v2.0) of the source code which contains Mehmet Fide's direct C translation which is the fastest version yet, but has the obvious downside of coding in the style of C. It's still slower than C#, and it's significantly slower than C.

If you think you can write a faster version, please do and either contribute back to the repository or send me a patch for it to be added to the repository. Before you do, I highly suggest profiling your code so that you do not make the mistake other Delphi developers have made in this thread, e.g. making wild claims that I should have used X instead of Y when Y clearly is *NOT* the bottleneck. So before you do that, please profile the code to get an understanding of where the hot spots are. A good sampling profiler that is free can be found at http://delphitools.info/samplingprofiler/

There's not much more for me to say that I haven't already said in this thread. So good luck, and I'm sorry that performance reports of the upcoming XE3 are lacklustre at best.

PG said...

I find overzealous Delphi developers tiresome.

If you have some magic sauce that you can apply to the code to make it faster, please be my guest (and submit a patch to the project so that it can be incorporated). If you can't, maybe it's time to accept that Delphi isn't as awesome as you'd like it to be. At the very least, run the Delphi binary under a profiler so at least you learn something about where Delphi is slow and avoid those pitfalls in the future. Perhaps by moving all floating point operations into highly optimized C++ binaries.

Delphi Haters said...

Hi Phillip,


The article is due for a revision later next month with updated benchmarks for XE2/32, XE2/64, VC++/64 with multi-core threading, VC++/64 with mutli-core threading and C#/32, C#/64 and GCC with multi-core threading and Java.


From the last posts, the guy above said that 64-bit speeds is now 50% slower than Visual C++. Last time, it was 80% slower.

MF said...

Beta 9 is out, the difference is now only 26% :

Pulsar XE2 Beta 9 64 bit output:
Mininum running time = 2,00 seconds
Composite Score MFlops: 747,47
FFT Mflops: 461,90 (N=1024)
SOR Mflops: 806,21 (100 x 100)
MonteCarlo: Mflops: 139,62
Sparse matmult Mflops: 879,92 (N=1000, nz=5000)
LU Mflops: 1449,70 (M=100, N=100)

Visual Studio 2010 C 64 bit output:
Composite Score: 942.01
FFT Mflops: 865.20 (N=1024)
SOR Mflops: 1076.51 (100 x 100)
MonteCarlo: Mflops: 176.08
Sparse matmult Mflops: 791.02 (N=1000, nz=5000)
LU Mflops: 1801.23 (M=100, N=100)

PG said...

That's pretty good for the Beta 2 compiler. It's starting to look like XE3 might be generating FPU code that's twice as fast as XE2.

Do you think the guys at Embarcadero are paying attention to this blog? :)

Delphi Haters said...

> ...

Do you also know that one of their employees is moderating a warez site? LOL

Maziar Navahan said...

i pm to DavidI about it :-)

Maziar Navahan said...

But 26% diffrent is not low !

gina said...

Can someone post a comparison of Visual Studio C without SIMD extensions vs. Delphi?

cognos23 said...

There are lies, damn lies, and benchmarks

See another one here :

http://dada.perl.it/shootout/

You will notice that you can take this a little bit more serious. It is transparent and you can compare each tests results and source codes of about 50 languages. i.e. Matrix Multiplication (http://dada.perl.it/shootout/matrix.html) - click on the language links on the left to see the source code.

I think this is a litte more reliable as a freak posting a port of yet another bechmark and commenting it with : "hey guys, did a port of scibench and delphi suckz ass... he he".

Don't forget to see the final scorecard here:

http://dada.perl.it/shootout/craps.html

Delphi is leading there (by average). But that's another lie, am I right?

Smike said...

My studies show that Java is faster than C# and approximately 2 times faster than Delphi!

http://imageshack.us/f/526/benchmarky.png/

NMAD said...

@cognos23
The benchmark you posted only uses integers. Delphi is said to be slow with floating point operations.

Delphi Haters said...

Delphi XE2 is available. You can do it again with Floats and post results.

PG said...

I find it hilarious that this thread is still going, and aside from a few posters early on (like Zilvinas and Mehmet) nobody has bothered to address fundamental issues. Instead, they've chosen to bring up nonsensical optimizations (e.g. use performance counters) without measuring the impact themselves or they take issue with the guy who did the port.

The funniest post was one that dismissed the current Scimark benchmark and authoritatively linked to a list of 9 year old benchmarks, testing a 14 year old C++ compiler (Visual C++ 6) which is from Delphi's Golden era. This might be news to those in Delphi land, but every release of Visual C++ has brought performance improvements. Guys, I'm sorry to say the world has moved on in the last 10 years.

Delphi Haters said...

Open to fair suggestions for a fair C++, C# and Delphi performance comparison.


Another fair comparison is Jobs markets.
How many paying jobs are there for Delphi vs. C# vs Java?


Surely, if Delphi is RAD, IDE, there would be tens of thousands of Jobs for millions of Delphi developer. Right? ...

Yudi Purwanto said...

Execuse me, sorry if my comment is out of topic

no matter what the benchmark result is ... and i don't care about your ignorance to delphi

but in real application i'm still happy using delphi/mormot for my web application and service because it very fast.
like what you have post in this article http://delphihaters.blogspot.com/2012/11/delphi-servers-worst-performance.html

just a piece of benchmark doesn't reflect the real/whole application performance the customer want.
of course the primary goal of the customer is to get maximum profit from the service they provide.
And delphi is the perfect solution for building the fast & friendly resource application.

fast & tiny application ==> low hardware cost (CPU and Memory) & operational cost (Electric power & Bandwidth) ==> more profit