VBA-M Forums

Full Version: [FIXED]HQ3x/4x ASM filters border
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
HQ3x/4x ASM implementation produces wrong interpolation on the image's border. Look at the attached screenshot. It is obvious at the top of the "VIOLET CITY" text.
[attachment=1]


Hints to fix it:
This bug has already been fixed in the C version; look at hq_base.h / line 343 - 372.
The ASM version most likely only has something like skipLine instead of skipLinePlus and skipLineMinus, which is however necessary in order to work correctly.

This bug is most probably the cause why only the HQ3x/4x filters cause an exception with me new multi-threaded filter execution.
I get the following error message:
Quote:Unhandled exception at 0x00511ba9 in VisualBoyAdvance.exe: 0xC0000005: Access violation writing location 0x02c21000.
The debugger points me to hq4x_32.asm line 960.

The asm version of the HQ2x filter does not have this bug.

I would really appreciate if anyone knowledgeable of asm can try to figure a fix out for it. Until this bug is fixed, I can't really submit my multi-threading patch Sad
spacy51 Wrote:HQ3x/4x ASM implementation produces wrong interpolation on the image's border. Look at the attached screenshot. It is obvious at the top of the "VIOLET CITY" text.
Thanks for the patch. Smile
Thank you so much for the patch.

It looks perfect now: [attachment=6]



Unfortunately, Multi-Threading still doesn't work with thw HQ3x/4x filters, which means there has to be another cause.
This means, there is still a faulty piece of code in the filter, which accesses memory outside of the source or destination bitmap. This causes access conflicts when the other memory area is currently in use by another CPU core.
A mutex on the whole image area is no option, because the whole thing wouldn't be faster than on a single core, moreover the current state may have unforseen side effects even on single cores.
Quote:Unfortunately, Multi-Threading still doesn't work with thw HQ3x/4x filters, which means there has to be another cause.
This means, there is still a faulty piece of code in the filter, which accesses memory outside of the source or destination bitmap. This causes access conflicts when the other memory area is currently in use by another CPU core.

Gotta love multithreading and thread-safety hey? Bricks

And its funny how I see on Ngemu that noobs see it as easy.....They should try it before they say such things....
spacy51 Wrote:Multi-Threading still doesn't work with thw HQ3x/4x filters, which means there has to be another cause.
ASM filters are not thread-safe Wink
chrono Wrote:
spacy51 Wrote:Multi-Threading still doesn't work with thw HQ3x/4x filters, which means there has to be another cause.
ASM filters are not thread-safe Wink


But the 2xSaI, Super 2xSaI and Super Eagle ASM filters work without changes.

You did a very great job on the hq4x_32 filter. It works perfectly now.

I get a speed increase from about 200% (1 core) to 250% (2 cores) now.


It looks like you had to change almost every line in the .asm file. Could you maybe provide me with some info on what was the problem with the code?

Would you be so kind to do the same magic on the other 3 versions? HQ3x_32 (has top priority)
HQ4x_16
HQ3x_16

I'll make sure you get your name into the about box.


EDIT:
Uploaded changes, SVN469
thread-safe filters: bilinear.cpp, hq3x_16.asm, hq3x_32.asm, hq4x_16.asm, hq4x_32.asm
which are faster, ASM or C filters?

secondly, can a 4xSaI filter be done in ASM?
I comitted chrono's fixes and added an an option to set the maximum number of threads to create. The option is not yet exposed to the GUI, but can be changed in the ini. If the "maxCpuCores" option does not exist or is invalid, it will be auto-detected using the CPUID instruction.

Somehow my speed results vary, I probably have to add a mechanism to explicitly run each thread on an individual core instead of letting windows guess.
Reference URL's