On discord discussion, we had come to agreement on needing to put HLE Database into separate project to use as submodule. However, HLE Database is misleading. It should be symbol table database, since it does find all the symbol functions much as possible.
Basic Task Concept:
It's a big project to work on, hopefully it will be easy to do.
Edit: Just to be clear, scanning code API will be included. However patching code will not.
EDIT2: Links to XbSymbolDatabase and XbSymbolDBMerge branch being work on are available to view progress.
Also @LukeUsher state he will work on Cxbx-Reload's changes to work with XbSymbolDatabase library.
Why C and not C++? There's no reason why an external library must be plain C, providing the interface is well defined. The current approach of using a std::map internally wouldn't work with plain C, so it would require a significant rework of the scanning code.
I say this because the external library should be more than just the symbol table, it should also contain the scanning (but not patching) code too. This is so that other projects can make use of the symbol scanning code: For example, we could write a plugin for IDA Pro that calls into this library to identify functions.
Basically, the symbol scanning library should have an Init function, a function to fetch an address of a symbol by name, or the name of a symbol from an address, as well as a function to return a complete symbol table.
Cxbx-Reloaded can then use these APIs to cross-reference functions to it's patches, so that the patching code (correctly) stays emulator specific, but the scanning and symbol table code stays project agnostic.
For one thing, both QEMU and XQEMU are in C, not C++. If compiled header file is written in C++, it will not help XQEMU developers.
If you noticed, both Windows kernel and drivers (wdk, or other words system driver) are written in C. It's more logical to keep the header file written in C. When compiling symbol table library, it can be C++ unless state otherwise.
Another reason why C, it's faster model design than C++. If really want fastest, then use asm language.
Hello, I'm just randomly checking this interesting project but I can't help myself and I have to comment on your last paragraph.
Another reason why C, it's faster model design than C++.
I have used various versions of C language (even C89), C++ and even ASM as NASM for x86 or Atmel 8/16 bit architecture.
Binary translated from C language isn't faster whatsoever than C++ binary. The only thing "slower" in C++ is of course virtual methods in robust OOP class design. And even this can be resolved by optimizing OOP design to a point where It's not a problem at all. Other "slow" features such as lambdas, shared pointers aren't mandatory for daily C++ usage. In the end you can of course not use OOP features in C++ at all.
If really want fastest, then use asm language.
No, on the contrary using ASM usually results in slower performance. This is due today's compilers gcc, clang e.t.c. can in many cases optimize code much more than human programmer.
Are you using loops instead of copypasting instructions? If compiler sees very short loop it will do it without a hinch. Are you pipelining your loops? Compilers sure does. Can you reliably use advanced instruction sets such as SSE?
Final problem is you don't ever want premature optimization and of course you either write compatible code or optimized one. If you want to use those really fast techniques with AVX/AVX2 instruction sets you use libraries such as OpenMP or whatever the microsoft counterpart is.
Windows kernels, drivers. apis are written in C mostly because of historical reasons. The only real question is, will people that use your library use C or C++? Using C++ library in C projects is of course pain in the ass, but it can be done.
Is there a need to make the database reusable? Most of the patterns were identified from IDA scripts and ported into the project (or so I thought).
The legacy signatures still exist in this repo (see here https://github.com/Cxbx-Reloaded/Cxbx-Reloaded/tree/master/doc)
The headers are:
Symbol,Version,Lib,Length,StartPattern,CRCLength,CRCValue,TrailingPattern,References
They could easily be exported from a script on the source files
using ASM usually results in slower performance.
That's only if beginner or advance skill set without using sse instructions. If someone has an expert skill with asm language (including numerous type of instructions). Then it will not be slower. Not all compilers is able to optimize everything by the way. Even human being can find a way to improve the asm code after output into asm file.
There are several websites that give the results between ASM, C, C++, etc to determine which language is faster. Usually it's C than C++. (Somehow one site has outdated info, and the others can't be found for some reasons.)
Windows kernels, drivers. apis are written in C mostly because of historical reasons.
Not true. There are numerous of other languages can't communicate directly with C++. When this happen, other languages support will not be here.
C++ uses different CRT versions and different linkage reference. Do try compile two different projects with different Visual Studio version along one project to link the api from another project. The end result is always a failure and incompatible. This is the main reason why Windows' APIs are in C.
P.S. If Windows is fully written in C++ with no C APIs plus your program uses OS's C++ APIs. Once new Windows version is out, your old program will not work on the new Windows which lead developers to re-compile in order for their program to work on newer Windows OS. This is the other reason.
@x1nixmzeng LukeUsher talked about making a symbol table plugin for IDA to load the plugin in order to research, find the bugs, and other purposes.
Edit:
Just to be clarify on this quote.
Using C++ library in C projects is of course pain in the ass, but it can be done.
Can only be done with C APIs, not C++ APIs.
Edit2:
it should also contain the scanning (but not patching) code too.
Sorry I didn't reply about more than symbol table, yes it will have scanning code included. (I'm aware patching code will not be move into separate project. 😋 )
I had been working on proof of concept for C coding xbSymbolDatabase library. It is working well excluding some external references outside HLEDatabase files. I also made a comparison both old and new HLECache files to see if there's any wrong address between two same symbol. Both are looking good too excluding D3D8's external references...
At the moment, not everything for scanning functions has been moved yet. More coming soon™...
Nice work! Could you share a link to some code already?
It's only day 2 so far for this coding. 😛 I also want to make a 2nd branch for Cxbx-Reloaded as a wip merge branch. Then we can all contribute there before we're ready to merge final changes.
Of course, mainly need to implement D3D8 offsite codes. I believe #1119, Refactor HLE symbol detection, will hopefully progress in the wip merge branch as well?
@LukeUsher Can we merge the wip branch into master or do you want to finalize more stuff internally and names before merge?
If it works, and doesn't cause any regressions then by all means, go ahead!
Most helpful comment
I had been working on proof of concept for C coding xbSymbolDatabase library. It is working well excluding some external references outside HLEDatabase files. I also made a comparison both old and new HLECache files to see if there's any wrong address between two same symbol. Both are looking good too excluding D3D8's external references...
At the moment, not everything for scanning functions has been moved yet. More coming soon™...