Freelancer Community Network
Reminder: Internet Explorer 6 or below are NOT supported.
HomeHome
ForumForum
WikiWiki
DownloadsDownloads
ForgeForge
Multiplayer Connection Tutorial
Collapse/Expand Random Image
Collapse/Expand Login
Username:

Password:

Remember me



Lost Password?

Register now!
Collapse/Expand Chat
Collapse/Expand Who's Online
85 user(s) are online (48 user(s) are browsing Forum)

Members: 1
Guests: 84

SobaniExile, more...
Collapse/Expand Donations
Monthly costs: -30€
Income (ads): +5€
Donations (last month): +10€

Current balance: -300€
(last updated 11/2017)

Please make a donation if you want to help keeping The-Starport online:

Bitcoin address:
Thanks!
Collapse/Expand Links
Collapse/Expand Advertisement
There are currently 101 users playing Freelancer on 34 servers.
November. 22, 2017

Browsing this Thread:   1 Anonymous Users



 Bottom   Previous Topic   Next Topic  Register To Post



Crash at 0x635d32b (Common.dll), dump attached
Starport Admin
Joined:
2009/2/21 21:42
Group:
Webmasters
Registered Users
Posts: 3457
Offline
Hey all,

So I've been trying to find the source of a particular crash that seems to happen occasionally when switching systems. Now, it's worth mentioning of course that we use a modified system for jumping which is based off Cannon's hyperdrive code.

Anyway, the crash is at 0x635d32b, which is in common.dll, but I've been unable to find any mention of this one yet. I attempted to trace into the instruction and got rather conflicting results: it seems to be called both by ios_base and by a physics function?

Since this is a server crash, I'd really like to know what the hell is going on with it, so I've attached the dump file that FLHook produces right before crashing. I hope someone's already encountered this or may be able to glean more insight from it. IDA doesn't even want to run the dump over here, which great, and Visual Studio typically produces garbage when it doesn' thave a PDB to work off of.

Attach file:


zip flserver_06.09.2016_19.03.26-1.dmp.zip Size: 24.71 KB; Hits: 81

Posted on: 2016/9/7 17:50
"Cynicism is not realistic and tough. It's unrealistic and kind of cowardly because it means you don't have to try."
-Peggy Noonan
Top
Re: Crash at 0x635d32b (Common.dll), dump attached
Home away from home
Joined:
2009/8/16 2:58
From Qld, Aus.
Group:
Registered Users
FLServer Admins
Trusted Speciality Developers
Senior Members
Posts: 1805
Offline
Seems to be related to Common.dll 0x0635C376 crash, as it's pointing to the same data. You could try nopping out the six bytes at 0FD31E (or 635D31E in the debugger; alternatively, and with -1), which would leave edx as -1, not 0xFFFF, so comparing with eax as -1 will work (which, given the registers in the dump, seems to be the problem).

Posted on: 2016/9/8 3:05
Top
Re: Crash at 0x635d32b (Common.dll), dump attached
Starport Admin
Joined:
2009/2/21 21:42
Group:
Webmasters
Registered Users
Posts: 3457
Offline
Hmm... I'll try to do that and report if I run into anything. At least it hasn't crashed when I tried running around and jumping, so that's a start.

Posted on: 2016/9/8 3:56
"Cynicism is not realistic and tough. It's unrealistic and kind of cowardly because it means you don't have to try."
-Peggy Noonan
Top
Re: Crash at 0x635d32b (Common.dll), dump attached
Starport Admin
Joined:
2009/2/21 21:42
Group:
Webmasters
Registered Users
Posts: 3457
Offline
So, to report back, I've had one crash at 0x635d32b since applying the patch on top of a bunch more at Common.dll+0xf24a0 (which I've mentioned as having trouble with in the past).

I'm pretty sure the two issues are related, but at this point I'm completely unable to say what would cause these to rise up. I've already sanitized positions and orientations obtained from players (which were a large source of these errors), but it seems like bad values are still being generated somewhere.

To perhaps provide some context, here are six dump files of those two crashes.

Attach file:


zip dumps.zip Size: 112.92 KB; Hits: 77

Posted on: 2016/9/14 4:52
"Cynicism is not realistic and tough. It's unrealistic and kind of cowardly because it means you don't have to try."
-Peggy Noonan
Top
Re: Crash at 0x635d32b (Common.dll), dump attached
Home away from home
Joined:
2009/8/16 2:58
From Qld, Aus.
Group:
Registered Users
FLServer Admins
Trusted Speciality Developers
Senior Members
Posts: 1805
Offline
This may catch the crash at F24A0, causing it at F23CD instead, so another dump may help narrow it down. You could also try replacing the final 88 00 with C2 08 00 to see if that works around it. Don't know about the other issues, though.

Code:
File: Common.dll
0E4432: 8F [ 9A ]
0E44B0: 11 [ 1C ]
0E453E: 83 [ 8E ]
0F23C5: 8B 44 E4 04 85 C0 75 03 88 00 [ 90 90 90 90 90 90 90 90 90 90 ]

Posted on: 2016/9/15 6:04
Top
Re: Crash at 0x635d32b (Common.dll), dump attached
Starport Admin
Joined:
2009/2/21 21:42
Group:
Webmasters
Registered Users
Posts: 3457
Offline
I'll definitely try that! How do you load those dump files by the way? Visual Studio works well with them, but since all those crashes are in files without PDBs it's pretty much useless, and IDA doesn't seem to be able to do anything useful with them, let alone give me some kind of call stack or crash info.

Posted on: 2016/9/15 14:37
"Cynicism is not realistic and tough. It's unrealistic and kind of cowardly because it means you don't have to try."
-Peggy Noonan
Top
Re: Crash at 0x635d32b (Common.dll), dump attached
Starport Admin
Joined:
2009/2/21 21:42
Group:
Webmasters
Registered Users
Posts: 3457
Offline
I've tried applying all previous fixes with the C2 08 00 workaround and got a bunch more crash logs in different locations. Interestingly, related logs are always 10 seconds apart.

They're all the same offset except the one at 19.22.25 which has three cascading points of failure, starting at 0x0635d32b. All seem related with jumping across systems.

Also, I can confirm that they all trace down to PhySys::Update rather than another of the many functions that end up calling the crashing code.

Attach file:


zip dumps2.zip Size: 188.28 KB; Hits: 74

Posted on: 2016/9/16 1:15
"Cynicism is not realistic and tough. It's unrealistic and kind of cowardly because it means you don't have to try."
-Peggy Noonan
Top
Re: Crash at 0x635d32b (Common.dll), dump attached
Home away from home
Joined:
2009/8/16 2:58
From Qld, Aus.
Group:
Registered Users
FLServer Admins
Trusted Speciality Developers
Senior Members
Posts: 1805
Offline
I use cdb (from the debugging tools, the command line version of WinDbg) to read the dump ((for %j in (*.dmp) do cdb -ses -z %j -c .ecxr;k;q) |tde).

I wanted dumps with the new crash address so I could trace back what was writing the zero in the first place (assuming a zero was being inserted into the hash table, not just replacing an existing entry; given that it's crashing at a different spot, I guess it is inserted). I could hazard a guess at a PhySys::CreatePhantom constructor.

Posted on: 2016/9/16 4:10
Top
Re: Crash at 0x635d32b (Common.dll), dump attached
Starport Admin
Joined:
2009/2/21 21:42
Group:
Webmasters
Registered Users
Posts: 3457
Offline
I'm tracking down the bug a little further (it's not quite reproducible, but it's sufficiently common to force it) and one thing I've noticed is that a lot of the time, there's at least one player ship with NaNs for position and orientation at the time of the crash. Not all the time, though, which is odd. What's even more odd, however, is that as I've mentioned in the past, we're blocking NaNs coming from the client, so I don't see where they could even be coming from.

I'll try to hook the CreatePhantom calls and see if I get something.

Posted on: 2016/9/16 4:43
"Cynicism is not realistic and tough. It's unrealistic and kind of cowardly because it means you don't have to try."
-Peggy Noonan
Top
Re: Crash at 0x635d32b (Common.dll), dump attached
Starport Admin
Joined:
2009/2/21 21:42
Group:
Webmasters
Registered Users
Posts: 3457
Offline
So I have conclusions (partial, anyway):

1) CreatePhantom didn't trigger any exception, so I don't think the error originates from that.

2) The root cause of the issue remains NaN positions and/or orientations. The way it happened this time around was a lot more tricky though, hence why it took me so long to find it. The rotation quaternion the client sent back after jumping was all NaNs because the rotation matrix was slightly denormalized, causing the conversion to fail. However, the quaternion was "sanitized" somewhere along the chain and became all zeros, so it didn't get caught by the SPObjUpdate patch I added (it checks for NaNs only), but a zero quaternion is still an invalid rotation matrix, so when converting it back it became full of NaNs again and would crash the server on some occasions.

I've tracked down and fixed the quaternion conversion (I can't really help with rotation matrices becoming slightly denormalized, that's just normal) on the client, but I've also added another check to SPObjUpdate in FLHook which requires the quaternion to be mostly normalized, so it'll catch erroneous values like this in the future.

Posted on: 2016/9/17 14:58
"Cynicism is not realistic and tough. It's unrealistic and kind of cowardly because it means you don't have to try."
-Peggy Noonan
Top