[FEATURE REQUEST] Save and flush to disk after important events (including acquirements generation)

(TL;DR: Skip the immediately following paragraph; from there, the more you read, the less important it is.)

Hello, I am playing on underhound.eu. The server hanged up or crashed yesterday during my game, just a few moments after I picked a really good randart Ranged Weapon from a scroll of acquirement, exactly what I needed (I trained Ranged Weapons to almost 14). Now after reloading the game, not only did it roll back to before reading the scroll of acquirement (so no Ranged Weapon), it also replaced the acquirements completely with totally useless shit. I never won this game and this run had potential, I even survived being cast to the Abyss and I beat lots of lethal monsters I never thought I could beat, all thanks to Uskayaw (first time trying Uskayaw). So I ragequit this run because it’s unfairly borked. (Actually I want to add: The “current” damage per time was about 30-something, and the artifact that was unfairly taken away from me was a Ranged Weapon of speed with 50-something damage per time and (as far as I remember) at least the same or better accuracy taking into account the base accuracy and enchantment, but it was also Dex+5 (if I’m not mistaken) so when wielded it increased to 60-something (67?) damage per time, roughly doubling my damage per time. The artifact also had a short, cool name, something like “Zhera” or “Zarya” or “Zoara” or “Rhaza” or something like that. I don’t know why it’s not mentioned in the morgue file, but I also got a manual of Fire Magic from killing that player ghost, which is why I started training Fire Magic in the first place, as I hadn’t intended to be a spellcaster. I want to add: I was forced to make a choice between continuing to play and maybe even winning but having a bitter feeling that control was taken from me unfairly, or abandoning the character and (later) starting anew out of a conscious decision. DCSS feels fair to play because most of my deaths are caused by my own mistakes, oversights, or overconfidence, and when they’re caused because of a bad seed, I just know in advance that bad seeds can happen and I consciously choose to play anyway (same thing goes for Xom (including from a faded altar), if I’ll ever decide to try it); in contrast, here I shouldn’t have lost my acquirement, and I feel like control was taken away from me. This reminds me of The Matrix movies.)

FEATURE REQUEST: Please make the game save its progress and flush its files (e.g. using fsync(2) system call on Unix-like systems) immediately after major events like generating acquirements (maybe also before showing the acquirements to the player, so nobody will ever be disappointed if the game crashes during this and the acquirements would have changed). (Alternatively, make the random generation less sensitive to small details, or make it more deterministic, so such save-and-flush would likely be unnecessary.) Also:
(1) Nonetheless, if the game doesn’t flush its data when the player saves & exits, it would be good to add to the save function a system call to flush.
(2) There could be a screen, similar to the dungeon generation screen that’s especially visible with the ‘pregen_dungeon = full’ configuration option, that shows the duration of the effect(s) (e.g. acquirement generation and/or game saving) and the wait for the operating system to flush afterwards together with one progress bar. This is useful to give an indication to the player of what’s happening instead of the game suddenly freezing for what could be a few long seconds (or minutes in rare cases) and to indicate that the acquirements/loot presented to the player is ready to be seen by the player, and also maybe to increase the tension, kinda like how in reality shows on TV they tense the audience before they announce the winner/loser (although here it wouldn’t be artificial). I imagine there could also be such a screen when the character dies, but only after the player sees the message “You die…” with less than 1 HP in the stats at the side. (It’s impossible to see less than 1 HP at the side before seeing the message, right??)
(3) I imagine this could also be useful when creating, loading, or killing/deleting player ghosts? I have no idea how DCSS handles player ghosts and whether there could be race conditions in their handling.

Thank you in advance. I really gotta stop editing this

1 Thank

I’ve had similar issues in the past.

It does feel bad when you lose a good ?requirement due to a crash

For what it’s worth, crawl already flushes saves to the best of its abilities (although recovery / saving after crashes is an art, not a science). CUE based on reports I’ve been getting unfortunately seems to be having some sort of hard-to-diagnose os-level issue that I expect preempted anything the server/crawl instance does to try to ensure saves. fsync needs I/O to actually be working in order to do anything…

the wait for the operating system to flush afterwards together with one progress bar

I don’t think this would actually be viable for normal operations, crawl saves a lot, and it normally takes on the order of ms; any case where it’s taking longer is a case where something has gone so wrong that the crawl process is probably completely blocked (as seems to be the case with whatever is going on with CUE).

1 Thank

Honestly, I think it’s not the first time either that this happens to me on this server (the only server I’ve ever played on). Which server did you experience this sort of thing on?

certainly with bcrawl running on my own computer, but I’m not certain that was the only time it happened

That’s good to know. There’s two questions that remain to me:
(1) Does it flush synchronously or asynchronously? The former might be better UX because DCSS would only continue/exit if the data has arrived to the disk, which is a good assurance.
(2) Does DCSS save during acquirement generation? Or does it just have a loop that makes it save & flush every few seconds? Even a few seconds could be stretched to long times on taxed computers/servers, so that’s an important difference.

That makes a lot of sense, since if it saves frequently, there’s less data to write to disk every time, and that’s excellent because it reduces the chance of data loss like what happened to me. Also DCSS doesn’t write that much data I imagine, it just has to track the player’s progress mainly. So not even megabytes I guess. So I agree, there’s no need to show a progress bar for saving nor flushing in this case.

Okay so I’m not a programmer (although I do have knowledge), but I delved a bit into the source code and found some interesting things (to my best interpretation, as I don’t know C++):
(1) In files.cc, there’s a function called save_game(). If it’s called with false, it does an intermediary save mid-game, and with true it saves & exits. When saving, it calls a function called commit() on the game data.
(2) In package.cc, there’s a function called commit(), which checks if DO_FSYNC is set, and if it is, it calls fdatasync(). Normally, fdatasync() is a system call on Unix-like systems (fdatasync(2), which is similar to fsync(2) but doesn’t call the OS to flush filesystem metadata), but see the next paragraph.
(3) In syscalls.cc, there’s a function called fdatasync(). It calls a Windows-specific system call or system library function that tells Windows to flush unwritten data from RAM to disk.
(4) In package.h, there’s the definition of DO_FSYNC. It’s only set if neither DGAMELAUNCH nor some debugging-related constant are set.

I have a few thoughts from this, assuming I got this right. When commit() executes, does it call the fdatasync() system call or the seemingly–Windows-only wrapper in syscalls.cc? (If it’s always this or that, then DCSS only flushes to disk on one kind of operating system and not both kinds. Or maybe on Unix-like systems the system call definition takes precedence over the wrapper so it works on both?) And is it possible that the WebTiles interface on the underhound.eu server runs DCSS with either DGAMELAUNCH or debugging enabled, so that flushing is skipped? Also I didn’t see anything to suggest that the code is async, so it’s probably synchronous, which I see as a good thing.

Furthermore, I saw that the game calls save_game(false) in ouch.cc, but saw no call for that in acquire.cc. I don’t know your codebase but I seem to have an identical intention to the person that added this save_game() call in ouch.cc — saving (and flushing to disk) during major game events.

It sounds like it might solve the issue if you add save_game(false) to the appropriate place in acquire.cc
But you’d have to experiment with implementing it and manually crashing the game in the middle of an acquirement to test whether it has the intended effect. If implemented incorrectly it could lead to you losing the scroll without getting the item (?)

If you care a lot you could put in a pull request. It sounds like you should be capable. I can’t make any promises on behalf of the devs though.

1 Thank

First, it’s exactly what I’m requesting here; but second, it wouldn’t be effective if the game doesn’t flush to disk. So I want it to have this function call in acquire.cc anyway, but it might not be enough to mitigate this issue. Furthermore, merely fixing flushing (assuming it’s broken) would decrease the chance of this issue occuring, which is also good, even though it doesn’t prevent the issue completely like a synchronous save & flush before displaying the result. And also it might be a good idea to add these save calls not just to acquirement/gift generation but also in other places, like maybe when vaults are generated in the dungeon or something.

I don’t think it’s necessary. Saving isn’t a complex or infinite function call, it’s just one thing (conceptually at least) that’s done once and returns, and it’s also an extremely common and relatively simple operation, so not a high likelihood for surprises. It’s inherently a side-effect–producing function (it’s called solely for the effect it does and not for its (non-existent?) return value), but if the developers think it’s safe to call it in that context, it should be fine.

Honestly I’d rather not. I have a GitHub account but I’d rather not use it for various reasons. I’m curious what @advil or the other devs (would) think. I think it’s not the first time I’ve experienced this issue. I’m a bit hesitating to start a new game and have it destroyed again. So I’ll wait for now to see what they say.

It’s theoretically possible that if you save the game after you’ve read the ?acquirement but before you chose the gift, that in the game state that is saved you have neither the ?acquirement nor the gift. And so when you load that game state you’ve lost the ?acquirement for nothing.

I’m pretty sure I’ve lost ?acquirement due to this mechanism in brcawl. By manually saving (and quiting) the game in the middle of reading an ?acquirment. (it was a tricky choice and I wanted to think about it.)

It’s possible that it can’t happen anymore in mainline since the acquirement mechanism got improved, but you’d have to have a deeper understanding of the underlying mechanisms than I have to guarantee that.
Saving the game in the middle of a player action is inherently dangerous.

What you’re saying is that the operation of replacing the unread scroll of acquirement with its read version is non-atomic (i.e. it’s not an all-or-nothing-happened operation). (Fyi, the UI dialog of scrolls of acquirement can safely be closed; once read, a scroll of acquirement has its offers set.) This is simply a matter of putting the function call at the right place after the operation is done.

As for whether it can still happen, we need the devs to say if this is an atomic operation or not. What you’re describing with manually saving sounds like a bug — atomic or not, if the game hasn’t crashed (e.g. from computer power loss), it should do its best to complete what it started.

EDIT: Actually I’ve been wondering what would happen if the game would crash in the middle of a non-cancelable dialog without a trivial fallback. Like in the middle of advancing to level 12 when it asks you which attribute to improve.