Right, the server had some issues over the weekend - on looking, it's run out of disk space, so couldn't save data etc.
We have a backup from the 9th of world and player data - it's looking likely that we'll have to restore from it, as it sounds like some player's data is screwed - inventory problems, potential for item dupes, claims missing, etc.
I'll be continuing to look at it later on when I get back in.
I've asked SMC (again) to set up a port forward so I can monitor disk space and other metrics via Nagios; had this been in place, I'd have known about it in time to do something and none of this would have happened.
I'll be talking to SMC to get another drive in the box so we can do daily backups, and get the monitoring in place etc.
Feel free to subscribe to notifications on this forum post for updates.
Downtime, possible rollback
-
- Site Admin
- Posts: 584
- Joined: Sun Aug 17, 2014 3:23 pm
- Location: UK
- Contact:
-
- Site Admin
- Posts: 584
- Joined: Sun Aug 17, 2014 3:23 pm
- Location: UK
- Contact:
Re: Downtime, possible rollback
UPDATE: right, the extra monitoring I wanted is now in place - so at least this can't happen again.
I've shipped the dynmap tiles (a big fuckton of rendered images, basically) off to the external additional drive, and symlinked them back in place; this frees up a fair amount of space on the main drive. Dynmap *should* work as normal, just potentially slightly slower.
I'm in the process of backing up the world & player data state as it currently stands, *just in case* we need to refer back to it for any reason. I'll then restore the most recent world backups we have.
I've been made aware of at least one person whose bonus claim blocks got reset (I'm not even sure how/why that would happen...), so we'll have to decide what we do for anyone affected by that. On the plus side, I *should* be able to extract that information from the logs, as claim block adjustments log both the increase and the new total - so I should be able to write a script to parse the logs and extract the correct bonus claim blocks figure for each player and re-apply it if needed.
I've shipped the dynmap tiles (a big fuckton of rendered images, basically) off to the external additional drive, and symlinked them back in place; this frees up a fair amount of space on the main drive. Dynmap *should* work as normal, just potentially slightly slower.
I'm in the process of backing up the world & player data state as it currently stands, *just in case* we need to refer back to it for any reason. I'll then restore the most recent world backups we have.
I've been made aware of at least one person whose bonus claim blocks got reset (I'm not even sure how/why that would happen...), so we'll have to decide what we do for anyone affected by that. On the plus side, I *should* be able to extract that information from the logs, as claim block adjustments log both the increase and the new total - so I should be able to write a script to parse the logs and extract the correct bonus claim blocks figure for each player and re-apply it if needed.
-
- Site Admin
- Posts: 584
- Joined: Sun Aug 17, 2014 3:23 pm
- Location: UK
- Contact:
Re: Downtime, possible rollback
UPDATE2: Right, I have backed up the fucked up world & player data as it was, then restored the last good backup - sadly, that's from the 7th, not the 9th as I'd thought before.
I've also backed up all plugin data, which would include claims, etc.
I've brought the server back up now to see how things are. I'm going to have to go off to bed shortly as I feel like crap, but feel free to give it a whirl overnight and report any problems you might see.
I'd caution you not to do anything too important overnight; if something turns out to be really broken, I may have to restore from a backup again, so just test stuff out / arse about, but don't put too much effort into stuff just in case!
Tomorrow, I'll whip up a script to extract bonus claim block counts from the logs, so we can fix anyone who had their bonus claim blocks reset after voting.
Really, really sorry this shit has happened. If only it was being properly monitored, and if only Mr Murphy hadn't decided that it should happen over the end of a week/weekend where my boy has been ill, and I came down with a cold myself...
Anyway, monitoring of disk space (and other metrics) is now in place, and over the next few days I'll be setting up a much better backup routine so if we ever had any future problems requiring restoring from backups, we'd have something recent enough!
I've also backed up all plugin data, which would include claims, etc.
I've brought the server back up now to see how things are. I'm going to have to go off to bed shortly as I feel like crap, but feel free to give it a whirl overnight and report any problems you might see.
I'd caution you not to do anything too important overnight; if something turns out to be really broken, I may have to restore from a backup again, so just test stuff out / arse about, but don't put too much effort into stuff just in case!
Tomorrow, I'll whip up a script to extract bonus claim block counts from the logs, so we can fix anyone who had their bonus claim blocks reset after voting.
Really, really sorry this shit has happened. If only it was being properly monitored, and if only Mr Murphy hadn't decided that it should happen over the end of a week/weekend where my boy has been ill, and I came down with a cold myself...
Anyway, monitoring of disk space (and other metrics) is now in place, and over the next few days I'll be setting up a much better backup routine so if we ever had any future problems requiring restoring from backups, we'd have something recent enough!
-
- Site Admin
- Posts: 584
- Joined: Sun Aug 17, 2014 3:23 pm
- Location: UK
- Contact:
Re: Downtime, possible rollback
Further update: all *seems* to be as good as can be expected - things are working.
Started new backup method which will give us a full backup, daily, of the world & player data, and crucially the various plugins config etc which wasn't being backed up regularly before.
Those who lost bonus claim blocks should have them back now, after some scripted log-schleppage. If you think your claim blocks count is still incorrect, please let me know and I'll see what I can do.
Dynmap tiles will be showing the state of the world as it was before the rollback, but will re-generate when anything happens in those chunks.
So, at this point, I think we're good, albeit having lost a few days, and should be safe to resume playing as normal.
Started new backup method which will give us a full backup, daily, of the world & player data, and crucially the various plugins config etc which wasn't being backed up regularly before.
Those who lost bonus claim blocks should have them back now, after some scripted log-schleppage. If you think your claim blocks count is still incorrect, please let me know and I'll see what I can do.
Dynmap tiles will be showing the state of the world as it was before the rollback, but will re-generate when anything happens in those chunks.
So, at this point, I think we're good, albeit having lost a few days, and should be safe to resume playing as normal.
-
- Diamond Trader
- Posts: 896
- Joined: Mon Nov 24, 2014 12:09 pm
- Location: Spain &/or NJ
Who is online
Users browsing this forum: No registered users and 1 guest