Opencomputers: OC is haunted(Or something about my setup)

Created on 10 Oct 2016 · 34Comments · Source: MightyPirates/OpenComputers

We were running server with broken backup script(not triggering save-on) and weird stuff started to happen:

Disks are lost(UUID is lost)
Cables are haunted (nodes do weird stuff, like cable going from case to nothing has 255 components, applies likely only to nodes created with save-off already in place)

Asie says it may be caused by not calling markDirty on some entities properly.

bug-major

Source

magik6k

👍1

Most helpful comment

Ah, hah. Well. I should've just tested with the dev version sooner, so never mind ;)

For anyone interested in the technicalities, here's what happened:

In 1.8.9 there was a change in behavior, where writeToNBT suddenly got called after onChunkUnload. Which is dumb, because then cleanup runs before saving, leading to stuff saved as stopped where it was actually running. So the hacky fix was to set a flag in onChunkUnload, and if that was set, clean up in writeToNBT.
Now this is already super broken, which I didn't think of at that time, because in theory, writeToNBT might never be called (if the chunk isn't dirty). But nevermind that.
Now the order is correct again, so cleanup didn't properly happen anymore, because the flag was never set in writeToNBT. This is kinda bad, but not too bad. However!
If a node network crosses a chunk border, and the node in the unloaded chunk isn't cleaned up, it doesn't tell it's neighbor it's gone now... see where this is going?
Now if the chunk is loaded again without the neighbor chunk also getting unloaded, those nodes with the old addresses would still be present -- so to avoid duplicates, when loading, the nodes will get a new address. That was the bug.
But! To reproduce it, this means you have to walk exactly so far away that the one chunk is unloaded, but the other isn't. That's pretty hard to do unless you know exactly what you're looking for.
What made this a ton easier to reproduce tho, was the EIO conduits, which ghostload chunks they run through (by design), so no matter how far you went away, the neighboring chunk got reloaded right away, with no cleanup happening.

So yeah. Fun times. Just wanted to get this off my chest :P

fnuecke on 25 Feb 2017

👍4 🎉3

All 34 comments

Not only with save-off, new cases/racks lose tags(items, states, etc.), cables do mentioned weird stuff, etc. On multiplayer, place a computer, insert items, turn on, go somewhere to let chunks unload, go back, look at your new empty case.

Newest build from jenkins for 1.10, didn't happen with RC1, so it is somewhere there: https://github.com/MightyPirates/OpenComputers/compare/7179fcc4f181f494c6c908ab879cec46aa8be215...OC1.6-MC1.10

magik6k on 11 Oct 2016

I have a 70-block-long cable that is connected to 24 adapters. I've been seeing a problem where after leaving my base and coming back, the computer claims to have more than 300 (!) components connected and refuses to boot. Breaking and replacing all of the cable will solve this until the next time it happens.

OpenComputers-MC1.10.2-1.6.0.3-rc.1.jar

codewarrior0 on 24 Oct 2016

I saw this during the btm convention -- we never could find the root cause, and I never was able to reproduce the bug.

However, I was able to determine that the bad behavior occurred only in one specific chunk.

Is this hosted? Can you isolate the problem to one chunk?

payonel on 24 Oct 2016

It only happens on server for me, in places where chunks may load/unload frequently. We got rid of the ghost nodes issue somehow(have to remind myself how), but disks still change uuid sometimes..

magik6k on 24 Oct 2016

I am also having a similar issue. Today, I have had the same hard drive (tier 1) change its UUID, losing all information on the disk. I made a backup of the lua code on a floppy, and the floppy has not changed its UUID. This only occurs when the chunk the computer resides in unloads then loads. Whether through traveling outside of the chunk or logging from the server.

The computer can't handle chunk loads for some reason. It always has issues, whether it be the hard drive changing UUIDs, the main screen freezing, or phantom components adding themselves to the system.

Server is a creeperhost server
Modpack is FTB Infinity Lite for 1.10.2
OC version is 1.6.0.4

GB-Hijacker on 2 Jan 2017

Can anyone reproduce this using the latest version of OpenComputers (1.6.1)?

Vexatos on 2 Jan 2017

UUID rewrites aren't limited to hard drives apparently. I had tracked my IE diesel generator's UUID when I first added an adapter to it so if I ever ran two, I would call them by address instead of using getPrimary for a quick and dirty when coding a solo diesel. Today, I ran by the diesel room and checked the UUID of the adapter and it had changed.

I chunk loaded the base as well today and all issues of UUID resets have gone away. I'll consider force loading 1.6.1 to my private server to see if this issue is still relevant on the up to date version.

GB-Hijacker on 3 Jan 2017

👍2

Update from my end. I ran a mod by mod install to see if I could locate an issue with this. I was able to consistently replicate a computer freeze issue and associate it to Aroma 1997 Dimension Worlds, but I was unable to replicate the UUID reset.

I did update my private server to FTB Inf lite 1.5.1 with OC 1.6.1.6. The UUID rewrite still exists. I noted the UUID rewrite was restricted to adapters and hard drives. I have three separate computers running in my base, a central server, and two user interface terminals. All three had reset hard drives that necessitated OpenOS reinstalls and a rewrite of code (thankfully I backed up all code to pastebin). I put signs on all adapters before I tested the chunk unload/reload error, and all adapters changed UUIDs. Redstone I/O blocks did not reset.

http://imgur.com/a/O1NLH

Here are a bunch of screenshots from before and after the chunk load test.

Unfortunately, I haven't been able to replicate this on single player. It's only been something I can replicate on multiplayer, and trying to do a mod by mod test on my multiplayer server is not something I am able to do right now.

GB-Hijacker on 15 Jan 2017

👍3

Great find! My world is located on RFTools dimension, so it may be related. Have you done the test on overworld or inside the modded world?

magik6k on 15 Jan 2017

The only true tests I've ran are in single player with FTB Infinity Lite for 1.10.2 v 1.5.1. It was performed in a creative flat world at the original spawn point. The spawn point was moved to a random location to force the spawn point to unload. I then would teleport to three random spots in the map and then teleport back to the original chunk and inspect the computer, and then check the UUID of the screen, the hard drive, and an adapter I had placed. I did load immersive engineering early in the test to force the adapter to have a UUID. I, however, did not note a UUID rewrite during my tests. Doing this on my private server would be more than a pain in the butt.

I did check to see if the UUID rewrite was based around multiblocks reloading from disk to memory. To test this, I broke my crusher and rebuilt it. The UUID of the adapter didn't change. This was also replicated on my private server. My theory was the UUID of the adapter changed because the multiblock may have been reconstituted from the structure being reloaded to memory. However, forcing the multiblock to reform didn't change the UUID of the adapter. So I'm kind of back to square one on why this behavior is happening during a reload to memory.

GB-Hijacker on 15 Jan 2017

Can you reproduce it with blocks that are _not_ multiblocks placed against an adapter? Maybe a Vanilla furnace, or any EnderIO machine if you happen to have that mod along with Computronics installed. Mekanism machines should work as well. Just try a few blocks until you find one that the computer recognizes.

Vexatos on 15 Jan 2017

I couldn't replicate the UUID reset of the adapter on the vanilla furnace. Also of note, my IE arc furnace hasn't reset its UUID either. So the adapter issue might be relegated to an IE bug. It doesn't explain the hard drive rewrite, however.

GB-Hijacker on 15 Jan 2017

It might indeed. From what it looks like, IE only produces a computer component if the structure is a valid multiblock, meaning if, after a chunk reload, the multiblock is not valid even for a single tick, it will produce no component; if it then produces a component afterwards, it will be considered a new one and thus have a new UUID. Calling @BluSunrize.

It is likely completely unrelated to the Hard Drive issue.

Vexatos on 15 Jan 2017

I'll update later today. I'm going to de-chunk load my base, grab some UUID info, and do some teleporting. Compare UUIDs for multublocks, vanilla, and OC components. I'll upload logs and maybe take a video if possible. My biggest concern is the hard drive reset as it's the most damaging to the entire setup.

GB-Hijacker on 16 Jan 2017

Two videos for the price of one! At the beginning of each video, I'll make some screen selections to show the system is working at the beginning of a test. For the server, it'll flash Tx or Rx on the screen to show if it's sending or receiving.

Video 1: https://youtu.be/GWehU-iEcQ8

The first video shows the HDD corruption issue. Out of my three running computers, it hit my diesel room interface rig. This required an OpenOS reinstall and then a download of my interface software from my pastebin backup. I also tagged each multiblock adapter with an analyzer and made a video note of the hard drive UUID. I also want to note that the squeezer and day tank 2 had to be recoded as their respective blocks had UUID resets and they are called address specific (Day tank 1 and the fermenter are also address specific, but they didn't reset). For some reason I was disconnected a couple times from the server after I returned to the base. This video only shows one of the disconnects. I turned off recording after the second disconnect, but I already had the BSOD on camera.

I apologize for the lack of audio in this run. There was too much ambient noise in the background and I hadn't turned off my mic. So rather than inundate you with the sounds of my wife making lunch and my obnoxious voice replying, I'll mute the video.

Server log is here:

http://pastebin.com/y1T63wqV

Client log is here:

http://pastebin.com/TKDAK4Nf

The big winner here is the tier 2 HDD in the diesel room.

HDD UUID before starts with 9d34212a and after the reload, it became 9ec68939

Video 2: https://youtu.be/SotdskpBz9A

On the second video, I ran the test again after having reinstalled all software. I cut out a section where I tried to teleport away, but the TP didn't result in any errors. I ended up walking to the built village near my town again. This time, both interface computers had turned off, and the server had a BSOD due to too many components connected.

Server log: http://pastebin.com/4hbJnkXN

My client didn't produce a log for this session.

I hope some of this helps! I'm not seeing anything glaring on top that is causing this to happen, but all I can do is pull as much data together as possible and shotgun it to you guys.

UPDATE: I logged back in to fix up the server and get all my components back in order, and my diesel room computer had another BSOD due to a HDD UUID reset.

GB-Hijacker on 16 Jan 2017

👍2

Thank you for all the testing you do; unfortunately, I still have not been able to reproduce an HDD UUID reset myself. Another thing to test would be whether EEPROMs or floppy disks inside of running computers can get reset as well or whether it applies only to HDDs. It also might be worth checking whether it applies to a RAID block containing HDDs. Floppies, HDDs and RAIDs all share a certain part of code so finding the issue appear on floppies and RAIDs would narrow it down a bit.

Vexatos on 16 Jan 2017

I have the same bug here. My HDDs' ID got changed. Here is my modpack: http://pastebin.com/dycgvAYB

merithes on 16 Jan 2017

@Vexatos I'm afraid I can't assist at all here. I have had no hand in the computer support in IE, all of that was contributed by @malte0811.
You got any ideas on this one, Malte?

BluSunrize on 16 Jan 2017

@Vexatos, I haven't noticed floppies resetting, but that's not to say it couldn't happen. As far as EEPROM goes, I can check those as well. Components can change UUIDs from my testing. I had to update modem addresses recently when testing these resets. EEPROMs could go unnoticed because they would still reset into a LUA EEPROM since they're a different entity than a standard EEPROM.

I have two floppies I keep in storage in my test lab, both of them have managed to go through all this testing without a reset. I'll mount one to my tier 3 rig and see if it resets this evening.

GB-Hijacker on 17 Jan 2017

https://youtu.be/8HiFmSLwJVE

I took a floppy with some info I didn't care to keep and tossed it in my tier 2 case. The UUID reset and I lost the two programs I had backed up on it. There's some running around in this to access the floppy drive of the diesel room comp as I keep it hidden. Also, I didn't note this in the video, but the modem's address also reset for that computer. So, this reset seems to be able to affect any OC component.

The floppy's first three for its address changed from c30 to 43b.

Server log: http://pastebin.com/CrdPmfJ1

Client log: http://pastebin.com/xJcfz7sN

I'm thinking of going back mod by mod in single player and instead of teleporting away, running a required number of chunks away to see if that's what causes this to happen. I'm thinking my original test of teleporting wasn't the proper way of going about this.

GB-Hijacker on 17 Jan 2017

👍2

I'm afraid I just can't seem to reproduce this (in dev env at least, where it'd actually help :P).

However, I found out that a hack for an oddity in 1.8.9 is no longer needed (and actually probably breaking things now), so I removed that. I honestly don't see how this would cause this, but hey, worth a shot -- could you please see if build 69 is less broken for you?

Thanks a ton for the time you've put into tracking this down and documenting it! I really appreciate it.

Edit: FWIW, it's also nothing so obvious as markDirty not being called, sadly.

fnuecke on 29 Jan 2017

@GB-Hijacker Any news on this? We are apparently both unable to reproduce this issue ourselves...

Vexatos on 3 Feb 2017

It keeps happening to me but I'm not active enough on my server to know the exact moment it resets

merithes on 3 Feb 2017

I've been pretty busy with work, and haven't had a chance to run any tests. I'll try to make some time this upcoming week.

GB-Hijacker on 3 Feb 2017

Tried to repro it in FTB Infinity Lite 1.6.0 with no success :/ If anyone who runs into this could upload the world it's happening in that'd be great, thanks.

fnuecke on 5 Feb 2017

@fnuecke, would a world that was based on FTB Skyfactory 3.0.6 (MC 10.2, OC 1.6.1.6) with an haunted OC computer usable for you?
Alternatively I've loaded it in a FTB Infinity Lite 1.6.0 and added a brdige between both bases.

P.S.
Components change UUID (and OC computer hangs) in northern base after going to the other base and back.

icestorm972 on 22 Feb 2017

Thanks a bunch! I'll give them a shot this weekend, let's hope something turns up.

fnuecke on 23 Feb 2017

First good news: got a repro with the SF save! Now to see if it still appears with the debugger attached, and then to see if I can find the why... so thanks again a lot for the save!

fnuecke on 24 Feb 2017

Allright, so after being able to reproduce it, and long time of stepping through with the debugger, I realized that it was indeed fixed with build 69 .-. So yeah, been fixed for a month, but nobody tested? Anyway, new version incoming.

fnuecke on 25 Feb 2017

Uhm. I tried to test it with dev 79 before I posted the world. But as my OC computer hung up on boot/02_os.lua (too long without yielding) I thought I did something wrong on updating the mod in the pack...
and as I checked the UUIDs by typing "=components.list()" in lua on the OC computer before, I probably didn't check if the UUIDs in the tooltip were staying the same...

icestorm972 on 25 Feb 2017

Ah, hah. Well. I should've just tested with the dev version sooner, so never mind ;)

For anyone interested in the technicalities, here's what happened:

In 1.8.9 there was a change in behavior, where writeToNBT suddenly got called after onChunkUnload. Which is dumb, because then cleanup runs before saving, leading to stuff saved as stopped where it was actually running. So the hacky fix was to set a flag in onChunkUnload, and if that was set, clean up in writeToNBT.
Now this is already super broken, which I didn't think of at that time, because in theory, writeToNBT might never be called (if the chunk isn't dirty). But nevermind that.
Now the order is correct again, so cleanup didn't properly happen anymore, because the flag was never set in writeToNBT. This is kinda bad, but not too bad. However!
If a node network crosses a chunk border, and the node in the unloaded chunk isn't cleaned up, it doesn't tell it's neighbor it's gone now... see where this is going?
Now if the chunk is loaded again without the neighbor chunk also getting unloaded, those nodes with the old addresses would still be present -- so to avoid duplicates, when loading, the nodes will get a new address. That was the bug.
But! To reproduce it, this means you have to walk exactly so far away that the one chunk is unloaded, but the other isn't. That's pretty hard to do unless you know exactly what you're looking for.
What made this a ton easier to reproduce tho, was the EIO conduits, which ghostload chunks they run through (by design), so no matter how far you went away, the neighboring chunk got reloaded right away, with no cleanup happening.

So yeah. Fun times. Just wanted to get this off my chest :P

fnuecke on 25 Feb 2017

👍4 🎉3

Sorry I haven't been as helpful the past month. My new job has been a bit more taxing than I gave it credit, and we've had the flu swirling around the house for a couple weeks now.

I'm not running EIO conduits, so I'm curious what I could be running that's causing similar activity. If need be, I can grab my instance off my server and upload it. I'm running IE wires and conveyors and OC cables for energy/info/item movements.

GB-Hijacker on 26 Feb 2017

@GB-Hijacker fnuecke wasn't saying eio conduits are necessary for repro, just that they make repro more likely

payonel on 26 Feb 2017

That. Also, they're not the only tile entity in the world of modding by far that does ghostloading!
Edit: also number two, if you have an OC network running across multiple chunks (i.e. long running cables) the likelihood increases that at least one of them is still loaded when walking away and coming back, also making this issue more easy to reproduce.

fnuecke on 26 Feb 2017

Was this page helpful?

0 / 5 - 0 ratings