Wp-calypso: Import: Images are sometimes not imported or backfilled

Created on 14 Aug 2017  路  79Comments  路  Source: Automattic/wp-calypso

Sometimes, when an import wraps up, all of its images aren't imported. Other times, the images appear to be imported, but the references to them in their original paths are not updated to new paths at wordpress.com. When the source site is taken offline, this can result in broken images, and indeed in irrevocably lost images if the content owner doesn't have backups of the images.

Historically, we've had a series of scripts we could run to try to fix up as many of these cases as possible, but we've never properly solved the root cause. This is more of an API issue than a calypso issue, and indeed it'll affect wp-admin imports too. Maybe it makes sense to move it to trac, but my worry is that trac isn't as visible as this issue tracking system, so I'm starting here.

Unfortunately, the issue can be caused by a number of things, from transient failures in our async jobs system to downtime on the source site, to intentional throttling by source service providers. I am skeptical that we'll be able to develop consistently reproduction steps. This issue therfore is designed to collect reports so that we can at least get a sense of how frequently it's coming up, and hopefully to get the issue escalated.

Some related background: p195om-3uN-p2

Some recent reports:

  • p2UL9c-3xX-p2
  • p2UL9c-3xr-p2
  • p2UL9c-3vQ-p2
Import [Type] Bug

Most helpful comment

We know the pain incomplete media imports can cause people. As @dllh says in the intro, there can be several different reasons why they happen. WordPress.com has a team of developers taking a close look at the whole process in the back end, trying to figure out the most common causes and address them.

Our aim is to make media imports generally more robust and reliable. Along the way, we're fixing some of the issues people have noticed, like when the link between posts and media files is sometimes lost, and or when featured images go missing. Our work in this area continues - we hope people will soon start to see the results in Calypso!

p3Ex-3qr-p2

All 79 comments

Hey there! Rachel recommended that I mention this issue here as it looks like what my user is experiencing. Any tips would be greatly appreciated so I can give the user some type of reply soon! Thank you!

p1503466102000019-slack-triage

Just to expand on a few more of the details here... The images down the home page of [redacted] are not appearing, such as under Services. This seems to be related to an issue David N. brought to my attention regarding Photon requests pointing to the incorrect domain and path.

We received a user report in 709826-zd-woothemes. The img src referenced the old URL after import; the issue was fixed manually with a script in this case (ref: p1506710605000247-slack-triage).

ticket is 767612-zd-woothemes

Moved from no-tar-sands.org to WordPress.com site notarsandsblog.wordpress.com. Is waiting to use the no-tar-sands.org domain here, since many images are still at the old host, such as the first image here:
https://notarsandsblog.wordpress.com/2016/12/09/stay-vigilant-keep-standing-with-standing-rock/

Do we have scripts to both make sure all media gets moved here (user said "there were 430 items in the media library on the old site and only 173 imported to the new one" ) and then update all media links in their content itself?

@rachelmcr seems like you are the best person to ping - thanks for any help!

Adding a note for the above from @spncrb - the customer is eager to change the name servers over asap to keep emails online. I've asked for some more detail there to see what is happening today that might impact email, in case that would also take the images they need offline

Adding another note to @spncrb comment that their hosting expires next Tuesday and they need to have it fixed before that.
I have asked them to import again only the posts from 2016 and check if the image on post https://notarsandsblog.wordpress.com/2016/12/09/stay-vigilant-keep-standing-with-standing-rock/ was moved over, in case that breaking the import helps.

Adding another note to 767612-zd. User is feeling stressed with their hosting running out tomorrow. Are we able to help before the hosting expires? @dllh Is there someone else we can ping, as I think these types of issues are handled by a different team now? Thanks

There's not another team handling the one-off fixes (we're not doing them anymore). We've shifted investigation of the underlying cause away from HG but I don't believe a team has picked it up yet.

Adding this ticket: 789482-zen

This came up again in 44039-hc.

Discussion in p195om-3BU#comment-15747 and p1511181091000127-slack-triage.

884861-f

This came up again in 824290-zen.

Another report in p2EDhh-j6-p2

Another one where it seems not all the images got imported 825044-zen

Another ticket here, from a WordPress.com-to-WordPress.com import. 813456-zen

Another ticket here in 825715-zen - As workaround for this user, I clicked on the blank image in the editor and the image displayed in Edit mode. After clicking Update, it added the image back to the Editor, so even when the reference to the image was wrong, it did pull the image after all.

This issue occurred in 843085-zen. User downgraded from Atomic to Simple and the uppercase letters in the image paths were not converted to lowercase. I was able to resolve this issue with the post-import-backfill-attachments.php script.

This issue occurred again during an downgrade from Atomic. p8yzl4-14y-p2

Updating from @nellofn's comment above, 825715-zen is now here: 885286-zen

Two questions:

  1. @kevmarsden are you able to use your super script powers on this one?
  2. When this import issue is fixed, do we expect it will retroactively backfill the sites that have already been affected? If not, could/should we have someone (or a list of someones) with a sandbox in place to monitor and fix these up as they come in for now?

I worry that we're collecting customer tickets here, and those customers are waiting for a fix that might not actually restore their images if it only addresses future imports.

@chad1008

  1. I ran a script to update the image src that were pointing to ../wp-content/uploads. But there are still 100+ images that didn't get imported. I posted about it on p8yzl4-17p-p2
  1. We probably won't be able to retroactively backfill images, so it's best not to make any promises. In the meantime, I'm trying to figure out a temporary workflow.

A user reported image URLs not backfilled correctly after an import when reverting from an Atomic to Simple site. It's resolved for this user; reporting here to track frequency.

Internal refs: 886658-zen, 1462823-hc, p2EDhh-jN-p2

Another issue with an Atomic revert. I was able to fix it with a script from my sandbox. p8yzl4-18o-p2

Another issue here 885591-zen

Each media file failed to import, while everything else moved over. The original site is still live. @kevmarsden is this something you might be able to help with?

Another site had an issue after reverting from Atomic. I was able to fix it with import scripts. Ticket: 903221-zen

User reports that they imported three weeks of content from one WP.com site into another, existing WP.com site. The posts came over, but the media did not, and the images embedded in the imported posts are loading from the original site, which is still live. The user can manually upload what they say are 3,800 media items but would still need the links updated. @kevmarsden, could you take a look?

928324-zen & 1662819-hc

The user's media seem to have been imported but are showing empty on blog posts.

931967-zen

Another user came in chat and here is a follow up ticket: 950411-zen

The images are still pointing to a previous host and some images are not coming through as featured images

950276-zen

Images in posts were still referencing the self-hosted site rather than being pulled into WordPress.com and attached to the posts.

Again in 952255-zen and p2EDhh-l1-p2.

The images were imported but were referencing the older site. (This has been fixed now but adding to make sure it's tracked.)

1084856-zen

User is trying to migrate content from hoochiekoochie.skynetblogs.be (not a WordPress website) to hoochiekoochie.blog. They successfully imported the content but images did not import. I attempted a few other imports into a test website with no luck. The user's original website account will be shut down at the end of the year and they want to save the images.

@kevmarsden, would you be willing to take a look?

@benchilcote I was able to import the images and update the references using a script from my sandbox (image-import.php). I updated the user in 1084856-zen

Another issue was reported here: 1126731-zen
Replacing the URL manually works as the images do seem to have imported, but that might be quite a bit of work for the user. Would like to know if there's a way we could update this for them using the script @kevmarsden done in the above by Ben?

Another report in https://en.forums.wordpress.com/topic/some-photos-missing/ - images imported to library, but src URLs not rewritten.

For prosperity: I've helped with the last two issues, but wasn't able to help with this one due to the way the image paths are formatted. ^

Another one reported in chat here: 3989459-hc Can we please try to run the script to clean this up for the user?

Sent follow-up here: 1192901-zen
Site: ischerzo.com

1192637-zen, 3987868-hc
Mp3 filepaths did not update on an import from Hostgator to WordPress.com for https://wealthisnotmoney.wordpress.com. The filepaths were being used in the audio shortcode. @kevmarsden was able to repair them.

Issue reported in 1213806-t

The user has imported a site to WordPress.com. Gallery images are missing from blog posts:

Example Post on WordPress.org:
https://www.whatevernevermind.com/shrinkwrapped-whatkatiedoes-neverending/

Example Post on WordPress.com: https://whatevernevermind339609687.wordpress.com/2009/01/27/shrinkwrapped-whatkatiedoes-neverending/

The image exists in the image library but has not been added to the gallery in the post.

@kevmarsden could you please take a look?

Update July 9, 2018: The user has deleted old posts on their page and are interested in moving back to Dreamhost so the issue no longer exists.

While doing an Atomic revert, I needed to user the image-import.php script to import the images. 1263011-zen

This came up in p2EDhh-se-p2 too. The user moved from .org to .com and, although the images were imported to the Media Library, the images still used the URL of the old site. I was able to run a script to update the URLs.

Another case in p2EDhh-si-p2.

Another issue in p2EDhh-ss-p2. This one was for an export/import between WordPress.com sites. The image URLs still referenced the old WordPress.com site.

Another issue with an import between WordPress sites - 1276607-zen.

Another: 1297249-zen

Another 294817-h

User reports missing photos after import (which she was able to re-import from Google Photos), and none of the photos are appropriately attached to her imported posts.

Missing some photos after import, and image URLs in post reflect the old site: 1316685-zen

Another report: 1336432-zen
Some photos are missing from the import, but that's likely due to them not being in the XML file that was originally exported for some reason.

Another report: 1354776-zen

The media library is missing 11 of the original site's 79 images (possible they were just unattached images).

In addition, the src URLs are referencing the original site instead of the wp.com file locations.

Note: the domain name has been moved over, but there is a temporary domain active on the original self hosted site (please see the linked ticket for those URLs)

Another report: 5959271-hc

Their images are showing up in their media library, but the image links are all pointed to their old (no longer available) .org file format - /wp-content/uploads/

As a note, do we have an easy resolution for this on simple sites? @kevmarsden, seeing you as a go-to ping here, so... Anything that can be done here?

Followup: 1371530-zen

Another report: 1380258-zen

Image paths in blog posts are appearing as /wp-content/uploads instead of the WPcom path format.
The images themselves appear to have been imported.

Update on the previous: 1380258-zen ; Not all images were imported - some are missing.
Images that were in a sized format (cropped, etc..) didn't come in - which makes it harder to manually replace the image paths from the wporg to the wpcom format.

Another report: 6069173聽-hc, 1381646-zen

Image paths in blog posts are appearing as /wp-content/uploads instead of the WPcom path format.
The images themselves appear to have been imported.

Another report: #6048425-hc, #1379807-zen, #6048425-hc

All from the same user. Content exported from an AT site and imported into a simple site. Images are still referencing back to the reverted AT site.

Another report: #1517810-zen (created from #7534292-hc)

Content imported from self-hosted. Images were all imported, but img srcs in blog posts stayed as /wp-content/uploads instead of the WPcom path format.

Another report:

1536032-zen

Images are imported but all src's on posts still point to the self-hosted WordPress.org site.

Another example where images URLs were not updated in post content during import from a self-hosted site. 1538854-zen

Another example where images weren't updated correctly in 1543139-z.

Another example in 1548127-z.

I just made a request for 1549815-zen

Another example in 1585987-z.

Another example in 1590119-zen

6648934-hc

Another one 1562445-zen

8465016-hc

Also came up in p2EDhh-BG-p2.

9102509-hc

1701671-zen

1721601-zen

Came up again in p2EDhh-EL-p2.

A couple more cases:

  • p2EDhh-EP-p2
  • p2EDhh-EN-p2

This came up again in 1783827-zen.

Again in 1832421-zen after a user's Atomic site was reverted.

Also came up in these P2 posts:

  • p2EDhh-Gz-p2
  • p2EDhh-GC-p2

We know the pain incomplete media imports can cause people. As @dllh says in the intro, there can be several different reasons why they happen. WordPress.com has a team of developers taking a close look at the whole process in the back end, trying to figure out the most common causes and address them.

Our aim is to make media imports generally more robust and reliable. Along the way, we're fixing some of the issues people have noticed, like when the link between posts and media files is sometimes lost, and or when featured images go missing. Our work in this area continues - we hope people will soon start to see the results in Calypso!

p3Ex-3qr-p2

Other cases:
p2EDhh-It-p2
p2EDhh-Ip-p2
p2EDhh-Ir-p2
p2EDhh-Jc-p2
p2EDhh-IP-p2

Another case here: p2EDhh-L2-p2

Another one here: p2EDhh-L9-p2

This came up again in 2146804-zen and discussed in p2EDhh-OZ-p2.

I have broken links in the hard reverted atomic site after importing from the backup site 2191218-hc

Was this page helpful?
0 / 5 - 0 ratings