Turicreate: Kernel dies when trying to train object detection model with radeon 560 gpu

Created on 5 Dec 2018  路  27Comments  路  Source: apple/turicreate

I have a 2018 Mac book pro with a radeon 560 gpu. I can train a object detection model with cpu using turicreate and use my radeon gpu for training other models using "plaid", but every time I try to train using turicreate and it chooses my AMD Radeon 560 gpu my kernel dies before training starts and I get this message:

/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MetalPerformanceShaders-121.1.1/MPSCore/Utility/MPSLibrary.mm:218: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function cnnConv_Update_32x64.
Compiler encountered an internal error: (null)

I'm on turicreate 5.2.

Thanks

object detection p1 toolkits

Most helpful comment

Same here. Tried going back to Mojave 10.14 but can't download it from Apple... and I don't want to go back to High Sierra so I guess it's a waiting game for now... disappointing... having bought this new Mac book pro 2018 because It could run turicreate with a lot of succes.... wait...wait...wait...for a fix.

All 27 comments

@HugoLamarreFTC I am getting the same error. Are you using Mojave? if so what version?

I am on 10.14.2
turicreate 5.2
same machine as above.

10.14.2 Beta (18C52a)

I have the same issue too. 2018 MBP 15" with touchbar, Radeon Pro 560X. Turicreate 5.2.

I am almost certain it is an issue with 10.14.2. My training py script was running before using GPU, once I updated to 10.14.2, it fails with the above message.

Version 10.14.3 Beta (18D21c) is not fixing it. Maybe I'll try going back with earlier version...

Experiencing the same issue. For now I've reverted to CPU processing.

Same here. Tried going back to Mojave 10.14 but can't download it from Apple... and I don't want to go back to High Sierra so I guess it's a waiting game for now... disappointing... having bought this new Mac book pro 2018 because It could run turicreate with a lot of succes.... wait...wait...wait...for a fix.

The same issue here!

>>> model = tc.object_detector.create(train_data)
Using 'image' as feature column
Using 'annotations' as annotations column
Downloading https://docs-assets.developer.apple.com/turicreate/models/darknet.params
Download completed: /var/folders/b0/s13jxsdx11j2l61qfbh3lkn40000gp/T/model_cache/darknet.params
Setting 'batch_size' to 32
Using GPU to create model (AMD Radeon RX Vega 56)
Setting 'max_iterations' to 1000
/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MetalPerformanceShaders-121.1.1/MPSCore/Utility/MPSLibrary.mm:218: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function cnnConv_Update_32x32.
    Compiler encountered an internal error: (null)
'
Abort trap: 6

Mojave 10.14.2(18C54), MacPro5,1 (2012 mid) with third-party graphics card Radeon RX Vega 56 got this issue above.

Third-party graphics cards for Mac Pro

But it return to works fine when use another third-party graphics card Radeon R9-280X / Mojave 10.14.2.

I guess it's a bug if use latest GCN architecture graphics card (Polaris / Vega). Hope it could be fixed soon.

@HugoLamarreFTC I assume you mean you have Radeon 560X? I believe you either have a 2018 MBP with Radeon 560X or a 2017 MBP with Radeon 560.

I've seen issues with 560X but not with Radeon 560 (2017 MBP) or Radeon Pro Vega 64 (2017 iMac Pro). We're investigating, but at this time our best guess is that this is something at the Metal layer or lower, so may require a future macOS.

Thanks for your efforts. Depending on where you check on the Mac the Radeon info differs !?!... But I finally got the right info: So I have a 2018 MacBook Pro 15-inch with Radeon Pro 555X (4GB of GDDR5 memory). I'll keep you informed when future macOS update come out, I install all betas to do development.

Same error here on the Radeon 560X with macOS 10.14.2

Same error.

Materializing SFrame
Creating model...
Setting 'batch_size' to 16
Using GPU to create model (AMD Radeon Pro 455)
Setting 'max_iterations' to 3000
/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MetalPerformanceShaders-121.1.1/MPSCore/Utility/MPSLibrary.mm:218: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function cnnConv_Update_32x64.
    Compiler encountered an internal error: (null)
'
Abort trap: 6

MacBook Pro (15-inch, 2016)
2.7 GHz Intel Core i7
16 GB 2133 MHz LPDDR3
Radeon Pro 455 2048 MB
Intel HD Graphics 530 1536 MB

same error here, I'm using dual SAPPHIRE AMD Radeon RX Vega 64 egpu
Any good news on this issue so far? @HugoLamarreFTC

Not really... Just tested it with 10.14.3 Beta (18D38a) and still having same problem... Guess I'll keep trying at every OS release. Keep posting info here.

Well, I got my hands on a 2018 MBP with Radeon Pro 560X and I can definitely repro this 100% of the time. With the exact same data, I have no problem on my 2017 MBP with Radeon Pro 560 (non-X). I'll be following up to see that this gets fixed, probably somewhere beneath us....

Also have the same problem.

macOS 10.14.2
MacBook Pro (15-inch, 2018)
Radeon Pro 555X 4096 MB

Getting the same error with an MBP and it showing correct type. I am on turicreate 5.2
If I set the number of GPU to zero it works fine.
Here is the output:

Using 'image' as feature column
Using 'annotations' as annotations column
Setting 'batch_size' to 32
Using GPU to create model (AMD Radeon Pro 460)
Setting 'max_iterations' to 2000
/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MetalPerformanceShaders-121.1.1/MPSCore/Utility/MPSLibrary.mm:218: failed assertion `MPSKernel MTLComputePipelineStateCache unable to load function cnnConv_Update_32x64.
Compiler encountered an internal error: (null)
'
/Users/turbogeek/.anaconda/navigator/a.tool: line 1: 91466 Abort trap: 6 /Users/turbogeek/anaconda3/envs/turicreate/bin/ipython -i

Same issue here:

Radeon Pro 580 8192MB
Python 3.6.5.1
turicreate 5.2.1
macOS 10.14.2

@HugoLamarreFTC Let me know if you still see this issue after upgrading to 10.14.4 beta!

Works using 10.14.4 Beta (18E174f) !!!
Problem solved for my situation.

Is there a solution for those of us who aren't on the macOS 10.14.4 beta?

It returned to work in 10.14.4 Beta (18E174f)

Radeon RX Vega 56
Python 3.6.5
turicreate 5.2.1

Great!

I am running 10.14.3 with 5.2.1 and still crashing
AMD Radeon Pro 460

@turbogeek Yes, I think the relevant macOS-side fixes only went in with 10.14.14

@elbowdonkey As I understand it, the issues are at the AMD driver level, so I don't see a reasonable way for us to workaround these issues on our end, while still using the GPU. Of course, you should always be able to train on CPU using turicreate.config.set_num_gpus(0) but that will obviously be slower.

Please do let us know if anyone sees this crash on 10.14.14 or later (note what GPU you have)! (And of course, let us know if you encounter any other issues running on 10.14.14, especially while it's still in preview and there's hope of fixing any issues specific to that macOS release!)

@nickjong I believe you mean this bug is fixed in 10.14.4, right?

@dmcgloin Yes, I believe this bug is fixed in 10.14.4. Let me know if you experience otherwise! (I don't know precisely what GPUs were affected or fixed. Certainly Radeon Pro 560X.)

Encountered the same issue with a AMD Radeon Pro 460 on 10.14.3. Can confirm issue is resolved with 10.14.4 beta.

Encountered the same issue with an AMD Radeon Pro 455 on 10.14.3. Can confirm issue is resolved with 10.14.4 beta.

Was this page helpful?
0 / 5 - 0 ratings