Turicreate: How to solve this problem MemoryError: std::bad_alloc?

Created on 26 Feb 2019  路  5Comments  路  Source: apple/turicreate

50 million data can run normally,but some errors have occurred in 20 million data, like this MemoryError: std::bad_alloc

wrong reported:
Inferred types from first 100 line(s) of file as
column_type_hints=[int,int,float]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
Unable to parse line "5161,\N,1.0"
Unable to parse line "1025,\N,1.0"
Unable to parse line "1025,\N,1.0"
Unable to parse line "1025,\N,1.0"
Unable to parse line "1025,\N,1.0"
Unable to parse line "13922,\N,1.0"
Unable to parse line "13922,\N,1.0"
Unable to parse line "13922,\N,1.0"
Unable to parse line "14095,\N,1.0"
Unable to parse line "14095,\N,1.0"
Read 3391871 lines. Lines per second: 2.48496e+06
32 lines failed to parse correctly
Finished parsing file /media/dataset/conjunction.csv
Parsing completed. Parsed 22357436 lines in 4.42243 secs.
Finished parsing file /media/dataset/videos.csv
Parsing completed. Parsed 100 lines in 0.035012 secs.
Finished parsing file /media/dataset/videos.csv
Parsing completed. Parsed 17735 lines in 0.03144 secs.
Split trainingSet and testSet success!
TrainSet = 22327983
TestSet = 29453
+---------+--------+--------+--------+
| movieId | userId | rating | tname |
+---------+--------+--------+--------+
| 2 | 28 | 1.0 | Makeup |
| 2 | 28 | 1.0 | Makeup |
| 2 | 122 | 1.0 | Makeup |
| 2 | 3 | 1.0 | Makeup |
| 2 | 308 | 1.0 | Makeup |
| 2 | 67 | 2.5 | Makeup |
| 2 | 414 | 1.0 | Makeup |
| 2 | 225 | 1.0 | Makeup |
| 2 | 12 | 2.5 | Makeup |
| 2 | 70 | 1.0 | Makeup |
+---------+--------+--------+--------+
[22327983 rows x 4 columns]

Preparing data set.
Data has 22327983 observations with 154554 users and 60973 items.
Data prepared in: 14.2716s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 2.454ms | 0.5 |
| 435.039ms | 100 |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in 2 passes using dense lookup tables.
Traceback (most recent call last):
File "cosmodel.py", line 221, in
tr.train('/media/dataset/','cosmodel/')
File "cosmodel.py", line 109, in train
self.calc_sim_matrix()
File "cosmodel.py", line 79, in calc_sim_matrix
self.item_sim_model = tc.item_similarity_recommender.create(self.trainSet, user_id='userId', item_id='movieId',target="rating",similarity_type='jaccard')
File "/usr/local/anaconda3/envs/turi/lib/python3.6/site-packages/turicreate/toolkits/recommender/item_similarity_recommender.py", line 257, in create
model_proxy.train(observation_data, user_data, item_data, opts, extra_data)
File "/usr/local/anaconda3/envs/turi/lib/python3.6/site-packages/turicreate/extensions.py", line 290, in
ret = lambda args, *kwargs: self.__run_class_function(name, args, kwargs)
File "/usr/local/anaconda3/envs/turi/lib/python3.6/site-packages/turicreate/extensions.py", line 274, in __run_class_function
ret = self._tkclass.call_function(fnname, argument_dict)
File "turicreate/cython/cy_model.pyx", line 35, in turicreate.cython.cy_model.UnityModel.call_function
File "turicreate/cython/cy_model.pyx", line 40, in turicreate.cython.cy_model.UnityModel.call_function
MemoryError: std::bad_alloc

correct report:
turi) root@APP-SF-01:/media/dataset# python isview.py
Finished parsing file /media/dataset/conjunction.csv

Parsing completed. Parsed 100 lines in 0.720085 secs.

Inferred types from first 100 line(s) of file as
column_type_hints=[int,int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in

the column_type_hints argument

Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Read 2067120 lines. Lines per second: 1.8101e+06
Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Unable to parse line "userId,movieId,isViewed(rating),timestamp"
Read 22334431 lines. Lines per second: 3.58344e+06
Read 42600526 lines. Lines per second: 3.78682e+06
85 lines failed to parse correctly
Finished parsing file /media/dataset/conjunction.csv
Parsing completed. Parsed 53691947 lines in 13.7157 secs.
Finished parsing file /media/dataset/videos.csv
Parsing completed. Parsed 100 lines in 0.034708 secs.
Finished parsing file /media/dataset/videos.csv
Parsing completed. Parsed 17735 lines in 0.031793 secs.
Split trainingSet and testSet success!
TrainSet = 53532199
TestSet = 159748
+--------+---------+------------------+------------+--------+
| userId | movieId | isViewed(rating) | timestamp | tname |
+--------+---------+------------------+------------+--------+
| 71 | 131 | 1 | 1527424458 | None |
| 71 | 166 | 1 | 1527424555 | Makeup |
| 71 | 384 | 1 | 1527424819 | Talent |
| 71 | 94 | 1 | 1527424880 | Sport |
| 4 | 96 | 1 | 1527427424 | Funny |
| 4 | 44 | 1 | 1527427427 | Music |
| 4 | 44 | 1 | 1527427428 | Music |
| 4 | 99 | 1 | 1527427431 | Game |
| 105 | 563 | 1 | 1527427609 | Talent |
| 4 | 99 | 1 | 1527427642 | Game |
+--------+---------+------------------+------------+--------+
[53532199 rows x 5 columns]

Warning: Ignoring columns timestamp;
To use these columns in scoring predictions, use a model that allows the use of additional features.
Preparing data set.
Data has 53532199 observations with 69776 users and 19908 items.
Data prepared in: 31.8395s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 28.15ms | 1.25 |
| 65.866ms | 100 |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 787.358ms | 0 | 0 |
| 3.82s | 11.75 | 2377 |
| 6.80s | 23 | 4605 |
| 9.79s | 34.25 | 6830 |
| 12.79s | 45.5 | 9072 |
| 15.81s | 57 | 11373 |
| 18.80s | 68.5 | 13637 |
| 21.80s | 79.5 | 15874 |
| 24.79s | 91.75 | 18308 |
| 29.69s | 100 | 19908 |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 32.4772s

question recommender toolkits

Most helpful comment

There's a parameter in the item similarity model create method called target_memory_usage. By default it's 8GB -- try setting it to 2GB or something lower.

All 5 comments

There's a parameter in the item similarity model create method called target_memory_usage. By default it's 8GB -- try setting it to 2GB or something lower.

There's a parameter in the item similarity model create method called target_memory_usage. By default it's 8GB -- try setting it to 2GB or something lower.

thanks a lot ,'ll try it now

There's a parameter in the item similarity model create method called target_memory_usage. By default it's 8GB -- try setting it to 2GB or something lower.

It works! Thank you!

This is a great thread. Thanks! I was dealing with the same issue.

Just posting the link to the line of code so it's easy for people to see if needed:

https://github.com/apple/turicreate/blob/46ca01b6baace808b32c3b7f67bdba6ca494efde/src/python/turicreate/toolkits/recommender/item_similarity_recommender.py#L26

Was this page helpful?
0 / 5 - 0 ratings