Describe the bug
It has been observed that DataFrame.as_gpu_matrix(order='C') api is slower than DataFrame.as_gpu_matrix() and then transform() twice. However, both of them yield the same array.
Steps/Code to reproduce bug
import cudf
import numpy as np
import time
n = 1000000
m = 50
gdf = cudf.DataFrame()
for i in range(m):
gdf[i] = np.random.random_sample(n)
for c in gdf.columns:
gdf[c]=gdf[c].astype(np.float32)
st = time.time()
gmat = gdf.as_gpu_matrix(order='C')
print("time taken", time.time()-st)
time taken 247.8072156906128
print(gmat.flags)
{'F_CONTIGUOUS': False, 'C_CONTIGUOUS': True}
st = time.time()
mat = gdf.as_gpu_matrix()
tmat = mat.transpose().transpose()
print("time taken", time.time()-st)
time taken 0.520819902420044
print(tmat.flags)
{'F_CONTIGUOUS': False, 'C_CONTIGUOUS': True}
#check the values
np.array_equal(gmat.copy_to_host(), tmat.copy_to_host())
True
Environment details
<details><summary>Click here to see environment details</summary><pre>
**git***
Not inside a git repository
***OS Information***
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Linux a293a33e114b 4.4.0-154-generic #181-Ubuntu SMP Tue Jun 25 05:29:03 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
***GPU Information***
Fri Aug 2 03:45:22 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... On | 00000000:05:00.0 Off | 0 |
| N/A 36C P0 43W / 300W | 2507MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... On | 00000000:06:00.0 Off | 0 |
| N/A 33C P0 44W / 300W | 293MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-SXM2... On | 00000000:84:00.0 Off | 0 |
| N/A 28C P0 35W / 300W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-SXM2... On | 00000000:85:00.0 Off | 0 |
| N/A 35C P0 44W / 300W | 5890MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
***CPU***
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2650L v4 @ 1.70GHz
Stepping: 1
CPU MHz: 1999.957
CPU max MHz: 2500.0000
CPU min MHz: 1200.0000
BogoMIPS: 3402.97
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 35840K
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
***CMake***
/conda/bin/cmake
cmake version 3.14.5
CMake suite maintained and supported by Kitware (kitware.com/cmake).
***g++***
/usr/bin/g++
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
***nvcc***
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
***Python***
/conda/bin/python
Python 3.7.3
***Environment Variables***
PATH : /conda/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LD_LIBRARY_PATH : :/usr/local/cuda/lib64:/usr/local/lib:/conda/lib:/usr/local/cuda/lib64:/usr/local/lib
NUMBAPRO_NVVM : /usr/local/cuda/nvvm/lib64/libnvvm.so
NUMBAPRO_LIBDEVICE : /usr/local/cuda/nvvm/libdevice/
CONDA_PREFIX :
PYTHON_PATH :
***conda packages***
/conda/bin/conda
# packages in environment at /conda:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
absl-py 0.7.1 pypi_0 pypi
arrow-cpp 0.12.1 py37h0e61e49_0 conda-forge
asn1crypto 0.24.0 py37_0
astor 0.8.0 pypi_0 pypi
atomicwrites 1.3.0 py_0 conda-forge
attrs 19.1.0 py_0 conda-forge
backcall 0.1.0 py_0 conda-forge
blas 1.0 mkl
bleach 3.1.0 py_0 conda-forge
bokeh 1.2.0 py37_0
boost 1.68.0 py37h8619c78_1001 conda-forge
boost-cpp 1.68.0 h11c811c_1000 conda-forge
boto 2.49.0 pypi_0 pypi
boto3 1.9.162 py_0
botocore 1.12.163 py_0
bzip2 1.0.7 h7b6447c_0
ca-certificates 2019.5.15 0
certifi 2019.6.16 py37_0
cffi 1.12.3 py37h8022711_0 conda-forge
chardet 3.0.4 py37_1
clangdev 8.0.0 hc9558a2_2 conda-forge
click 7.0 pypi_0 pypi
cloudpickle 1.2.1 py_0 conda-forge
cmake 3.14.5 hf94ab9c_0 conda-forge
cmake-setuptools 0.1.3 pypi_0 pypi
conda 4.7.10 py37_0
conda-package-handling 1.3.11 py37_0
cryptography 2.6.1 py37h1ba5d50_0
cudatoolkit 10.0.130 0
cudf 0.8.0 pypi_0 pypi
cuml 0.8.0 pypi_0 pypi
curl 7.62.0 hbc83047_0
cycler 0.10.0 pypi_0 pypi
cython 0.29.12 py37he1b5a44_0 conda-forge
cytoolz 0.10.0 py37h516909a_0 conda-forge
dask-core 2.1.0 py_0 conda-forge
dask-cudf 0.0.0.dev0 pypi_0 pypi
dask-cuml 0.8.0 pypi_0 pypi
decorator 4.4.0 py_0 conda-forge
defusedxml 0.5.0 py_1 conda-forge
distributed 2.1.0 py_0 conda-forge
docutils 0.14 py37_0
entrypoints 0.3 py37_1000 conda-forge
expat 2.2.5 he1b5a44_1003 conda-forge
freetype 2.9.1 h8a8886c_1
gast 0.2.2 pypi_0 pypi
gensim 3.8.0 pypi_0 pypi
google-pasta 0.1.7 pypi_0 pypi
grpcio 1.22.0 pypi_0 pypi
h5py 2.9.0 pypi_0 pypi
heapdict 1.0.0 py37_1000 conda-forge
icu 58.2 hf484d3e_1000 conda-forge
idna 2.8 py37_0
importlib_metadata 0.18 py37_0 conda-forge
intel-openmp 2019.4 243
ipykernel 5.1.1 py37h24bf2e0_0 conda-forge
ipython 7.6.1 py37h5ca1d4c_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
jedi 0.14.1 py37_0 conda-forge
jinja2 2.10.1 py_0 conda-forge
jmespath 0.9.4 py_0
joblib 0.13.2 py37_0
jpeg 9b h024ee3a_2
jsonschema 3.0.1 py37_0 conda-forge
jupyter_client 5.3.1 py_0 conda-forge
jupyter_core 4.4.0 py_0 conda-forge
jupyterlab 0.35.5 py37hf63ae98_0
jupyterlab_server 0.2.0 py37_0
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.0 pypi_0 pypi
kiwisolver 1.1.0 pypi_0 pypi
libarchive 3.3.3 h5d8350f_5
libcurl 7.62.0 h20c2e04_0
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 8.2.0 hdf63c60_1
libgfortran-ng 7.3.0 hdf63c60_0
libpng 1.6.37 hbc83047_0
libprotobuf 3.6.1 hdbcaa40_1001 conda-forge
libsodium 1.0.17 h516909a_0 conda-forge
libssh2 1.8.2 h22169c7_2 conda-forge
libstdcxx-ng 8.2.0 hdf63c60_1
libtiff 4.0.10 h2733197_2
libuv 1.30.1 h516909a_0 conda-forge
libxml2 2.9.9 hea5a465_1
llvmlite 0.29.0 py37hf484d3e_0 numba
lz4-c 1.8.1.2 h14c3975_0
lzo 2.10 h49e0be7_2
markdown 3.1.1 pypi_0 pypi
markupsafe 1.1.1 py37h14c3975_0 conda-forge
matplotlib 3.1.1 pypi_0 pypi
mistune 0.8.4 py37h14c3975_1000 conda-forge
mkl 2019.4 243
mkl-service 2.0.2 py37h7b6447c_0
mkl_fft 1.0.13 py37h516909a_1 conda-forge
mkl_random 1.0.4 py37hf2d7682_0 conda-forge
more-itertools 7.1.0 py_0 conda-forge
msgpack-python 0.6.1 py37h6bb024c_0 conda-forge
nbconvert 5.5.0 py_0 conda-forge
nbformat 4.4.0 py_1 conda-forge
ncurses 6.1 he6710b0_1
ninja 1.9.0 py37hfd86e86_0
notebook 5.7.8 py37_1 conda-forge
numba 0.43.0 pypi_0 pypi
numpy 1.16.4 py37h7e9f1db_0
numpy-base 1.16.4 py37hde5b4d6_0
nvgraph 0.1.0.dev0 cuda10.0_9 nvidia/label/cuda10.0
nvstrings-cuda100 0.0.0.dev0 pypi_0 pypi
olefile 0.46 py37_0
openssl 1.1.1c h7b6447c_1
opt-einsum 2.3.2 pypi_0 pypi
packaging 19.0 py_0 conda-forge
pandas 0.23.4 py37h637b7d7_1000 conda-forge
pandoc 2.7.3 0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
parquet-cpp 1.5.1 4 conda-forge
parso 0.5.0 py_0 conda-forge
pexpect 4.7.0 py37_0 conda-forge
pickleshare 0.7.5 py37_1000 conda-forge
pillow 6.0.0 py37h34e0f95_0
pip 19.0.3 py37_0
pluggy 0.12.0 py_0 conda-forge
prometheus_client 0.7.1 py_0 conda-forge
prompt_toolkit 2.0.9 py_0 conda-forge
protobuf 3.9.0 pypi_0 pypi
psutil 5.6.3 py37h516909a_0 conda-forge
ptyprocess 0.6.0 py_1001 conda-forge
py 1.8.0 py_0 conda-forge
pyarrow 0.12.1 py37hbbcf98d_0 conda-forge
pycosat 0.6.3 py37h14c3975_0
pycparser 2.19 py37_0
pygments 2.4.2 py_0 conda-forge
pyopenssl 19.0.0 py37_0
pyparsing 2.4.0 py_0 conda-forge
pyrsistent 0.15.3 py37h516909a_0 conda-forge
pysocks 1.6.8 py37_0
pytest 5.0.1 py37_0 conda-forge
python 3.7.3 h0371630_0
python-dateutil 2.8.0 py_0 conda-forge
python-libarchive-c 2.8 py37_10
pytorch 1.1.0 py3.7_cuda10.0.130_cudnn7.5.1_0 pytorch
pytz 2019.1 py_0 conda-forge
pyyaml 5.1.1 py37h516909a_0 conda-forge
pyzmq 18.0.2 py37hc4ba49a_1 conda-forge
readline 7.0 h7b6447c_5
requests 2.21.0 py37_0
rhash 1.3.6 h14c3975_1001 conda-forge
ruamel_yaml 0.15.46 py37h14c3975_0
s3fs 0.2.1 py_0
s3transfer 0.2.0 py37_0
scikit-learn 0.21.2 py37hd81dba3_0
scipy 1.2.1 py37h7c811a0_0
send2trash 1.5.0 py_0 conda-forge
setuptools 41.0.1 py37_0
six 1.12.0 py37_0
smart-open 1.8.4 pypi_0 pypi
snakeviz 2.0.1 pypi_0 pypi
sortedcontainers 2.1.0 py_0 conda-forge
sqlite 3.27.2 h7b6447c_0
tb-nightly 1.15.0a20190714 pypi_0 pypi
tblib 1.4.0 py_0 conda-forge
termcolor 1.1.0 pypi_0 pypi
terminado 0.8.2 py37_0 conda-forge
testpath 0.4.2 py_1001 conda-forge
tf-estimator-nightly 1.14.0.dev2019071001 pypi_0 pypi
tf-nightly 1.15.0.dev20190715 pypi_0 pypi
tf-nightly-gpu 1.15.0.dev20190715 pypi_0 pypi
thrift-cpp 0.12.0 h0a07b25_1002 conda-forge
tk 8.6.8 hbc83047_0
toolz 0.10.0 py_0 conda-forge
torchvision 0.3.0 py37_cu10.0.130_1 pytorch
tornado 6.0.3 py37h516909a_0 conda-forge
tqdm 4.32.1 py_0
traitlets 4.3.2 py37_1000 conda-forge
urllib3 1.24.1 py37_0
wcwidth 0.1.7 py_1 conda-forge
webencodings 0.5.1 py_1 conda-forge
werkzeug 0.15.4 pypi_0 pypi
wheel 0.33.1 py37_0
wrapt 1.11.2 pypi_0 pypi
xz 5.2.4 h14c3975_4
yaml 0.1.7 had09818_2
zeromq 4.3.2 he1b5a44_2 conda-forge
zict 1.0.0 pypi_0 pypi
zipp 0.5.1 py_0 conda-forge
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
</pre></details>
Additional context
Add any other context about the problem here.
@AK-ayush the plan is to replace the implementation of this function entirely with #1898 which will be much more performant.
Given the plan is to move this to a libcudf function anyway, I'm going to close this as a duplicate.