Notebook: Unrecognized alias: '--profile=xxx', it will probably have no effect.

Created on 18 Aug 2015 · 34Comments · Source: jupyter/notebook

The --profile option in jupyter appears to be ignored now when it's run with the notebook command. The usage for it still lists:

Examples

ipython notebook                       # start the notebook
ipython notebook --profile=sympy       # use the sympy profile
ipython notebook --certfile=mycert.pem # use SSL/TLS certificate

Source

blitzd

Most helpful comment

Ok, here is what I just did durring the last 1/2 h, for me on OS X

Install apache-spark ($ brew install apache-spark)
install findspark ( pip install -e . after cloning https://github.com/minrk/findspark, and cd findspark)
install java (from here)
fire a notebook (jupyter notebook)

enter the following:

import findspark
import os
findspark.init()

import pyspark
sc = pyspark.SparkContext()
lines = sc.textFile(os.path.exapnduser('~/dev/ipython/setup.py'))
lines_nonempty = lines.filter( lambda x: len(x) > 0 )
lines_nonempty.count()

execute :

Yayyyyy !

Carreau on 25 Aug 2015

👍3

All 34 comments

Sorry, missed that in the examples. Fixed by #310.

minrk on 18 Aug 2015

Out of curiosity, and to possibly clear up some confusion that I have seen on stackoverflow and such, how would one now specify startup initialization type options for Jupyter?

A specific scenario I am thinking of is with pySpark.

blitzd on 19 Aug 2015

See the ML discussion:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/jupyter/7q02jjksvFU

but you shouldn't need a profile for that. PySpark can just be a kernel, if you really want it to be.
a kernel is basically just a way of launching a process, so it can be different envs, or different location.
some national lab have name kernel depending on which physical machine the process will run for example.

Carreau on 19 Aug 2015

Hi. I am trying to create a profile for pyspark too. Could you please tell me how to proceed ? Thanks

vherasme on 25 Aug 2015

There is no notion of profile in jupyter and for the notebook.

It's roughly like asking to dual boot a computer because you want to use vim and emacs,
and get 2 hard drive just to set your $EDITOR differently.

As stated in the mailing list thread, you can if you like, it would be something like

$ JUPTYER_CONFIG_DIR=~/jupyter_pyspark_foo jupyter notebook

If should auto create the needed files in ~/jupyter_pyspark_foo, but it is likely not what you want.

You most likely just want a separate kernel, or just import pySpark as a library. Still without knowing more of what you want to do it's hard to give you an answer...

Carreau on 25 Aug 2015

👍1

I would like to use pySpark in the ipython notebook. Either by calling it
as a library or by creating a profile/kernel/ etc.

On 25 August 2015 at 11:00, Matthias Bussonnier [email protected]
wrote:

There is no notion of profile in jupyter and for the notebook.

It's roughly like asking to dual boot a computer because you want to use
vim _and_ emacs,
and get 2 hard drive just to set your $EDITOR differently.

As stated in the mailing list thread, you can if you like, it would be
something like

$ JUPTYER_CONFIG_DIR=~/jupyter_pyspark_foo jupyter notebook

If should auto create the needed files in ~/jupyter_pyspark_foo, but it
is likely not what you want.

You most likely just want a separate kernel, or just import pySpark as a
library. Still without knowing more of what you want to do it's hard to
give you an answer...

—
Reply to this email directly or view it on GitHub
https://github.com/jupyter/notebook/issues/309#issuecomment-134531793.

vherasme on 25 Aug 2015

Ok, here is what I just did durring the last 1/2 h, for me on OS X

Install apache-spark ($ brew install apache-spark)
install findspark ( pip install -e . after cloning https://github.com/minrk/findspark, and cd findspark)
install java (from here)
fire a notebook (jupyter notebook)

enter the following:

import findspark
import os
findspark.init()

import pyspark
sc = pyspark.SparkContext()
lines = sc.textFile(os.path.exapnduser('~/dev/ipython/setup.py'))
lines_nonempty = lines.filter( lambda x: len(x) > 0 )
lines_nonempty.count()

execute :

Yayyyyy !

Carreau on 25 Aug 2015

👍3

(Note, installing/downloading java took 20 minutes)

Carreau on 25 Aug 2015

😄1

After running:
import findspark
import os
findspark.init()

import pyspark
sc = pyspark.SparkContext()

I get this error:

Exception                                 Traceback (most recent call last)
<ipython-input-1-0e2dcc62fef1> in <module>()
      4 
      5 import pyspark
----> 6 sc = pyspark.SparkContext()

/Users/victor/Downloads/spark-1.4.1/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    108         """
    109         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 110         SparkContext._ensure_initialized(self, gateway=gateway)
    111         try:
    112             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/Users/victor/Downloads/spark-1.4.1/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
    227         with SparkContext._lock:
    228             if not SparkContext._gateway:
--> 229                 SparkContext._gateway = gateway or launch_gateway()
    230                 SparkContext._jvm = SparkContext._gateway.jvm
    231 

/Users/victor/Downloads/spark-1.4.1/python/pyspark/java_gateway.pyc in launch_gateway()
     87                 callback_socket.close()
     88         if gateway_port is None:
---> 89             raise Exception("Java gateway process exited before sending the driver its port number")
     90 
     91         # In Windows, ensure the Java child processes do not linger after Python has exited.

Exception: Java gateway process exited before sending the driver its port number

vherasme on 26 Aug 2015

You should get this error if you get the wrong java (the 60M download instead of the 200+M download)

Carreau on 26 Aug 2015

I actually got jdk-8u60-macosx-x64.dmg which is 238.1 MB. Maybe I should restart the machine

vherasme on 26 Aug 2015

hum, I did not had to restart IIRC.

Carreau on 26 Aug 2015

does the following works?

$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

Python 2.7 or 3 ?

Carreau on 26 Aug 2015

It works now:

In [2]: sc
Out[2]: pyspark.context.SparkContext at 0x106296cd0

Thanks a lot for your help. I've spent a loooong time trying to fix this

vherasme on 26 Aug 2015

:cake: :cocktail: :tada: !

Happy Sparking !

Carreau on 26 Aug 2015

@vherasme What did you do to make it work in the end? Thanks!

ajschumacher on 26 Aug 2015

I followed the steps @Carreau recommends above:

.....

Install apache-spark ($ brew install apache-spark) _In my case I had
Spark installed already_
install findspark ( pip install . after cloning
https://github.com/minrk/findspark, and cd findspark)
install java (from here
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)
_This must be java 1.8.0_60_
fire a notebook (jupyter notebook)

enter the following:

import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext()

I also had these two in .bash_profile:

export SPARK_HOME="/Users/victor/Downloads/spark-1.4.1"
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

On 26 August 2015 at 20:16, Aaron Schumacher [email protected]
wrote:

@vherasme https://github.com/vherasme What did you do to make it work
in the end? Thanks!

—
Reply to this email directly or view it on GitHub
https://github.com/jupyter/notebook/issues/309#issuecomment-135128577.

vherasme on 27 Aug 2015

I hope this change of policy about profiles is mentioned (more explicitly?) in the docs. I tried to put up a server on an Amazon EC2 image but following the instructions on the ipython docs didn't work because with ipython==4.0 it no longer accepted the --profile option.

eulerreich on 31 Aug 2015

I hope this change of policy about profiles is mentioned (more explicitly?) in the docs. I tried to put up a server on an Amazon EC2 image but following the instructions on the ipython docs didn't work because with ipython==4.0 it no longer accepted the --profile option.

IPython 4.0 still have profile, you are just mistaking Notebook for IPython.

If you want different configuration for the notebook, you need to set the jupyter config dir environement variable, if you want profile for your kernel, you can set it in your kernelspec.

Carreau on 31 Aug 2015

I tried both ipython notebook --profile=xxx and jupyter notebook --profile=xxx and both have the same error. (While the --help options for
both have the same erroneous suggestion that --profile still works)

I think a separate tutorial for setting up jupyter remote server would
help, since I'm sure currently people would just go look at the ipython doc
and be confused like I was. At least note in the ipython doc that this is
now different for jupyter.
On Aug 31, 2015 7:33 AM, "Matthias Bussonnier" [email protected]
wrote:

I hope this change of policy about profiles is mentioned (more
explicitly?) in the docs. I tried to put up a server on an Amazon EC2 image
but following the instructions on the ipython docs didn't work because with
ipython==4.0 it no longer accepted the --profile option.

IPython 4.0 still have profile, you are just mistaking Notebook for
IPython.

If you want different configuration for the notebook, you need to set the
jupyter config dir environement variable, if you want profile for your
kernel, you can set it in your kernelspec.

—
Reply to this email directly or view it on GitHub
https://github.com/jupyter/notebook/issues/309#issuecomment-136359708.

eulerreich on 1 Sep 2015

How did you get help to give you hints about profile ?

And again, --profile does not work with the notebook application, only with ipython/ipython kernel.
If you want profile for your kernel you need to modify your kernelspec. Use jupyter kernelspec list --debug to see where your kernelspec are.

Carreau on 1 Sep 2015

I did ipython notebook --help and jupyter notebook --help and both gave the same thing. The examples OP posted are listed at the end of the help output.

Right I get it now that --profile no longer works with the notebook, but I'm saying the doc should be made clearer so that the in the future, people switching from lower versions of ipython shouldn't have to look far to get an answer.

For example, if I google 'set up remote server jupyter' the first result is http://ipython.org/ipython-doc/1/interactive/public_server.html, and nowhere in there does it say that --profile no longer works for ipython/jupyter 4. Indeed, one of the instructions is

"You can then start the notebook and access it later by pointing your browser to https://your.host.com:9999 with ipython notebook --profile=nbserver."

Other top results are about jupyter hub, which requires python3. I don't think I saw a single mention that the --profile option no longer works for ipython/jupyter 4 among them.

Maybe you guys wrote a doc, but google is just being dumb for the moment. Nevertheless I never find it, and I searched for a long time before finding this issue posted here.

eulerreich on 1 Sep 2015

I did ipython notebook --help and jupyter notebook --help and both gave the same thing. The examples OP posted are listed at the end of the help output.

O_o do you have both IPython 4.x and notebook 4.x ?

Right I get it now that --profile no longer works with the notebook, but I'm saying the doc should be made clearer so that the in the future, people switching from lower versions of ipython shouldn't have to look far to get an answer.

Well it's hard to bias google. For whatever reason people are still looking referencing docs for 1.0 and google put it on top. We'll try to find a solution.

Carreau on 1 Sep 2015

I had ipython 4 initially but that kept giving errors as I said, so I
installed jupyter, but that didn't solve anything.

On Tue, Sep 1, 2015 at 3:43 AM, Matthias Bussonnier <
[email protected]> wrote:

I did ipython notebook --help and jupyter notebook --help and both gave
the same thing. The examples OP posted are listed at the end of the help
output.

O_o do you have both IPython 4.x and notebook 4.x ?

Right I get it now that --profile no longer works with the notebook, but
I'm saying the doc should be made clearer so that the in the future, people
switching from lower versions of ipython shouldn't have to look far to get
an answer.

Well it's hard to bias google. For whatever reason people are still
looking referencing docs for 1.0 and google put it on top. We'll try to
find a solution.

—
Reply to this email directly or view it on GitHub
https://github.com/jupyter/notebook/issues/309#issuecomment-136637812.

eulerreich on 1 Sep 2015

Is there a way to avoid typing following code in each notebook:
import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext()

and just make sure whenever you launch notebook it already hooked up the spark?

It isn't too hard but it feels like a jury-rigging which i hate.

Ablomis on 8 Nov 2015

You can add it to a startup file, e.g. ~/.ipython/profile_default/startup/initspark.py

minrk on 9 Nov 2015

I got the same issue, and the steps from @vherasme didn't work.

Python 2.7.10
Spark 1.4.1
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

2015-11-21 12 09 58

wlsherica on 21 Nov 2015

@wlsherica, I had that same issue. For me, this was being caused by a bad spark configuration.

Specifically, I had:

export PYSPARK_SUBMIT_ARGS="--master local[2]"

So I just removed that.

sdlin on 5 Dec 2015

@Carreau amazing work. thanks so much.

Carreau commented on 25 Aug 2015
Ok, here is what I just did durring the last 1/2 h, for me on OS X

Install apache-spark ($ brew install apache-spark)
install findspark ( pip install -e . after cloning https://github.com/minrk/findspark, and cd findspark)
install java (from here)
fire a notebook (jupyter notebook)
enter the following:

import findspark
import os
findspark.init()

import pyspark
sc = pyspark.SparkContext()
lines = sc.textFile(os.path.exapnduser('~/dev/ipython/setup.py'))
lines_nonempty = lines.filter( lambda x: len(x) > 0 )
lines_nonempty.count()
execute :

221
Yayyyyy !

JohnTian on 4 Jan 2016

@Carreau thanks your answer

M2shad0w on 22 Jan 2016

Thanks @Carreau for the step-by-step instructions! Stumbled upon this issue when following instructions for IPython 3.x

In case anyone want more detailed instruction and explanation, I have wrote http://flummox-engineering.blogspot.com/2016/01/how-to-configure-ipython4-for-apache-spark.html

hanxue on 27 Jan 2016

Using the findspark setup, are you able to use jars which are added via SparkConf spark.jars?

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext
conf = SparkConf().set("spark.jars","/usr/local/opt/spark-csv_2.10-1.3.0.jar")
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)

gets loaded when SparkContext is started:

16/02/03 11:03:47 INFO SparkContext: Added JAR /usr/local/opt/spark-csv_2.10-1.3.0.jar at http://1.2.3.4:49318/jars/spark-csv_2.10-1.3.0.jar with timestamp 1454526227905

still:

df = sqlContext.read.format('com.databricks.spark.csv')\
.options(header='true', delimiter=',', inferschema=True)\
.load(csvpath)

Py4JJavaError: An error occurred while calling o247.load.
: java.lang.ClassNotFoundException: Failed to load class for data source: com.databricks.spark.csv.
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:67)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:87)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.csv.DefaultSource
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:60)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:60)
at scala.util.Try.orElse(Try.scala:82)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scal

fabboe on 3 Feb 2016

using the shell in spark tutorial is also a good solution to the issue.

versemonger on 15 Feb 2016

I am having the problem that I am running spark jobs in a hadoop cluster triggered by Jupyter notebook. The problem is that each cell of code consumes the number of configured executors but they are never released. so after a number of executed cell blocks all the resources of the cluster are blocked.

Has anyone had this problem?

marianobilli on 23 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Permission denied: '/root/.local'

uolter · 3Comments

Application launch erro

arbaazsama · 3Comments

Jupyter server cannot start

arilwan · 3Comments

restarting kernel create_prompt_layout, create_output ImportError: cannot import name 'create_prompt_application' from 'prompt_toolkit.shortcuts' (/home/pi/.local/lib/python3.7/site-packages/prompt_toolkit/shortcuts/__init__.py)

SmnHgr · 3Comments

hide non-notebook files

jonatanblue · 3Comments