The --profile option in jupyter appears to be ignored now when it's run with the notebook command. The usage for it still lists:
ipython notebook # start the notebook
ipython notebook --profile=sympy # use the sympy profile
ipython notebook --certfile=mycert.pem # use SSL/TLS certificate
Sorry, missed that in the examples. Fixed by #310.
Out of curiosity, and to possibly clear up some confusion that I have seen on stackoverflow and such, how would one now specify startup initialization type options for Jupyter?
A specific scenario I am thinking of is with pySpark.
See the ML discussion:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/jupyter/7q02jjksvFU
but you shouldn't need a profile for that. PySpark can just be a kernel, if you really want it to be.
a kernel is basically just a way of launching a process, so it can be different envs, or different location.
some national lab have name kernel depending on which physical machine the process will run for example.
Hi. I am trying to create a profile for pyspark too. Could you please tell me how to proceed ? Thanks
There is no notion of profile in jupyter and for the notebook.
It's roughly like asking to dual boot a computer because you want to use vim and emacs,
and get 2 hard drive just to set your $EDITOR differently.
As stated in the mailing list thread, you can if you like, it would be something like
$ JUPTYER_CONFIG_DIR=~/jupyter_pyspark_foo jupyter notebook
If should auto create the needed files in ~/jupyter_pyspark_foo, but it is likely not what you want.
You most likely just want a separate kernel, or just import pySpark as a library. Still without knowing more of what you want to do it's hard to give you an answer...
I would like to use pySpark in the ipython notebook. Either by calling it
as a library or by creating a profile/kernel/ etc.
On 25 August 2015 at 11:00, Matthias Bussonnier [email protected]
wrote:
There is no notion of profile in jupyter and for the notebook.
It's roughly like asking to dual boot a computer because you want to use
vim _and_ emacs,
and get 2 hard drive just to set your $EDITOR differently.As stated in the mailing list thread, you can if you like, it would be
something like$ JUPTYER_CONFIG_DIR=~/jupyter_pyspark_foo jupyter notebook
If should auto create the needed files in ~/jupyter_pyspark_foo, but it
is likely not what you want.You most likely just want a separate kernel, or just import pySpark as a
library. Still without knowing more of what you want to do it's hard to
give you an answer...—
Reply to this email directly or view it on GitHub
https://github.com/jupyter/notebook/issues/309#issuecomment-134531793.
Ok, here is what I just did durring the last 1/2 h, for me on OS X
$ brew install apache-spark)pip install -e . after cloning https://github.com/minrk/findspark, and cd findspark)jupyter notebook)enter the following:
import findspark
import os
findspark.init()
import pyspark
sc = pyspark.SparkContext()
lines = sc.textFile(os.path.exapnduser('~/dev/ipython/setup.py'))
lines_nonempty = lines.filter( lambda x: len(x) > 0 )
lines_nonempty.count()
execute :
Yayyyyy !
(Note, installing/downloading java took 20 minutes)
After running:
import findspark
import os
findspark.init()
import pyspark
sc = pyspark.SparkContext()
I get this error:
Exception Traceback (most recent call last)
<ipython-input-1-0e2dcc62fef1> in <module>()
4
5 import pyspark
----> 6 sc = pyspark.SparkContext()
/Users/victor/Downloads/spark-1.4.1/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
108 """
109 self._callsite = first_spark_call() or CallSite(None, None, None)
--> 110 SparkContext._ensure_initialized(self, gateway=gateway)
111 try:
112 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
/Users/victor/Downloads/spark-1.4.1/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
227 with SparkContext._lock:
228 if not SparkContext._gateway:
--> 229 SparkContext._gateway = gateway or launch_gateway()
230 SparkContext._jvm = SparkContext._gateway.jvm
231
/Users/victor/Downloads/spark-1.4.1/python/pyspark/java_gateway.pyc in launch_gateway()
87 callback_socket.close()
88 if gateway_port is None:
---> 89 raise Exception("Java gateway process exited before sending the driver its port number")
90
91 # In Windows, ensure the Java child processes do not linger after Python has exited.
Exception: Java gateway process exited before sending the driver its port number
You should get this error if you get the wrong java (the 60M download instead of the 200+M download)
I actually got jdk-8u60-macosx-x64.dmg which is 238.1 MB. Maybe I should restart the machine
hum, I did not had to restart IIRC.
does the following works?
$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
Python 2.7 or 3 ?
It works now:
In [2]: sc
Out[2]: pyspark.context.SparkContext at 0x106296cd0
Thanks a lot for your help. I've spent a loooong time trying to fix this
:cake: :cocktail: :tada: !
Happy Sparking !
@vherasme What did you do to make it work in the end? Thanks!
I followed the steps @Carreau recommends above:
.....
enter the following:
import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext()
I also had these two in .bash_profile:
export SPARK_HOME="/Users/victor/Downloads/spark-1.4.1"
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
On 26 August 2015 at 20:16, Aaron Schumacher [email protected]
wrote:
@vherasme https://github.com/vherasme What did you do to make it work
in the end? Thanks!—
Reply to this email directly or view it on GitHub
https://github.com/jupyter/notebook/issues/309#issuecomment-135128577.
I hope this change of policy about profiles is mentioned (more explicitly?) in the docs. I tried to put up a server on an Amazon EC2 image but following the instructions on the ipython docs didn't work because with ipython==4.0 it no longer accepted the --profile option.
I hope this change of policy about profiles is mentioned (more explicitly?) in the docs. I tried to put up a server on an Amazon EC2 image but following the instructions on the ipython docs didn't work because with ipython==4.0 it no longer accepted the --profile option.
IPython 4.0 still have profile, you are just mistaking Notebook for IPython.
If you want different configuration for the notebook, you need to set the jupyter config dir environement variable, if you want profile for your kernel, you can set it in your kernelspec.
I tried both ipython notebook --profile=xxx and jupyter notebook
--profile=xxx and both have the same error. (While the --help options for
both have the same erroneous suggestion that --profile still works)
I think a separate tutorial for setting up jupyter remote server would
help, since I'm sure currently people would just go look at the ipython doc
and be confused like I was. At least note in the ipython doc that this is
now different for jupyter.
On Aug 31, 2015 7:33 AM, "Matthias Bussonnier" [email protected]
wrote:
I hope this change of policy about profiles is mentioned (more
explicitly?) in the docs. I tried to put up a server on an Amazon EC2 image
but following the instructions on the ipython docs didn't work because with
ipython==4.0 it no longer accepted the --profile option.IPython 4.0 still have profile, you are just mistaking Notebook for
IPython.If you want different configuration for the notebook, you need to set the
jupyter config dir environement variable, if you want profile for your
kernel, you can set it in your kernelspec.—
Reply to this email directly or view it on GitHub
https://github.com/jupyter/notebook/issues/309#issuecomment-136359708.
How did you get help to give you hints about profile ?
And again, --profile does not work with the notebook application, only with ipython/ipython kernel.
If you want profile for your kernel you need to modify your kernelspec. Use jupyter kernelspec list --debug to see where your kernelspec are.
I did ipython notebook --help and jupyter notebook --help and both gave the same thing. The examples OP posted are listed at the end of the help output.
Right I get it now that --profile no longer works with the notebook, but I'm saying the doc should be made clearer so that the in the future, people switching from lower versions of ipython shouldn't have to look far to get an answer.
For example, if I google 'set up remote server jupyter' the first result is http://ipython.org/ipython-doc/1/interactive/public_server.html, and nowhere in there does it say that --profile no longer works for ipython/jupyter 4. Indeed, one of the instructions is
"You can then start the notebook and access it later by pointing your browser to https://your.host.com:9999 with ipython notebook --profile=nbserver."
Other top results are about jupyter hub, which requires python3. I don't think I saw a single mention that the --profile option no longer works for ipython/jupyter 4 among them.
Maybe you guys wrote a doc, but google is just being dumb for the moment. Nevertheless I never find it, and I searched for a long time before finding this issue posted here.
I did ipython notebook --help and jupyter notebook --help and both gave the same thing. The examples OP posted are listed at the end of the help output.
O_o do you have both IPython 4.x and notebook 4.x ?
Right I get it now that --profile no longer works with the notebook, but I'm saying the doc should be made clearer so that the in the future, people switching from lower versions of ipython shouldn't have to look far to get an answer.
Well it's hard to bias google. For whatever reason people are still looking referencing docs for 1.0 and google put it on top. We'll try to find a solution.
I had ipython 4 initially but that kept giving errors as I said, so I
installed jupyter, but that didn't solve anything.
On Tue, Sep 1, 2015 at 3:43 AM, Matthias Bussonnier <
[email protected]> wrote:
I did ipython notebook --help and jupyter notebook --help and both gave
the same thing. The examples OP posted are listed at the end of the help
output.O_o do you have both IPython 4.x and notebook 4.x ?
Right I get it now that --profile no longer works with the notebook, but
I'm saying the doc should be made clearer so that the in the future, people
switching from lower versions of ipython shouldn't have to look far to get
an answer.Well it's hard to bias google. For whatever reason people are still
looking referencing docs for 1.0 and google put it on top. We'll try to
find a solution.—
Reply to this email directly or view it on GitHub
https://github.com/jupyter/notebook/issues/309#issuecomment-136637812.
Is there a way to avoid typing following code in each notebook:
import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext()
and just make sure whenever you launch notebook it already hooked up the spark?
It isn't too hard but it feels like a jury-rigging which i hate.
You can add it to a startup file, e.g. ~/.ipython/profile_default/startup/initspark.py
I got the same issue, and the steps from @vherasme didn't work.
Python 2.7.10
Spark 1.4.1
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

@wlsherica, I had that same issue. For me, this was being caused by a bad spark configuration.
Specifically, I had:
export PYSPARK_SUBMIT_ARGS="--master local[2]"
So I just removed that.
@Carreau amazing work. thanks so much.
Carreau commented on 25 Aug 2015
Ok, here is what I just did durring the last 1/2 h, for me on OS X
Install apache-spark ($ brew install apache-spark)
install findspark ( pip install -e . after cloning https://github.com/minrk/findspark, and cd findspark)
install java (from here)
fire a notebook (jupyter notebook)
enter the following:
import findspark
import os
findspark.init()
import pyspark
sc = pyspark.SparkContext()
lines = sc.textFile(os.path.exapnduser('~/dev/ipython/setup.py'))
lines_nonempty = lines.filter( lambda x: len(x) > 0 )
lines_nonempty.count()
execute :
221
Yayyyyy !
@Carreau thanks your answer
Thanks @Carreau for the step-by-step instructions! Stumbled upon this issue when following instructions for IPython 3.x
In case anyone want more detailed instruction and explanation, I have wrote http://flummox-engineering.blogspot.com/2016/01/how-to-configure-ipython4-for-apache-spark.html
Using the findspark setup, are you able to use jars which are added via SparkConf spark.jars?
from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext
conf = SparkConf().set("spark.jars","/usr/local/opt/spark-csv_2.10-1.3.0.jar")
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)
gets loaded when SparkContext is started:
16/02/03 11:03:47 INFO SparkContext: Added JAR /usr/local/opt/spark-csv_2.10-1.3.0.jar at http://1.2.3.4:49318/jars/spark-csv_2.10-1.3.0.jar with timestamp 1454526227905
still:
df = sqlContext.read.format('com.databricks.spark.csv')\
.options(header='true', delimiter=',', inferschema=True)\
.load(csvpath)
Py4JJavaError: An error occurred while calling o247.load.
: java.lang.ClassNotFoundException: Failed to load class for data source: com.databricks.spark.csv.
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:67)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:87)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.csv.DefaultSource
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:60)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:60)
at scala.util.Try.orElse(Try.scala:82)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scal
using the shell in spark tutorial is also a good solution to the issue.
I am having the problem that I am running spark jobs in a hadoop cluster triggered by Jupyter notebook. The problem is that each cell of code consumes the number of configured executors but they are never released. so after a number of executed cell blocks all the resources of the cluster are blocked.
Has anyone had this problem?
Most helpful comment
Ok, here is what I just did durring the last 1/2 h, for me on OS X
$ brew install apache-spark)pip install -e .after cloning https://github.com/minrk/findspark, andcd findspark)jupyter notebook)enter the following:
execute :
Yayyyyy !