Hydrogen: How to pass variables between code blocks with different kernels

Created on 18 Apr 2017  路  12Comments  路  Source: nteract/hydrogen

Is it possible to pass variables between cells with different interpreters/kernels as in ipython/jupyter?

Most helpful comment

Since I have a bit of free time, I thought I'd write a bit about hydrogen's support of cells and how it has developed over time.

To begin there are two views of how to do literate programming:

  • Programming language as outermost grammar, annotations inside comments
  • Text as outermost grammar, code inside fenced blocks

Both of these have their strengths. The programming-language-outermost approach allows running the code as a script with a stock interpreter (this is great for collaboration with people who have a different workflow that doesn't involve hydrogen). On the other hand, the text-outermost approach (e.g. as in R markdown) is good for documentation and exploratory coding.

Hydrogen supports both of these. The run cells is aimed at the first, while the rich-document approach targets the second. The syntax between them is necessarily different. The different variations %%, <codecell>, and In [] are for compatibility with existing tools, such as the "Export" functionality of Jupyter Notebook.

The rich-document functionality was only implemented recently, and it doesn't have any provisions for variable sharing. It's also not very clear how to implement that, since Hydrogen is entirely language-agnostic, and the Jupyter protocols aren't designed for variable sharing across kernels.

So the way to do variable-sharing today is to use the code-cells functionality inside a Python file, and rely on the existing magics that you used in your notebook example. For your example the equivalent file is:

#%%
%load_ext rpy2.ipython

#%%
a = !echo hello
print(a)

#%%
import numpy as np
x = np.array([2.3,-4.5,10.01])
x.mean()

#%%
%%R -i x -o m 
m = mean(x)

#%%
print(m)

#%%
a=10
b=20

#%%
%%bash -s $a $b --out c
echo "this is displayed"
echo "the last thing printed to stdout will be passed back in the variable c"
echo "a + b = $(($1+$2))"

#%%
print(c)

You would use the "Run Cell" command to run each of the fenced sections. The image by @n-riesco didn't use cell separator comments, which I agree is not a fit for everyone in terms of workflow (I believe the lack of separators forces you to select the different-language sections with your mouse and then run only the selected code).

All 12 comments

I might not be familiar with what you are asking. Do you mean connecting another "frontend" (e.g. notebook, qtconsole) to an existing kernel so that you have access to the same variables across more than one frontend?

If so, check out this section of README.md. The example focuses on connecting to a remote kernel, but you can connect to local kernels as well.

Hi @BenRussert, what I mean is the equivalent of this in jupyter noteboks.

Yes, Hydrogen can run the same kernels the Jupyter notebook does:

screenshot from 2017-04-18 22-58-39

Sorry I'm coming off as dumb. Yes, I see that it runs different kernels in different cells. But being able to pass variables between code blocks with different kernels goes beyond that. Is there some documentation on this?

In the example above, how does hydrogen know that in the space between echo "a + b = $(($1 + $2))" and print(c) we've transitioned from the bash kernel to python? If just an empty line, that seems pretty inconvenient and dangerous.

What about when using fenced code blocks as in R markdown? What would be the syntax in that case?

So, I tried running something like what you have above. But when I step through it with shift-enter, it always balks when I land on a line starting a code block with a magic command like %%R. I get

"UsageError: %%R is a cell magic, but the cell body is empty. Did you mean the line magic %R (single %)?"

rmagic_empty

However, if I select the entire block, it will run properly.

So this confirms exactly what I suspected: The current hydrogen syntax does not currently include a way to implicitly identify code cells. I think this is a serious shortcoming.

I think this syntax is very important.

While I love Python, I think the R-markdown system is really fabulous. I think it would be great if hydrogen would expand on this so that executable code blocks were in fences with metadata to show which language, and which variables were passing in and out.

```(language="python", push=['a','b'])
a=10

b=20
```

```(language="R", pull=['a','b'], push=['a','b','m'])
m = 1/2*(a + b)
```

```(language="Python", pull=['m'])
print("m = " + str(m))
```

It would be nice if there were shortcuts for the code block metadata.

But at the very least, need some syntax to show the start of a code block:

#%%Python -o a b
a=10

b=20

#%%R -i a b -o m
m = 1/2*(a + b)

#%%Python -i m
print("m = " + str(m))

Did you this this in the read me?

Using that method of separating code blocks, how do you specify

  1. Which kernel to use for the code block and?
  2. How to pass variables in and out?

And, this gives yet another (making 5) way to delineate code blocks. This method does allow to specify kernel. But still no way to share variables.

I love what hydrogen is trying to do. But syntax-wise, it is to an extent reinventing the wheel. R-markdown already specifies a way to perform literate programming in a file that can be either run in notebook format or as a script. That sytax is widely used, widely understood and matches the markdown formatting that is virtually ubiquitously use by coders.

However, R-markdown only handles R. What the world needs is a version that can pipeline multiple kernels. I'd love to see hydrogen become that. But right now, there are too many ways to do the same thing, and none of them do this thing completely.

Oh, and just found another supporting argument. In the section on run cells the docs say " just replace # with the comment symbol for your desired language." Well, there are many of these //,%,# and maybe more. With code blocks in fences, there is no confusion at all.

Since I have a bit of free time, I thought I'd write a bit about hydrogen's support of cells and how it has developed over time.

To begin there are two views of how to do literate programming:

  • Programming language as outermost grammar, annotations inside comments
  • Text as outermost grammar, code inside fenced blocks

Both of these have their strengths. The programming-language-outermost approach allows running the code as a script with a stock interpreter (this is great for collaboration with people who have a different workflow that doesn't involve hydrogen). On the other hand, the text-outermost approach (e.g. as in R markdown) is good for documentation and exploratory coding.

Hydrogen supports both of these. The run cells is aimed at the first, while the rich-document approach targets the second. The syntax between them is necessarily different. The different variations %%, <codecell>, and In [] are for compatibility with existing tools, such as the "Export" functionality of Jupyter Notebook.

The rich-document functionality was only implemented recently, and it doesn't have any provisions for variable sharing. It's also not very clear how to implement that, since Hydrogen is entirely language-agnostic, and the Jupyter protocols aren't designed for variable sharing across kernels.

So the way to do variable-sharing today is to use the code-cells functionality inside a Python file, and rely on the existing magics that you used in your notebook example. For your example the equivalent file is:

#%%
%load_ext rpy2.ipython

#%%
a = !echo hello
print(a)

#%%
import numpy as np
x = np.array([2.3,-4.5,10.01])
x.mean()

#%%
%%R -i x -o m 
m = mean(x)

#%%
print(m)

#%%
a=10
b=20

#%%
%%bash -s $a $b --out c
echo "this is displayed"
echo "the last thing printed to stdout will be passed back in the variable c"
echo "a + b = $(($1+$2))"

#%%
print(c)

You would use the "Run Cell" command to run each of the fenced sections. The image by @n-riesco didn't use cell separator comments, which I agree is not a fit for everyone in terms of workflow (I believe the lack of separators forces you to select the different-language sections with your mouse and then run only the selected code).

On 19/04/17 01:15, Ariel Balter wrote:

Sorry I'm coming off as dumb. Yes, I see that it runs different kernels in different cells. [...]

Sorry, it wasn't my intention. In fact, the example you link is key to understand your question.

That example only runs the python kernel. All those magic % are run by the kernel python and under the hood import the rpy2 module to run R requests.

As fas as I know, the Jupyter messaging protocol doesn't provide any means for inter-kernel communication. And considering that the current notebook forces by design one kernel per notebook, this is unlikely to change any time soon.

Also note that sharing results between kernels isn't straight forward (different languages have different types; e.g. python has int and float numeric types, but javascript only has a number type).

On 19/04/17 04:20, Ariel Balter wrote:

So, I tried running something like what you have above. But when I step through it with shift-enter, it always balks when I land on a line starting a code block with a magic command like |%%R|. I get

|"UsageError: %%R is a cell magic, but the cell body is empty. Did you mean the line magic %R (single %)?" |

rmagic_empty https://cloud.githubusercontent.com/assets/5349876/25161836/034edc56-2473-11e7-80e5-0e21d130b5bc.png

However, if I select the entire block, it will run properly.

So this confirms exactly what I suspected: The current hydrogen syntax does not currently include a way to implicitly identify code cells. I think this is a serious shortcoming.

Although I didn't define any cells in the screencast I posted, Hydrogen does provide syntax for marking code cells.

See the documentation linked by @BenRussert https://github.com/nteract/hydrogen/issues/723#issuecomment-295059507

And see @nikitakit 's discussion for an example with code cells: https://github.com/nteract/hydrogen/issues/723#issuecomment-295088962

I think this syntax is very important.

While I love Python, I think the R-markdown system is really fabulous. I think it would be great if hydrogen would expand on this so that executable code blocks were in fences with metadata to show which language, and which variables were passing in and out.

I agree with your opinion on markdown, but as I said in my previous email, different languages have different types, and we would need to define an exchange format for sharing variables (json?).

On 19/04/17 05:00, Ariel Balter wrote:

I love what hydrogen is trying to do. But syntax-wise, it is to an extent reinventing the wheel. R-markdown already specifies a way to perform literate programming in a file that can be either run in notebook format or as a script. It is widely used, widely understood and matches the markdown formatting that is virtually ubiquitously use by coders.

I think after you read @nikitakit 's comment https://github.com/nteract/hydrogen/issues/723#issuecomment-295088962
you'll see how unfair the comment above is.

I want add to what @nikitakit said that in my opinion one of the main contributions Hydrogen has made to the Jupyter framework is the convention to mark code cells in cross-language manner (i.e. without any magics %).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

neurotronix picture neurotronix  路  4Comments

lgeiger picture lgeiger  路  3Comments

vader333 picture vader333  路  3Comments

wadethestealth picture wadethestealth  路  3Comments

nils-werner picture nils-werner  路  3Comments