- Blog - Silicon Cloud

3 年 ago

清, 扬

2 minutes

jupyter notebookのPythonコードを別マシンで実行したかったので、jupyterのAPIを用いて抜き出し＆保存することにした。

環境

jupyter/scipy-notebook:19db0c85c56d

Jupyterトークンを調べる

外部からjupyterを操作するためにトークンが必要なため、まずはトークンを調べる。

最初にjupyter起動する際に入力している（はず？）だが、大抵は忘れてしまっていると思うので、jupyterコンテナを探す→ログ表示→トークン確認の順で確認する。

jupyterコンテナを探す
下記例ではIDが022298a2039a、コンテナ名がjupyterとなっている。

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
022298a2039a jupyter/scipy-notebook:latest “tini -g — start-no…” 11 days ago Up 2 days 0.0.0.0:8888->8888/tcp jupyter

ログ表示する
ID or コンテナ名を指定して、ログを表示する。

# コンテナIDの場合
$ docker logs 022298a2039a
# コンテナ名の場合
$ docker logs jupyter
# docker-composeの場合
$ docker-compose logs

トークンの確認
おそらくログの最後の方にtokenの記載がある。
APIなどのエラーログが出るので、何回か試行錯誤しているとログは流れていってしまう…。

jupyter | Or copy and paste one of these URLs:
jupyter | http://022298a2039a:8888/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxx
jupyter | or http://127.0.0.1:8888/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxx

外部からJupyterAPIを叩く

Notebookの状態

APIを叩く側の実行環境

pythonでAPI操作するので、requestsパッケージが必要。requestsを入れるだけなのでDockerじゃなくてもよいけど、後でサービス化しやすいかな・・と思い、コンテナにしておく。
以下試行環境の詳細(Dockerfile)。

FROM python:3.6-buster
RUN pip install requests

ビルドする。

$ docker build -t jupyter-api .

コンテナ起動。
マウント(-v)は無くてもよいが、作成ファイルを取り出すためにとりあえずカレントをマウント。

$ docker run -it --net host -v $(pwd):/tmp jupyter-api bash

API試行

まずはjupyterのAPIを使用可能か確認。

import requests
url_api = 'http://localhost:8888/api'
response = requests.get(url_api)
print(response.status_code, response.text)  # 200 {"version": "6.0.3"}

無事、APIは叩けるようなのでNotebookを取得する。

notebook_path = '/work/test1.ipynb'
url_file = url_api + '/contents' + notebook_path

token = 'xxxxxxxxxxxxxxxxx'   # トークンを設定
headers =  {'Authorization': 'token ' + token}

response = requests.get(url_file, headers=headers)

response.textにnotebook情報がjson形式で返ってくる。
以下、print(response.text)の実行結果を表示用に整形したもの。

{
  "name": "test1.ipynb",
  "path": "work/test1.ipynb",
  "last_modified": "2020-03-07T21:08:58.897321Z",
  "created": "2020-03-07T21:08:57.594298Z",
  "content": {
    "cells": [
      {
        "cell_type": "code",
        "execution_count": 1,
        "metadata": {
            "trusted": true
        },
        "outputs": [
          {
            "data": {
              "text/plain": "array([0.83709745, 0.46685874, 0.94285637, 0.03938868, 0.79617107,\n       0.98784776, 0.27798577, 0.96118447, 0.5253161 , 0.0690074 ])"
            },
            "execution_count": 1,
            "metadata": {},
            "output_type": "execute_result"
          }
        ],
        "source": "import numpy as np\nnp.random.random(10)"
      }
    ],
    "metadata": {
      "kernelspec": {
        "display_name": "Python 3",
        "language": "python",
        "name": "python3"
      },
      "language_info": {
        "codemirror_mode": {
          "name": "ipython",
          "version": 3
        },
        "file_extension": ".py",
        "mimetype": "text/x-python",
        "name": "python",
        "nbconvert_exporter": "python",
        "pygments_lexer": "ipython3",
        "version": "3.7.4"
      }
    },
    "nbformat": 4,
    "nbformat_minor": 4
  },
  "format": "json",
  "mimetype": null,
  "size": 921,
  "writable": true,
  "type": "notebook"
}

今回、ノートブックのコード部分を取り出したいので、content.cells.cell_type = ‘code’のブロックのsourceを抜き出せばよい。
cell_typeはマークダウンブロックの場合はcell_type = ‘markdown’となる。

import json
json_src = json.loads(response.text)
src = [cell['source'] for cell in json_src['content']['cells'] if cell['cell_type'] == 'code']

srcには配列要素として、各セルのコードが格納されており、今回はソースコードのセルが1つなので、1要素の配列となる。

['import numpy as np\nnp.random.random(10)']

ファイルに出力する。一応UTF8宣言を付けて出力しておく。

with open('/tmp/test1.py', 'w') as f:
    print("# -*- coding: utf-8 -*-", file=f)
    for block in src:
        print(block, file=f)

出力ファイルを確認。ばっちり。

# -*- coding: utf-8 -*-
import numpy as np
np.random.random(10)

Notebookから無事出力できたので、後は煮るなり焼くなり実行するなりできそう。

参考資料

jupyter を docker で起動後に、token 確認するコマンド
トークンの調べ方を参考にした。

stack overflow: Interact with Jupyter Notebooks via API
構文を参考にした。

Jupyter Notebook – Contents API
データ構造はここを見た。

Jupyter Notebook Server API
APIを調べた。