マンガデータの内訳を見る

マンガデータの内訳を見る#

準備#

Import#

変数#

関数#

Show code cell content Hide code cell content

def create_mosaicplot(
    df: pd.DataFrame,
    x: str,
    y: str,
    color: str,
    width: str,
    text: str,
    color_discrete_sequence: List[str] = OKABE_ITO,
) -> go.Figure:
    """
    指定されたDataFrameを元にモザイクプロットを作成する関数

    Parameters
    ----------
    df : pd.DataFrame
        プロットに使用するデータが含まれるDataFrame
    x : str
        x軸に表示するデータのカラム名
    y : str
        y軸に表示するデータのカラム名
    color : str
        グループ分けの基準となるデータのカラム名
    width : str
        各バーの幅を表すデータのカラム名
    text : str
        各バーに表示するテキストのデータのカラム名
    color_discrete_sequence : List[str], optional
        使用する色のリスト デフォルトはOKABE_ITOのカラーパレット

    Returns
    -------
    go.Figure
        作成されたモザイクプロットのFigureオブジェクト
    """

    # 空のFigureオブジェクトを作成
    fig = go.Figure()

    # color列に登場するユニークな要素に対し、色をマッピング
    unique_keys = df[color].unique()
    color_map = {
        name: color for name, color in zip(unique_keys, color_discrete_sequence)
    }

    # color列のユニークな要素ごとにDataFrameをフィルタリング
    for i, name in enumerate(unique_keys):
        df_tmp = df[df[color] == name].reset_index(drop=True)
        # 幅をwidth列から抽出
        widths = df_tmp[width]

        # バーの位置を計算し、プロットに追加
        # 幅が変わるようxの値を調整
        fig.add_trace(
            go.Bar(
                name=name,
                x=df_tmp[width].cumsum() - widths,
                y=df_tmp[y],
                text=df_tmp[text],
                width=widths,
                offset=0,
                marker_color=color_map[name],
            )
        )

        # 最初の要素を用いて、X軸ラベルの設定値を作成
        if i == 0:
            # 各「棒」の中央に配置されるように座標を計算
            tickvals = df_tmp[width].cumsum() - df_tmp[width] / 2
            ticktext = df_tmp[x].unique()
            # x軸の表示範囲を決定するために利用
            x_max = df_tmp[width].sum()

    # x軸の目盛りの位置、テキスト、表示範囲を設定
    # 「棒」の太さの合計値を1としたとき、左右に0.1ずつ余白が残るように調整
    fig.update_xaxes(
        tickvals=tickvals, ticktext=ticktext, title=x, range=[-x_max * 0.1, x_max * 1.1]
    )

    # y軸のタイトルを設定
    fig.update_yaxes(title=y)

    # プロットのレイアウトを設定、凡例タイトルも指定
    fig.update_layout(barmode="stack", legend_title=color)

    return fig

可視化例#

円グラフ#

Show code cell content Hide code cell content

# 可視化対象のDataFrameを確認
df_pie.head()

	連載化	マンガ作者数
0	済	355
1	未	960

Show code cell content Hide code cell content

# マンガ作者名が遠藤達哉と一致するレコードを抽出
df_crt_oneshot[df_crt_oneshot["crtname"] == "遠藤達哉"].T

	1214
crtid	CCRT03052
crtname	遠藤達哉
oneshot_first_date	2000-12-04
oneshot_ccid	C88617
oneshot_ccname	月華美人
oneshot_mcname	週刊少年ジャンプ
series_first_date	NaN
series_ccid	NaN
series_ccname	NaN
series_mcname	NaN

Show code cell content Hide code cell content

# マンガ作者名が山本崇一朗と一致するレコードを抽出
df_crt_oneshot[df_crt_oneshot["crtname"] == "山本崇一朗"].T

	809
crtid	CCRT01585
crtname	山本崇一朗
oneshot_first_date	2014-06-11
oneshot_ccid	C92613
oneshot_ccname	からかい上手の
oneshot_mcname	週刊少年サンデー
series_first_date	NaN
series_ccid	NaN
series_ccname	NaN
series_mcname	NaN

Show code cell content Hide code cell content

# マンガ作者名が三浦糀と一致するレコードを抽出
df_crt_oneshot[df_crt_oneshot["crtname"].str.contains("三浦糀")].T

	540
crtid	CCRT00629
crtname	三浦糀
oneshot_first_date	2017-06-07
oneshot_ccid	C115558
oneshot_ccname	先生、好きです。
oneshot_mcname	週刊少年マガジン
series_first_date	NaN
series_ccid	NaN
series_ccname	NaN
series_mcname	NaN

積上げ棒グラフ#

Show code cell content Hide code cell content

# 可視化対象のDataFrameを確認
df_sbar.head(10)

	グループ名	8話目までの合計各話数	count	掲載形態	8話目までの平均話数	text
0	第1群（合計8-16話）	569	578	4色カラー	0.984429	約0.98話
1	第1群（合計8-16話）	4055	578	モノクロ	7.015571	約7.0話
2	第2群（合計17-31話）	742	594	4色カラー	1.249158	約1.2話
3	第2群（合計17-31話）	4010	594	モノクロ	6.750842	約6.8話
4	第3群（合計32-81話）	814	617	4色カラー	1.319287	約1.3話
5	第3群（合計32-81話）	4122	617	モノクロ	6.680713	約6.7話
6	第4群（合計82-1968話）	830	606	4色カラー	1.369637	約1.4話
7	第4群（合計82-1968話）	4018	606	モノクロ	6.630363	約6.6話

Show code cell content Hide code cell content

# 可視化対象のDataFrameを確認
df_sbar2.head()

	マンガ雑誌名	グループ名	8話目までの合計各話数	count	掲載形態	8話目までの平均話数	text
0	週刊少年サンデー	第1群（合計8-16話）	57	84	4色カラー	0.678571	約0.68話
1	週刊少年サンデー	第1群（合計8-16話）	615	84	モノクロ	7.321429	約7.3話
2	週刊少年サンデー	第2群（合計17-31話）	98	100	4色カラー	0.980000	約0.98話
3	週刊少年サンデー	第2群（合計17-31話）	702	100	モノクロ	7.020000	約7.0話
4	週刊少年サンデー	第3群（合計32-81話）	241	177	4色カラー	1.361582	約1.4話

モザイクプロット#

Show code cell content Hide code cell content

# 可視化対象のDataFrameを確認
df_mos.head()

	グループ名	8話目までの合計各話数	count	掲載形態	8話目までの平均話数	text
0	第1群（合計8-16話）	569	578	4色カラー	0.984429	約0.98話
1	第1群（合計8-16話）	4055	578	モノクロ	7.015571	約7.0話
2	第2群（合計17-31話）	742	594	4色カラー	1.249158	約1.2話
3	第2群（合計17-31話）	4010	594	モノクロ	6.750842	約6.8話
4	第3群（合計32-81話）	814	617	4色カラー	1.319287	約1.3話

Show code cell content Hide code cell content

# 可視化対象のDataFrameを確認
df_mos2.head()

	マンガ雑誌名	グループ名	8話目までの合計各話数	count	掲載形態	8話目までの平均話数	text
8	週刊少年ジャンプ	第1群（合計8-16話）	291	225	4色カラー	1.293333	約1.3話
9	週刊少年ジャンプ	第1群（合計8-16話）	1509	225	モノクロ	6.706667	約6.7話
10	週刊少年ジャンプ	第2群（合計17-31話）	296	166	4色カラー	1.783133	約1.8話
11	週刊少年ジャンプ	第2群（合計17-31話）	1032	166	モノクロ	6.216867	約6.2話
12	週刊少年ジャンプ	第3群（合計32-81話）	147	91	4色カラー	1.615385	約1.6話

積上げ密度プロット[1]#

Show code cell content Hide code cell content

# 可視化対象のDataFrameを確認
df_area.head()

	マンガ作品名	ccid	掲載年	各話数	count	mean	掲載形態
0	ドカベン	C95127	1972	22.0	34.0	0.647059	4色カラー
1	ドカベン	C95127	1972	12.0	34.0	0.352941	モノクロ
2	ドカベン	C95127	1973	21.0	49.0	0.428571	4色カラー
3	ドカベン	C95127	1973	28.0	49.0	0.571429	モノクロ
4	ドカベン	C95127	1974	19.0	49.0	0.387755	4色カラー

ツリーマップ#

Show code cell content Hide code cell content

# 可視化対象のDataFrameを確認
df_tree.head()

	mcname	ccname	cename	four_colored
0	週刊少年マガジン	ダイヤのA	第238話/この世代	True
1	週刊少年マガジン	我間乱～GAMARAN～	第94話	True
2	週刊少年マガジン	ファイ・ブレイン最期のパズル	第1話クラシック同好会	True
3	週刊少年マガジン	かってに改蔵	特別番外編「損して得とれない」	True
4	週刊少年マガジン	FAIRY TAIL	第231話終わらせる者	True

パラレルセットグラフ#

Show code cell content Hide code cell content

# 可視化対象のDataFrameを確認
df_par.head()

	雑誌名	年代	発売曜日
0	週刊少年サンデー	1970	月
1	週刊少年サンデー	1970	月
2	週刊少年ジャンプ	1970	月
3	週刊少年ジャンプ	1970	月
4	週刊少年ジャンプ	1970	月

マンガデータの内訳を見る

Contents

マンガデータの内訳を見る#

準備#

Import#

変数#

関数#

可視化例#

円グラフ#

積上げ棒グラフ#

モザイクプロット#

積上げ密度プロット[1]#

ツリーマップ#

パラレルセットグラフ#