Published Date : 2021年8月18日3:51

016 Pythonで遊ぶ (シャー256 6)
016 Playing with Python (sha256 6)


This blog has an English translation


ニコニコ動画にアップした動画のまとめ記事です。

This is a summary blog post about a video I uploaded to NicoNico.

細かい部分は動画を参考にしてください。

Please refer to the video for details.


目次

Table of Contents




① 動画の説明
① Video Description



前回の続きです。

Continued from last time.

では前回の問題点である大きすぎるデータと画像等のバイナリデータへの対処を施し、sha256のスクリプトを完成させましょう。

Let's take care of the previous problem of oversized data and binary data such as images and complete the sha256 script.

まずは今まで理解がしやすいといった理由で二進数表記を使用してきましたが、今回はバイト列を扱うので16進数表記を使っていきます。

We've been using binary notation for reasons that are easy to understand, but this time we'll deal with bytes, so we will use the hexadecimal.

なので今回は16進数についてPythonを使い簡単に理解してから、sha256のスクリプトの続きを行っていきましょう。

So, this time, We'll use python to quickly understand hexadecimal and then continue with the script for sha256.

16進数について詳しいことはwikipedia等を参照してください。

See wikipedia and other sites for more information on hexadecimal.

In [1]: import codecs
   ...: byte_str = codecs.decode('7e5139722f481aaed1e654e468e0b8ce9756fab5fdf04401f068d7f93642495e', 'hex')

In [2]: type(byte_str)
Out[2]: bytes

In [3]: byte_str
Out[3]: b'~Q9r/H\x1a\xae\xd1\xe6T\xe4h\xe0\xb8\xce\x97V\xfa\xb5\xfd\xf0D\x01\xf0h\xd7\xf96BI^'

In [4]: byte_str.hex()
Out[4]: '7e5139722f481aaed1e654e468e0b8ce9756fab5fdf04401f068d7f93642495e'

In [5]: byte_array = bytearray(byte_str)

In [6]: byte_array
Out[6]: bytearray(b'~Q9r/H\x1a\xae\xd1\xe6T\xe4h\xe0\xb8\xce\x97V\xfa\xb5\xfd\xf0D\x01\xf0h\xd7\xf96BI^')

In [7]: byte_array[0]
Out[7]: 126

In [8]: hex(byte_array[0])
Out[8]: '0x7e'

私達が普段使っている十進数は10で数字の桁数が上がります。

The decimal number we usually use is 10, which increases the number of digits.

In [9]: print("base 10")
   ...: for i in range(11):
   ...:     print(i, "->", i)
   ...:
base 10
0 -> 0
1 -> 1
2 -> 2
3 -> 3
4 -> 4
5 -> 5
6 -> 6
7 -> 7
8 -> 8
9 -> 9
10 -> 10

つまり言い換えると、各列は10の累乗を表しているとも言えます。

In other words, each column represents a power of 10.

In [10]: for i in range(10):
    ...:     print(f"10 to the power of {i} is {10**i}.")
    ...:
10 to the power of 0 is 1.
10 to the power of 1 is 10.
10 to the power of 2 is 100.
10 to the power of 3 is 1000.
10 to the power of 4 is 10000.
10 to the power of 5 is 100000.
10 to the power of 6 is 1000000.
10 to the power of 7 is 10000000.
10 to the power of 8 is 100000000.
10 to the power of 9 is 1000000000.

同様に二進数の各列は2の累乗を表し、16進数の各列は16の累乗を表します。

Similarly, each binary column represents a power of 2, and each hexadecimal column represents a power of 16.

In [11]: print("base 2")
    ...: for i in range(11):
    ...:     print(i, "->", bin(i))
    ...:
base 2
0 -> 0b0
1 -> 0b1
2 -> 0b10
3 -> 0b11
4 -> 0b100
5 -> 0b101
6 -> 0b110
7 -> 0b111
8 -> 0b1000
9 -> 0b1001
10 -> 0b1010

In [12]: for i in range(10):
    ...:     print(f"2 to the power of {i} is {2**i}. ({bin(2**i)})")
    ...:
2 to the power of 0 is 1. (0b1)
2 to the power of 1 is 2. (0b10)
2 to the power of 2 is 4. (0b100)
2 to the power of 3 is 8. (0b1000)
2 to the power of 4 is 16. (0b10000)
2 to the power of 5 is 32. (0b100000)
2 to the power of 6 is 64. (0b1000000)
2 to the power of 7 is 128. (0b10000000)
2 to the power of 8 is 256. (0b100000000)
2 to the power of 9 is 512. (0b1000000000)

つまり、16進数は0から9までの10個の数字(0から9までの数は今までと意味は一緒です)と、AからFまでの6つのアルファベット(Aが10になりFが15になります)を使って16桁の数値を表現しています。

That means there are 16 possible digits used to represent numbers. 0 to 9 those values still represent the same value you're used to. the remaining six digits are represented by A to F, which values of 10 to 15.

In [13]: print("base 16")
    ...: for i in range(17):
    ...:     print(i, "->", hex(i))
    ...:
base 16
0 -> 0x0
1 -> 0x1
2 -> 0x2
3 -> 0x3
4 -> 0x4
5 -> 0x5
6 -> 0x6
7 -> 0x7
8 -> 0x8
9 -> 0x9
10 -> 0xa
11 -> 0xb
12 -> 0xc
13 -> 0xd
14 -> 0xe
15 -> 0xf
16 -> 0x10

In [14]: for i in range(10):
    ...:     print(f"16 to the power of {i} is {16**i}. ({hex(16**i)})")
    ...:
16 to the power of 0 is 1. (0x1)
16 to the power of 1 is 16. (0x10)
16 to the power of 2 is 256. (0x100)
16 to the power of 3 is 4096. (0x1000)
16 to the power of 4 is 65536. (0x10000)
16 to the power of 5 is 1048576. (0x100000)
16 to the power of 6 is 16777216. (0x1000000)
16 to the power of 7 is 268435456. (0x10000000)
16 to the power of 8 is 4294967296. (0x100000000)
16 to the power of 9 is 68719476736. (0x1000000000)

二進数でビットの9つ目が1で下位のビットが全て0の場合は2の8乗で256、ビットの8番目から下位ビットの全てが1の状態なら255です。これを16進数で表すと100とFFとなります。

In binary, if the 9th bit is 1 and the lower bits are all 0, it is 2 to the power of 8 which is 256. If the 8th to lower bits are all 1, it is 255. This is expressed in hex as 100 and FF.

In [15]: bin(256)
Out[15]: '0b100000000'

In [16]: bin(255)
Out[16]: '0b11111111'

In [17]: hex(256)
Out[17]: '0x100'

In [18]: hex(255)
Out[18]: '0xff'

そして8ビット長は1バイト長として表現されます。

And the 8-bit length is then represented as the one-byte length.

print("eight-bit length ->", format(0, '08b'), "to",  format((2**8)-1, '08b'))
eight-bit length -> 00000000 to 11111111
print("one-byte length ->", format(0, '02x'), "to",  format((2**8)-1, '02x'))
one-byte length -> 00 to ff

つまり16進数表現なら短い表記で見やすくなります。

In other words, the hex representation is easy to see with short notation.

ちなみに2の0乗や16の0乗等、ある数に0乗すると1になる理由は以下の通り計算すればその理屈が分かります。

By the way, the reason why a number raised to the power of 0 is 1 (such as 2 raised to the 0 power or 16 raised to the 0 power) can be explained by the following calculation.

In [21]: for i in range(10):
    ...:     print(f"10 to the power of {i} is {10**i}.")
    ...:
10 to the power of 0 is 1.
10 to the power of 1 is 10.
10 to the power of 2 is 100.
10 to the power of 3 is 1000.
10 to the power of 4 is 10000.
10 to the power of 5 is 100000.
10 to the power of 6 is 1000000.
10 to the power of 7 is 10000000.
10 to the power of 8 is 100000000.
10 to the power of 9 is 1000000000.

In [22]: for i in range(10, 0, -1):
    ...:     print(f"{10**i} divided by 10 equals {10**(i-1)}. (10 to the {i}-1 power is 10 to the {i-1} power).")
    ...:
10000000000 divided by 10 equals 1000000000. (10 to the 10-1 power is 10 to the 9 power).
1000000000 divided by 10 equals 100000000. (10 to the 9-1 power is 10 to the 8 power).
100000000 divided by 10 equals 10000000. (10 to the 8-1 power is 10 to the 7 power).
10000000 divided by 10 equals 1000000. (10 to the 7-1 power is 10 to the 6 power).
1000000 divided by 10 equals 100000. (10 to the 6-1 power is 10 to the 5 power).
100000 divided by 10 equals 10000. (10 to the 5-1 power is 10 to the 4 power).
10000 divided by 10 equals 1000. (10 to the 4-1 power is 10 to the 3 power).
1000 divided by 10 equals 100. (10 to the 3-1 power is 10 to the 2 power).
100 divided by 10 equals 10. (10 to the 2-1 power is 10 to the 1 power).
10 divided by 10 equals 1. (10 to the 1-1 power is 10 to the 0 power).

In [23]: for i in range(10):
    ...:     print(f"16 to the power of {i} is {16**i}.")
    ...:
    ...:
16 to the power of 0 is 1.
16 to the power of 1 is 16.
16 to the power of 2 is 256.
16 to the power of 3 is 4096.
16 to the power of 4 is 65536.
16 to the power of 5 is 1048576.
16 to the power of 6 is 16777216.
16 to the power of 7 is 268435456.
16 to the power of 8 is 4294967296.
16 to the power of 9 is 68719476736.

In [24]: for i in range(10, 0, -1):
    ...:     print(f"{16**i} divided by 16 equals {16**(i-1)}. (16 to the {i}-1 power is 16 to the {i-1} power).")
    ...:
    ...:
1099511627776 divided by 16 equals 68719476736. (16 to the 10-1 power is 16 to the 9 power).
68719476736 divided by 16 equals 4294967296. (16 to the 9-1 power is 16 to the 8 power).
4294967296 divided by 16 equals 268435456. (16 to the 8-1 power is 16 to the 7 power).
268435456 divided by 16 equals 16777216. (16 to the 7-1 power is 16 to the 6 power).
16777216 divided by 16 equals 1048576. (16 to the 6-1 power is 16 to the 5 power).
1048576 divided by 16 equals 65536. (16 to the 5-1 power is 16 to the 4 power).
65536 divided by 16 equals 4096. (16 to the 4-1 power is 16 to the 3 power).
4096 divided by 16 equals 256. (16 to the 3-1 power is 16 to the 2 power).
256 divided by 16 equals 16. (16 to the 2-1 power is 16 to the 1 power).
16 divided by 16 equals 1. (16 to the 1-1 power is 16 to the 0 power).

では前回の動画で使用したhello worldという文字列をバイト列でメッセージブロックにするとどうなるか見てみましょう。

Now, let's see what happens when the string [hello world] used in the previous video is turned into a message block in bytes.

後でアペンドメソッドを使って末尾に1ビットを加えたりできるようにbytearrayメソッドを使ってバイト列の配列に変換します。

Use the bytearray method to convert it to an array of bytes so that you can later add one bit to the end using the append method.

ちなみにこの0x80は整数で表すと128(2**7)、二進数なら0b10000000で、これでインプットメッセージの末尾に1ビット加えた状態になります。

Note that this 0x80 is 128 (2**7) in integer and 0b10000000 in binary, this adds one bit to the end of the input message.

これで気が付いたと思いますが、二進数の時は1ビットの後のゼロの数を気にしていましたが、16進数の場合は1バイトずつ作成すればいいので、それらを気にする必要がないというメリットもあります。

As you may have noticed, in binary, we were concerned about the number of zeros after 1 bit, but in hex, we only have to create 1 byte at a time, so we don't have to worry about them. This is also the advantage of handling hexadecimal.

ワイル文の判定で使われている[((len(input_) * 8) + 64) % 512]の部分が大きい入力データへの解決策です。

The [((len(input_) * 8) + 64) % 512] part of the While statement is the solution to large input data.

これなら、最後のメッセージブロックの末尾に8バイト(64ビット)分のデータのビット数を残したまま、512の倍数毎にインプットデータを複数のメッセージブロックに分けられます。

This divides the input data into multiple message blocks in multiples of 512, leaving 8 bytes (64 bits) of data in bits at the end of the last message block.

これを確かめる為に512ビットを越えるデータをインプットデータに置き換えてみましょう。

To verify this, let's replace more than 512 bits of data with input data.

このように約10メガバイトのインプットデータを使って確かめてみましょう。

Let's check it using about 10 mega bytes of input data like this.

このインプットデータのビット数(110000 * 8 = 880000ビット)を8バイト長(64ビット長)でビッグエンディアンでデータの末尾に加えます。

The number of bits (110000 * 8 = 880000 bits) of this input data is added to the end of the data in big endian with a length of 8 bytes (64 bits).

ではこの前処理が終わったインプットデータを64バイト毎に配列に格納していきましょう。

Now, let's store this preprocessed input data in an array every 64 bytes.

For文のrangeの三つの引数の内、三つ目の引数はステップ数で、ここに指定した数毎にループをします。つまり今回は64バイト毎にループしていきます。

The third of the three arguments in the range function of the For statement is the number of steps, and the loop is performed for each number specified here. So this time we loop every 64 bytes.

このように各要素が前回作成した64バイトのメッセージブロックになっている配列ができあがりました。

As you can see, we have an array where each element is a 64 byte message block, as we did in the previous video.

後はこのメッセージブロック一つ一つに対して前回の時と同様にメッセージスケジュールに変換します。

Each message block is then converted to a message schedule as in the previous video.

メッセージスケジュールを作成する関数を作成しましょう。

Let's create a function to create a message schedule.

バイトオブジェクトを整数に変換する必要があります。

The byte object must be converted to an integer.

では作成した関数を試してみましょう。

Now let's try out the function we created.

では一気にsha256のスクリプトを完成させましょう。

Let's go ahead and complete the sha 256 script.

ラウンド定数をハードコードします。

Hard-code the round constants.

初期ハッシュ値をハードコードします。

Hard-code the initial hash values.

圧縮ループ内で使う関数を作成していきましょう。

Let's create the functions for use in the compression loop.

因みに前回の動画でChoose関数内で使用したNOT演算用の関数は使用しません。

Note that we don't use the NOT operations function we used in Choose function in the previous video.

何故なら、あの関数はディスプレイ表示を修正するだけの関数です。NOT演算(~)は正常に働きます。

Because it only modifies the display display. The NOT operation (~) works correctly.

表示だけを修正するならビット長を指定してマスクをかけるだけです。

If you only want to modify the display, just specify the bit length and mask it.

では最初のメッセージスケジュールを使ってループ一回目のハッシュ値を求めてみましょう。

Let's use the first message schedule to generate the hash value of the first loop.

このハッシュ値が次の圧縮ループ内で使われます。

This hash value is used in the next compression loop.

そして全てのメッセージスケジュールを使ってこれを繰り返し、最終的なハッシュ値を導く訳です。

We then iterate through all the message schedules to derive the final hash value.

いつもどおりにhashlibモジュールを使って自作のsha256が正しく機能しているかどうかを検証してみましょう。

Use the hashlib module as usual to verify that your own sha256 is working correctly.

では全ての関数や工程をまとめてみましょう。

Let's summarize all the functions and processes.

長いので割愛。詳しくは動画を見てください。
I omitted the contents of the script because it was too long. See the video for more details.

def create_message_blocks(data):
    .....................................

def create_message_schedule(message_block):
    .....................................

def compression_loop(blocks, round_constants):
    .....................................

def cat_final_hash(H):
    .....................................

def right_rotate(bits, shift_num, size=32):
    .....................................

def sigma0(num):
    .....................................

def sigma1(num):
    .....................................

def SIGMA1(num):
    .....................................

def Ch(x, y, z):
    .....................................

def SIGMA0(num):
    .....................................

def Maj(x, y, z):
    .....................................


round_constants = [
    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
]

print('test')
import hashlib
message = "hello world" * 1000000
blocks = create_message_blocks(message)
final_hash_values = compression_loop(blocks, round_constants)

final_hash_values
hashlib.sha256(message.encode('utf-8')).digest()
final_hash_values.hex()
hashlib.sha256(message.encode('utf-8')).hexdigest()

hashlib.sha256(message.encode('utf-8')).digest() == final_hash_values
hashlib.sha256(message.encode('utf-8')).hexdigest() == final_hash_values.hex()

バイナリデータにも対応させる為に、isinstanceメソッドを使用して文字列の場合とバイトオブジェクトの場合で処理を分けます。

To accommodate binary data, use the isinstance method to separate processing for string and byte objects.

では画像ファイルのハッシュ値を求めてみましょう。

Let's get the hash value of the image file.

ご覧の通り今回作成した自作のsha256のPythonスクリプトはhashlibモジュールと比べるとかなり低速です。

As you can see, my own sha256 Python script is quite slow compared to the hashlib module.

ですが、sha256が実際にどのように動作しているかをこれでかなり深く理解できたのではないでしょうか。

However, through this scripting you may now have a much deeper understanding of how sha256 actually works.



以上です。お疲れ様です。

That's all. Thank you for your hard work.