p5.jsとnumjsで自然言語処理 - 類似単語のランキング
Natural Language Processing with p5.js and numjs - Ranking of similar words

A natural language processing series with p5.js and numjs.


In this article, I'll create a ranking of similar words by cosine similarity using p5.js and numjs.

もちろん、P5はJavascriptで作られているので、P5内でJavascriptそのものが使えます。 なので、基本P5で書いていきますが、P5では難しい文字列処理等はJavascriptでそのまま書いていきます。

Of course, p5.js is written in Javascript, so you can use it directly in p5.js. I usually write code in p5.js, but there many things that are difficult to process in p5.js (String operations, etc.), so if I encounter such a situation, I write code directly in Javascript.


Here's a quick overview.

まずこの記事の趣旨は、 NumjsというNumpy(Python)のJavascriptヴァージョンを使用して、 ML.js等のライブラリを使わずにできるだけ1から自然言語処理の仕組みを理解することです。

First of all, the purpose of this blog post is use Numjs that a Javascript version of Python's Numpy, to understand how natural language processing works, without using libraries like ML.js.


If you use a library for machine learning, things go easy, but I think if you know how the details of how machine learning works, you can apply it to many things.


This time, I'll use the cosine similarity I briefly described in this blog post to create something that can display the ranking of similar words.


I'll show you a simple overall flow for creating similar word rankings using the following diagram.

Responsive image

Ranking of similar words


Before we create a similar ranking of words, let's briefly review the cosine similarity.


The following sentence is exactly what I wrote in this previous my blog post.


The formula for calculating cosine similarity is as follows.

Responsive image


Briefly, the similarity of the cosines is calculated as the number 1 if the two vectors are pointing in exactly the same direction, or the number -1 if they are pointing in the opposite direction.


For a more intuitive understanding, as shown in the figure below, the direction of the arrow when a word is expressed as a vector represents the cosine similarity.

Responsive image


Let's start with a script that creates a ranking of similar words from given words.


The contents of the working folder are as follows.

working directory
your working directory
    -- assets
        -- sixLittleMice.txt
    -- nlpExample.html
    -- nlpExample.js
    -- utils
        -- preprocess.js
        -- buildCoOccurrenceMatrix.js
        -- cosSimilarity.js
+       -- rankingOfWords.js


Let's create a function [rankingOfWords.js] that creates a similar ranking of words.

const descendingOrder = function(a,b) {
    if(a > b) return -1;
    if(b < a) return 1;
    return 0;

function rankingOfWords(query, w2id, id2w, vSize, cmat, rankingLimit = 10) {
    let similarWordList = [];

    if (!(query in w2id)){
        print("This word '" + query + "' is not in the list.");

    let qid = w2id[query];
    let qvec = cmat.tolist()[qid];
    let similarityList = nj.zeros([vSize]);

    for (let i = 0; i < vSize; i++){
        let x = nj.array(cmat.tolist()[i]);
        let y = nj.array(qvec);
        similarityList.set(i, cosSimilarity(x,y).get(0));

    let count = 0
    for (let [index, value] of similarityList.tolist().sort(descendingOrder).entries()){
        if (id2w[index] == query){
        similarWordList.push([id2w[index], value]);

        count += 1
        if (count >= rankingLimit) {
            return similarWordList;
Let's try it.

let result;

function preload(){
    result = loadStrings("./assets/sixLittleMice.txt");

let word2Id;
let id2Word;
let corpus;

let arrOfResults;
let vocabSize;

let coOccurenceMatrix;

let arrayOfRanking = [];

function setup(){
    arrOfResults = preprocess(result);


    corpus = arrOfResults[0];
    word2Id = arrOfResults[1];
    id2Word = arrOfResults[2];

    vocabSize = Object.keys(word2Id).length;
    coOccurenceMatrix = createCoMatrix(corpus, vocabSize);
    let testWord = 'i';
    arrayOfRanking = rankingOfWords(testWord, word2Id, id2Word, vocabSize, coOccurenceMatrix);
    print('Similarity of words : ' + testWord + '\n' + arrayOfRanking);
Similarity of words : i


Let's make the similarity rankings appear in the browser by user interaction.

let result;

function preload(){
    result = loadStrings("./assets/sixLittleMice.txt");

let word2Id;
let id2Word;
let corpus;

let arrOfResults;
let vocabSize;

let coOccurenceMatrix;

let arrayOfRanking = [];
let input, button, notfound;
let placeHolder = [];
let description = "";
let rankingLimit = 10;

function setup(){
    arrOfResults = preprocess(result);


    corpus = arrOfResults[0];
    word2Id = arrOfResults[1];
    id2Word = arrOfResults[2];

    vocabSize = Object.keys(word2Id).length;
    coOccurenceMatrix = createCoMatrix(corpus, vocabSize);

    // gui
    input = createInput();
    input.position(20, 65);

    button = createButton('create');
    button.position(input.x + input.width, 65);

    notfound = createElement('h3', '').position(20, 100);

    description = createElement('h4', '').position(20, 100);

    for (let i=0; i < 10; i++){
        placeHolder.push(createElement('p', '').position(20, 150 + (i * 30)));


    createElement('h2', 'Do you want to create a word similarity ranking?').position(20, 5);


function findSimWords() {
    placeHolder.forEach(function (element){

    let testWord = input.value();
    arrayOfRanking = rankingOfWords(testWord, word2Id, id2Word, vocabSize, coOccurenceMatrix, rankingLimit);

    if (typeof arrayOfRanking == "undefined"){
        notfound.html("This word '" + testWord + "' is not in the list.").position(20, 100);
    if ((typeof placeHolder != "undefined") || (placeHolder != [])) {
        description.html('It is the best ' + rankingLimit + ' words ranking similar to ' + testWord + '.\n');
        for (let [index, arr] of arrayOfRanking.entries()) {
            placeHolder[index].html((index + 1) + ' : ' + arr[0] + " : " + Math.floor(arr[1] * 100) + '%' + '\n');


I will show you the result in the video.


I will finish by explaining some details.

const descendingOrder = function(a,b) {
    if(a > b) return -1;
    if(b < a) return 1;
    return 0;

この関数は降順に配列の要素を並び替えるためにあります。 sortメソッドと組み合わせて使います。

This function sorts the elements of an array in descending order. Used with the sort method.

for (let [index, value] of similarityList.tolist().sort(descendingOrder).entries()){
    if (id2w[index] == query){

配列の後ろから一つずつ順番に数を比較して、大きいものを配列の先頭にもっていきます。 次にまた同じことを配列の先頭以外で繰り返します。 さらに次は配列の先頭と二番目以外を抜かして繰り返す。 さらに次は配列の先頭と二番目と三番目以外を抜かして繰り返す。 この作業を最後まで繰り返せば降順に整数が並ぶという訳です。

Compares the numbers one at a time from the end of the array and places the larger number at the beginning of the array. Then it repeats the same thing again except at the beginning of the array. Then, next step is to skip the first and second parts of the array, Then, next step is repeated except for the first, second, and third parts of the array. So if you repeat this process all the way to the end of the array, the integers are sorted in descending order.

Responsive image

Responsive image

Responsive image

Responsive image

Responsive image

Responsive image

Responsive image

Responsive image


The following is a description of the GUI portion.

description = createElement('h4', '').position(20, 100);
for (let i=0; i < 10; i++){
   placeHolder.push(createElement('p', '').position(20, 150 + (i * 30)));

createElement methodで指定したHTMLの要素が作られます。 そしてposition methodで指定した場所へ配置されます。 今回は何度も表示させたり消したりする必要があるので、 先にプレースホルダーとして空の要素を作っておきます。

The HTML element specified by "createElement method" is created. It is then positioned at the location specified by "position method". This time, it needs to be displayed and deleted many times. So, I decided to first create an empty element as a placeholder.



When the [create] button is pressed, a function is invoked that creates a similar ranking of words and draws them on the screen.

function findSimWords() {
    placeHolder.forEach(function (element){


First, we initialize the HTML element to display the results.

    let testWord = input.value();
    arrayOfRanking = rankingOfWords(testWord, word2Id, id2Word, vocabSize, coOccurenceMatrix, rankingLimit);


It then takes the word from the input element created in "createElement method" and passes the word to "rankingOfWords function", and receives the output.

    if (typeof arrayOfRanking == "undefined"){
    notfound.html("This word '" + testWord + "' is not in the list.").position(20, 100);


If there is no output result, there is no corresponding word or it is an error, so it displays the content and ends.

    if ((typeof placeHolder != "undefined") || (placeHolder != [])) {
        description.html('It is the best ' + rankingLimit + ' words ranking similar to ' + testWord + '.\n');
        for (let [index, arr] of arrayOfRanking.entries()) {
            placeHolder[index].html((index + 1) + ' : ' + arr[0] + " : " + Math.floor(arr[1] * 100) + '%' + '\n');


If you have an output, you can display it by assigning a value to an empty P element created using "entries method".


Next time, also I want to do something interesting with numjs and P5. But I might go back to Python.

See You Next Page!