無料で学べるスキルアップAIキャンプ
https://www.skillupai.com/skillupai-camp/
2021年3月24日開催の
「RaspberryPiで始めよう AIとIoT」
で、久しぶりに RaspPi いじってみよう。
カメラ使って、AI使って、それっぽいことできそうだ。
子供のころ、映画「STARWARS」 を観てワクワクし、ガンプラにドキドキしていた。時間が経つのも忘れてしまう、そんな時間をもう一度、取り戻すために、Raspberry pi を使って、オヤジがロボット作りに挑戦する!
$ python -m gensim.scripts.make_wiki enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2 wiki_en_output 10002015-05-13 07:49:11,950 : INFO : running /usr/local/lib/python2.7/dist-packages/gensim/scripts/make_wiki.py enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2 wiki_en_output 10002015-05-13 07:49:13,406 : INFO : adding document #0 to Dictionary(0 unique tokens: [])2015-05-13 08:00:07,043 : INFO : finished iterating over Wikipedia corpus of 4680 documents with 13500677 positions (total 6280 articles, 13507365 positions before pruning articles shorter than 50 words)2015-05-13 08:00:07,044 : INFO : built Dictionary(260454 unique tokens: [u'biennials', u'tripolitan', u'unsupportable', u'refreshable', u'nunnery']...) from 4680 documents (total 13500677 corpus positions)2015-05-13 08:00:11,267 : INFO : discarding 240659 tokens: [(u'ability', 817), (u'able', 1252), (u'about', 2936), (u'abstention', 7), (u'abstentionism', 2), (u'according', 2057), (u'account', 890), (u'accounts', 486), (u'achieved', 648), (u'across', 1240)]...2015-05-13 08:00:11,268 : INFO : keeping 19795 tokens which were in no less than 20 and no more than 468 (=10.0%) documents2015-05-13 08:00:12,348 : INFO : resulting dictionary: Dictionary(19795 unique tokens: [u'writings', u'homomorphism', u'hordes', u'yellow', u'gag']...)2015-05-13 08:00:12,564 : INFO : storing corpus in Matrix Market format to wiki_en_output_bow.mm2015-05-13 08:00:12,566 : INFO : saving sparse matrix to wiki_en_output_bow.mm2015-05-13 08:00:14,260 : INFO : PROGRESS: saving document #02015-05-13 08:12:06,707 : INFO : finished iterating over Wikipedia corpus of 4680 documents with 13500677 positions (total 6280 articles, 13507365 positions before pruning articles shorter than 50 words)2015-05-13 08:12:06,709 : INFO : saved 4680x19795 matrix, density=1.896% (1756120/92640600)2015-05-13 08:12:06,711 : INFO : saving MmCorpus index to wiki_en_output_bow.mm.index2015-05-13 08:12:06,726 : INFO : saving dictionary mapping to wiki_en_output_wordids.txt.bz22015-05-13 08:12:11,686 : INFO : loaded corpus index from wiki_en_output_bow.mm.index2015-05-13 08:12:11,687 : INFO : initializing corpus reader from wiki_en_output_bow.mm2015-05-13 08:12:11,688 : INFO : accepted corpus with 4680 documents, 19795 features, 1756120 non-zero entries2015-05-13 08:12:11,689 : INFO : collecting document frequencies2015-05-13 08:12:11,753 : INFO : PROGRESS: processing document #02015-05-13 08:13:48,804 : INFO : calculating IDF weights for 4680 documents and 19794 features (1756120 matrix non-zeros)2015-05-13 08:13:48,979 : INFO : storing corpus in Matrix Market format to wiki_en_output_tfidf.mm2015-05-13 08:13:48,980 : INFO : saving sparse matrix to wiki_en_output_tfidf.mm2015-05-13 08:13:49,063 : INFO : PROGRESS: saving document #02015-05-13 08:17:32,296 : INFO : saved 4680x19795 matrix, density=1.896% (1756120/92640600)2015-05-13 08:17:32,297 : INFO : saving MmCorpus index to wiki_en_output_tfidf.mm.index2015-05-13 08:17:32,309 : INFO : finished running make_wiki.py$$ ls -l-rw-r--r-- 1 pi pi 11820881800 4月 7 07:06 enwiki-latest-pages-articles.xml.bz2-rw-r--r-- 1 pi pi 46529467 5月 13 07:44 enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2-rw-r--r-- 1 pi pi 21416947 5月 13 08:12 wiki_en_output_bow.mm-rw-r--r-- 1 pi pi 26176 5月 13 08:12 wiki_en_output_bow.mm.index-rw-r--r-- 1 pi pi 46458890 5月 13 08:17 wiki_en_output_tfidf.mm-rw-r--r-- 1 pi pi 27274 5月 13 08:17 wiki_en_output_tfidf.mm.index-rw-r--r-- 1 pi pi 131400 5月 13 08:12 wiki_en_output_wordids.txt.bz2$
$ python -m gensim.scripts.make_wiki enwiki-latest-pages-articles.xml.bz2 wiki_en_output
$ python rel_post_01.pyTraceback (most recent call last):File "rel_post_01.py", line 20, in <module>import nltk.stemImportError: No module named nltk.stem$
$ sudo pip install -U nltk
$ python rel_post_01.pyTraceback (most recent call last):File "rel_post_01.py", line 43, in <module>min_df=1, stop_words='english', charset_error='ignore')TypeError: __init__() got an unexpected keyword argument 'charset_error'$
$ cd ../data/toy$ cat *This is a toy post about machine learning. Actually, it contains not much interesting stuff.Imaging databases can get huge.Most imaging databases safe images permanently.Imaging databases store images.Imaging databases store images. Imaging databases store images. Imaging databases store images.$
$ more ch03-1.pyimport osimport sysimport scipy as spDIR = r"../data/toy"posts = [open(os.path.join(DIR, f)).read() for f in os.listdir(DIR)]from sklearn.feature_extraction.text import CountVectorizervectorizer = CountVectorizer(min_df=1)print(vectorizer)$
$ python ch03-1.pyTraceback (most recent call last):File "ch03-1.py", line 8, in <module>vectorizer = CountVectorizer(min_df=1.0)TypeError: __init__() got an unexpected keyword argument 'min_df'$
$ pythonPython 2.7.3 (default, Mar 18 2014, 05:13:23)[GCC 4.6.3] on linux2>>> import sklearn>>> sklearn.__version__'0.11'>>>
$ sudo pip install -U scikit-learn
$ pythonPython 2.7.3 (default, Mar 18 2014, 05:13:23)[GCC 4.6.3] on linux2>>> import sklearn>>> sklearn.__version__'0.16.1'>>>
$ python ch03-1.pyCountVectorizer(analyzer=u'word', binary=False, decode_error=u'strict',dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',lowercase=True, max_df=1.0, max_features=None, min_df=1,ngram_range=(1, 1), preprocessor=None, stop_words=None,strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b',tokenizer=None, vocabulary=None)$
$ more ch03-1.pyimport osimport sysimport scipy as spDIR = r"../data/toy"posts = [open(os.path.join(DIR, f)).read() for f in os.listdir(DIR)]from sklearn.feature_extraction.text import CountVectorizervectorizer = CountVectorizer(min_df=1)x_train=vectorizer.fit_transform(posts)num_samples, num_features=x_train.shapeprint("#samples: %d, #features: %d" % (num_samples, num_features))print("--------------------------------------------\n")print(vectorizer.get_feature_names())print("--------------------------------------------\n")new_post = "imaging databases"new_post_vec=vectorizer.transform([new_post])print(new_post)print(new_post_vec)print(new_post_vec.toarray())print("--------------------------------------------\n")# normdef dist_raw(v1, v2):delta = v1 - v2return sp.linalg.norm(delta.toarray())#best_dist = sys.maxsizebest_i = Nonefor i in range(0, num_samples):post = posts[i]if post == new_post:continuepost_vec = x_train.getrow(i)d = dist_raw(post_vec, new_post_vec)print("=== Post %i with dist=%.2f: %s" % (i, d, post))print(x_train.getrow(i).toarray())if d < best_dist:best_dist = dbest_i = iprint("*** Best post is %i with dist=%.2f" % (best_i, best_dist))$
$ python ch03-1.py#samples: 5, #features: 24--------------------------------------------[u'about', u'actually', u'can', u'contains', u'databases', u'get', u'huge', u'images', u'imaging', u'interesting', u'is', u'it', u'learning', u'machine', u'most', u'much', u'not', u'permanently', u'post', u'safe', u'store', u'stuff', u'this', u'toy']--------------------------------------------imaging databases(0, 4) 1(0, 8) 1[[0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]--------------------------------------------=== Post 0 with dist=4.00: This is a toy post about machine learning. Actually, it contains not much interesting stuff.[[1 1 0 1 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 1]]=== Post 1 with dist=1.73: Imaging databases can get huge.[[0 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]=== Post 2 with dist=2.00: Most imaging databases safe images permanently.[[0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0]]=== Post 3 with dist=5.10: Imaging databases store images. Imaging databases store images. Imaging databases store images.[[0 0 0 0 3 0 0 3 3 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0]]=== Post 4 with dist=1.41: Imaging databases store images.[[0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]]*** Best post is 4 with dist=1.41$
$ python seeds_threshold.pyTraceback (most recent call last):File "seeds_threshold.py", line 12, in <module>features, labels = load_dataset('seeds')File "/home/pi/bmlswp/ch02/load.py", line 27, in load_datasetdata.append([float(tk) for tk in tokens[:-1]])ValueError: could not convert string to float:$