Beigesoft™ Math projects overview.

Keywords: Python for science, NumPy, SciPy, Support vector machines.

This project never duplicate information from existing top level (that give you search engines) sources. It supplements existing projects (references them), explains things in an any degree over way, introduces new ones.

Английский как основной язык наук.

Английский язык давно уже стал интернациональным языком для наук. И все это несмотря на то что многие слова на Английском имеют несколько значений, причем неродственных, и даже противоположных, а научный доклад у них зовется довольно весело - paper. Отечественная медицина позаимствовала много терминов с Английского, например - сенсибилизация, локализованный, синус лифтинг... Ученые и инженеры переводят и публикуют свои работы с родного на Английский язык. Первая статья "Support vector machines" посвящена работе автором которой является Владимир Вапник. Русско-язычных существенных источников по данной работе не наблюдается. Поэтому нет смысла писать статьи на Русском, тем более что все ссылки на англоязычные источники.

Why Python (exactly NumPy, SciPy, etc.) seems to be the best math and science tool.

Many scientists and engineers chose Java because of its ability of publishing programs as WEB-applets. Java is faster and definitely more rigorous than Python, and it also allows to invoke C/C++ libs via JNI. But the most scientists and engineers have chosen Python. The reason probably is the ability to run commands in its shell. All scientist tools have a shell interpreter (Matlab, Octavia, Scilab, etc). But the truth is that many main units in SCI-Python libs are made with C (e.g. see on GitHub percent of C code in NymPy and SciPy). So, if you want to contribute, then you should use C for a code that requires a lot of computation, and you should use a good abstraction to make and use different implementations, i.e. from a generic one to architecture dependent ones (e.g. via SSE, GPU, etc.). Besides you should code Python in more type-explicit way, e.g. denote floats with a dot, i.e. use 1. instead of 1 in math expressions. You also should be aware about Python's things that make your life really not easy, for example:

#Python2:
>>> 3 / 2
1
>>> 3. / 2.
1.5
>>> 3 // 2
1
>>> 3.0 // 2.0
1.0
#Python3:
>>> 3 / 2
1.5
>>> 3. / 2.
1.5
>>> 3 // 2
1
>>> 3.0 // 2.0
1.0
In this example, in contrast to Python2, Python3 acts in an ambiguous way, i.e. it treats 3 / 2 as floats instead of integers, so you must use exactly // for integers.

If you want (actually must) to make a tests suite, then you probably end up with "test.support is not a public module. It is documented here to help Python developers write tests...". In Java JUNIT makes all that you need. In Python it's not too clear which testing library is the choice. So, using ordinary BASH (git-bash available on MS Windows) seems to be a reliable testing/installing tool (similar to Make in C).

Python isn't type-safe at all. For example, you make an Array variable, then assign a scalar to it - Python never blame about it. In Java it will be an error. In C it will be a compiler's error or warning about assigning wrong pointer type. In addition in Java SpotBug and other tools detect many errors automatically. But pychecker and pylint seems to be useless for it, e.g.:

import numpy as np
X = np.array([[.5, 1.2], [1., 1.], [3., 3.], [4., 3.3]])
print(X)
X = 1
print(X)
Python runs this program without any blame. Both pychecker and pylint will not blame about wrong type assigning. New Python 3.5 https://docs.python.org/3/library/typing.html designed for hinting only, i.e. there is no compiling errors, pylint3 is silent about wrong type assignment (but jupiter notebook on TAB makes an auto-completion without such "typing").

But for mathematicians that make final non-big programs (not libraries) those disadvantages matter nothing. But they appreciate a very good Python feature - Jupiter Notebook - i.e. ability to publish programs in WEB. In opposite to Java Applets, this is a server side technology, that automatically means:

Understanding Support Vector Machines.

This project is intended to understand, explain and try step by step SVM method. Read it to know:

Detecting spam in SMS unlabeled data.

This is about data mining (extracting information) from data without a classifying model (that trained on labeled train data). For example 5 messages:

"XXXMobileMovieClub: To use your credit, click the WAP link in the next txt message or click here>> http://wap. xxxmobilemovieclub.com?n=QJKGIGHJJGCBL"
Oh k...i'm watching here:)
Eh u remember how 2 spell his name... Yes i did. He v naughty make until i v wet.
Fine if thatåÕs the way u feel. ThatåÕs the way its gota b
"England v Macedonia - dont miss the goals/team news. Txt ur national team to 87077 eg ENGLAND to 87077 Try:WALES, SCOTLAND 4txt/̼1.20 POBOXox36504W45WQ 16+"
You can detect yourself that the first and the last messages look like spam. But can Math do this? An ordinary expert system can do this job probably very well. But they say, that Math can retrieve information from source data that you can't notice neither at first look nor after a week of thoroughly study it. So, this is the first simplest data-set to start.

* Plots(graphs) are made by Matplotlib, and source code (Python) are included.

Source code of this project: https://github.com/demidenko05/beige-math