Diving into Tensorflow

Does android dream of an electronic sheep?

Since February this year I have been learning more and more about neutral networks and their recent reincarnation known as deep learning. This approach become famous outside of its usual field not only by association with Google’s work on AI but also by number of articles discussing results of deep learning as similar to hallucination, experienced by people in, let’s call it, chemically excited state of mind.

This almost gives an impression of the deep learning as a good party trick. It is definitely more than that. The concept is to replicate the work of the human brain and create self-learning machines. In the current instance neural networks are extremely efficient in finding patterns, though as with every ML approach a lot of this can be noise (over-fitting) and a lot of effort is put in place in order to avoid this. If you interested in learning more check online - there is a lot of information. Few things I could recommend is:

And if you want to try out NN online:

let’s go Tensorflow

There is a number of really good frameworks supporting this. I have set my sight for Tensorflow for three reasons:

  • it has support of the Google and an active community
  • it can be operated using Python
  • Goggle Cloud support it natively

I have just started a small side project, looking at utilising deep learning

my own copy of Tensorflow

Until recent the easiest way to install Tensorflow was to use a docker image. This great tool is much better served by Linux than Windows. The main problem is the need to use of virtual box as a part of deployment - you can use Linux dockers but you can’t compile anything for your hardware (you do deploy them in virtual Linux) neither can you fully explore the command line capacities of this great tool.

An alternative is to install Tensorflow native. Again, great for Linux, not so good for Windows. While the logical conclusion is to switch to Linux I still need to use some of my high-performance university machines that need to run on Windows.

Over a week ago, Google released v 0.12 that [works natively at Windows platform]((https://developers.googleblog.com/2016/11/tensorflow-0-12-adds-support-for-windows.html). Section below explains how to install it on the PC. Note that we are looking at installing CPU based version. It is much slower than GPU one and generally not recommended for anything but basic. I tried to install GPU version, despite having NVidia card which should be supported by CUDA I just managed to crash my hardware. I would be interested to hear from ppl who managed to install it.

  1. Install Anaconda, it’s great data science oriented distribution so you will find it useful anyway
  2. Create new environment
conda create -n tensorflow python=3.5
conda info --env
activate tensorflow
  1. Install Tensorflow into it
pip install tensorflow
conda install ipython
conda install scipy
  1. Test your installation
  • run ipython - make sure it identifies correct python 3.5.x and does not flag
    missing libraries - install any missing libraries using conda install xxx
    import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello)) a = tf.constant(10) b = tf.constant(32) print(sess.run(a + b))
  1. Run your first neutral network
  • get env path using conda info --env
  • run example
    cd [YOUR_ENV_PATH]\Lib\site-packages\tensorflow\models\image\mnist\ python convolutional.py
  • This will take a long time on CPU - now you know why people run NN on GPU-rich servers.
  • if you read through my introduction this is eqivalent of this ConvnetJS example
  1. If above two steps executed successfully - you got working Tensorflow, well done!

How can I git python notebooks?

I suspect I am not the only one struggling with using git for version control of Jupyter Notebooks. While intended to make it easy to publish code online they are notorious for mixing code with the output making it difficult to properly identify changes and also bloating repositories with unnecessary code output.

There are quite a few solutions discussed on the internet, mostly varying between using a filter or pre-commit git hooks. Both operate in roughly the same manner - output is stripped before code is being committed, focusing on changes in code only.

So how should I do it?

So, what is the easiest way to implement those solutions? Use Florian Rathgeber pip package. To install it type:

pip install nbstripout
pip install nbconvert

Then, to initialise a new repo with ipython notebooks type:

git init
nbstripout --install

What is happening in the second step? We are defining Git filter that will be used every time we commit. This will take care of the output cells before committing. The detailed code responsible for the git filter can be seen by using cat .git\config.

From now on all you need to do is commit as usual. And the above code will work it magic behind.

keeping certain outputs

What happens if you want to keep specific output? Use the “Edit Metadata” button on the desired cell to enter

{
  "keep_output": true,
}

or
{ "init_cell": true, }

Other approaches

From reading other blogs it seems that git hooks might be another approach. I will discuss it once I tired it.

What is the difference between GPS and GNSS then?

In my previous post, I have used both terms GPS (describing the American Navigation System) and GNSS. The last terms is a popular umbrella term for all navigation systems including American, Russian, Chinese, European, Japanese and Indian ones.
I have recently read great GNSS book by Len Jacobson’s where he points out that this is not consistent with original definitions

GNSS term has been originally defined by International Civil Aviation Organization (ICAO) as “A worldwide position and time determination system that includes one or more satellite constellations, aircraft receivers and system integrity monitoring, augmented as necessary to support the required navigation performance for the intended operation”.

This includes not only augmentation constellation (EGNOS, WAAS and MSAS) but also the ground-based safety of life infrastructure - a much larger definition that we tend to use. Interestingly both the 2007 and 2012 ICAO documents list only GPS and GLONASS as part of GNSS. This is mostly to the fact that only L1 and G1 are protected frequencies worldwide. Galileo E1 should be soon accredited and protected as well.

Does it mean that we should properly say: GNSS, excluding ground components and Chinese Beiduo and Indian IRNSS? No, not really. It is a mouthful and you are most likely not be understood. I think it is more the interesting case of GNSS term hijack by the professional magazines.

BTW, as might be aware I will be presenting at ION GNSS+ 2016 in Portland, Oregon next week. If anybody is interested in learning more get in touch.

In the meantime I am flying and enjoying my new office:)

Alt text

Care to join me at ION GNSS+ 2016?

ION GNSS+ 2016

This September I will be attending ION GNSS+ 2016, taking place September 12-16 at the Portland Convention Center in Portland, Oregon.

ION GNSS+ 2016 is the world’s largest technical conference focusion of different means of navigation. This is where a new GNSS and navigation technologies are showcased and discussed for the first time. This year is special as ION will be celebrating 30 years of ION conferences (changing name from ION GPS through ION GNSS to ION GNSS+, showing the growth of importance of other constellations and navigation sensors such as IMS or Locata1). This will involve 1980’s style celebration, featuring the decade’s best food, games, and music.

Alt text

My paper

I will be presenting in the Session F3: Marine Applications on Thursday, September 15 and discussing University of Nottingham and the Royal Norwegian Naval Academy joined research in the jamming of the GPS signal.

Why would this be important?

Ship traffic has increased fourfold over last 25 years and now constitute 90% of global trade by volume. Analysis of accidents and serious incidents worldwide indicate that half of them are due to the navigational error. This ratio has significantly increased in period 1995-2002, concurrent with the introduction of easy to use electronic chart Displays (ECDIS) and GPS navigation. Recent reports has also identified the lack of other systems knowledge across the crew.
This all suggest the navigators’ over-reliance on the GPS.

GPS, due to its signal strength, can be reasonably easily jammed using so-called privacy devices - small and low powered GPS jammers, mostly under $50, used to prevent tracking of goods or vehicles. Those are illegal in USA and Europe, but enforcing this remains tricky. Our previous research shown that the typical maritime GPS receiver is easy to jam and may produce visibly erroneous positional information up to 600 meters from standard GPS L1 jammer and complete loss of signal within 200 meter radius.
This creates an opportunity for very difficult to detect and prevent cyber-attack, which can lead to a serious maritime incidents, especially in poor visibility.

Alt text

Interested in learning more?

I will be presenting my recent results on Thursday, September 15. If you would like to join me and the others in Portland, register here.

  1. Locata is a novel positioning technology from Australia based on the concept of Pseudolites, that is static, earthbound transmitter utilising GPS signal structure and sometimes even frequency. What is really interesting with this technology is its very precise network synchronisation, using technology called Timeloc and its use of WiFi frequency. I think the most impressive use of the technology is its deployment on the White Sands Missile Range (WSMR) in New Mexico, USA. If you want to learn more, my PhD thesis might be of use.

Don't use pandas for matrix operations

Twilight of Matlab

I am used to doing my matrix calculations in Matlab - it is a great tool to do. Yet, over the last few years, I have migrated to python and R. There are many reasons to explain the shift, with the active online discussion. For a starter, I recommend reading this Matlab forums.

In a nutshell, Matlab is amazing for its matrix operations but has weaknesses in other areas. It has fallen behind python and R in interactive bits (especially workbooks) and with every new edition it seems to go Microsoft Office way - simplify for beginner users, make it annoying for more experienced ones. It has advantages: it is by far the easiest to learn and Matlab simulink don’t have any replacement yet.

For more in-depth comparison I also suggest reading this post.

Machine learning in python

Andrew Ng Machine Learning course is based on Matlab. I decided to try to do it both in Matlab and python. I was not the first, Lyn did it last year.

Yet as I progressed I realised that there was a problem. During the simple matrix multiplications, results which should be (m,1) become (m,m) To keep the story short - the culprit was pandas’ data frame and its interaction with numpy for matrix operations.

Let’s start with example1

import pandas as pd
import numpy as np

data = pd.read_csv('ex1data1.txt', header = None)
# assign the first and second columns to the variables X and y, respectively
X = np.c_[np.ones(data.shape[0]), data.iloc[:,0] ] 
y = data.iloc[:,1]

v = X.dot(theta) - y
v

This will produce (m,m) matrix (full of NaNs) instead of (m,1). Problem has to do with pandas’ data frame indexes as explained on stackoverflow.

What would be the solutions?

keep it simple

Matrix operations are done using numpy. It is more correct to simplify pandas data frame to numpy arrays and this way avoid all the problems later one. Let me demonstrate this:

data = pd.read_csv('ex1data1.txt', header = None)
# assign the first and second columns to the variables X and y, respectively
X = np.c_[np.ones(data.shape[0]), data.iloc[:,0] ] 
y = np.c_[data.iloc[:,0]]

X.dot(theta)-y #residuals

This will produce results as expected.

Takeaway

The lesson for today:
Use most simple variables for the functions. Do not depend on auto-casting, there might be hidden pitfalls2 that you are not aware of. Introduce forced casting in your functions or classes. Just in case.

  1. If you want to run example yourself, a copy of ‘ex1data1.txt’ can be found in my repo

  2. For more pandas gotchas check here