# Python's Summer of Code 2016 Updates

## May 29, 2016

### Karan_Saxena (italian mars society)

#### Community Bonding Period Updates

So today marks the day we begin our coding period. (Well, I must admit I got a bit late in publishing this blog post)

I will be describing my current setup that I have done in the past month.

For recap, my work during the summers will be with Italian Mars Society(IMS), on a project called ERAS - European MaRs Analog Station. As the name suggests, the aim of ERAS is to build a analog base station which will be used to train astronauts to be capable to visit planet Red.
The primary stage of this project is to build a Virtual setup of the same, called V-ERAS or Virtual-ERAS.

## Virtual ERAS station “on Mars”

V-ERAS uses multiple 'Microsoft Kinect for Xbox One' devices to recognize and track user body and movements. Data from Kinect is also placed on a Tango bus, to be available for any other V-ERAS module.
Kinect feed data is used to animate an avatar with the Blender Game Engine. The latter is also responsible to draw the whole virtual martian environment, manage interactions among multiple users and allow them to interface with tools, objects and all terrain vehicles (ATVs).
V-ERAS also uses Motivity - a static omnidirectional treadmill, on which users can move to walk on the emulated Martian environment.

 Microsoft Kinect for Xbox One

IMS-ERAS is a project under Python Software Foundation(PSF) for GSoC'16. The description of all the selected projects for IMS, under PSF for GSoC'16, can be found on this link. The title of my project is "Improving the step recognition algorithm used in ERAS virtual environment simulation".
In brief, the aim of my project is to
1) improve the feet recognition in the body tracker module written by Vito Gentile during GSoC'15.
2) port existing code to use Kinect v2, using PyKinect2

## My RGB and depth data from Kinect Sensor

My first step was to get my hands on Kinect and to make sure if it works on my laptop.
My laptop configuration is Intel i5 3.3GHz with 12GB RAM and Nvidia NVS 5400M. One important point to be noted here is that Kinect required USB 3.0 as mandatory.
Initially, I began with Kinect Studio v2.0 and SDK Browser v2.0 (Kinect for Windows) to test the data being received and sent by the Kinect sensor. Kinect v2 sends an overwhelming amount of data. At runtime, it was sending 5GBps of data.

For the project, I will be going forward with PyKinect2, based on Microsoft Kinect SDK 2.0 (compliant with Kinect for Xbox One). More details about PyKinect can be found on Github.

During these days, I also setup Tango, both on Windows and Ubuntu. The instructions for the same can be found on ERAS Documentation.

Time to start coding now :)

Onwards and Upwards!!

## Recap

I think I will be starting each week's post with a "Recap" section. In that section I will explain the motivation and aim behind each week's coding. So my project is about making it possible to develop coala bears with literally any programming language. This is useful because an open source's community survival is based on contributors. The easier you make it for contributors to contribute, the higher the chances you will have an everlasting and successful open source project.

The way I chose to implement this functionality wasn't that complicated: basically we have a normal python bear (wrapper bear) whose sole purpose (in this world) is to start a process of a given executable (binary, script, etc), send it a big line containing a JSON string and then wait for a big line again with a JSON string. All the relevant information (file name, lines and optionally settings) will be passed through that JSON and similarly Result1 and Diff2 objects will be passed back. Yes, you might legitimately argue that we add some overhead, which is totally true, but this is the trade-off we are willing to pay for such a feature.
A modular and extendable way to build such a feature would be to have a class that contains all the basic functionality of these so called wrapper bears, and then every time we want to create a new one, we just inherit from that class and make small changes (say executable file name).
In coala there is already a class that does something similar, it integrates all kind of bears that use linters3. So I had a starting point.

Back to the recap, my goal for this week was to start that "class" which I named ExternalBearWrap (with the help of the community :D). I chose to implement it similarly to the already existing Linter class which makes use of python's decorators.

1. Object that creates the results for coala to display

2. Object that creates the results that also offer a suggested fix

3. Static analysis tools built usually for each language. coala offers a way to integrate those in bears.

## Python decorators

I used a substantial amount of time this week learning about the decorators and their use cases. They make use of a functional programming aspect called function closure. I will not be detailing function closures and decorators here, instead I will point you to this link if you want to learn for yourself.

Instead of having a class from which the other wrappers will inherit, we make a decorator for the other wrappers to use. The decorator way has some advantages but the most important one is that it minimizes the code written in the wrappers. We want such a feature because the developers who choose to write bears in other languages obviously want to write close to zero python. These wrappers will be auto-generated later on in the project.

## Wrap Up

To sum it up, I managed to write the external_bear_wrap decorator. So far the wrapper bears can send and receive input to and from a given executable passed as a decorator argument. Next week the functionality should be completed by sending and receiving the appropriate JSON and parsing it into coala Result and Diff objects.

### liscju (Mercurial)

#### Coding Period - I Week

In the first week I started to work on the solution(really amazing, isnt it?) :P
First thing was setting up the weekly meeting time with the mentor to discuss goals of the weeks to accomplish.

Second thing was discussing some design issues about the solution - how to fit feature into the working code. Decision for now is to reuse proto.statlfile implementation for that - before statlfile was remote store procedure to check if given file is accessible in remote store. Now it will return also location of the file.

And the most important thing - I started working on the simplest implementation of feature possible - client and server on the same machine, different location to store large files also on local machine. One could ask whats the idea of working on such a simplest subset of the solution, but in my view this is really important phase. In this phase you must design test for the solution, you must make design decision how things fit together, also working on a subset of the solution and getting feedback as quick as possible gives space to discuss it and learn from it. Im in the middle of making changes so implementation details are constatly changing and mercurial feature for changing history are really helpful for working on it, if you havent encountered evolve extension i really suggest checking it:
https://www.mercurial-scm.org/wiki/EvolveExtension

Apart from the project my patch "send stat calls only for files that are not available locally", which I described in previous post ,was merged into the main repository, you can have a look here:
https://selenic.com/hg/rev/fd288d118074

My patches about migrating largefiles to python3 hasnt been merged so far, but working on the solution i found bug in import-checker.py(inner tool for checking if imports in source files comforms to the style guide) - it didnt recognize properly relative imports of the form
from ..F import G
I fixed this bug and sent to the mailing list and its already merged, you can check this here:
https://selenic.com/hg/rev/660d8d4ec7aa

I also got some feedback about excessive password prompt i described in previous post and in mailing list:
https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-May/084490.html
As soon as I accomplish all goals of this week and have some free time Im gonna send new proposal of the solution

### SanketDG (coala)

#### Summer of Code Week 1

So as I said in my previous blog post, I am working with coala for language independent documentation extraction for this year’s Google Summer of Code.

It has been one week since the coding period has started, and there has been some work done! I would like to explain some stuff before we get started on the real work.

So, my project deals with language independent documentation extraction. Turns out, documentation isn’t that independent of the language. Most programming languages don’t have an official documentation specification. But It could be said that documentation is independent of the documentation standard (hereby referred to as docstyle) it uses.

I have to extract parts/metadata from the documentation like descriptions, parameters and their descriptions, return descriptions and perform various analyzing routines on this parsed metadata.

Most of my work is with the DocumentationComment class, where I have to implement routines for each language/docstyle. I started out with python first because of two reasons:

• Its my favourite programming language (Duh!)
• coala is written in python! (Duh again!)

So python has its own docstyle, that is known as “docstrings”, and they are clearly defined in PEP 257. Note that PEP 257 is just a general styleguide on how to write docstrings.

The PEP contains conventions, not laws or syntax

It is not a specifictaion.

Several documentation tools support compiling these docstrings into API documentation like Sphinx and Doxygen. I aim to support both of them.

So, I have come up with the following signature for DocumentationComment:

DocumentationComment(documentation, language, docstyle, indent, marker, range)

Now let’s say doc is an instance of DocumentationComment. doc would have a function named parse_documentation() that would do the parsing and get the metadata. So if I have a function with a docstring:

And I load this into the DocumentationComment class and then apply the parsing:

Note: Not all parameters are required for instantation.

Now printing repr(docdata) would print:

You may ask about the strange formatting. That is because it retains the exact formatting, as displayed in the docstring. This is important, because whatever analyzing routines I run, I should always be able to “assemble” back to the original docstring.

That’s it! This was my milestone for week 1, to parse and extract metadata out of python docstrings! I have already started developing a simple Bear, that I will talk about later this week.

PS: I would really like to thank my mentor Mischa Krüger for his thoughts on the API design and for doing reviews on my ugly code. :P

#### Week 1

By week 1 I finally got a part of the hardest thing regarding this project: a way of maintaining the requirements for the bears easily.

To help myself with that, I created a PackageRequirement class. Now each bear can have a REQUIREMENTS tuple made of instances of the PackageRequirement class.

To automatize working, I worked on two separate things:

• creating a “multiple“ method, which helps you instantiate multiple instances of that class instead of calling out the same class again all over and over again
• creating subclasses, such as PythonRequirement and NPMRequirement, which have the manager already set to “pip“, respectively “npm“.

These classes receive 3 arguments: manager, package name and version.

However, there’s more: you can also avoid specifying the version, and this way, the latest will automatically be specified.

On the other hand, I am working on a bearUploader tool. This will upload a MockBear (which I chose to be cpplintbear for the basic functionality it provides) to provide continuous integration. This is still work in progress, as the tool only creates a setup.py file for the bear right now, but it’s going to be done the next week.

So for the next week: More on the bearUploader tool!

### kaichogami (mne-python)

#### Decoding API

Hello!
Its been a week after the coding period. Lot of discussion about the approach of refactoring the decoding module took place. However every idea had some sort of shortcoming.
The refactoring is being done so as to comply with the scikit-learn pipeline. The pipeline chains various steps of transformers ending with a classifier, which is a requirement.
This week I worked on changing EpochsVectorizer class. Earlier this class changed the dimensionality  of Epochs data matrix to a two dimension matrix(Epochs is a data structure of mne containing events of brain signals sampled at, greater than Nyquist frequency). After changing, EpochsVectorizer accepts Epochs object and returns a 2D matrix containing information about the epochs, with original shape as (n_epochs, n_channels, n_times) and a vector of shape (n_samples) with the event labels.  With all the changes in view, a pipeline object with mne  would look like:

X, y = EpochsVectorizer().fit_transform(epochs)
clf = make_pipeline(Xdawn(epochs.info), MinMax(), LogisticRegression())
# CV with clf
cv = cross_val_score(clf, X, y, ...)

There are uses of above technique. Firstly, most important it makes it compatible with sklearn. Using cross_val_score becomes possible, therefore saving lot of effort to chain transformers as well as to cross validate. Secondly Epochs is an extremely heavy object which is bound to make processes slow.

This was a gist of what we did this week. Another change is being proposed by Jean for a new API. There are shortcoming for passing  info between steps. It is a problem to pass info between various steps. Secondly info may not be enough to reconstruct the original data. Or in some cases it provides a lot more information than needed.
As for my PR you can see it here. Thank you for reading!

## Week 1 Updates

The task of Week 1 was to complete the prototype of coala-bears –create application. As explained in my previous blog post, this tool helps new bear creators an easy approach to generate most of the code which is reused for every bear and allows the user to simply plug in the values specific to the bear she’s creating. A scaffolding template for both the bear file and test file is present in the scaffold-templates directory.

While figuring out on how to replace the values present in these files with the one which user provides, I got to know about an interesting method safe_substitue present in Python Std Library itself. This function is better than using substitue in my case as the user can choose not to provide values at run time, hence there won’t be a KeyError in that case.

The default delimiter is $which describes the placeholder value, and basically anything after$ if it matches with the dict[key] will be replaced by value of that key. Example:

>>> s = Template('I am working from $start to$end')
>>> s.substitute(start='Monday', end='Friday')
I am working from Monday to Sunday

I am done with implementing the basic functions of the CLI, which is basically asking users some values and appending them and generating the file at directory chosen by the user.

### Further Work for Week 2 involves

• Look for the feasability of adding Python Prompt Toolkit features
• Add more options if required
• Make a python package so that user can install from pip
• General code cleanup if required

### Here’s a video of the tool in action:

All the code is uploaded on Gitlab at coala / coala-bear-management

## May 28, 2016

### mkatsimpris (MyHDL)

#### Make more tests

Today, Christopher point me some changes to do in my code in the pull requests in order to make my module more. I made these minor changes and I will wait for more feedback in order to improve my code in terms of scalability, modularity, and readability. When all the changes implemented the pull request will finally be merged in the original repository. Meanwhile, I run some tests of the

## May 27, 2016

### Yen (scikit-learn)

#### scikit-learn Sparse Utils Now Support Fused Types

Dealing with sparse data is fairly common when we are anlyzing large datasets. However, sparse function utilities in scikit-learn only support float64 currently and will therefore implicitly convert other input data types, e.g., float32, into float64, which may cause unexpected memory error. Since Cython fused types allow us to have one type definition that can refer to multiple types, we can solve potential memory wasting issue by substituting float64 with Cython fused types.

Below, I’ll briefly introduce sparse function utilities in scikit-learn and describe the work I’ve done to enhance it during GSoC.

## Sparse Function Utilities

In scikit-learn, sparse data is often represented as scipy.sparse.csr_matrix or scipy.sparse.csc_matrix. However, these two matrices do not provide built-in methods to calculate important statistics such as L2 norm and variance, which are useful when we are playing with data. Therefore, scikit-learn leverages on sparsefuncs_fast.pyx which defines some helper methods for sparse matrices to handle sparse data more conveniently throughout the project.

## Memory Wasting Issue

However, original implementation of sparse function utilities in scikit-learn is not memory-efficient.

Let’s take a simple function which do not use Cython fused types and will calculate L2 norm of a CSR matrix X as example:

def csr_row_norms(X):
"""L2 norm of each row in CSR matrix X."""

# 1
cdef:
unsigned int n_samples = X.shape[0]
unsigned int n_features = X.shape[1]
np.ndarray[np.float64_t, ndim=1, mode="c"] norms
np.ndarray[np.float64_t, ndim=1, mode="c"] data
np.ndarray[int, ndim=1, mode="c"] indices = X.indices
np.ndarray[int, ndim=1, mode="c"] indptr = X.indptr

np.npy_intp i, j
double sum_

# 2
norms = np.zeros(n_samples, dtype=np.float64)

# 3
data = np.asarray(X.data, dtype=np.float64) # Warning: might copy!

# 4
for i in range(n_samples):
sum_ = 0.0
for j in range(indptr[i], indptr[i + 1]):
sum_ += data[j] * data[j]
norms[i] = sum_

np.sqrt(norms)

return norms
1. Declare Cython’s static-typed (in contrast to Python’s dynamic-typed) variables to store attributes of the input CSR matrix X since static-typed variables can accelerate the computation a lot.

2. Initialize norms with 0s.

3. Since we’ve already used cdef to declare data as a np.ndarray which contains np.float64_t element in step 1, data of X need to be converted into type np.float64 if it belongs to other data types.

4. Calculate the squared sum of each row and then take squared root of it to get L2 norm.

As illustrated above, we can see that STEP 3 IS DANGEROUS because converting type of data may implicitly copy the the data and then cause memory error unexpectedly. To see how it will affect the memory space, we can use memory_profiler to monitor memory usage.

Here is the result of memory profiling if we pass a scipy.sparse.csr_matrix with np.float32 element into our example function:

It is abvious that memory usage increase dramatically because step 3 copies the data so as to convert it from np.float32 to np.float64.

To solve this problem, we can introduce Cython fused types to avoid data copying. But firstly, let’s take a brief look at Cython fused types.

## Cython Fused Types

Here is official page’s clear introduction for fused types:

Fused types allow you to have one type definition that can refer to multiple types. This allows you to write a single static-typed cython algorithm that can operate on values of multiple types. Thus fused types allow generic programming and are akin to templates in C++ or generics in languages like Java / C#.

Note that Cython fused types are specialized at compile time, and are constant for a particular function invocation.

By adopting Cython fused types, our function can accept multiple types and therefore doesn’t need to do datatype conversion.

## Common Pitfalls

Intuitively, in order to integrate Cython fused types to solve the memory issue described above, we will delete step 3 and change step 1 in our function as follows:

# 1
cdef:
unsigned int n_samples = X.shape[0]
unsigned int n_features = X.shape[1]

# Change type from np.float64_t to floating
np.ndarray[floating, ndim=1, mode="c"] norms
np.ndarray[floating, ndim=1, mode="c"] data = X.data

np.ndarray[int, ndim=1, mode="c"] indices = X.indices
np.ndarray[int, ndim=1, mode="c"] indptr = X.indptr

np.npy_intp i, j
double sum_

However, above changes will cause Cython compile error Invalid use of fused types, type cannot be specialized.

It seems that Cython doesn’t allow us to declare fused types variable and then assign value to it within a function if this function doesn’t accept any argument that has type involves the same fused types. Hence, we need to introduce a implementation trick here.

## Enhanced Implementation

The trick I used here is to define a wrapper function and make its underlying implementation function accept fused types arguments. The reason behind this is mentioned above:

If a function accepts some argument that has a particular fused type, it can use cdef to declare and init variable with that particular fused type within its scope.

Code of enhanced implementation is showed below:

# Wrapper function
def csr_row_norms(X):
"""L2 norm of each row in CSR matrix X."""
return _csr_row_norms(X.data, X.shape, X.indices, X.indptr)

# Underlying implementation function
def _csr_row_norms(np.ndarray[floating, ndim=1, mode="c"] X_data,
shape,
np.ndarray[int, ndim=1, mode="c"] X_indices,
np.ndarray[int, ndim=1, mode="c"] X_indptr):
cdef:
unsigned int n_samples = shape[0]
unsigned int n_features = shape[1]
np.ndarray[DOUBLE, ndim=1, mode="c"] norms

np.npy_intp i, j
double sum_

norms = np.zeros(n_samples, dtype=np.float64)

for i in range(n_samples):
sum_ = 0.0
for j in range(X_indptr[i], X_indptr[i + 1]):
sum_ += X_data[j] * X_data[j]
norms[i] = sum_

return norms

Finally, to verify our enhancement, here is the result of memory profiling if we pass a scipy.sparse.csr_matrix with np.float32 element into our enhamced function:

Cool! As what figure shows, our function no longer copy the data anymore.

## Summary

All of the functions in sparsefuncs_fast.pyx now support Cython fused types! Great thanks to all of the reviewers and their useful opinions.

In the next few weeks, my goal is to work on clustering algorithms such as KMeans in scikit-learn so as to make it also support Cython fused types.

### tushar-rishav (coala)

#### coala-html Beta

So I had been working on coala-html beta version since a few weeks.
The PR was certainly huge to be reviewed at once and soon became cumbersome to keep changes updated. But credits to my mentor - Attila and the constructive feedbacks from Abdeali, I could get it done the right way, making an appropriate and meaningful commits with a better code.

##### What is coala-html?

coala-html is a console application that runs coala analysis and generates an interactive webpage for the user.

##### How coala-html works?

coala-html, creates a webpage (Angular app) based on certain json files that are generated - first time when coala-html is run on a given repository, or updated - running coala-html again. By default, the generated webpage is served by launching a server at localhost and the json files are updated. User has an option to change this behaviour by providing the nolaunch and noupdate arguments respectively while running the coala-html. User can also provide an optional dir or directory path argument that will store the code for the webpage.
You may see a brief demo below:

Now as the basic functionalities are done, I am gonna work on improving the UI and writing more tests for having maximum coverage in coming weeks.

Stay tuned! :)

### meetshah1995 (MyHDL)

#### Invitation Game ? no please !

This week I had to get ahead with the pure python decoder implementation. I surely went ahead but it was more like a step ahead, two steps back , recursion more or less.

I had to tread through the never-ending RISC-V ISA specification manual to understand each instructions encoding and write tests to make sure my decoder is working as it should - correctly! .

Murphy said it righteously, if there were x mistakes one can make , I made them all implementing the decoder.

But nonetheless a very basic version of the decoder is ready , I have to still polish it and rigorously test it for correctness. I expect to do that by tomorrow.

So see you next week. Decoding isn't very inviting a game though folks :P .

Best,
MS.

### mkatsimpris (MyHDL)

#### Implementing the Color Space conversion module

In this week my responsibilities are to understand how the color space conversion module works and write a test unit for the module. However I moved a bit ahead. I implemented the module and the test unit in myhdl. Firstly, I read a lot of material to understand how the conversion from rgb to luminance and chrominance works. The equations which used in the conversion are: Y = (0.299*R)+(

### jbm950 (PyDy)

#### GSoC Week 1

The first week of the Google Summer of Code is now coming to an end and I feel like I’ve hit the ground running and made a great head start. Most of the week revolved around work with creating a way to benchmark KanesMethod and LagrangesMethod classes so that activities aimed at enchancing the speed performance of these classes can be tracked. I also worked on moving some code from the pydy repository to the sympy repository and made my first attempt at reviewing a pull request. Lastly I continued researching Featherstones Method of equation of motion generation and started digging into the structure of KanesMethod and LagrangesMethod as I work towards making a base equations of motion class.

The week started off by finishing the tkinter GUI and benchmarking code that I had started making from scratch during the community bonding period. I added the ability to filter the graphed results by test, python version and platform. This code was submitted to the SymPy repository in PR #11154. This PR has since been closed as Jason Moore pointed out that SymPy already has a benchmarking repository that is able to do basically what I was achieving with my code and a better solution would be to simply move my tests there. First I had to learn the airspeed velocity (ASV) package which is what the benchmarking repository uses to run it’s tests. After reading through the documentation of ASV’s homepage I altered my KanesMethod and LagrangesMethod tests to fit ASV’s formatting. This code was submitted to the sympy_benchmarks repository in PR #27. This code has since been merged though during the submission process Jason brought up that it would be a good idea to broaden the scope of testing for the equations of motion generators and mentioned a few example scripts to look through. My summary of reviewing those scripts can be found on the PR but basically some of the examples did not use additional methods but simply provided different inputs for testing equations of motion formation which is still useful.

Among the scripts to review was pydy.models.py which Jason pointed out would be useful if added to the SymPy repository as it would give additional code to benchmark and test. Some tasks that would need to be completed to achieve this migration were to remove all dependence of the code on pydy and come up with some test code which I worked on the back half of this week. Also I changed the location of the theta coordinate of models.py’s second function at Jason’s request. The submission of this code to the SymPy repository is in PR #11168 which at the time of this writing is awaiting the completion of the travis ci tests.

The last thing I did related to my project this week was continue to learn the math behind Roy Featherstone’s equations of motion algorithm. I finished reading through his short course on spatial vector algebra slides and their accompaning notes. Also I contined reading through A Beginners Guide to 6-D Vectors (Part 2). Lastly I began taking notes on KanesMethod and LagrangesMethod’s apis as I begin working towards creating an equations of motion generation base class.

I also made my first attempt at doing a PR review this week on PR #10650. This PR had very little code to look over and I made some suggestions on basic syntax choices. After he fixed the suggestions, however, I pinged members who deal with that code as I am not confident in my ability to assess whether the code is ok for merging or if the fix is necessary.

### Future Directions

Next week my plan is to jump more into figuring out the internals of the KanesMethod and LagrangesMethod classes which will most likely involve dynamics topics I am less familiar with. In addition I will keep making progress on learning Featherstone’s method of equations of motion generation. Thus it seems that next week will be focused more on theoretical learning and less on coding than this week was.

### PR’s and Issues Referenced in Post

• (Closed) Mechanics Benchmarking PR #11154
• (Merged) Added a test for KanesMethod and LagrangesMethod PR #27
• (Open) Fix matrix rank with complicated elements PR #10650
• (Open) Pydy models migration PR #11168

## May 26, 2016

### Shridhar Mishra (italian mars society)

#### GSoc 2016 IMS- Community Bonding

I shall be using the same blog that i created for Gsoc 2015.

Huh! So its good to be back with a new project! Kudos to Italian Mars Society. I did some work before the actual coding begun and got to know the new members of my team as well.

Here's the progress till now.
• Successfully imported and ran all the test programs and Kinect works fine.
• Stored all the readings  in json format on my Windows machine for the joint movements.
Further I have to send them to Linux virtual machine using vito's code.

On the unity end I have been using a plugin that tracks joint movements for trial. There is also a paid plugin that uses Kinect out of the box. A decision has to be made after consulting my mentor for the usage of the paid plugin.

Things are now gonna be a bit slow since my exams are from 30th May to June 12th. At this time Karan doesn't have exams so I can pick up from his work on Kinect if at all they co incide. I can work with my full capacity after 12th.

Cheers!
Shridhar

## May 25, 2016

### Riddhish Bhalodia (dipy)

#### What am I doing wrong?

This week was pretty much ups and downs. I have gone through the code and the LocalPCA paper for about 100 times, and tried out all different things there to try, no avail! I am still missing something. So I thought to do a walkthrough of the algorithm and code simultaneously one step at a time.

Update: This walkthrough helped, I picked on few mistakes and corrected them, I have put the corrected results after the walkthrough.

So here we go (This one may go pretty long!)

### [1] Read Data

Read the dMRI data and store the necessary parameters. Following are the lines of the code

fetch_sherbrooke_3shell()
img, gtab = read_sherbrooke_3shell()

data = img.get_data()
affine = img.get_affine()

### [2] Noise Estimation

This is separate section and will divide this into further parts

#### (2.a) Get Number of b=0 Images

This is pretty easy once we use the gtab object, this was described in the previous blog post as well hence not repeating the same here.

#### (2.b) Create matrix X for PCA decomposition

Once we decided to go for SIBE or MUBE we need to create a mean normalised matrix X of size NxK, N = number of voxels in one image, and K = number of images. We will be using the PCA trick for faster computation.

data0 = data[...,gtab.b0s_mask]    # MUBE Noise Estimate
# Form their matrix
X = data0.reshape(data0.shape[0] * data0.shape[1] * data0.shape[2], data0.shape[3])
# X = NxK
# Now to subtract the mean
M = np.mean(X,axis = 0)
X = X - M

#### (2.c) Create covariance matrix C = XTX and perform its EVD

C is of size KxK and then we will use it for PCA trick

C = np.transpose(X).dot(X)
[d,W] = np.linalg.eigh(C)
# d = eigenvalue vector in increasing order, W = normalised eigenvectors

#### (2.d) Get the eigenvectors of XXT

We actually want the eigenvectors of the XXT but for computational efficiency we computed that of XTX, now we need to get the actual ones (this is the PCA trick)

V = X.dot(W)

#### (2.e) Find the eigenvector corresponding to the smallest positive eigenvalue

The noise component is in the eigenvector corresponding to the lowest positive eigenvalue, so we choose that and reshape it to the original image dimensions

d[d < 0] = 0;
d_new = d[d != 0]
# As the eigenvalues are sorted in increasing order
I = V[:,d.shape[0]-d_new.shape[0]].reshape(data.shape[0],data.shape[1],data.shape[2])

#### (2.f) Computation of local mean of data and local noise variance

We compute the noise field by computing local variance of 3x3x3 patches of the lowest positive eigenvector image. And local mean of data is need for sigma correction in the next step. I is the image obtained in the previous step.

for i in range(1,I.shape[0] - 1):
for j in range(1, I.shape[1] - 1):
for k in range(1, I.shape[2] - 1):
temp = I[i-1:i+2, j-1:j+2, k-1:k+2]
temp = (temp - np.mean(temp)) * (temp - np.mean(temp))
sigma[i-1:i+2, j-1:j+2, k-1:k+2] += temp
temp = data[i-1:i+2, j-1:j+2, k-1:k+2, :]
mean[i-1:i+2, j-1:j+2, k-1:k+2, :] +=  np.mean(np.mean(
np.mean(temp,axis = 0),axis=0),axis=0)
count[i-1:i+2, j-1:j+2, k-1:k+2,:] += 1

sigma = sigma / count[...,0]
mean = mean / count

#### (2.f) SNR based sigma correction

This was also described in the previous blogpost, however the issue of not getting a satisfactory noise field after rescaling is solved. (See the corrected figure below)

# find the SNR and make the correction
# SNR Correction
for l in range(data.shape[3]):
snr = mean[...,l] / np.sqrt(sigma)
eta = 2 + snr**2 - (np.pi / 8) * np.exp(-0.5 * (snr**2)) * ((2 + snr**2) *
sp.special.iv(0, 0.25 *(snr**2)) + (snr**2) * sp.special.iv(1,0.25 *(snr**2)))**2
sigma_corr[...,l] = sigma / eta

#### (2.g) Regularise using an LPF

sigma_corrr = ndimage.gaussian_filter(sigma_corr,3)

### [3] Local PCA

Now we have obtained the sigma, so now we proceed to the local PCA part.

#### (3.a) For each voxel choose a patch around it

for k in range(patch_radius, arr.shape[2] - patch_radius , 1):
for j in range(patch_radius, arr.shape[1] - patch_radius , 1):
for i in range(patch_radius, arr.shape[0] - patch_radius , 1):
X = np.zeros((arr.shape[3], patch_size * patch_size * patch_size))
M = np.zeros(arr.shape[3])
# X = PCA matrix, M = mean container
temp = arr[i - patch_radius : i + patch_radius + 1,
j - patch_radius : j + patch_radius + 1,
k - patch_radius : k + patch_radius + 1,:]
temp = temp.reshape(patch_size * patch_size * patch_size, arr.shape[3])

#### (3.b) Construct matrix X for PCA

We need to construct a matrix X of the size NxK, where N = number of voxels in the patch and K = number of directions.

X = temp.reshape(patch_size * patch_size * patch_size, arr.shape[3])
# compute the mean and normalise
M = np.mean(X,axis=1)
X = X - np.array([M,]*X.shape[1],dtype=np.float64).transpose()

#### (3.c) Construct covariance matrix and perform it’s EVD

The covariance matrix will be C = XTX

C = np.transpose(X).dot(X)
C = C/arr.shape[3]
# compute EVD of the covariance matrix of X get the matrices W and D
[d,W] = np.linalg.eigh(C)

#### (3.d) Threshold the eigenvalue matrix and get estimated X

# tou = 2.3 * 2.3 * sigma
d[d < tou[i][j][k][:]] = 0
D_hat = np.diag(d)
X_est = X.dot(D_hat)

#### (3.e) Recover patch and update matrices for over-complete averaging X

We have to recover the patch from the estimated X, then we have to compute theta of the patch and update the matrices for overcomplete averaging.

temp = X_est + np.array([M,]*X_est.shape[1], dtype = np.float64).transpose()
temp = temp.reshape(patch_size, patch_size, patch_size, arr.shape[3])
# update the theta matrix
# update the estimate matrix(thetax) which is X_est * theta

theta[i - patch_radius : i + patch_radius + 1,j - patch_radius : j + patch_radius + 1,
k - patch_radius : k + patch_radius + 1 ,:] = theta[i - patch_radius : i + patch_radius + 1,
j - patch_radius : j + patch_radius + 1,k - patch_radius : k + patch_radius + 1 ,:] +
1/(1 + np.linalg.norm(d,ord=0))

thetax[i - patch_radius : i + patch_radius + 1,j - patch_radius : j + patch_radius + 1,
k - patch_radius : k + patch_radius + 1 ,:] = thetax[i - patch_radius : i + patch_radius + 1,
j - patch_radius : j + patch_radius + 1,k - patch_radius : k + patch_radius + 1 ,:] +
temp / (1 + np.linalg.norm(d,ord=0))

#### (3.f) Get the denoised output

denoised_arr = thetax / theta

### [4] Rician Adaptation

This is done to correct the bias introduced by the Rician noise in dMR images. This is done by creating a look up table (between y and phi) of the expression given below, where phi = (bias corrected value/ sigma) and y = (uncorrected denoised value/ sigma).

However the look up table for such a large data is very computationally expensive, this is the current implementation.

# eta_phi = LUT, and as the lookup table was generated by keeping phi values as
# phi = linspace(0,15,1000) and y is the output of expression
# we need to find the index of the closest value of arr/sigma from the dataset
corrected_arr = np.zeros_like(denoised_arr)
y = denoised_arr / np.sqrt(sigma)
opt_diff = np.abs(y - eta_phi[0])
for i in range(eta_phi.size):
if(i!=0):
new_diff = np.abs(y - eta_phi[i])
corrected_arr[new_diff < opt_diff] = i
opt_diff[new_diff<opt_diff] = new_diff[new_diff<opt_diff]
# we can recover the true value from the index
corrected_arr = np.sqrt(sigma) * corrected_arr * 15.0/1000.0

This concludes the paper and the implementation I have done. All the codes can be found in in the following PR.
LocalPCA function
Noise Estimation Function
LocalPCA example (currently not using function for easier debugging)
Noise estimation example (currently not using function for easier debugging)

This week I understood the rician adaptation described in [1] and implemented it. I generate the LUT in the localPCA function itself. The problem is the time, as we need to compare every element of the denoised array which is very big in size (189726720 for sherbrooke data) with the lookup table and find the nearest value to it. This is very computationally expensive task, and hence I need to find a better way to use the LUT.

## Updated Results

The code walkthrough helped me get few things corrected and so I will be posting few of the updated results

1.  The Noise Estimation

This matches reasonably (in visual context) with that in the paper.

2.   Local PCA Denoising

The images are still overly smooth and the time to denoise one dataset is huge. (~ 1.5 hrs for (128,128,60,193) dataset). Computing the covariance matrix takes the most time in local PCA and will have to work on speeding that up. Also probably will have to look closely at the implementation to figure out why it is overly smooth.

## Things left to be done …

1. Debug
2. Improve for computational efficiency
3. HPC dataset test
4. Validation

## References

[1] Multiresolution Non-Local Means Filter for 3D MR Image Denoising
Pierrick Coupe, Jose Manjon, Montserrat Robles, Louis Collins. Adaptive .
IET Image Processing, Institution of Engineering and Technology, 2011. <hal-00645538>

### fiona (MDAnalysis)

#### What is this MD thing anyway?

With the GoSC coding period now started, here’s that background and for my project I promised last post! I’ve tried to start from the (relative) basics so the context of what I’m trying achieve with my project is more clear; if you already know how Molecular Dynamics and Umbrella Sampling work (or you’re just not that interested), feel free to skip ahead to the overview of my GoSC project or projected timeline.

### So what is MD?

Experimental approaches to biochemistry can tell us a lot about the function and interactions of biologically relevant molecules on larger scales, but sometimes we want to know the exact details – the atomic scale interactions – that drive these processes. You can imagine it’s quite difficult experimentally to see and follow what’s going on at this scale!

This is where Molecular Dynamics, or MD, comes in. What we do is use a computer to model our system – say, a particular protein and its surrounding environment (usually water and ions) – on this scale and simulate how it behaves. To make our simulations feasible, we make some approximations when considering how the atoms interact: in general, they are simply modelled as a spheres connected by springs to represent molecular bonds, with additional forces to represent e.g. charge interactions.

The set of parameters that describe these interactions form a ‘force field’ and are chosen to best replicate various experimental and other results. Using this force field, we can add up the total force on each atom due to the relative location of the remaining atoms. We make the approximation that this force is approximately constant over the next small length of time dt, which allows us (given also each particle’s velocity) to calculate how far and in which direction they’ll move over that time period. We move all our particles appropriately, and repeat the process.

The force isn’t really constant, so for our simulations to be accurate this dt needs to be very small, typically on the order of a femtosecond (a millionth of a billionth of a second!), and we need to iterate many, many times before we approach the nanosecond or greater timescales relevant for the events we’re interested in.

It would not be feasible to keep a record of every single position we calculate for every single timestep. Even when our output record of the simulation – the trajectory – is updated at a much lower frequency, a typical simulation may still end up with coordinates (and possibly velocities and forces) for each of tens of thousands of atoms in each of thousands of timesteps – it’s quite easy to get buried in all that data!

This is where tools like MDAnalysis help out – by making it easier to read through and manipulate these trajectories, isolating atom selections of interest and performing analysis over the recorded timesteps (known as frames).

If you want to know more about Molecular Dynamics, particularly the more technical aspects, you could head over to the Wikipedia page or read one of the many review articles, such as this one by Adcock and McCammon.

### But what do umbrellas have to do with anything?

Sometimes we want to know more about our system than ‘molecule A and molecule B spend a lot of time together’ – we want to quantify how much molecule A likes molecule B, by calculating a free energy for the interaction. The lower the free energy of an interaction or particular conformation of our system, the more favoured it is. We can also talk about free energy landscapes which show how the free energy of our system changes as we vary part of it, for example bringing two molecules together.

Say you want to figure out which toys your cat likes the best. You can wait and see where he spends his time in relation to them, and if you watch long enough you can make a probability distribution of how likely you are to find your cat at any place at a point in time. This probability will correlate with how much he favours each toy, and so we can calculate from the probability distribution the free energy of cat-toy interactions.

The problem is, you might be watching for a long time.

The same is true in biochemistry. If a particular interaction is very favourable, we’re not likely to see the molecules separate within the timeframe of our MD simulation: without some sort of reference state, we can’t calculate our free energy, and we never see any less-favourable states that might exist.

Instead, what we do it put out cat on a lead to force him to stay near various locations. There’s still a bit of give in the lead, so for each location we restrain him, he’ll still shift towards the things he likes, and away from the things he doesn’t.

Combining the probability distribution from each these we get a much more complete (though biased) probability distribution than before. These can be unbiased to get our free energy landscape, and from this we can determine the most favoured toy: it’ll be the one corresponding to the lowest free energy.

(Naturally, it’s the random box.)

This is more or less the Umbrella Sampling (US) method. To be a little more technical, we determine the free energy between two states (such as two molecules interacting or separate; a ‘binding’ free energy) by first picking a reaction coordinate that will allow us to move from one state to the other (e.g. distance between the two molecules). We then perform a series of MD simulations (or windows) with the system restrained at various points along the reaction coordinate by adding an additional term to the force field, usually a harmonic potential - the shape of which gives the name ‘umbrella’ sampling.

The probability distributions are calculated by recordng the value of the reaction coordinate (or, equivilantly, force due to the harmonic potential) during each window, and unbiased and combined using an algorithm such as WHAM (I’ll be talking about this in more detail in the future!). The free energy landscape along the reaction coordinate we get out is called a Potential of Mean Force (PMF) profile.

Again, if you’re interested in reading more, you could try the Wikipedia page, the original US article by Torrie and Valleau or the read up on WHAM.

### So what am I doing for GoSC?

Currently, MDAnalysis doesn’t have tools for dealing particularly with US simulations, both in terms of specific analysis (including WHAM) and handling the trajectories and associated extra data from each window. I’m hoping to make this a reality.

My project is currently divided into three main parts. Below is a brief overview for now; I’ll talk more about each as I encounter them and form a clearer picture of how they’ll play out. Each is associated with an issue on the MDAnalysis GitHub page, linked below, so you can pop over there for more discussion there.

• Adding auxiliary files to trajectories. Usually when performing US simulations, the value of the reaction coordinate/force is recorded at a higher frequency than the full set of coordinates stored in the trajectory. Being able to add this or other auxiliary data to a trajectory, and dealing getting the data aligned when they’re saved at different frequencies, will be generally useful to MDAnalysis (not just for US). See the issue on GitHub here.

• An implementation of WHAM, perhaps a wrapper for an existing implementation, so WHAM analysis can be performed from within MDAnalysis. See the issue here

• An umbrella class, to deal with the trajectories for each window, loading the auxiliary reaction coordinate/force data, and storing other relevant information about the simulations such as the restraining potentials used; passing the relevant information on to run WHAM and allowing other useful analysis, such as calculating sampling or averages of other properties of the system in each window or as a function of the reaction coordinate. Read the issue here.

### Timeline

Here’s my current projected timeline for dealing with these three tasks:

This is likely to change as I encounter new ideas and issues along the way, but hopefully by mid-August I’ll have something nice for dealing with US simulations in MDAnalysis to show off!

Phew, well that ended up being a monster of a post - worry not, I don’t expect any posts for the immediate future to be this long!

I’ll be back soon, with more on the adding-auxiliary-data part of this project and how my first week has gone. I’m currently experimenting with the design on this blog, and writing an ‘About’ page and a proper ‘Home’ page, so watch out for those, and see you again soon!

## May 23, 2016

### srivatsan_r (MyHDL)

#### myhdl.traceSignals()

There is a small problem while using traceSignals() function in MyHDL to trace the waveform and generate a vcd file.

For example-

Here the test_1() function returns the instances which needs to be traced when simulating. An instance of clock_driver() is created and returned.

When you try to run this, this gives an error

myhdl.ExtractHierarchyError: Inconsistent hierarchy – are all instances returned ?

I was struggling with this error for hours and then found out a workaround for this.

The problem can be overcome by storing the instance of clock_driver in a variable and then returning that variable inside test_1() function.

The working code –

I don’t know why the later works while the previous one isn’t.

From myhdl 1.0 (which is yet to be released) onwards a new API is to be used for tracing signals, running simulation etc.

The new API uses @myhdl.block decorator.

The new code –

By using this decorator provided from myhdl 1.0 both the above described methods work fine.

### Ranveer Aggarwal (dipy)

#### #0 Dot Init

This past month, I have been working hard to get familiar with (my mentors harder to get me familiar with) one of the largest visualisation engines available on the web today. VTK, while being very powerful, is a tad low on documentation, and that has resulted in a slower start than I expected. This week I worked on a small UI element.

A button (grey) and a cube (red)

### VTK

Here’s how VTK’s visualisation pipeline looks like:

Image Courtesy: Visualization Toolkit ( VTK ) Tutorial

Here are the resources I am using:

### Creating a button

My first task is to create the most basic visual interface element one can think of - a button. Since we need more control over how the UI should look like, we’re going to have to do something more than using the built-in button widget.
Currently, I’m using an ImageActor. But since I need to create an overlay, I’ll be needing a 2D actor instead of a 3D one. That’s what I’ll be doing next. Here’s how it looks like currently:

A button click moves a cube

I’m maintaining the code here.

### Next Steps

Next up, I’ll be getting this button to work as a 2D overlay and also working on its positioning. Furthermore, I’ll be working on 3D sprites that always face the camera.

### tushar-rishav (coala)

#### PyAutoGUI

Recently, I came across PyAutoGUI, a cross platform Graphical User Interface automation Python module. The module allows us to programmatically control the mouse and keyboard. That means we can write scripts to automate the tasks that involved mouse movements/clicks or inputs from keyboard. To understand better let’s write a simple script that would draw a Symbol of Peace for us. If you don’t have any paint tool then you may try online for free at SumoPaint.

So before our script executes, we will have Brush tool selected. We could handle the selection of brush tool but it all depends on the position of the brush tool in Paint and it differs for various Paint softwares.
So let’s get started.

Importing required modules. Nothing cool here.

Ideally, we would want to have control over our automation script even in situations when things go wrong. We could ask script to wait after every function call, giving us a short window to take control of the mouse and keyboard if something goes wrong.
This pause after each function call can be implemented by setting a certain numeric value to PAUSE constant in pyautogui module.

We may also want to add an initial delay to let user select an appropriate paint tool.

Screen size can be obtained using size method. If you observer carefully, Symbol of Peace is a big circle enclosing an inverted Y. Circular path can be traced using parametric equation. Let us assume screen center as circle center.

Mouse clicks can be implemented using pyautogui.click method. A mouse click is a combination of the two events:

• Pressing the button.
• Releasing the button.

Both combined makes one click. pyautogui.click takes x and y coordinates of the region to click upon. If these params are not passed then a click is performed at the current mouse position.

Let’s implement mouse click to focus on the paint region.

Apart from a click, we can also drag mouse cursor. PyAutoGUI provides the pyautogui.dragTo() and pyautogui.dragRel() functions to drag the mouse cursor to a new location or a location relative to its current one. dragTo takes x and y coordinate of the final position and dragRel takes x and y coordinates and interprets it relative to the current position.
The origin lies at top-left corner of screen and the x-coordinates increase going to the right, and the y-coordinates increase going down. All coordinates are positive integers; there are no negative coordinates.

Now next few lines would create a circular path with enclosed inverted Y. The idea is to use parametric equation of circle and keep incrementing the angle until one complete revolution or 2*PI angle has been swept.

Combining all together

You may try and run this script after selecting a brush tool.
Here is a Demo

Please note that this was only a brief introduction to GUI automation using Python. PyAutoGUI also provides a bunch of other functions like to perform hotkeys operations (Ctrl + c for copy) etc.
If you find this module interesting you should check out its documentation.

Cheers!

### sahmed95 (dipy)

#### GSoC 2016 begins with Dipy under Python Software Foundation !

Hi, I am excited to announce that the proposal to implement IVIM techniques in Dipy as part of Google Summer of Code 2016 has been accepted and I will be working on it over the summer under the guidance of Ariel Rokem, Eric Peterson and Rafael NH (who was a GSoC 2015 student working on DKI implementation in Dipy). Dipy is a python library for analysis of diffusion­ weighted MRI (dMRI).

Diffusion patterns can reveal microscopic details about tissue architecture and is used in clinical as well as neuroscience research. The intra­voxel incoherent motion (IVIM) model describes diffusion and perfusion in the signal acquired with diffusion MRI. Recently the interest has expanded and applications have emerged throughout the body including kidneys, liver, and even the heart. Many more applications are now under investigation such as imaging for cancer (prostate, liver, kidney, pancreas, etc.) and human placenta. One of its largest uses is in brain mapping and neuroscience research such as Parkinson’s disease where it is used to study aging and structural degeneration of fibre pathways in the brain.

In the presence of magnetic field gradient pulses of a diffusion MRI sequence, the MRI signal gets attenuated due to motion, effects of both diffusion and perfusion. The IVIM model uniquely describes the diffusion and perfusion from data with multiple diffusion encoding sensitivities (b­values).

The IVIM model represents the attenuation in the signal acquired with diffusion MRI using the following equation given by Le Bihan [1988]

$\frac{S}{S_0} = f_\mathrm{IVIM} F_\text{perf} + (1- f_\mathrm{IVIM}) F_\text{diff} \,$

where is the volume fraction of incoherently flowing blood in the tissue, the signal attenuation from the IVIM effect and is the signal attenuation from molecular diffusion in the tissue.

I will be writing a new module in Dipy to calculate the various parameters of this model using this equation and hence obtain separate images for diffusion and perfusion. You can find the complete proposal and details here - IVIM in Dipy.

Le Bihan's original paper : http://www.ncbi.nlm.nih.gov/pubmed/3393671

### meetshah1995 (MyHDL)

#### Community.bond()

Community bonding period is over and now comes the real fun part - coding :D.

To summarize my community bonding period , I completed the following exercises :

• Get familiar with the inherent decorators and generator structure of myHDL.
• Learn to model sequential and combinational logic hardware components in myHDL.
• Learnt to read Chisel and be able to interpret model from code which maybe needed in further parts of the RISC-V CPU design.
• Read and understand the RISC-V and its instruction set model.
• Search , understand and assimilate existing RISC-V implementations and choose one to focus on.
• Get started with the decoder implementation.
Other than these exercises , I also got to know more about my mentors and had fruitful discussions on the road ahead.

See you next week !

MS

## Summer trip

So as mentioned in my previous post, I will be meeting some of my mentors and fellow GSoC interns at Europython. For those who haven't heard about Europython, it is a conference revolving around the python programming language (wow you did not expect that did you?). To be honest I don't know much myself since this will be the first conference of this kind that I am attending.

## Europython = <3

This summer trip is going to take place between 17 - 25 July. From a student's perspective, covering the expenses for such a trip is not trivial. In my case these expenses include:

• Plane tickets
• Accommodation
• Conference ticket
• On site expenses

Here I would like to thank the people at Europython for sponsoring us by giving us free tickets. It means a lot (of money)1 to us. With that all being said, I have already booked plane tickets and planned accommodation with my fellow coalanians23.

## Coding starts

My post marks the end of the community bonding period. Stuff is about to get real now. It is 23 May 2 a.m. so let's get coding.

1. This is a joke. But it is true

2. People the are part of the coala community

3. I just discovered these footnotes and I am very happy

### Ravi Jain (MyHDL)

#### GSOC 2016 starts off!

Yay Readers! Glad to inform that I have been selected in the program Google Summer of Code 2016. I will be working with the sub-org MyHDL which comes under the umbrella of Python Software Foundation(PSF). The project proposal is to design and test Gigabit Ethernet Media Access Controller (GEMAC) core in MyHDL using Python. The GEMAC is a communication core commonly used for streaming data coming from Transmitter and Receiver client interfaces to the Field Programmable Gate Arrays (FPGA). The purpose of the project is to create a design that can be easily used and understood by any endpoint user. This project proves the capability of MyHDL to create user friendly versions of hardware system designs. This project is important because it demonstrates the use of advanced software technologies applied to hardware design. If you wish you can go ahead and check my complete project proposal.

I should use this space for status updates of my project. The blog is intended to be addressed to the beginners in the field of FPGA programming so they can learn alongside me and to my mentors for evaluation purposes. So this post comes at the time when the bonding period (April 23, 2016 – May 22, 2016) has just ended, during which we were supposed to, using help of our mentors, get all prepped up for the upcoming intense 3 months of coding the summer out.

UPDATE TILL NOW:

I have setup my Windows 7 PC with the following.
– Python 3.5.1
– MyHDL version 1.0-dev (Install guide)
– Eclipse IDE for Python
– Xilinx ISE Webpack for Verilog / VHDL

I will be using Xilinx Spartan 3E Starter Kit for hardware testing and verification. So during the bonding period I got my hands on the kit and implemented basic logic gates using Verilog in Xilinx ISE to just familiarise myself with the basic hardware.

After that I started looking for some resources to get little insight into GEMAC core. I found Xilinx 1-Gigabit Ethernet MAC DS200 datasheet which defines the modules and interfaces of the core implemented by Xilinx. Moreover I used the reference HDL (test_gemac) from GnuRadio USRP Project referred by Christopher Felton, one of my mentors for the GSoC programme for getting help towards understanding the implementation of cores and I shall use this further during the coding period to continuosly compare the core which shall be developed by me in MyHDL. This online Verilog tutorial helped a lot to get familiar with basic syntax of Verilog while going through the reference code.

FURTHER:

So, the coding period starts today and now we are supposed to give it all in and code.
For the next week I plan to define the interfaces and transactors and start developing the basic modular structure of top level sub-modules using the test_gemac repo and Xilinx_DS200 datasheet.

RSS Feed for subscribing: https://ravijain056.wordpress.com/category/myhdl/feed/

### Prayash Mohapatra (Tryton)

#### Hello Fellow Developers around the World

I spent the time fixing another easy issues. By now I am pretty confident about navigating through the codebase for fixing the bugs (grep -rl to the rescue!). As I have already posted, my project is about Porting the CSV Import/Export module from GTK Client (Tryton) to the Web Client (SAO). I am trying to make sense of problems faced by people when they post on mailing lists and mentors always there replying. Then again it feels good that real people whom I may have never known are using a project where I will be contributing.

Honestly, I could have spent more time in community bonding period. Just waiting for my semester exams to end.

### mkatsimpris (MyHDL)

#### GSoC Community Bonding Period

Good news! My proposal got accepted from Python Software Foundation for GSoC 2016. I will be working on MyHDL sub-organization writing code from 23 May till 23 August. The main goal of the proposed project is to create the frontend part of a JPEG encoder which will be implemented in MyHDL. The frontend part consists of the color-space converter, the 2D DCT, the top-level FSM, and the input

## Life is never simple

For the last week of the community bonding period, I was finishing off the animation module for KivEnt. I added a JSON parser inside AnimationManager to read and store animation lists stored on disk in JSON format. Making this module was fairly simple. I had to use python’s json library to get a dict from the JSON file for reading, and convert a dict to a JSON string to store to a file. You must be thinking this would have gotten finished pretty easy, adding two functions in the AnimationManager and making a simple test JSON file in the twinkling stars example accompanying this module. But alas, things are never so simple in life. I mean, NEVER!

Every python user, if they have worked with python 2, will have faced this infamous error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in \
position 0: ordinal not in range(128)

What this essentially means is that the python string we have has a byte of value 0x80 somewhere and our poor old ASCII. encoding, in which python strings are encoded, has no character for that value. The JSON parser module very smartly returns Unicode encoded strings, which is a character encoding different from ASCII. But the animation names in my module were beign stored as python strings, which are encoded in ASCII, and hence the error.

The JSON parser has to do this because the JSON files would almost always be in UTF-8 format which follows the Unicode encoding. Unicode allows support for a lot more characters than ASCII (The Euro symbol for example), hence is more widely used. Let me explain in detail what encodings are.

## Encodings - Why do they exist?

Computers, being the stupid chips of silicon they are, do not understand text as we do. They love numbers, binary numbers, and hence store, interpret and communicate all kinds of data in that format. So humans need to devise a system to convert that binary data to an understandable format, which is the desired symbol of the alphabet in the case of text data. A very simple example is ASCII, which is basically a table mapping a 7-digit binary no. to an English letter or symbol.

This is still quite popular and in use (Python 2 uses this encoding for the str data type, as we saw above). ASCII originally was and still is only a 128-symbol encoding (7 bits) but there are a lot of extended versions to make use of the remaining 128 numbers left unassigned. So you have accented characters, currency signs and what not in those extended encodings (ISO Latin-1 is one such example). The problem is that there is no global standard which is followed for these extended encodings. So a file containing accented characters written in one standard, might be rendered as something else entirely on some other computer which uses some other standard.

That’s the problem python had with the number 128 (or 0x80) as one of the characters in the string. ASCII, encoding used by python 2, has no symbol mapped to that number. Or any number above 127.

The number 128 stands for the Euro sign(€) in the Latin-1 encoding. What if some European programmer wanted to print out the Euro sign in one of his very useful programs which calculates your monthly expenditure. Or what if a Japanese programmer wanted to output logs in Japanese and not English? ASCII doesn’t have characters for that!

Japanese people came up with their own standards to map japanese symbols, ended up with 4 different such standards, with no way to interchange between them and made a mess of the ‘Information Interchange’ problem ASCII had tried to solve. We need a standard which:

• Has support for all possible symbols and languages in use
• Is followed world-wide
• Can represent any symbol as 8-bit bytes (called octets in the Unicode specification)

## Enter Unicode

Unicode is like a mega mega version of ascii. In the simplest sense, what it does is assign a number to every possible symbol that you would ever want to display on a screen. These numbers are decided by the Unicode Consortium, a group of large tech companies who are concerened about text encodings. From Unicode’s Wikipedia entry:

The Unicode Standard, the latest version of Unicode contains a repertoire of more than 120,000 characters covering 129 modern and historic scripts, as well as multiple symbol sets.

Okay so our first 2 criterias are satisfied. We just need to convert these numbers into an octet format. We could just convert each number to binary, and split it into how many ever octets are required. Unicode is able to assign symbols to numbers upto 0x10FFFF which is 3 octets. This is a very wasteful way to convert Unicode to octets, because most of English text files would use 24 bits for each character when one only requires 7 bits. With each character we have 17 redundant bits. This is bad for storage and data-transfer bandwidths.

Most of the internet today uses a standard called ‘UTF-8’ for this conversion. It is an ingenious way which enables one to represent all of Unicode’s code points (numbers mappable to a symbol).

It can be defined using these very simple rules:

• A byte starting with 0 is a single byte character (0XXXXXXX).
• A byte starting with 110 denotes the starting of a two byte character (110XXXXX 10XXXXXX)
• A byte starting with 10 is a continuation byte to a starting byte for a character.
• 1110XXXX and 11110XXX denote starting bytes for 3 and 4 byte long characters. They will be followed by the required continuation bytes.
• The character’s code point is determined by concatenating all the Xes and reading them as a single binary number.

It becomes immediately obvious by the single byte characters that such an encoding supports ASCII and represents it by default as a single byte. Any valid ASCII file is a valid UTF-8 file.

For an example lets try decoding an emoji character:

0xF0 0x9F 0x98 0x81

In binary:

11110000 10011111 10011000 10000001

Look how the first byte will tell its a four byte character, so three continuation bits are to be expected. Removing the UTF-8 header bits (highlighted ones) and concatenating the remaining bits:

000 011111 011000 000001 = 0x1f601

Looking up U-1f601 in the Unicode table we find that it is the grin emoji :D!

Most of the web has moved to using UTF-8. The problem with the python 2 data type str is that we need to explicitely declare it to be the unicode data type if we want support for special characters. And you need to be extra careful using one in place of the other, in which case you need to convert them. Else python throws you that headache of an error because it can’t decode a few characters in ascii.

But because UTF-8 is backwards compatible with ASCII, python 3 uses UTF-8 encoded Unicode as its standard string encoding. I wonder why we haven’t all dumped python 2 in some history archive and moved to python 3 yet :P! I have had too much of UnicodeDecode errors!

## Credits

This awesome video by Computerphile <3: https://www.youtube.com/watch?v=MijmeoH9LT4

#### Other News

I’m anxious for the start of the coding period for GSoC! And I also just launched my club’s new website: https://stab-iitb.org/electronics-club/ which is getting awesome reviews from my freinds so I’m kinda happy and excited too!

I also recently read ‘The Martian’ by Andy Weir, after having watched the really great movie. The book is atleast 10 times more awesome than the movie! Its a must read for any hacker, tinkerer and automater!

### Anish Shah (Core Python)

#### GSoC'16: End of Community Bonding

Today is the last day of the community bonding period. I start working full-time on GitHub integration from tomorrow. :) I had planned to blog at the end of very week during community bonding period, but I couldn’t. I will definitely blog every weekend during the coding period. This blog post is about all the work that I have done in the last three weeks.

## Linking GitHub Pull Requests in issue comments

On GitHub, #<number> automatically links on an issue or a pull request. Likewise on Python issue tracker, strings like "#<number>" and "PEP <number>" get linked to an issue and PEP respectively. Since we are integrating GitHub into our issue tracker, we would want to automatically link strings like "PR <number>" to a GitHub Pull Request. This was a fairly easy task to do as I got familiar with the code while doing the first task. You can find the patch that I submitted here. It was an easy task as I just had to add a regex to match with strings like "PR <number>" and some tests for the same.

## Adding GitHub PR to a b.p.o issue

GitHub links a pull request to an issue if the title or comment has a string like "Fixes #123". Likewise, we want to automatically link a GitHub PR to an issue on Python Issue Tracker. For example, if a title/comment has a string like "Fixes bpo123", then the PR should be linked to issue 123 on Python Issue Tracker. This can be easily done using GitHub Webhooks. Webhooks allows us to subscribe to certain events on Github.com and when those events are triggered, Github sends a HTTP POST to a configured URL. GitHub Webhooks are pretty easy to setup and use. However, since I was not much familiar with the roundup codebase, I was not sure about the correct way to create an endpoint for the webhooks. Thanks to my mentor Maciej, I got a clear about the roundup codebase. At first, the code that I had written was very poor. But, I tried refactoring and I think the current code quality is much better than the previous one. But, it can still be improved a lot. I still need to add tests for this. You can find the current work here. After this is done, I also need to show and update PR status on the issue tracker.

Thanks for reading. I’m very excited for the next 12 weeks. Let me know if you have any questions about my project. :)

## May 21, 2016

### Avishkar Gupta (ScrapingHub)

#### Community Bonding

Hey there, this is the first in the series of posts aimed at documenting my experience as a GSoC 2016 student for regular reporting to the org, and for my own reference to have something to look back on in retrospect. For the first post, let me start with spouting off a little about myself, and then I’ll talk about how the experience has been to date.

I’m a(or rather was) a final year student of Computer Engineering at a college in New Delhi, and I’ve been a part of the GSoC program back in 2014, where I worked for MATE desktop. For the last year or so, I’ve been dabbling in data science and machine learning, and came to know of Scrapy when I used it for one such project early on to scrape opinions from an e-commerce site. Fast forward a year and casually browsing through PSF’s list of GSoC organizations I got to know that these guys were also participating and decided to give it a shot. Given how hectic the schedule was for me back in February, I would be lying if I said that the reason for my selection was me being some sort of a big-shot programmer who came in all guns blazing. I approached this as passively as one possibly could, it was due to the efforts of the ScrapingHub suborg admin Paul who showed interst in my proposal, gave me an interview and put me to work on a bug which also made me familiar with the actual inner workings of Scrapy that I was able to gain footing on the project.

My project for this summer is going to deal with re-factoring the Scrapy signaling API, in an effort to move away from the PyDispatcher library which would greatly enchance the performance of signals. Django moved away from PyDispatcher in 2001, and they reportedly observed an increase of upto 90% in efficiency. I intend to build off their work and assume we would see similar results in Scrapy.

The Scrapy community are a really active lot, and my “community bonding” started just a couple days into the announcement of me being selected for the project, which is good because I’ve had exams for the past couple of weeks or so and was AFK for a major part of them(of course informing my mentors about the same first). I had a video chat with my mentor Jakob where we figured out how reporting etc. would work for the summer, our next chat is scheduled for the 24^th, 25^th where we shall discuss how the actual implementation of the project will be and what plans I have for the same. I also finalised the work on a bug I was working on and had submitted in the form of a patch as a part of my proposal. The Scrapy community is really responsive, and I’m honored to be a part of it and to be working with all the people here. I hope my work is up to their standards at the end of this. Since there is not much Technical content to write about at this point, with the coding period having not yet started so we’ll keep this one short, I’ll bore you with the technical details in the next one. Thanks for reading, the next post will go up on Sunday the 28th. Signing off.

### tushar-rishav (coala)

#### EuroPython 2016

I am really pumped up for the forthcoming EuroPython conference being held in Bilbao, Spain from July 17-24.

### EuroPython in brief

The EuroPython conference series was initiated by the European Python community in 2002. It started in Charleroi, Belgium, which attracted over 200 attendees and have surpassed the 1000 attendee mark in 2014. It’s the second largest Python conference world-wide and the largest in Europe. If you are interested, you should buy the ticket as long as they last. :)

### Purpose of visit

I shall be conducting a session on a guide to make a real contribution to an open source project for novice and also be accompanied by the awesome coalains who will be attending this conference. We have a sprint scheduled too. :D

I am grateful to the Python Software Foundation and EuroPython for sponsoring the accommodation and ticket for the conference. Without such aid it wouldn’t be possible to meet the awesome community out there. I must also pay my gratitude to Lasse Schuirmann for encouraging and helping to make this participation actually happen. :)

Stay tuned!

### Aakash Rajpal (italian mars society)

#### 23rd is almost Here

23rd May, the Official Coding Period for My GSoC Project Starts. Hello Everyone, my name is Aakash Rajpal.My proposal was selected under Italian Mars Society which is a sub-org under the massive Python Software Foundation(PSF). My Project Titled:-ITALIAN MARS SOCIETY: IMPLEMENTATION OF AN INTERACTIVE HEADS UP DISPLAY FOR OCULUS RIFT FOR THE V-ERAS SIMULATION ENVIRONMENT. I have received the Oculus Rift DK2 from the IMS just a few days ago. I have been trying to set the Oculus Up ever since switching between Windows to Ubuntu trying to get the Oculus Set up. Finally,  I was able to get it running on Ubuntu. Now I can start the coding and begin my GSoc Experience.

#### Europython & GSoC start

So this weekend has 2 huge things. First of all, as most may know, this weekend announces the end of the Community Bonding period. This means that the coding session begins on Monday. However, this is not the only thing that happens. Two days ago I just found out that I’m going to be sponsored by Europython with a ticket there!

## What is Europython?

EuroPython was the first major Python programming language community conference ever organized by volunteers. It started 2002 in Charleroi, Belgium, which attracted over 200 attendees.

It now is the largest European Python conference (1200+ participants in 2014 and in 2015), the second largest Python conference world-wide and a meeting reference for all European programmers, students and companies interested in the Python programming language.

## When?

I have already bought tickets and I’m planning to go there this year. It will be held from 17 to 24 July and it’s probably gonna be one of the most amazing experiences in my life. I’m going there with the amazing coalaians that I’ve been working with on the coala project (see https://github.com/coala-analyzer/coala for more information about what coala is) including (hopefully) my mentor. I’m so hyped about the conferences, the talks, the workshops and especially the coding & drinking sessions I will be having alongside my co-stayers.

### Valera Likhosherstov (Statsmodels)

#### GSoC 2016 #0

Welcome to my blog! Feel free to leave comments or anything. I am a student from Russia, right now I am on my 3rd year at Ural Federal University studying Computer Science. Also, I am studying at the Yandex School of Data Analysis, which is a free Master's level program by Yandex, Russian search engine and the biggest IT-company in the country.
My interests lie in areas of Statistics, Machine Learning, also in Computer Vision and Natural Language Processing a little. I really love Mathematics, as well as I love its application, expressed within code.
My hobbies are listening to hip-hop music, working out in gym, visiting art galleries and reading novels by Dostoyevsky.
This year I am happy to take part in Google Summer of Code in Statsmodels under Python Foundation. It's a great opportunity, and I hope I will pass all deadlines successfully :)

## About my GSoC project

The goal is to implement a Python module to do inference for linear state space models with regime switching, i.e. with underlying parameters changing in time by markovian law. A device for that, called Kim Filter, is described in "State-space Models With Regime Switching" book by Chang-Jin Kim and Charles R. Nelson.
Kim Filter includes Kalman Filter as a phase, which makes my work much easier and motivates pure Python approach, because I can delegate all the heavy lifting to statsmodels' Cython module performing Kalman Filter routine.
The next step of my project is to implement well-tested econometric models with regime switching, including Markov switching autoregression, Dynamic Factor model with regime switching and Time varying parameter model with Markov-switching heteroscedasticity. You can find details and exact specification of models in my proposal.
To perform testing, I am going to use Gauss code examples, published on Professor Chang-Jin Kim's website.

## Setting up a development environment

To setup the environment, I followed advices of my mentor Chad Fulton, who helps me a lot with technical, coding, and field-specific issues. Probably, this would be helpful to anyone, who wants to contribute to statsmodels or any other Python library.
I am using Mac OS, so I performed the following steps:
1. Deleted all versions of Statsmodels in my site-packages directory.
2. Cloned the Statsmodels master repository to ~/projects/statsmodels (or anywhere else in your case).
3. Added ~/project/statsmodels to my PYTHONPATH environment variable (included line export PYTHONPATH=~/projects/statsmodels:$PYTHONPATH at the end of ~/.bash_profile). Now, any changes made to Python files are available when I restart the Python instance or use reload(module) command in the Python shell. If I pull any changes to Cython files, I recompile them with python setup.py build_ext -i in statsmodels folder. ## Running Gauss code Code examples, provided by "State-space Models With Regime Switching" authors, require Gauss language interpreter, which is not a free software. But there is an open-source Ox console, which can run Gauss code. But OxGauss doesn't support some Gauss functions by default, and you have to load analogous Ox language wrappers. In my case that was a function optmum, widely used in Kim-Nelson code samples, Ox Console developers provide M@ximize package for it. Another problem I spent some time to figure out is that M@ximize is incompatible with Ox Console 7, so I have to use 6th version, which works just fine. ## What's next? I will post regular reports about my work during the whole summer. I have already implemented a part of my project, so the next report is coming soon. But if interested, you already can see the code. Next time I will talk about design of Kim Filter and details of its implementation and testing. I'm sure you'll enjoy that! ### mike1808 (ScrapingHub) #### GSOC 2016: Hi # Who am I? Hi. My name is Mikael Manukyan (you can call me Michael or just Mike). I am a student from Armenia. Right now, I am in the last year of Russian Armenian University doing my master’s degree in Computer Science. Also, I have almost 3 years of experience in Web Development using Node.js and I am CTO of some local software development company. My fields of interest are Coding, Machine Learning, Neural Networks, Computer Vision and Natural Language Processing. I am a part of small team of highly motivated students who are passionate about Neural Networks. We are trying to find our place in the enormously fast growing field of Neural Networks. Our team is called YerevaNN. And our latest work is an implementation of Dynamic Memory Networks. I am happy to take part of this wonderful program which Google organize each year. A company for which I will work during the summer is scrapinghub and the project is Splash. # Why Splash? So, let’s explain to you why I chose this particular project. First for all, what is Splash? According to its documentation, Splash is a lightweight, scriptable headless browser with an HTTP API. It is used to: • properly render web pages that use JavaScript • interact with them • get detailed information about requests/responses initiated by a web page • apply Adblock Plus filters • take screenshots of the crawled websites as they are seen in a browser But it isn’t the reason why I chose it, the main reason is that Splash consists from: • Qt - for web page rendering • Lua - for interaction with web pages • Python - to glue everything together The variety of different programming languages and their interrelation is the main aspect why I thought: “This is a project on which I will love to work!”. Splash has a feature to interact with web pages using Lua scripts. Therefore, scripting is an experimental feature for now and it has a lack of necessary functions. And making Splash scripts more practical and useful is my main work for the summer. # What I will do for Splash? There are three main features/modules that I am going to add to Splash: 1. In Lua scripts ability to control the flow of commands execution (particularly, splash#go) using new API splash#wait_for_event 2. In Lua scripts ability to interact with DOM elements using new API splash#select 3. User plugins support ### splash#wait_for_event The current implementation of the splash#go method returns the control to the main program only when the page, which is currently loading, returns loadFinished signal. Signals are a part of Qt which are used for communication between various modules. For this particular case, signals are used to notify that page, e.g., has finished loading or some error has occurred during its load. The current behavior doesn’t allow to do something when the page hasn’t been fully loaded (e.g. there are some resources on the page that took very long time to load). I am going to add a new method which will allow to catch various type of signals and along with that a new parameter for splash#go event which will return the control back to the Lua translator right after its execution, without waiting for the page load. This method will allow to control the flow not only for splash#go but for the all other methods which depends on signals. ### splash#select Currently, in the Splash scripting there is no convenient way to click on the element, fill the input, scroll to the element, etc. This method will find the specified DOM element and return the instance of Element class. This class will manipulate with DOM elements in Lua. Adding new utility functions is the main part of my summer work. I decided that adding utility functions on the splash object (like splash#click) is not as good as adding utility functions to some class (like element = splash:select() and then element:click()). ### User plugins During my project exploration, I noticed on TODO comment. TODO: users should be able to expose their own plugins And I thought: “why not to add user plugins support?”. It is going to work the following way. If the user wants to add her own plugins, she specifies --plugins /path/to/plugins argument when starts Splash server. Plugins folder should contain two subfolders: lua and python. For Lua and Python files respectively. Lua folder is used to load custom Lua modules. Python folder contains a list of python classes. This classes should be inherited from SplashScriptingPluginObject class which will allow to user to load Lua modules. # What I did during Community Bonding Period? As I mentioned in the begging of the post I am graduating this year, so during Community Bonding Period I was working on my final exams preparation and my final project. Hence, I got very little time for GSOC. However, I did quite a big work exploring the inner structure for Splash before my GSOC acceptance. During my first PR I understood how the different parts of Splash are communicating with each other, how the tests are implemented and how the docs are written. Alongside, I tried to fix some active issues. Unfortunately, I didn’t manage to do it, however, when I tried to find the cause of bugs I dig very deep into Splash implementation. Also, I really thankful to my mentor who understood my current situation and allowed me to focus on my education. # What now? I think, this summer will be the productive one and I don’t miss any my deadlines. Wish to all GSOC students the same. ### Have fun and code :wink: ## May 20, 2016 ### srivatsan_r (MyHDL) #### Tilde Caution! Tilde (‘~’) is an unary operator in many programming languages used for inverting every bit of an integer. In MyHDL when you want to invert a single bit of an intbv class object, you can use both ‘~’ and ‘not’, while ‘not’ is safest for inverting a single bit. For example- a = intbv(10)[10:0] # 'a' now contains 10 represented with 10 bits Here, a single bit of ‘a’ can be accessed by using In [2]: a[2] # To access the 2nd bit which is 0 Out[2]: False Now to invert the bit, using ‘not’ will give you ‘True’, whereas using ‘~’ will give you ‘-1’ and not ‘True’. In [3]: not a[2] Out[3]: True In [4]: ~a[2] Out[4]: -1 This happens because a[2] returns ‘False’ and ‘False’ is treated as ‘0’. Zero is stored as a series of binary 0s in the memory and when we invert it it gives a series of binary 1s (‘11111…’ extends depending on the size of the data type) which corresponds to the binary representation of ‘-1’. So, using ‘not’ is the safest approach. However, ‘~’ can be used to invert a single bit with a different approach. In [5]: a[3:2] # The bit can be accessed like this Out[5]: intbv(1L) In [6]: not a[3:2] == ~a[3:2] # Now both give same result Out[6]: True So, if bits are accessed as a[b], then use ‘not’ operator to invert the bit. But, if bits are accessed by a[b+1:b] then both ‘~’ and ‘not’ can be used to invert it. ‘not’ cannot be used to invert a series of bits, as it will return only ‘False’, only ‘~’ can be used for that. ### liscju (Mercurial) #### Community Bonding - Part II(and the last one) In this week i have been working on the few issues, as well as preparing development environment. I prepared special repository on bitbucket to publish my commits there: https://bitbucket.org/liscju/hg-largefiles-gsoc It should make it easier for mentors/reviewers to see my current work and review it before being send to mercurial-devel mailing list. Second thing i did was to play a bit with amazon aws, later it will be used to test my work. Another thing i did was dealing with verify command for largefiles extension. So far it sent stat calls to remote server to get information if given file in store was valid. I changed it to send stat calls only for files that are not available locally, it reduces the round trip between server/client and also makes using verify without network connection possible. Patch for this is in review phase: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-May/084071.html Next thing i continue to work on was to make largefiles compatible with python3. This issue deals with removing cycle import between localstore and basestore. So far we(me and developers on mailing list) did not find clear solution how this should be divided, probably the best we can do now is to move basestore._openstore(which is cause of cycle) to new module, i dont know if this module will have any other functionalities beside keeping this method. My previous tries to resolve this issue can be found here: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-May/084224.html Another thing i started to work on was the problem with excessive password prompt when cloning repository with largefiles. User fills password information once and after downloading "normal" repository when largefiles want to download large files from server it asks for password once again. It is because the httppeer object connecting to the server is created separately for hg core and for largefiles extension. Each of those object has own version of passwordmanager remembering password. So far the best solution i found is to make passwordmanager singleton object which reuses password information for given url and user. I have no idea if this is good enough, so far i sent WorkInProgress patch to mailing list: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-May/084490.html From the next week the "proper" part of gsoc begins, i hope it will be great :P ### jbm950 (PyDy) #### Community Bonding Period The community bonding period is coming to a close and so I’d like to write about what I’ve done/learned during this time. I’ve had the opportunity to create my first blog, have my first meeting with my mentors, submit a couple of minor pull requests to pydy and sympy, add an example script to the pydy repository, begin learning about spatial vectors and begin work on some benchmarking code. This is my first attempt at blogging and I had some trouble initially setting it up. For starters I did not know what an RSS feed was or for what it was used. Also I wanted the blog to be hosted on github pages where I currently keep my collection of notes and thus I decided to try to use the jekyll static site backend that github uses. I tried, however, to isolate the blog in its own subfolder with the rest of my GSoC information but this caused all kinds of problems with posts showing up multiple times and the RSS feed not updating properly. I eventually decided to stop trying to separate the posts and just centrally locate them as is demonstrated in the jekyll documentation. For the RSS feed I used a template that I found online. Now the posts appeared properly and the RSS feed updated correctly. The last thing I wanted to do for my blog before I considered it officially set up was to have some method of allowing people to comment on my posts. I found a blog post online on how to achieve this without the need for anything other than what github pages offers and so I set out to try this method. I used the code shown on the blog post without any luck. Prior to this I have had zero experience working with javascript and so I didn’t even know where to begin to try to debug why the comments were not showing up and so I sent the writer of the blog post an email asking for his assistance. And he replied! He pointed out that I was missing the part where a java script library would be loaded for use on the page and once I added the couple of lines of code, commenting on my blog posts is now possible (At least I think that’s what the problem was but again I have no experience working with javascript). With the ability to comment added, my blog is completely set up and is connected to the correct channels for the Google Summer of Code. Early in the community bonding period I was able to have my first meeting with my mentors for my project. During this meeting it was discussed that I could change the later portion of my project from working on implementing a Newton Euler method of equations of motion generation to implementing the faster Featherstone method. Considering I had no great attachment to the Newton Euler method I agreed that the faster method would provide a greater benefit for the overall project. Since the meeting I have spent some time reading on the math involved in the Featherstone method, specifically spatial vectors and their uses in dynamics. To this end I have read A Beginners Guide to 6-D Vectors (Part 1) and started reading both A Beginners Guide to 6-D Vectors (Part 2) and Roy Featherstone’s short course on spatial vector algebra. I have also spent some time beginning to familiarize myself with the code that I will be working with. To begin I followed Jason Moore’s great suggestion of coding through one of the examples from my dynamics course and adding it to the pydy/examples folder in the pydy repository. The example I chose to use was a simple pendulum so that I could focus on the code rather than the complexities of the problem itself. This code and diagram are currently undergoing the review process now in order to be added to the pydy repository. Lastly I have begun work on benchmarking code which is mentioned as part of my project itself. In working on this part of the project I was able to learn how to use a SQLite database with python which I had only obtained brief exposure to in the past. This code currently works using python’s timeit library to run a file utilizing Lagrange’s method of equations of motion generation and another using Kane’s method. The code runs each file several thousand times and iterates through this process 30 times and saves the average of the 30 runs along with several other useful bits of information about the computer and version of python being used to run the tests. In addition to the benchmarking code itself I have been working on a script that will allow viewing of a graph of the tests utilizing matplotlib and tkinter. This code is close to completion and the current next major addition will be to add the ability to filter the tests based on what platform was used/what version of python was used to run the tests. This community bonding period has been productive and I am excited to begin the Google Summer of Code program on Monday. ## May 18, 2016 ### tushar-rishav (coala) #### Run your own race Most people live - whether physically, intellectually or morally - in a very restricted circle of their potential being. We all have reservoirs of life to draw upon of which we do not dream. William James Since past few days, I have been reading The Monk Who Sold His Ferrari by Robin Sharma. The author emphasises the importance of observing the beauty in the most ordinary things and talks about a mystical fable that contains the seven virtues for an enriched life. The fable goes like: Imagine, you are sitting in the middle of a magnificent, lush, green garden. This garden is filled with the most spectacular flowers you have ever seen. The environment is supremely tranquil and silent. Savor the sensual delights of this garden and feel as if you have all the time in the world to enjoy this natural oasis. As you look around you see that in the center of this magical garden stands a towering, red lighthouse, six stories high. Suddenly, the silence of the garden is disturbed by a loud creaking as the door at the base of the lighthouse opens. Out stumbles a nine-foot-tall, nine-hundred-pound Japanese sumo wrestler who casually wanders into the center of the garden. As this sumo wrestler starts to move around the garden, he finds a shiny gold stopwatch which someone had left behind many years earlier. He slips it on, and falls to the ground with an enormous thud. The sumo wrestler is rendered unconscious and lies there, silent and still. Just when you think he has taken his last breath, the wrestler awakens, perhaps stirred by the fragrance of some fresh yellow roses blooming nearby. Energized, the wrestler jumps swiftly to his feet and intuitively looks to his left. He is startled at what he sees. Through the bushes at the very edge of the garden he observes a long winding path covered by millions of sparkling diamonds. Something seems to instruct the wrestler to take the path, and to his credit, he does. This path leads him down the road of everlasting joy and eternal bliss. The author further explains that in the fable, the garden is a symbol for the mind and it is one of the seven timeless virtues for living a gratified life. If we care for our mind, if we nurture it and if we cultivate it just like a fertile, rich garden, it will blossom far beyond our expectations. But if we let the weeds take root, lasting peace of mind and deep inner harmony will always elude us. Our life is highly influenced by the quality of our thoughts. Thoughts are paramount and little bundles of energy. The author recommends to care for our thoughts as if it being the most prized possessions. Negative thoughts or worries reduce the productivity of that twelve-pound mass sitting between our shoulders - Our Mind. To summarize • Cultivate the mind with the positive thoughts. • Listen to your conscience and run your own race. I shall be sharing more as I further progress. :) ### Ramana.S (Theano) #### GSoC: Week 0 A little late post, but better late than never. So, Yay! My proposal got accepted for Google Summer of Code 2016 under Theano, a sub organisation of Python Software Foundation under the mentorship of Frédéric Bastein and Pascal Lamblin!! 😄 For those who are unaware of Theano, it is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation. My work for this summer will be focussed on improvising large graph’s traversal, serialization of objects, moving computations to GPU, creating a new optimizer_excluding flag, speeding up the slow optimizing phase during compilation and faster cyclic detection in graphs. The entire proposal with timeline of deliverables could be viewed here[1]. As the community bonding period is nearing it's end, I was finally done with my end semester exams last week and it was a pretty hectic couple of weeks. I had started my work in the reverse order with respect to my proposal, with "Faster cyclic detection in graphs". The work on new algorithm for detecting cycles in graph has been drafted by Fred during last November. I have resumed over that and carried it from there until we hit a road block, where the graphs do not pass the consistency checks with the new algorithm. So, I have moved on to the next task, and will come back to this once Fred’s schedule eases and when he could help me with this more rigorously as the code complexity is a little high for my understanding level. The next task (current) that I am working on is the optimization that move the computation to GPU. Stay tuned for more updates! [1]https://goo.gl/RBBoQl Cheers.🍻 ### Raffael_T (PyPy) #### GSoC 2016, let the project begin! First of all I am really excited to be a part of this years Google Summer of Code! From the first moment I heard of this event, I gave it my best to get accepted. I am happy it all worked out :) About me I am a 21 years old student of the Technical University of Vienna (TU Wien) and currently work on my BSc degree in software and information engineering. I learned about GSoC through a presentation of PyPy, explaining the project of a former participant. Since I currently attend compiler construction lectures, I thought this project would greatly increase my knowledge in developing compilers and interpreters. I was also looking for an interesting project for my bachelor thesis, and those are the things that pretty much lead me here. My Proposal The project I am (and will be) working on is PyPy (which is an alternative implementation of the Python language [2]). You can check out a short description of my work here. Here comes the long but interesting part! So basically I work on implementing Python 3.5 features in PyPy. I already started and nearly completed matrix multiplicationwith the @ operator. It would have been cool to implement the matmul method just like numpy does (a Python package for N-dimensional arrays and matrices adding matrix multiplication support), but sadly the core of numpy is not yet functional in PyPy. The main advantage of @ is that you can now write: S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r) instead of: S = dot((dot(H, beta) - r).T, dot(inv(dot(dot(H, V), H.T)), dot(H, beta) - r)) making code much more readable. [1] I will continue with the additional unpacking generalizations. The cool thing of this extension is that it allows multiple unpackings in a single function call. Calls like function(*args,3,4,*args,5) are possible now. That feature can also be used in variable assignments. The third part of my proposal is also the main part, and that is asyncio and coroutineswith async and await syntax. To keep this short and understandable: coroutines are functions that can finish (or 'return' to be precise) and still remember the state they are in at that moment. When the coroutine is called again at a later moment, it continues where it stopped, including the values of the local variables and the next instruction it has to execute. Asyncio is a Python module that implements those coroutines. Because it is not yet working in PyPy, I will implement this module to make coroutines compatible with PyPy. Python 3.5 also allows those coroutines to be controlled with “async” and “await” syntax. That is also a part of my proposal. I will explain further details as soon as it becomes necessary in order to understand my progress. Bonding Period The Bonding Period has been a great experience until now. I have to admit, it was a bit quiet at first, because I had to finish lots of homework for my studies. But I already got to learn a bit about the community and my mentors before the acceptance date of the projects. So I got green light to focus on the development of my tasks already, which is great! That is really important for me, because it is not easy to understand the complete structure of PyPy. Luckily there is documentation available (here http://pypy.readthedocs.io/en/latest/) and my mentors help me quite a bit. My timeline has got a little change, but with it comes a huge advantage because I will finish my main part earlier than anticipated, allowing me to focus more on the implementation of further features. Until the official start of the coding period I will polish the code of my first task, the matrix multiplication, and read all about the parts of PyPy that I am still a bit uneasy with. My next Blog will already tell you about the work of my first official coding weeks, expect good progress! ### Riddhish Bhalodia (dipy) #### Noise Estimation for LocalPCA This week after finishing the basic code for localPCA, I along with my mentors have been working on getting the local noise estimate from the data, I will describe the method followed in [1] (which we are following) and then give some initial results and problems being faced. Ok before we go I am compelled to mention, the featured image of this blog have absolutely nothing to do with the kind of noise we are discussion here😀. The more you know! ## Method Followed The noise estimation is divided into two slightly different cases, (i) with multiple b=0 images and (ii) with single b=0 image. MUBE (Multiple b=0 image noise variance estimator) Let us say we have K, b=0 images of size [l,w,h]. Then the steps are as follows • We construct a matrix X (KxN), N = l x w x h • We take the PCA of the matrix • Find the least significant principle component v (the eigenvector corresponding to the smallest eigenvalue of the covariance matrix XTX ) • Reconstruct the image from v (rationale being that this principle component will predominantly represent the noise) • Find the local variance of the reconstructed image using 3 x 3 x 3 patches SIBE (Single b=0 image noise variance estimator) We follow the same method as in the case of MUBE, the only difference being we compute PCA of the gradient weighted ( images with b ≠ 0) images. Comments Few implementation based comments, I used the GradientTable(gtab) object from DIPY to get the data segregated into b=0 and b ≠ 0 images. The code snippet is as follows fetch_stanford_hardi() img, gtab = read_stanford_hardi() data = img.get_data() affine = img.get_affine() # first identify the number of the b0 images K = gtab.b0s_mask[gtab.b0s_mask].size if(K>1): # MUBE Noise Estimate data0 = data[...,gtab.b0s_mask] ... else: # SIBE Noise Estimate data0 = data[...,~gtab.b0s_mask] ... The gtab.b0s_mask gives us a boolean array with true for b=0 images and false for all else. ## Post-processing Rescaling The variance estimate (σ2) is rescaled by the correction factor x(SNR), whose closed form expression is given as follows [2], here SNR = local mean/(local noise standard deviation) Filtering Pass the newly corrected noise variance (σ2 hat (symboling too much pain :|)) through a low pass filter LPF (which is empirically decided). For now I am using a Gaussian Filter with kernal size of 5 ## Not so great, but still results! We have not been able to get satisfactory results from this method yet. We are using sherbrook_3shell data and stanford_hardi data for now, as they are directly available from DIPY. The Sherbrook’s data has only one b=0 image where Stanford’s have 10 b=0 images. The code for the noise estimation is given here (it’s a temporary measure for testing) As we can see from this images there are a lot of problems. Everything seems to be working fine till the rescaling step (I will have to look at it and see if there is any bugs I might have missed). Another problem is that when we are taking the eigenvectors of the covariance matrix, what should be it’s normalisation? as any scalar multiple of the eigenvector will also be an eigenvector, so trouble! One possible solution to this could be the rescaling by the correction factor mentioned in the paper but that not working so well currently😦 One thing bothering me is the SNR we used for rescaling as mentioned in the paper is the ratio of local mean and noise standard deviation, instead of the standard notion of SNR = ratio of intensity by noise standard deviation. This will be explored😀 ## Next Up… Well first task is to look more closely into these noise estimates and solve the problems faced currently. Immediately we need to put these noise estimates to use and see how the localPCA behaves using them. Even though the code for localPCA is done I am yet to write a blogpost describing the local PCA based denoising method, so you can expect that in the coming week. ## References [1] Multiresolution Non-Local Means Filter for 3D MR Image Denoising Pierrick Coupe, Jose Manjon, Montserrat Robles, Louis Collins. Adaptive . IET Image Processing, Institution of Engineering and Technology, 2011. <hal-00645538> [2]Analytically exact correction scheme for signal extraction from noisy magnitude MR signals Cheng Guan Koay , Peter J. Basser Journal of Magnetic Resonance- 2006 Until next time! ~ Riddhish ## May 17, 2016 ### Karan_Saxena (italian mars society) #### Starting to apply for Google Summer of Code 2016 !!! Just a heads up that I'm applying for Google Summer of Code 2016 under Italian Mars Society, under Python Software Foundation umbrella. I hope they select my proposal and allow me to contribute during summers :) I will keep you updated. Onwards and upwards, Karan #### Accepted for Google Summer of Code 2016! Yes!! My project has been accepted :D Here's to awesome summers working for Google Summer of Code 2016 with Italian Mars Society on behalf of Python Software Foundation, under the supervision of Vito Gentile, Antonio Del Mastro and Ambar Mehrotra. My project and a short abstract of it: Official acceptance: Link You can discover more about IMS and ERAS project on this link. Cheers!! #### Google Summer of Code 2016 Payment Options Case 1 Bank: SBI Currency: USD, i.e. conversion done by SBI From [1] Remittance Inward: No charge Foreign Currency Conversion charges: 250 /- Service Tax is calculated as below: 1) 0.145% of the gross amount of currency exchanged for an amount of 1,00,000, subject to minimum of 35/- 2) 145 and 0.0725% of the gross amount of currency exchanged for an amount of rupees exceeding 1,00,000 and up to 10,00,000 Our transactions will be$505 + $2250 +$2750. Assuming Base Rate of 66 (as of today given by SBI. See [2]) the amounts are 33300, 148500 and 181500. The service tax would therefore be 48.28, 180.16 and 204.08

Total money deducted on the overall transaction: 250*3 + 48.28 + 180.16 + 204.08 = 1182.52

Note that the banks donot provide live rates. See SBI rates for the day on [2].

Total money received: 363330 - 1183 = 362147

All the other fees is covered by Google for 2016 term.

Case 2

Bank: SBI
Currency: INR, i.e currency converted by Payoneer

Note that Payoneer provides live rate as given on [3].

Payoneer fees

Current rate: 66.60

Amount deducted: 7332.66

Total money received: 363330 - 7333 = 355997

Disclaimer: All these details are true as on 4th May, 2016 and are applicable only for GSoC 2016. Things might change.

### mr-karan (coala)

#### Community Bonding Period

As the purpose of Community Bonding Period is to discuss your project with your mentor, take feedback and get familiar with code base of so that you can get started on your project ASAP from Day 1, I have written a small overview of what I’ll be doing in my GSoC work period. I have listed out a brief description of all the tasks, as it will help me and my mentor to plan things accordingly.

## Phase 1

I’ll be starting my work in the coala-bears repo. This will be a command line tool, where the user will be required to specify the bear name, the language that it will lint, etc. The bare minimum stuff required for a bear to run will be set up for the user as a boilerplate code, for the user to work with the test files and then the user can modify according to the linter executable. coala-bears --create will be a CLI which will ask the user certain questions about the bear the user is going to create. A sample file for bear will be created based on the values entered by the user. questions will be like - is it an autocorrect bear or regex based? - bear name - language it will be used to lint It will be packaged in a python module and uploaded to PyPi.

## Phase 2

Will be to work on Lint, Output and Results.

• Navigation of Results: This change will be done in Console Interaction class, where currently the user can only go further in the errors reported by bear. It would be nice to have a “go back to the previous” option as well.
• Embedded Source Code Checking : The code can be divided into different sections and then the appropriate bear can run on that section. The aim is, that if a file has multiple syntax like PHP has html code, the linter for PHP should only run in the section which is PHP.
• Multiline Regex: Some linters have multiline error messages but if the regex only parses single line, then the remaining part is lost. I will add a variable to make multiline and it can be used in such cases.
• Multiple Lint Bear: The idea is that if I have a project with multiple files of different programming languages then the bear specified by user in a comma separated list should only lint the files it’s meant to and not other files.

## Phase 3

Will be to modify the UI of coala-bears.

In this phase, I plan to implement certain features like Autocompletion, Syntax Highlighting and a Status Bar to the CLI. I’m going to use an excellent library Python Prompt Toolkit to achieve this task.

• For Autocompletion, I’m going to specify a way that all bear names are added. Fuzzy string matching can also be implemented.
• For Status Bar, I’m planning to show details of the filename, the bottom toolbar can have specific messages from the linter. I am still working on what more details can be added in these sections.
• I’m going to use Pygments lexers, for syntax Highlighting. Languages variable will be mapped to different pygments lexers and for a particular file extension, the appropriate lexer would run.

## Phase 4

This phase is an optional one, based on whether I will be able to complete the remaining phases as scheduled and if I have some time remaining in my GSoC period to do this. In this phase, I will be working on making a website for coala-bears where the user can see all the bears listed at one place, categorized according to languages. There will be information of every bear extracted from docstrings. A neat table will also be present for every bear where we could have all the statistics, more info about linter, the author of bear, asciinema URL.

I hope to discuss these points with my mentor and based on that I will be starting my coding on or before May 23.

Stay tuned for the next updates from my side :)

### Preetwinder (ScrapingHub)

#### Google Summer of Code

Hello, My name is Preetwinder, I am a 2nd year IT student in India. I usually program in Python, but with some practice can find my around in Java, C++ and Haskell. I have a great deal of interest in Information Retrieval, Programming Languages and Distributed Systems. I have been selected to work on Python 3 support for frontera under the Google Summer of Code program in which Scrapinghub is participating as an mentoring organization. My mentors will be Paul Tremberth(main mentor), Alexander Sibiryakov and Mikhail Korobov. I am very exited to be a part of this program and the frontera community. I hope to make a useful contribution to frontera.
Frontera is a distributed web crawling framework, which when coupled with a Fetcher(such as scrapy) allows us to store and prioritize the URL ordering in a scalable manner. You can read about frontera in greater detail here.

The past few weeks have been the community bonding phase of the program, during this time the candidates are supposed to get familiar with their mentors and the codebase of their organizations. During this time I have prepared a better timeline, discussed the changes to be made with my mentors, and have improved my understading of the working of frontera. I have split my task into two phases, in the first phase I will focus on improving tests and bring python 3 support to the single process mode. In the second phase(post mid-term evaluation) I will focus on improving tests and extend python 3 support to distrubuted mode. The major challenges in this project will be testing of some components which are a bit tricky to test, and getting unicode/bytes to work correctly.

I hope to successfully port frontera, and have a productive summer.

Google Summer of Code was originally published by preetwinder at preetwinder on May 17, 2016.

### udiboy1209 (kivy)

#### SegFaults At 3 In The Morning

Segmentation Faults are a nasty sucker. Any moderately experienced C/C++ programmer would know that. But well, I didn’t. I had started coding in Java initially, which has an exception throwing tendency which is the inverse of Perl’s :P! And it throws very clear and precise exceptions with a stack trace to point out exactly where you messed up. After Java I moved to Python, which is no different in terms of how it behaves with run-time errors. Modern programmers love exception-handling and stack traces it seems! Most of the contributions I have done till now for Kivy have involved only dealing with python code, but Kivy heavily relies on Cython.

## Cython - The backbone of computational python!

Cython is a C-Extension for python, which means it compiles python code to C to optimize and improve speed and computational power. Cython also lets you write C -like code the “python way” and integrate it with existing python code. Naturally Kivy relies heavily on it – any Native UI application requires speed. Something like KivEnt would have been out of the question if Cython didn’t exist. You just cannot get the speed you need for a game engine with pure python. A lot of scientific computation libraries like numpy and sympy use it too.

These past two weeks I have been working on a new animation system for KivEnt, which let me get my hands dirty in Cython for the first time. I really liked the experience, and I’ll also need it for most of my GSoC! I have always found python insufficient or lacking when I’m using it for computationally intensive tasks. Cython takes all those worries away!

## KivEnt’s new AnimationSystem

KivEnt essentially works using modular, inter dependant systems. These systems define how to process specific types of data components for each entity. Take the PositionSystem and Renderer (system which draws objects on the screen) – The PositionSystem will update a PositionComponent in the entity’s data to change physical position of the entity and Renderer will use the values stored inside this PositionComponent to draw that entity on the screen. Thats a very basic example, now try to think how a VelocitySystem would interact with PositionSystem and use VelocityComponent to modify PositionComponent and you’ll get the general idea of how KivEnt systems work :P.

The AnimationSystem is something which can render animations. Animations are extremely simple. You display an image for some duration, and then switch it with another image. Each image displayed for a certain duration is called a frame. So for making an animation, you need to specify a bunch of frames which is essentially a list of {image, duration} values.

Rendering images is handled by the Renderer so all the AnimationSystem has to do is wait for a frame to complete its duration and then change the render images for the entity, then again wait for its next frame. It’s job is so simple it could have been directly coded in python. In fact here is an example displaying just that: Simple Animation

But we need a faster and more powerful alternate. Something which can handle thousands of entities in each update. Cython! Plus we also need to make an AnimationManager to load/handle all animations, auto load animations from files, etc. So it needed to be done in Cython!

## The mother of all SegFaults: Bad Pointers

Wait why do we need to deal with pointers in python? Is there even such a thing?

There is in Cython. I mentioned before that Cython is a C-Extension. That means you have to pre-declare all your variables. With type. Which in turn leads to having to declare whether your variable is some type or a pointer to that type. You don’t have Python here to automatically assume (or work around) them for you during assignment.

The segfault I was encountering was because one of the functions was getting a null value instead of a pointer parameter. I had found this out by adding print statements to every function and checking where my program got stuck. This is a pretty stupid thing to do with segfaults. I wasted one whole day looking in the function which was apparently throwing the segfault, never realizing that the problem was in some other function passing the wrong parameter.

Well, I relayed this to my mentors and they suggested using this awesome tool for debugging: GNU Debugger. It can do a lot of uber-cool ninja-level stuff which I still have to learn but the one thing that it surely does is give me a stack trace of the error which led to the segfault. But again, gdb stack traces for Cythonized C code are nasty as hell.

Here’s an example:

#0  0x000000000000000a in ?? ()
#1  0x00007fffe7ea8630 in __pyx_f_11kivent_core_15memory_handlers_5block_11MemoryBlock_allocate_memory_with_buffer (__pyx_v_self=0x7fffd9fdccd0, __pyx_v_master_buffer=0x8fb4d0 <_Py_NoneStruct>)
at kivent_core/memory_handlers/block.c:1162
#2  0x00007fffe22d0e61 in __pyx_pf_11kivent_core_9rendering_9animation_9FrameList___cinit__ (__pyx_v_name=<optimized out>, __pyx_v_model_manager=<optimized out>, __pyx_v_frame_buffer=<optimized out>,
__pyx_v_frame_count=<optimized out>, __pyx_v_self=0x7fffe0fa5e60) at ./kivent_core/rendering/animation.c:2022
#3  __pyx_pw_11kivent_core_9rendering_9animation_9FrameList_1__cinit__ (__pyx_kwds=<optimized out>, __pyx_args=<optimized out>, __pyx_v_self=0x7fffe0fa5e60) at ./kivent_core/rendering/animation.c:1888
#4  __pyx_tp_new_11kivent_core_9rendering_9animation_FrameList (t=<optimized out>, a=<optimized out>, k=<optimized out>) at ./kivent_core/rendering/animation.c:2900
#5  0x00000000004b6db3 in ?? ()
#6  0x00007fffe24d9280 in __Pyx_PyObject_Call (kw=0x0, arg=0x7fffe544e1b0, func=0x7fffe24d5900 <__pyx_type_11kivent_core_9rendering_9animation_FrameList>) at ./kivent_core/managers/animation_manager.c:2124
#7  __pyx_pf_11kivent_core_8managers_17animation_manager_16AnimationManager_4load_animation (__pyx_v_loop=<optimized out>, __pyx_v_frames=<optimized out>, __pyx_v_frame_count=<optimized out>, __pyx_v_name=<optimized out>,
__pyx_v_self=0x7fffe56d3410) at ./kivent_core/managers/animation_manager.c:1427
#8  __pyx_pw_11kivent_core_8managers_17animation_manager_16AnimationManager_5load_animation (__pyx_v_self=0x7fffe56d3410, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>)
at ./kivent_core/managers/animation_manager.c:1387
#9  0x00000000004c4d5a in PyEval_EvalFrameEx ()
#10 0x00000000004ca39f in PyEval_EvalFrameEx ()
#11 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#12 0x00000000004ded4e in ?? ()
#13 0x00007fffe28ef5a9 in __Pyx_PyObject_Call (kw=0x0, arg=0x7fffe0fac9d0, func=0x7fffe54510c8) at ./kivent_core/gameworld.c:13802
#14 __Pyx__PyObject_CallOneArg (func=func@entry=0x7fffe54510c8, arg=arg@entry=0x7fffe80ca670) at ./kivent_core/gameworld.c:13839
#15 0x00007fffe2901f2d in __Pyx_PyObject_CallOneArg (arg=0x7fffe80ca670, func=0x7fffe54510c8) at ./kivent_core/gameworld.c:13853
#16 __pyx_pf_11kivent_core_9gameworld_9GameWorld_6init_gameworld (__pyx_self=<optimized out>, __pyx_v_callback=<optimized out>, __pyx_v_list_of_systems=<optimized out>, __pyx_v_self=<optimized out>)
at ./kivent_core/gameworld.c:5916
#17 __pyx_pw_11kivent_core_9gameworld_9GameWorld_7init_gameworld (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at ./kivent_core/gameworld.c:5630
#18 0x00000000004b1153 in PyObject_Call ()

Google Summer of Code is a really great platform for students to learn, because everybody is assigned one or more mentors to help them out. I do too. So why debug yourself :P. Just kidding! I had no clue how to interpret this because I was kinda new to Cython. Also it was 3AM in the morning at that point and I wa just too sleepy to look at any more of this stack trace! My mentor told me to show him the stack trace and he helped me find the culprit. It was this:

__pyx_v_master_buffer=0x8fb4d0 <_Py_NoneStruct>

The parameter master_buffer is being passed a None value! It was an easy debug after this. I wish I knew about this earlier. But quoting Kovak, one of my mentors:

Some of the most valuable experience is knowing what not to do.

After this I encountered another segfault, and debugging that was a breeze. I had made a pointer assignment inside an if and used it somewhere outside.

## Twinkling stars!

So twinkling stars is an example I was trying to debug my new code with. It loads 50 animations with three frames, each having three successive images of a twinkling star animation. The difference between each of the 50 is the duration of each frame, which is randomly assigned. I thought it would look beautiful.

The results are pretty great:

This was a pretty satisfying result for me :D! I still have to add and test a few features before I can do a performance test, but this has 3000 stars with 1 of 50 different animations, and it runs pretty smooth on my machine!

## May 16, 2016

### shrox (Tryton)

#### Simplified Proposal

In this post, I will explain my GSoC proposal for absolute noobs who have no idea of anything software or open source.

Just as Microsoft Word has docx as its default format for saving documents, the default container for Open Document files, such as used by Libre Office or Open Office is odt (which stands for open document text).

The way ODT works is that it stores a number of files in a compressed form, specifically in a zip container. These files are nothing but XML files. XML files, as you must have often heard in the context of the web, are simply files that contain data, sorted with the help of tags. This data could be anything, and so could the tags.

The following is an example of simple XML from w3schools

<note>
<to>Tove</to>
<from>Jani</from>
<body>Don’t forget me this weekend!</body>
</note>

As can be seen from the above XML, an XML file is only human readable text, inside human readable tags.

This make XML files amazing for version control. Okay, here’s another software management term that I feel I should explain. Version control is keeping track of versions of your code base so that you can revert to any version later. What’s important is that all the versions should be human readable so that they can be compared with each other.

Coming back to ODT, we have seen that an odt file is simply a lot of XML files in a zip container. In simpler words, an odt is a zip file. Yes. You can extract it using any zip-unzip utility. Go on, give it a try. You might need to rename it to .zip if you’re using Windows, though (go Linux!).

All in all, ODT files are zip files that contain a number of human readable XML files that are good for version control. But herein lies the problem - ODT files are not good at all for version control, since zip files need to be extracted and are not human readable!

But what if, what if we could take all those XML files and put the content of all those files into a single XML file? Why compress the files at all, right? That is, after all, where the problem lies.

And that is exactly what my primary work will be to do in this project. I will take the content from all the many XML files, and basically dump them into a single XML file and call it fodt (just as zip files were called odt).

FODT, or flat ODT files, are simply XML files that will open just the same in a word processor like Libre Office. They are not frequently used as they are bigger in size that ODT files since they are not compressed, but that makes them very useful if code in a document needs to be compared. Of course, document files are not meant to store code, and hence FODT files may only have a niche use.

Have a look at the title and abstract of my proposal on the GSoC website here.

### aleks_ (Statsmodels)

#### Hello GSoC!

I am very happy that the statsmodels community has accepted my proposal and I am proud to be part of this year's GSoC.

Now the last week of the bonding period (which is meant to get in touch with the community/mentors) is approaching. This also means one week left for final preparations before the coding starts. My feeling regarding the upcoming challenge is good - also because of my helpful mentors. They have given me advice related to the best setup of the development environment and have supplied me with useful papers. Thank you Kevin, thank you Josef!
For all of you curious about the goals of my time-series-related project, take a look at its description.

With that, thanks for reading!

### ghoshbishakh (dipy)

#### Google Summer of Code with Dipy

I know I am a bit late with this blogpost, but as you probably guessed from the title I made it into Google Summer of Code 2016!!

Through out this summer I will be working with DIPY under the Python Software Foundation.

### So how did I make it?

To be frank although I dreamt of getting into GSOC from 10th standard I never tried it whole heartedly before. And it was partly because I did not know how and where to start. But this time I was determined and more familiar with different open source projects and I started early getting involved with the community. After trying many organisations I finally found one where I could contribute something, be it tiny code cleanups or small enhancements. And trust me it feels just amazing when your first patch (pull request) gets merged into the master branch! Then I selected a project in this organisation, prepared an application and in the whole process my mentors helped me a lot with their valuable suggestions. And after that here I am! :)

### Project Overview

The aim of my project is to develop a new website for Dipy from scratch with a custom content management system and admin functionality for maintenance. Another key feature of the website will be continuous generation of documentation from the dipy repository and linking with the website. This means that whenever a new version of dipy will be released the website will be automatically updated with the new documentation. Some other features include a visualization of web analytics and github data to showcase the fact that the dipy project is spreading worldwide and a tool to generate documentation of command line utilities. The backend of the website will be built using django, and some other python libraries like markdown and python-social-auth. For visualization I plan to use D3js library. For me the most challenging and interesting part of the project will be continuous generation of documentation. There can be many ways this can be achieved. For now we have thought of a process in which for every commit or release a build server will be triggered which will build the documentation using sphinx and this documentation will then be uploaded to the website. In this process the documentation of the command line utilities will also have to be generated and that is a challenge of its own.

### Community Bonding Period

This part of the Google Summer of Code (April 23, 2016 - May 22, 2016) is called Community Bonding Period and I am discussing and refining the ideas with my mentors. We have weekly meetings and frequent communication through email and gitter. I have also set up my development environment and getting ready to start work. Although I have developed several small projects using django for my college and clubs I have never tried anything of this scale. So I am learning about the different challenges of deployment, security and scalability. I am trying to get familiar with the best practices and design patterns of django and learning how to test my code.

Hope to have an amazing summer! :)

## May 15, 2016

### Utkarsh (pgmpy)

#### GSoC 2016 with pgmpy

This all started around a year back, when I got introduced to open source (Free and Open Source, Free as in speech) world.

## May 14, 2016

### fiona (MDAnalysis)

#### Hello World

Welcome to my first blog post! I’ve set this blog up to follow my experiences as I participate in Google Summer of Code 2016, but hopefully I can make it generally interesting to anyone who happens to stumble their way here.

I’m participating in GoSC under the Python Software Foundation, working with MDAnalysis. A huge thanks to all three for giving me this opportunity! MDAnalysis is a Python library for analysing molecular dynamics (MD) simulations. I’ll be working on introducing capabilities to deal with the particular set of MD simulations associated with a method known as Umbrella Sampling (US). If, like my feline friend here, none of that makes any sense to you; fear not! I’ll explain more about the background and details of MD, US and my project in an impending future post. For now, here’s a bit more about myself:

I’m currently doing a PhD in Biochemistry at the University of Oxford. My work involves use of MD simulations, including the US simulations that are the focus of my GoSC project, to study binding of peripheral proteins to cell membranes. (Shameless self-promotion: you can check out my first paper here!)

I did my undergraduate degree, majoring in biochemistry and physics, at the University of Sydney (in-between wrestling crocodiles and drop bears, of course). I grew up in Australia, though I’m originally a (proud) Kiwi.

In my free time I enjoy reading, baking, knitting and other crafts. I already have some ideas for some GoSC project-themed creations, so stay tuned for those!

Well, that’s all from me for now. I’ll be back - with that promised project explanation - sometime soon!

## Why do we need denoising?

Simply put the images aquired from the dMRI scans are highly vulnerable to sensitivity of the magnetic field and the aquisition time, which by keeping in mind the paitient comfort dMRI is induced with high amount of noise. This noise due to the aquisition physics usually follows a Rician Distribution, this introduces a positive bias to the measurnments. As diffusion imaging finds use in precise quantitative comparision of shapes and orientations of the tensors.

## Where we currently stand!

As I mentioned in my previous bog post, DIPY has a denoising module which follows voxelwise non-local means based denoising approach, incorporated from the method described in [1]. This oversmooths the images and the edges are not well preserved.

## What’s Needed?

Well I better list it down

1. Blockwise averaging functionality in the existing nlmeans [2]
2. Adaptive denoising methods like adaptive soft coefficient matching using the blockwise averaging [3]
3. Local PCA based denoising, which takes into account also the directional information provided by the diffusion data, unlike the other methods. [4]

## Adaptive Soft Coefficient Matching (ASCM)

This works with the blockwise approach found in [2,3] and then they use it for ASCM. I want to touch upon this specifically as it’s results are really impressive and it’s a simple but elegant algorithm.

The basic idea is “MIX AND MATCH”, we have a noisy image In, and we perform blockwise averaging on it with two different parameter sets, one with larger patch size which will oversmooth the image into Io and one with smaller patch size which will undersmooth the image into Iu. Io will be very smooth with blurred out edges and on other hand Iu will have some noise but have sharper edges and features.

What we aim in smoothing is uniform smoothing over all the componentsHence we take bits of both Iand I, we take the wavelet transform of both the images and select lower subbands (for the reconstructed image) from I. Intuitively the low frequency components in I are efficiently denoised wheres high frequency components have some noise left, giving a similar analogy for I, where the high frequency components are well denoised but the features in low frequency components are spoiled due to over smoothing, and hence we choose higher subbands from I.
Then we do a weighted combination for mixing the coefficients and take the inverse wavelet transform. The whole process is shown in the figure below

## Progress .. so far!

I am working on two things right now.

Blockbased Averaging and ASCM

Added a keyword, and modified code accordingly which lets us choose between the current voxelwise implementation and the blockwise implementation of non-local means denoising in the function of nlmeans. The blockwise approach is taken from the Omar’s implementation of the same here. The adaptive soft coefficient matching (ASCM) code is also put in /dipy/denoise/ folder and the necessary wavelet function in the core functionality of dipy. Things which are remaining to be done in this aspect

• Optimization of the cython code which does blockwise averaging
• Tests and documentation

The current branch to this can be found here

Local PCA Based Denoising

I just started this and will put more theory about the method and implemention in the next blogpost. In an overview

• 4D dMRI data handling done
• Local PCA basic python based framework done
• Averaging the denoised voxels in an patch wise overcomplete manner done
• Currently working on rician adaptation code of bias correction

The current branch for this can be found here

Titbits

• Bug report filed for the current voxelwise implementation of nlmeans, and fix provided for the same. PR
• Cython build path needs to be specified to check for a specific code in the following manner (you need to include the src folder)

cython -I src/ -a dipy/denoise/denspeed.pyx

## References

[1] Impact of Rician Adapted Non-Local Means Filtering on HARDI
Descoteaux, Maxim and Wiest-Daessle, Nicolas and Prima, Sylvain and Barillot, Christian and Deriche, Rachid
MICCAI – 2008

[2]An optimized blockwise nonlocal means denoising filter for 3-D magnetic resonance images
Coupé P, Yger P, Prima S, Hellier P, Kervrann C, Barillot C. Ieee Transactions on Medical Imaging. 2008;27(4):425-441. doi:10.1109/TMI.2007.906087.

[3] Multiresolution Non-Local Means Filter for 3D MR Image Denoising
Pierrick Coupe, Jose Manjon, Montserrat Robles, Louis Collins. Adaptive .
IET Image Processing, Institution of Engineering and Technology, 2011. <hal-00645538>

[4] Diffusion Weighted Image Denoising Using Overcomplete Local PCA
Manjón JV, Coupé P, Concha L, Buades A, Collins DL, et al. (2013) PLoS (Pub Library of Science) ONE 8(9): e73021.

## May 10, 2016

#### Community bonding

So for this part of the project, called “Community bonding”, I have to prepare the things for my project. What is a better way of doing this besides talking to my mentor? Probably none. Today I had my first project-only related online conversation on Skype, and it was amazing. I got to talk to this man that is going to help me finish my project throughout this summer.

### What do I have to do?

I have to get my stuff done in this pre-coding part. And what does it involve? Well, as written in my proposal, I will have to settle two things:

• how will I make the installation as great as possible, so basically the user experience must be amazing
• decide how to make the package managing as convenient as it can be

### How will I do it?

This is where I go into what was being discussed today. The answer for the first one is quite debatable. We both had many ideas, but after brainstorming a little and deciding better, we ended up with an amazing solution. Old installation wizard experience.

The installation script will have at first 3 options:

1. Install all the bears
2. Install the recommended bears (which will probably include the general ones and a few for, let’s say, Python, C maybe? this is an easy aspect and not so important)
3. Custom installation

So basically the custom installation will throw a numbered list with all the bears in the terminal interface, and the user will have to input the numbers of bears that he wants. This will probably be the one that will take the most time, but I still think it’s cool, giving the user the liberty of having anything he wants.

For the second question, there’s no optimal solution reached  yet. But we still have a lot of time, right? 13 more days. However, we didn’t want to call it a day without having at least a solution. And there it is:

Package managing will be done by simply updating the requirements for each bear upon the time the requirements are updated. Sure, there may be some bears in worst case scenario that use many. But that is rare. Well, this is a solution after all, and whether it may not be so efficient, there’s still time, and it works, which may be enough for what we need for this project.

## Acceptance

First of all, I want to follow my first ever blog post, with good news. As you probably guessed from the title I got accepted for the Google summer of code program!!! Yoohoo!!! I will be working with coala under the Python Software Foundation.

### Community

While the coding hasn't yet started, Gsoc is already under way with the first phase called community bonding. Basically you have to get to know the community and the project's codebase better. This proved to be quite an interesting experience for me since I have to schedule calls with my mentor who is from India and the time gap is something I am not really used with.

Since I wrote a proposal for OWASP also, and I had the chance to interact with another community, I think it is safe to say at this point that coala admittedly has one of the friendliest and well structured communities. Gitter channels, google groups, mailing lists, all set up just to make sure everyone stays in touch. And not everything is about coala, for example, a couple of days ago some of us met online to do a bit of bonding via gaming (very good choice btw). I have to admit that I could not enjoy the game to its fullest because of the latency but it was still fun. I hope we do more of these gaming/anything meetups.

### Getting ready to work

Yes, it is the community bonding period and indeed you are not supposed to code pretty much anything now, but there are a couple of things that have to be done. In my proposal I left some issues to be discussed later on. My mentor Udayan suggested that this would be a good time to talk about these and "list down the finer details".

### Europython

One of the best parts of being a Gsoc student for coala is that we will meet at Europython this summer. I really look forward to this trip since I haven't participated in other conferences like that before. Also I have never gone to Spain :D

### Wrap up

All in all, I am eager to start doing the stuff that really matters, or maybe am I just eager for the summer to come faster? I don't really know but anyways, there goes my first ever blog post after my first ever blog post.

## May 09, 2016

### liscju (Mercurial)

#### Community bonding - Part I(first two weeks)

In the last two weeks, after code freeze is over i have been working on the few issues.

First which is actually merged in main repository is batching remote largefiles verify calls, it decreases number of round-trips between repository and server. The actual patch to browse is accessible here: https://selenic.com/hg/rev/305f9c36a0f5

Second i started to work on using absolute imports on largefiles extension code. This task in general is belonging to bigger effort as porting mercurial to python3. I started to working on this because using absolute imports deals with removing cycle import dependencies. To remove import cycle i have to move store factory from basestore to another module. Working on this i encountered at bug in import-checker dealing with import relative to parent - patch to this is pushed to hg-commited and can be browsed here: https://www.mercurial-scm.org/repo/hg-committed/rev/660d8d4ec7aa

Another task im doing right now is to make verify send remote calls only for files which are not accessible locally, the ones that are local should be verified locally without sending any remote calls.

### srivatsan_r (MyHDL)

#### GSoC Begins!!

It is the community bonding period from 23rd August to 22nd May. GSoC students are required to start knowing more about their project during this period.

I started working on the project with the help of my mentors.

I’m supposed to create HDMI Source/Sink Modules using MyHDL. So, I started by creating the necessary interfaces and transactors for the HDMI module. Initially I didn’t know what Interfaces are, and with the help of my mentors and the MyHDL community I learnt about it.

Interfaces are python classes with class attributes which are MyHDL signals and some extra variables. Transactors are helper functions with which the signal assignment can be done for the class attributes and are used in test benches to simulate the behavior of the interface.

### Vikram Raigur (MyHDL)

#### A play with MyHDL

Long time since a update. Sorry everyone for the late updates.

This week is full of surprises for me. I started using MyHDL and I came to know how things work exactly. I will explain my experience with different attributes and modules one by one. I will be using verilog much in my blog because I feel comfortable using verilog.

1.

The MyHDL signal is similiar to VHDL signal. I felt an analogy of the MyHDL signal’s next attribute with the non blocking Verilog assignments.

always@(clk)

a <= b   // non blocking assignment

c <= d   // non blocking assignment

Now coming to MyHDL analogy with VHDL Signal

@always_seq(clk.posedge, reset = reset)

def logic():

a.next = b // non blocking assignment

c.next = d // non blocking assignment

Where a,b,c,d are Signals in MyHDL.

2.

I was practising with MyHDL and I had to assign some statement like :

assign a = b

assign c = d

I did a straight forward assignment and it do not work. I contacted Chris Felton with my problem and he provided a nice way to do such assignments.

@always_comb

def assign():

a.next = b

c.next = d

return assign

3.

Following the journey, I tried to check whether MyHDL accepts a 2-D array as input in the module. Unfortunately I was unable to convert my code because Verilog do not accept 2-D array inputs, it was a mistake from me to expect such feature. This can be a new feature in MyHDL soon.

Then I tried to give a List of Signals as input and it eventually failed during conversion. We all know verilog do not accept list of signals as input. unless input is declared as wire

i.e input wire [4:0] inputlist [0:63]

// do not confuse it with 2-D array its a list with 5 bit data in each block

To solve this issue My Mentor Josyb said to use a wrapper which will take N Signals as input wrap them into an array ( not an input array ). Processing them and then unwrapping them.

I also tried a different method shown as follows :

def test():
iPixelBlock = [Signal(intbv(0, -1 << 11, -(-1 << 11) + 1)) for _ in range(64)]
clk = Signal(INACTIVE_HIGH)
enable_in, enable_out = [Signal(INACTIVE_LOW) for _ in range(2)]
reset = ResetSignal(1, active=ACTIVE_LOW, async=True)
inst = huffman(huffman, enable_out, iPixelBlock, enable_in, clk, reset)
return inst

toVerilog(test)

It works well and everyone knows why it works. List of signals is not an input to the block we are converting.

4.

We have two reference designs on which we have to work.

The VHDL version by Michal Krepa and The verilog version by david klun.

Josyb, Mkatsimpris and I decided to focus more on the VHDL version because the cores in the VHDL version are more modular and scalable. Also, they are very comfortable for independant testing.

The next post will contain my Github link and some modules which I designed for practise.

Thanks for going through the post.

Have a nice day .

## May 08, 2016

### tushar-rishav (coala)

#### Hola GSoC'16

Hi there!

Something was holding me back from taking the first step but finally, I’ve started writing blog. \m/

I am delighted to share the news that I’ve been recently selected as a Google Summer of Code intern and I shall be working with coala organisation under Python Software Foundation.
It all started with my participation in IndiaHack Open Source competition, a month long contest during January’16 and got introduced to coala.
I still remember, within a few hours after my first message at Gitter channel, I started working on an issue that was supposed to improve the coverage of coala core. Specifically, increase it from 98% to 100%. I had to first get familiar with the codebase, coala protocols like rebase - linear history, making atomic changes with good commit messages and writing a quality code. As I had just started, it was certainly a challenging task for me. But with the humble and ever supporting nature of coala developers, I got comfortable fairly soon.
I really look forward to continue as a contributor to this awesome community as a GSoC intern during summer and further.

###### Being coalaian

Certainly, It has been a learning experience so far. I’ve learnt a lot since past couple of months. Getting inspired from another awesome coalaian - Sils1297, I shall be attending EuroPython 2016 at Bilbao, Spain coming July. My talk, Guide to make a real contribution to an open source project for novice can be seen at the website.
We also have a sprint planned during the conference.
Feel free to drop me a mail if you are attending the conference.

I guess that’s all for now. :)
Stay tuned for further updates!

## Introduction:

So my project proposal got accepted at Google Summer of Code’16. Hurray! I will be working with coala under Python Software Foundation.

## How it all began

I had discovered coala from Indiahacks Open source track and began contributing since the last week of February. In almost a week, I had read the documentation and learnt some advanced git workflow like squashing commits. I had also solved some newcomer issues to get a feel of how to contribute to open source software. The experience of contributing to coala has been great, mainly because the community is very friendly and they guide you just enough so that you stand on your feet for yourself.

Also, getting your code accepted upstream is a not an easy task. The community is very strict about top notch commit style, which greatly improved my skills not only as a good developer but also how to communicate effectively with other teammates. This is one thing that I really liked about coala and I stuck with them because they appreciate the inner details and will accept your bug fixes only if you meet their high standards. The result of this was, I learnt how to write meaningful commit messages and not just ambiguous one-liners.

## The journey so far

I am really excited about my project Extend Linter Integration and I will be implementing the following changes in the coming summer.

• A new coala-bears --create command line tool to ease the process of creating new Bears.
• Work on extending Lint class.
• Provide command line interface improvements using Python Prompt Toolkit.

Contributing to open source is an amazing thing that every self-respecting developer should experience. It helps you grow at a much faster rate and you interact with some of the best minds, sitting in the other half of the world.

Hope to have a great time ahead in the summers. Cheers!

## May 06, 2016

### Aakash Rajpal (italian mars society)

#### GSoC Selection!!

It was a gloomy night, I was waiting for it all day and finally, the results had arrived. I couldn’t believe it my Name, my Name was there. I was selected for GSoC. A dream came true that night. I celebrated long into the night, thinking about how much this meant. I was up the whole night taking calls from my friends, family discussing how I will spend the summer. The Project I was selected was under Italian Mars Society.

### Abhay Raizada (coala)

#### GSoC: My first blog ever!

My proposal for GSoC ’16 has been selected and so my journey towards an awesome summer has begun.

When i started contributing to open source i had only the bookish knowledge learned from school and college and only a little bit of practical experience, i had no idea how a real life software was run and maintained, then i found syncplay . It was a software I used daily and got to know after a little while that it was Open Source, by that time i knew what Open Source meant but I hadn’t dove deep into it yet, so i began making some changes to it’s code-base(all hail open-source!).

A week or two later i had learned a lot, i learned about decorators, socket-programming, the twisted framework, utf-8 encoding and also some of the nitty-gritties of coding. I had known about the GSoC Program(one of my friends had participated earlier), and got to know(from the contributors) that syncplay wouldn’t be participating, so i started searching for organisations that’d be looking to participate and stumbled upon coala which is a static-code analyzer(though saying just this is doing it injustice).

the coala-community in one word is awesome! i have never seen a community in my little experience that is so helpful to newcomers! it took a little time getting used to learning how to operate the software at first but once i got used to it, it was and is still an  amazing experience. Working beside sils, AbdealiJK, Makman2 and all of the coala community has been an awesome learning/entertaining experience till now and i can’t wait to imagine what it would be like once the summer gets started!.

As far as my Project goes it deals with creating Indentation algorithms that would be language independent. It would automatically correct indentation(thanks to awesome diff management by coala) and would also look for cases when lines get too long and would even break the lines once it gets completed.

Open source has been a really fascinating experience so far. Apart from GSoC i’m looking to contribute to a lot of open source in my spare time i’d  like to finish my PR for syncplay, and find some other projects to contribute, i have a few ideas myself, let’s hope they see  the light of day😉.

All in all I’ve learned a lot from these few months contributing. I’ve developed habits like watching talks, reading blogs, all of them being so informative! I’ve learnt a lot about programming writing not only code, but writing efficient, well formatted code. I’ve had glimpses of frameworks, learnt new types of software. Being an IT student was never as exciting as it is now.

## May 05, 2016

### shrox (Tryton)

#### The Beginning

This is where my GSoC journey begins, for all practical purposes.

Right now in the ongoing community bonding period I hope to do the following -

1. Get done with issue5258 that I had assigned to myself a while back.
2. Get better acquainted with lxml, a Python XML processing library.
3. Refer to Relatorio’s codebase in order to understand how the final converter will be used.
4. Use Relatorio to generate reports with the existing codebase.

## May 03, 2016

### Sheikh Araf (coala)

#### Google Summer of Code '16 with coala

So, my proposal for Google Summer of Code 2016 got accepted. Yay! And now that my exams are over, I finally have some time to blog about it. So here we go.

Earlier this year I decided to dip my toe in the water of contributing to open source projects. I came across coala, a language independent static code analysis framework. I started off by fixing some of the simple newcomer issues. This helped me understand the code base of the project. The best part was that the coala community was extremely friendly and always helpful.

I came with no long-term plans, but I had a really fun time learning new stuff, so I had to stay. Later Google Summer of Code was announced and coala was participating under Python Software Foundation. I submitted a proposal and it got accepted.

So this summer I will be building a plugin to integrate the awesome coala code analysis framework with the Eclipse IDE. I have a coarse idea of the project and I look forward to discuss it with my mentor Harsh Dattani in the next few weeks of the community bonding period.

### kaichogami (mne-python)

#### GSoC 2016 with MNE

Hello everyone. Its been a while since I lasted posted here. I am glad and excited to say that my project was accepted under GSoC 2016. I will be working under MNE-python, a library to process brain signals. I am grateful to Denis and Jean for helping me out at every point of time. The proposal looks so good only because of them.
I will begin working on project as soon as my exams finish. An update listing out the new changes should be out every 2 weeks ideally. I have created a checklist to easy manage the deadline completion.
Here is a link to proposal. In short my project involves changing various transformers to follow “2-D X,Y” input/output and creating a Pipeline class to chain various transformers.

Thank you for reading!

## May 02, 2016

### Yen (scikit-learn)

#### Hello Google Summer of Code!

In this summer, I will participate in Google Summer of Code (GSoC for short), a program offers student developers stipends to write code for various open source projects. My proposal for GSoC, Adding fused types to Cython files, which aims to enhance the popular machine learning library scikit-learn has been accepted and will be supervised by two mentors from scikit-learn community: Jnothman and Mechcoder.

Below, I’ll briefly describe the work I’d like to achieve during GSoC.

## Proposal Abstract

The current implementation of many algorithms in scikit-learn, such as Stochastic Gradient Descent, Coordinate Descent, etc. only allow input with np.float64 and np.int64 dtypes due to the adoption of Cython fused types may result in explosion of the generated C code. However, since scikit-learn has removed Cython files from the repo and re-generate them from every build, it provides a good chance to refactor some of the “.pyx” files by introducing Cython fused types. This will allow those algorithms to support np.float32 and np.int32 dtypes data, which is currently casted into np.float64 and np.int64 respectively, and therefore reduce the waste of memory space.

You can find the detailed version of my proposal here!

## Example

Here, I’ll use an example to illustrate how Cython fused types can benefit the whole project.

mean_variance function in scikit-learn, like some algorithms I mentioned in my proposal abstract above, will explicitly cast np.float32 data into np.float64 before this pull request, which yields waste of memory. However, after we introduce Cython fused types into this function’s implementation, it can now accept np.float32 data directly.

Results of this enhancement can be visualized via memory profiling figures showed below:

• Memory usage before using fused types

• Memory usage after using fused types

As one can see, memory usage surrounded by the bracket drastically decrease.

## Summary

I believe that scikit-learn’s memory efficiency can be hugely improved after I add fused types into existing Cython files in the project.

On the other hand, great thanks to scikit-learn community for giving me this golden opportunity to work on an open source projects I use every day.

Really Looking forward to this productive summer!

## May 01, 2016

### Anish Shah (Core Python)

#### GSoC'16: Community Bonding (1st Week)

It is already one week into the community bonding period and I have already done a lot of new things. I talked to my mentor Maciej Szulik over email and he gave me some tasks - to setup the b.p.o environment locally and then to add a GitHub Pull Request URL field on issues page. You can find all the details about what I learnt this week below. :)

## Docker

I have been hearing about Docker for many months now. But, I have never got an opportunity to use it, as I generally use pip and virtualenv to quickly setup most of my Python projects. But, what’s awesome about Docker is that it is not just limited to Python projects. It allows you to package any project and its dependencies into a single unit. Generally, people think of Docker as a VM. VMs generally run Guest OS on top of hypervisors. However, Docker creates containers that include applications and its dependencies. They run as an isolated process on the host OS.

Virtual Machine and Docker Architecture
Picture courtsey: Docker.com

This allows the developers to quickly setup any application on any computer. It eliminates environment inconsistencies. I setup the Python Issue Tracker on my local machine using Docker. If you want to set it up locally, you can find the repository here. You can easily build the docker image using the following command

$docker build -t <image-name> <path-to-Dockerfile> To run the Docker container, you can run the command below. You can read more about Docker commands here.$ docker run [OPTIONS] IMAGE[:TAG|@DIGEST] [COMMAND] [ARG...]

## Template Attribute Language (TAL)

I have been using Django and Flask to create some web apps. They have template engine to create dynamic pages. Flask uses Jinja template engine and Django has its own template engine. Python Issue Tracker uses a templating language called as Template Attribute Language (TAL) to generate dynamic HTML pages. TAL statements are embedded inside HTML tags. It uses tal as namespace prefix. You can read more about TAL here.

## First GSoC’16 Task

To get familiar with the b.p.o codebase, I was given a small task by my mentor. I had to add a new field on the issue page, so that Developers can submit GitHub pull requests URL. The issue page should show a table of GitHub PRs related to the issue. I completed the task and submitted a patch for it. You can follow the progress here.

That’s it for this week. Thank you for reading. Do comment down below about what do you think about this post or any questions for me. See you guys next week.

## April 30, 2016

### Vikram Raigur (MyHDL)

#### New Friends

Hi

First of all thank you MyHDL and Google for providing me an opportunity to participate in Google Summer of Code 2016.

Today Iam going to write about my early experience as a part of Community bonding period. As, I have been working on JPEG Encoder this summer. I have been working on understanding the  Huffman Encoder module written in verilog which we are going to use as reference.

Also, in this early period I have been working on MyHDL syntax, implementing basic architectures using MyHDL.

Third thing is I texted my mentor Josy Belton for the first time on Gitter this week. Also, I contacted Mercurious Katsimpris, the person working on the front end of JPEG Encoder. Both are really friendly people. It was a pleasant experience talking to them.

Finally this week, I had a working setup for MyHDL library. I have installed Rhea (we are going to contribute there too). I am facing some errors when I try to build the files which Iam going to clear next week.

Overall the first week was a great one and now Iam set to home for the summer vacation taking leave for a couple of days.

Thank you

## April 29, 2016

### Avishkar Gupta (ScrapingHub)

#### The First Post

This blog will contain weekly reports I write as a GSoC student for Scrapinghub.

### Nelson Liu (scikit-learn)

#### (GSoC Week 0.1) How fast is fast, how slow is slow? A look into Cython and Python

The scikit-learn tree module relies heavily on
Cython to perform fast operations on NumPy arrays, so I've been learning the language (if you can even call it that) in order to effectively contribute.

At first, I was a bit skeptical about the purported benefits of Cython -- it's widely said that Python is "slow", but how slow is "slow"? Similarily, C code is known to be "fast", but its hard to get a grasp on the performance difference between Cython and Python without directly comparing them. This post summarizes a quick (and extremely unscientific) experiment I did comparing the performance of raw Python, Python code running in Cython, and Python code with static typing running in Cython. These results may not generalize to whatever application you have in mind for Cython, but they're suitable for seeing the existence of performance differences on a CPU-heavy task.

## Why would I want to use Cython?

Cython combines Python's ease of use with C performance to help developers optimize their Python code or create a fast Python interface to their C code.

To understand how Cython improves the performance of Python code, it is useful to have some knowledge of how code in Python and C is run. Python is a dynamically typed -- this means that variables do not have to be fixed at compile time, and a variable that starts as an int can be set to a list or even a custom Python object at any time. On the other hand, C is statically typed -- variable types must be defined at compile time, and they are generally that type and only that type. Also, Python is an interpreted language; this indicates that there is no compile step necessary to run the code. C is a compiled language, and files thus must be compiled before they are runnable.

Given Python's nature as a dynamically typed, interpreted language, the interpreter must spend time to figure out what type each variable is at runtime, extract the data from these variables, run the low-level machine instructions, and then place the result into a (possibly new) Python object that is returned. In C, the compiler can figure out at compile time all the details of low-level functions / data to use; a compiled C program spends almost all its runtime calling fast low-level functions, making it much faster than Python. Cython attempts to improve the performance of Python programs by bringing the static typing of C to Python, a dynamic language.

With a few exceptions, valid Python code is also valid Cython. To demonstrate what sort of speed gains are possible with Cython, we turn to the classic example of calculating fibonacci numbers.

### Python vs Cython

Below is a simple recursive function to calculate the nth Fibonnaci number in Python

def fibonacci_py(n):
a, b = 0, 1
for _ in range(1, n):
a, b = b, a + b
return b

Let's see how long the Python function takes to calcuate several values of fibonacci

%timeit fibonacci_py(0)
1000000 loops, best of 3: 436 ns per loop
%timeit fibonacci_py(70)
100000 loops, best of 3: 4.89 µs per loop

Now, let's turn the above function into a Cython function without changing anything (remember that most valid Python code is valid Cython) and evaluate performance again.

%%cython
def fibonacci_cy_naive(n):
a, b = 0, 1
for _ in range(1, n):
a, b = b, a + b
return b
%timeit fibonacci_cy_naive(0)
1000000 loops, best of 3: 227 ns per loop
%timeit fibonacci_cy_naive(70)
100000 loops, best of 3: 2.1 µs per loop

Now let's add static typing to the naive Cython code.

%%cython
def fibonacci_cy_static(n):
cdef int _
cdef int a=0, b=1
for _ in range(1, n):
a, b = b, a + b
return b
%timeit fibonacci_cy_static(0)
10000000 loops, best of 3: 59.3 ns per loop
%timeit fibonacci_cy_static(70)
10000000 loops, best of 3: 126 ns per loop

As you can see, it took Python 436 ns per loop to calculate fibonacci(0) and 4.89 µs per loop to calculate fibonacci(70). Simply using Cython without any changes to the Python code more than doubled the performance, with 227 ns per loop to calculate fibonacci(0) and 2.1 µs per loop to calculate fibonacci(70). However, the most dramatic performance increase came from using statically typed C variables (defined with cdef). Using statically typed variables resulted in 59.3 ns per loop when calculating fibonacci(0) and 126 ns per loop when calculating fibonacci(70)! In the case of calculating fibonacci(0), this represents a 3x speed improvement over the naive Cython function and a 7x performance increase over the Python function. The speedup is even more pronounced when calculating fibonacci(70); using statically typed variables gave a speedup of almost 17x from the naive Cython version and approximately a 39x improvement over the normal Python version!

Cython gives massive performance achievements on this simple fibonacci example, but it's worth nothing that this example is completely CPU bound. The performance between Python and Cython on a memory bound program would likely still be noticeable, but definitely not as dramatic as this toy example.

## Conclusion

While learning Cython, I wrote a short iPython notebook tutorial on Cython pointers and how they work geared toward developers relatively fluent in Python but unfamiliar in C -- it's mainly intended to be practice / quick reference material, but you might find it handy if you want to learn more.

Additionally, the contents of the majority of this post are in an iPython notebook here.

For next week, I'll be providing a brief introduction to regression trees and some basic splitting criterion such as mean squared error (MSE) and mean absolute error (MAE).

If you have any questions, comments, or suggestions, you're welcome to leave a comment below :)

Thanks to my mentors Raghav RV and Jacob Schreiber for their constant support, and to the larger scikit-learn community for being a great place to contribute.

You're awesome for reading this! Feel free to follow me on GitHub or check out Trello if you want to track the progress of my Summer of Code project.

### Prayash Mohapatra (Tryton)

#### Accepted into GSoC!

I am accepted into Google Summer of Code 2016. Will be working on Tryton (under Python Software Foundation). I will be developing the CSV Import/Export feature for their Web Client codenamed SAO. I am very enthusiastic about this as I could finally write a proposal for GSoC after failing to do so for two years in a row.

Would be working mostly in JavaScript and the many tools that come with it. Would be working from home this summer. Really wanted to visit another city. Just waiting for the semester examinations to end.

~Try Miracle

## April 27, 2016

### tsirif (Theano)

#### Google Summer of Code adventure begins

Finally, about a month after proposal submissions, Google Summer of Code announced which projects will participate in this year’s coding summer adventure. My proposal to Python Software Foundation was accepted together with other 1205 proposals to 178 - in total - open-source organizations. Check the official blog announcement for more statistical references.

So, this way begins my involvement with Theano, an open-source project initiated by people in the MILA lab at the University of Montreal. Theano is a mathematical Python library which allows to define, optimize, and evaluate symbolic expressions, in a way that resulting computations are using the most out of the available computational resources. It uses underlying Python, C and CUDA implementations of generic mathematical operations and combines them according to a user-defined symbolic operation graph in order to achieve an optimized computation on the available software and hardware per platform. I am going to contribute in extending GPU support with more implementations of operations and with more functionality for multi-gpu and multi-node/gpu infrastructures.

See here the abstract of my proposal!

If you are interested on this project’s progress follow my fork on github.

More information on the project details at the next post!

### Ranveer Aggarwal (dipy)

#### A Pythonic Summer

I have been selected for Google Summer of Code (GSoC) 2016. I’ll be working on a SciFi UI using Python-VTK under DIPY (Python Software Foundation) and I’ll be mentored by Eleftherios Garyfallidis and Marc-Alexandre Côté.

DIPY is a python toolbox for analysis of MR diffusion imaging. It’s an open source research project that implements a broad range of algorithms for denoising, registration, reconstruction, tracking, clustering, visualization, and statistical analysis of MRI data.

### Project Description

Before we go into what I’ll be doing this summer, I’d like you to watch a small (but cool) video.

See those cool, futuristic interfaces? Gee, how cool would it be if those were real? Well, that’s kinda what we’ll be trying to achieve, except that the controls will be used for tractography exploration instead of space.

The main idea is to develop new futuristic widgets directly using VTK (Visualization Toolkit) without calling any external libraries. These new widgets will be useful because we will be able to use them to navigate in tractographies and allow neurosurgeons and other neuroscientists to have a unique impression and user experience when using DIPY’s tools.

The possible deliverable by the end of the summer would consist of field-dialogues, sliding panels with buttons and dynamic actor menus (3D menus on objects).

The fact that the project is completely open ended and involved building something entirely new makes it that much more intriguing.

Google Summer of Code (GSoC) is a yearly internship program by Google to help the open source communities to reach out to student contributors. Organisations pitch projects, and when selected, pick up university students to work on these floated projects or their own ideas related to the organisation’s project(s). Last year, I completed my GSoC with KDE and it was an amazing experience. This year, I hope it’ll be a notch higher.

#### A Pythonic Summer

I have been selected for Google Summer of Code (GSoC) 2016. I’ll be working on a SciFi UI using Python-VTK under DIPY (Python Software Foundation) and I’ll be mentored by Eleftherios Garyfallidis and Marc-Alexandre Côté.

DIPY is a python toolbox for analysis of MR diffusion imaging. It’s an open source research project that implements a broad range of algorithms for denoising, registration, reconstruction, tracking, clustering, visualization, and statistical analysis of MRI data.

### Project Description

Before we go into what I’ll be doing this summer, I’d like you to watch a small (but cool) video.

See those cool, futuristic interfaces? Gee, how cool would it be if those were real? Well, that’s kinda what we’ll be trying to achieve, except that the controls will be used for tractography exploration instead of space.

The main idea is to develop new futuristic widgets directly using VTK (Visualization Toolkit) without calling any external libraries. These new widgets will be useful because we will be able to use them to navigate in tractographies and allow neurosurgeons and other neuroscientists to have a unique impression and user experience when using DIPY’s tools.

The possible deliverable by the end of the summer would consist of field-dialogues, sliding panels with buttons and dynamic actor menus (3D menus on objects).

The fact that the project is completely open ended and involved building something entirely new makes it that much more intriguing.

Google Summer of Code (GSoC) is a yearly internship program by Google to help the open source communities to reach out to student contributors. Organisations pitch projects, and when selected, pick up university students to work on these floated projects or their own ideas related to the organisation’s project(s). Last year, I completed my GSoC with KDE and it was an amazing experience. This year, I hope it’ll be a notch higher.

### sahmed95 (dipy)

Hi,

I will be writing about my experiences over the summer working with Dipy in this blog. I am a Physics and Electronics double major student from BITS Pilani, Goa Campus, India with an active interest in Mathematical modeling, statistics and coding using Python. I will be contributing to Ddipy by integrating models for diffusion imaging such as IVIM and Rohde over the summer.

### meetshah1995 (MyHDL)

#### Introduction

Hello everyone,

I am Meet Shah and I will be writing more about my experience with myHDL community and a few implementation details of my project here.

### ljwolf (PySAL)

#### Lit Review Habits

Let’s go, academics. Share your tips on how you keep all your sources and notes straight while doing a literature review.

Zotero’s a great bibliography program that can use a simple button on your browser to import the info and pdfs from places like JSTOR or even just webpages. I use the “extra” line to note whether I’ve read/notated/printed/etc. that particular work as well as using the ‘notes’ feature to put specific info i think might be quotable for my own work.

Man, zotero is amazing, but I feel like I underutilize it pretty hard. I still do most of my reading on paper and stick most of my notes into workflowy, so zotero really just ends up being a link between my bibtex database and pdfs of the articles in question.

If there was something like bibdesk for linux (that’s not jabref, which is annoying as hell to use) I’d probably take that over zotero any day.

## April 26, 2016

### Aron Barreira Bordin (ScrapingHub)

#### GSoC - Support for Spiders in Other Programming Languages

Hello Everyone !

My name is Aron Bordin and I’m Brazilian Computer Science Student and AI Researcher. I’m studying Computer Science at São Paulo State University, and always coding something fun on my free-time :)

I’m very happy to announce that my Google Summer of Code proposal has been accepted :tada:

Scrapy is one of the most popular web crawling and web scraping framework. It’s written in Python and known by its good performance, simplicity, and powerful API. However, it’s only possible to write scrapy’s Spiders using the Python Language.

The goal of this project is to provide an interface that allows developers to write spiders using any programming language, using json objects to make requests, parse web contents, get data, and more. Also, a helper library will be available for Java, JS, and R.

## This Blog

I’ll use this blog to post updates about the project progress.

GSoC - Support for Spiders in Other Programming Languages was originally published by Aron Bordin at GSoC 2016 on April 26, 2016.

### udiboy1209 (kivy)

#### I Got Selected For GSoC 2016!

I’m really excited to say that my project was selected for Google Summer of Code 2016! This is a really great opportunity for me to get to code throughout the summer, and coding is something I dearly love :D !

## GSoC? What’s that?

So this is how GSoC works. There are open source organizations who have a list of projects they want to see implemented, and they are willing to mentor enthusiastic people for it. GSoC provides a way for students all over the world to take up these projects, along with a stipend. As a student you are supposed to submit a proposal for whichever project you want to do to the respective org. Then you await the org mentors to review your proposal, compare it with countless other submissions and finally deem you worthy/unworthy of their mentorship :) .

## Yay!

I submitted my proposal to Kivy, a python framework to create UI apps for various platforms like Android, iOS, Windows, Raspberry Pi ( :O ! I was surprised too!) and obviously linux. You couldn’t have imagined my excitement when I saw my name (actually my nick) and my project show up on Python Software Foundation’s projects list.

My project is to implement a module for Tiled maps in Kivy’s game engine KivEnt. Jacob Kovak, Mathieu Vibrel and Akshay Arora will be mentoring me. I’ve always wanted to work on game engines which makes this project all the more fascinating for me!

## How these past months have been

A bunch of seniors in my college have done GSoC in the past years, and they all had the same advice to give: Start contributing to the org you like, it gives you a much better chance of getting selected. So I started contributing to Kivy sometime in the winter last year. It was pretty tough at first, dealing with such a huge codebase. But the people who maintain kivy are really helpful with the tiniest of things. And they are extremely appreciative of the contributions you make too :D! I remember one of them commenting “Beautiful!” on one of my PRs before merging it, which left me wondering what was so great in this teensy contribution. But it did have a positive impact on me. For a beginner, positive feedback never hurts ;)!

I have come quite ahead from that beginning stage. I even earned a bounty on one of the bugs I fixed for kivy :D! The experience has been awesome. I’m getting to know this wonderful community of people who work towards kivy’s development and I feel glad I am starting to be a part of that community! And I believe there much more great times ahead!

## What’s more?

Well, GSoC requires me to blog about the developments of my project. It will help my mentors review my progress. So I will be using this blog to post updates and developments! Stay tuned :D!

### jbm950 (PyDy)

#### GSoC Acceptance

I am excited to announce that I have been accepted for the the Google Summer of Code program for the summer of 2016. I will be working with the Sympy open source project’s equation of motion generators. For the project I will mainly be focusing on creating a shared base class for the current equation of motion generators and adding an additional generator.

## April 25, 2016

### chrisittner (pgmpy)

#### GSoC proposal accepted!

My proposal for Google Summer of Code 2016 has been accepted :). This means that I will spend part of my summer working on the pgmpy library. I will implement some techniques for Bayesian Network structure learning. You can have a look an my proposal here.

As a first step, I set up this blog to document my progress. It is built with the Pelican static-site generator and hosted on GitHub pages.

### liscju (Mercurial)

#### My GSOC project proposal has been accepted!

My proposal has been accepted, now its time to speed work a bit. I should contact mentor in this week to discuss changes to my proposal. Apart from doing unfinished fixes im planning to do issue that Mads described in https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-April/083023.html to work on largefiles. In general they deals with largefiles verify command, i sent patches to this issue, but they didnt resolve issue correctly.

### SanketDG (coala)

#### GSoC 2016, here I come!

So, I am participating in GSoC 2016 with coala!

My proect is based on language independent documentation extraction, which will involve parsing embedded documentation within code and separate them into description, parameters and return values (even doctests!).

After this is done for several languages(Python, C, C++, Java, PHP), I will implement the following as the functionality of a bear:

• Documentation Style checking as specified by the user.
• Providing aesthetic and grammar fixes.
• Re-formats the documentation (indentation and spacing)

## April 24, 2016

### Pulkit Goyal (Mercurial)

#### Getting into GSoC

Hello everyone, recently I made into GSoC 2016. So I will be writing about how to get into GSoC. I have proposed a project under Python Software Foundation for an organisation named Mercurial. **Mercurial** is a cross-platform, distributed revision control tool for software developers. I will be describing about my organisation more in upcoming posts. For now I will be talking about GSoC.

### meetshah1995 (MyHDL)

#### ./gsoc init

As it turns out , my proposal has been selected as a project. Looking forward to an awesome summer.

return 0;

### Nelson Liu (scikit-learn)

#### An Intro to Google Summer of Code

I'm participating in the Google Summer of Code, a program in which students work with an open source organization on a 3 month programming project over summer; I'll be working with the scikit-learn project to add several features to the tree module. You can read my proposal here.

The program also requires that I publish a blog about my project and work; as a result, this weekly series of posts will recount progress, what I've been up to, and what I've learned.

I'll be prepending all of the posts in this series with GSoC Week #, and they will be tagged under gsoc.

## April 23, 2016

### Adhityaa Chandrasekar (coala)

#### GSoC '16!

Great news! I've been selected for this year's GSoC (Google Summer of Code) under coala, a powerful static-code analysis tool that is completely modularized. You should definitely use it in your projects if you want a tool that will completely automate huge segments of code review, thereby rapidly fast-forwarding the production cycle.

I've been contributing for a couple of months and the the experience has been nothing short of being phenomenal! I was recently given contributor status too :)

Over the course of this summer, I'll be working on a project called Settings Guessing. Currently coala needs the user to specify the choice for each setting - whether to use spaces or tabs, whether to use snake_casing or camelCasing, whether to use K&R style or Allman style. But with this project, this would guessed automatically! Totally awesome, right?

Stay tuned for more, I'll try to post updates weekly.

#### Accepted!

The day I have waited for so long has finally come. Yesterday noon the results were shown and my project with PSF (Python Software Foundation) & coala was accepted. This is probably my biggest life achievement so far, and as I am inside so happy, I feel such a huge responsability on my personal progress and work. I have compromised my summer by having to work hard on finishing this project.

The project I got accepted on is called Decentralizing BearsThis implies making bears, which are basically plugins for coala, independent packages. This allows for easier management and improvements upon them.

Right now the period is called Community Bonding, and lasts until 23th of May, when the actual work starts. This is when students and mentors get to know each other and talk about implementing the project.

### It is going to be hard.

I am roughly a student in my first year of Computer Science, starting to learn programming @October last year. This project will be a real challenge for myself, one which I am planning to finish, one which I am planning to work hard on.

### What’s next?

Over the next month, it will be a challenge to me to be able to manage all the homework from my college and also get prepared for this project. What is worse, the first 3 weeks after the work period on GSoC starts, my exam session starts. Yes, it’s going to be hard. But nothing comes in easy, does it?

### Thanks!

This blog post intends to thank everyone from coala who has helped me achieve this. With the help of that amazing community, I was able to achieve something I never thought I would.

### Anish Shah (Core Python)

#### GSoC'16: Python Software Foundation

Yes! I am a GSoCer now! Something I’ve wanted to say since last two years.

I have been spending my last two years contributing to various open source projects and learning new things. Now, I have got the opportunity to contribute full time to open source. I will be working with Python Software Foundation on Roundup and Git migration. I am excited to work on this project as this will automate a lot of stuff for Python Core contributors, which is otherwise a very hard manual work. I should thank PSF for accepting my project proposal. I’m looking forward to a great summer.

You can have a look at my project proposal here

Thank You!

## April 22, 2016

### ljwolf (PySAL)

#### Google Summer of Code

Google Summer of Code:

Excited to be working on PySAL for Google Summer of Code!

## April 13, 2016

### liscju (Mercurial)

#### Still before projects announcement time

Recently i was trying to fix a few bugs of which some wasnt related to largefiles extension. The things im working on now are:

• largefiles: makes verify work on local content by default
• update --clean doesn't work on new branch
• unshelve loses merge parents
• Error in hg help -v remove
• run-tests should use a whitelist for environment variables
All details can be found in bug tracker with "easy" tag

## April 08, 2016

### udiboy1209 (kivy)

#### Hello World

This is my first blog post! I’m starting this blog specifically for GSoC 2016. They require weekly updates on my project through blog posts. But you’ll also see other interesting stuff I am interested in and write about.

## March 25, 2016

### tsirif (Theano)

#### Welcome to COSA!

Friday, 25 March 2016, 05:05 AM, Thessaloniki, Greece

I am beginning a blog in which I am going to narrate my coding adventures.

Purpose of this is to initially describe publicly my “yet probable” activities in Google Summer of Code.

### ljwolf (PySAL)

#### Bringing Classifiers Alive in PySAL

I’ve talked a lot to fellow developers about making PySAL objects more than containers for the results of a statistical procedure.

One way I think we can do this is to focus on methods like predict, find, update, or reclassify.

So, here, I’ll show the way I’ve implemented a simple API to update map classifiers by defining their __call__ method.

In [2]:
import pysal as ps

The patch I applied to mapclassify should be in this github branch. To get it, you’ll need to git fetch my repository and check out the reclassify branch. Alternatively, what I added to Map_Classifier is so small, it’s easy to show:

First, I added a call method:

def __call__(self, *args, **kwargs):
"""
This will allow the classifier to be called like a
function *after* instantiation
"""
if inplace:
self._update(new_data, **kwargs)
else:
new = copy.deepcopy(self)
new._update(new_data, **kwargs)
return new

This will allow us to do something like:

classifier = pysal.Quantiles(data)
classifier(k=4)
classifier(k=9)
classifier(new_data, inplace=True)

and proceed to interact with the classifier object over and over again. Since there’s an inplace toggle (False by default), users can choose when to mutate or when to copy.

In theory, the __call__ method can support all of the different __init__ declarations possible. I’ve defined it this way because most of the mapclassify methods I can think of use a mandatory data argument and optional keyword arguments. The only one that varies from this is User_Defined, which I overwrote to handle correctly.

The main point here is that this enables users to quickly reclassify and view new classifications using the object they created! Thus, a common use case might be something like this:

In [4]:
In [5]:
Out[5]:
FIPSNO NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS STFIPS COFIPS SOUTH HR60 BLK90 GI59 GI69 GI79 GI89 FH60 FH70 FH80 FH90 geometry
0 54029 Hancock West Virginia 54 029 54029 54 29 1 1.682864 2.557262 0.223645 0.295377 0.332251 0.363934 9.981297 7.8 9.785797 12.604552 <pysal.cg.shapes.Polygon object at 0x7fc5495eb…
1 54009 Brooke West Virginia 54 009 54009 54 9 1 4.607233 0.748370 0.220407 0.318453 0.314165 0.350569 10.929337 8.0 10.214990 11.242293 <pysal.cg.shapes.Polygon object at 0x7fc5495eb…
2 54069 Ohio West Virginia 54 069 54069 54 69 1 0.974132 3.310334 0.272398 0.358454 0.376963 0.390534 15.621643 12.9 14.716681 17.574021 <pysal.cg.shapes.Polygon object at 0x7fc5495eb…
3 54051 Marshall West Virginia 54 051 54051 54 51 1 0.876248 0.546097 0.227647 0.319580 0.320953 0.377346 11.962834 8.8 8.803253 13.564159 <pysal.cg.shapes.Polygon object at 0x7fc549565…
4 10003 New Castle Delaware 10 003 10003 10 3 1 4.228385 16.480294 0.256106 0.329678 0.365830 0.332703 12.035714 10.7 15.169480 16.380903 <pysal.cg.shapes.Polygon object at 0x7fc549565…

5 rows × 70 columns

In [7]:
data = df['HR60'].values
In [8]:
classifier = ps.Quantiles(data)
In [9]:
classifier
Out[9]:
Quantiles

Lower            Upper              Count
=========================================
x[i] <=  2.497               283
2.497 < x[i] <=  5.104               282
5.104 < x[i] <=  7.621               282
7.621 < x[i] <= 10.981               282
10.981 < x[i] <= 92.937               283

Once estimated, the user can reclassify based on the same API as the constructor:

In [10]:
classifier(k=3)
Out[10]:
Quantiles

Lower            Upper              Count
=========================================
x[i] <=  4.265               471
4.265 < x[i] <=  8.679               470
8.679 < x[i] <= 92.937               471
In [11]:
classifier(k=9)
Out[11]:
Quantiles

Lower            Upper              Count
=========================================
x[i] <=  0.000               180
0.000 < x[i] <=  2.836               134
2.836 < x[i] <=  4.265               157
4.265 < x[i] <=  5.628               157
5.628 < x[i] <=  7.137               156
7.137 < x[i] <=  8.679               157
8.679 < x[i] <= 10.600               157
10.600 < x[i] <= 13.924               157
13.924 < x[i] <= 92.937               157

It doesn’t mutate the object unless inplace is provided and is true:

In [13]:
classifier
Out[13]:
Quantiles

Lower            Upper              Count
=========================================
x[i] <=  2.497               283
2.497 < x[i] <=  5.104               282
5.104 < x[i] <=  7.621               282
7.621 < x[i] <= 10.981               282
10.981 < x[i] <= 92.937               283
In [14]:
classifier(k=6, inplace=True)
In [15]:
classifier
Out[15]:
Quantiles

Lower            Upper              Count
=========================================
x[i] <=  1.993               236
1.993 < x[i] <=  4.265               235
4.265 < x[i] <=  6.245               235
6.245 < x[i] <=  8.679               235
8.679 < x[i] <= 11.850               235
11.850 < x[i] <= 92.937               236

This also enables users to add new data to the classifier.

Now, I bet there are better updating equations for the different classifiers than reestimating the entire classifier, like there are for running median problems. I anticipated extending this work with more sophisticated updaters than just reclassifying the entire set. This is why I split the __call__ method from what really does the updating:

def _update(self, data, *args, **kwargs):
if data is not None:
data = np.append(data.flatten(), y)
else:
data = self.y
self.__init__(data, *args, **kwargs) #this is the most naive updater

As the comment denotes, this is the most universally-acceptible updater, hence it’s definition in the Map_Classify baseclass. Fortunately, this means that any new classifier defined as a subclass of this gets a very naive in-place reclassification method for free.

Thus, you can do stuff like:

In [17]:
new_data = df['HR90'].values
In [19]:
classifier(new_data)
Out[19]:
Quantiles

Lower            Upper              Count
=========================================
x[i] <=  3.228               565
3.228 < x[i] <=  5.912               565
5.912 < x[i] <=  8.710               564
8.710 < x[i] <= 12.735               565
12.735 < x[i] <= 92.937               565
In [20]:
classifier(new_data, k=14)
Out[20]:
Quantiles

Lower            Upper              Count
=========================================
x[i] <=  0.000               296
0.000 < x[i] <=  2.200               108
2.200 < x[i] <=  3.469               201
3.469 < x[i] <=  4.483               202
4.483 < x[i] <=  5.394               202
5.394 < x[i] <=  6.282               201
6.282 < x[i] <=  7.297               202
7.297 < x[i] <=  8.266               202
8.266 < x[i] <=  9.348               201
9.348 < x[i] <= 10.628               202
10.628 < x[i] <= 12.217               202
12.217 < x[i] <= 14.603               201
14.603 < x[i] <= 18.544               202
18.544 < x[i] <= 92.937               202
In [21]:
classifier(new_data, k=6, inplace=True)
In [22]:
classifier
Out[22]:
Quantiles

Lower            Upper              Count
=========================================
x[i] <=  2.691               471
2.691 < x[i] <=  5.069               471
5.069 < x[i] <=  7.297               470
7.297 < x[i] <=  9.736               471
9.736 < x[i] <= 13.736               470
13.736 < x[i] <= 92.937               471

So, this is what I mean by “responsive” classes. They should:

1. support updating/reuse w/ new data
2. support augmentation of initial/init-time options/parameters
3. provide __call__ methods that consistently either update or use.

In map classification, I think __call__ would be better suited to find_bin than update_bins. In spatial regression, I think __call__ would be better suited to predict than something else.

__call__ should never alias summary() methods, which probably belong in __repr__, anyway.

## March 24, 2016

### Upendra Kumar (Core Python)

#### Automated testing using unittest

Ref. from : https://pymotw.com/2/unittest/
unittest is a testing framework for Python. It supports :
1. Test automation
2. Sharing of setup and shutdown code for tests
3. Aggregation of tests into collections
4. Independence of tests from the reporting framework

To implement the above mentioned functionalities, it supports some OOP based concepts:

1. test fixture : It is something like initializing and loading all variables(everything needed) before running the real logic for testing.
2. test case : Here we generally hard code responses for particular critical test cases.
3. test suite : It’s a collection of various test cases and test suites.
4. test runner : It decides the execution order during testing, interface for the user and format of output to be generated after the testing is completed.

## March 23, 2016

### Pranjal Agrawal (MyHDL)

First Post

This is the first post the Leros Developement blog for Google Summer of Code 2016. Application ready, fingers crossed. Hopefully lots more to come soon!

### ljwolf (PySAL)

#### Yep, that’s image view from within a terminal using libsixelThis...

Yep, that’s image view from within a terminal using libsixel

This is something I’ve always wanted, a way to just quickly dump scientific visualizations to a viewer within a CLI application. It looks like this does exactly that.

## March 22, 2016

### Upendra Kumar (Core Python)

#### My notes on extending Python with C/C++

1. Without third party tools : Using C and C++
2. With third party tools:
1. Cython
2. cffi
3. SWIG
4. Numba

Extending Python with C or C++:

Prerequisites :

1. Programming in C
2. Basic knowledge of Python

Functions performed by extension modules in C and C++:

1. Implement new built-in object types
2. Can call C library functions and system calls

All user visible entities are defined by “Python.h” have prefix Py or PY. “Python.h” imports stdio.h, string.h, errno.h and stdlib.h automatically.

static PyObject * spam_system( PyObject * self, PyObject * args )
{
const char * command;
int sts;

if(!PyArg_ParseTuple(args, "s", &command))
return NULL;

sts = system(command)
return PyLong_FromLong(sts);
}

Here PyArg_ParseTuple() <-- checks the argument types and converts them to C values.
It returns True if arguments are of right type (nonzero), otherwise return False(zero)
if arguments list is invalid.

## March 20, 2016

### mike1808 (ScrapingHub)

#### Welcome to My Blog!

Welcome to my blog!

## March 19, 2016

### aleks_ (Statsmodels)

#### Hello world!

Hello everyone,

this is the blog I set up for this year’s Google Summer of Code (GSoC). During this summer posts will show up here describing my coding experiences – but only in case of a successful application, so let’s hope for the best! : )

Aleks

## March 18, 2016

### Adhityaa Chandrasekar (coala)

#### Performance benchmark: C and Python

Hey everbody! Today I'll be doing a simple performance benchmark between Python and C.

I knew before starting that Python will be slower than C. And it has every reason to be so: it's an interpreted language after all. But when I actually saw the results, I was blown away. I found C to be over 22 times faster!

A good way to test the speed of two languages is to make them compute the first N prime numbers. And for this, I used Sieve of Eratosthenes. The reason? It's a simple, yet powerful algorithm that is very popular and is used frequently. It is, in a nutshell, a powerful benchmarking technique. Let's dive into the code. [Github repository]

Here is the python code: main.py

import sys

MAX_N = 10000000

prime = [False] * MAX_N

i = 2
while i < MAX_N:
sys.stdout.write(str(i) + " ")
j = i * 2
while j < MAX_N:
prime[j] = True
j += i
i += 1
while i < MAX_N and prime[i]:
i += 1

And here is the C code: main.c

#include <stdio.h>

#define MAX_N 10000000

int prime[MAX_N];

int main() {
int i = 2, j, k = 0;
while(i < MAX_N) {
printf("%d ", i);
j = i * 2;
while(j < MAX_N) {
prime[j] = 1;
j += i;
}
i++;
while(i < MAX_N && prime[i])
i++;
}

return 0;
}

As you may see, the two are almost identical in the steps used. But it's worthwhile to discuss the differences too:

• In Python, due the lack of something analogous to #define like in C, we have to resort to using a normal variable MAX_N. This might lead to a slightly slower performance compared to the preprocessor directive.
• In Python, we use i += 1 instead of i++ like we do in C. I'm not too sure about the performance impacts of using either, but intuitively, I feel i++ is faster since since processors may have dedicated instructions for them. Again, I'm unsure about this, but felt it was necessary to point out this difference.
• In Python, you may see the prime = [False] * MAX_N compared to the C equivalent of int prime[MAX_N]. I concede that this makes it slightly slower, but on further testing, I found the impact is really negligible.

And with that out of the way, let's look at the performance!

$gcc main.c$ time ./a.out > output_c
./a.out > output_c  0.43s user 0.02s system 99% cpu 0.450 total
$time python main.py > output_python python main.py > output_python 9.54s user 0.08s system 100% cpu 9.611 total$ diff output_c output_python
$There you go! The Python code takes over 9 seconds to complete the task while C takes just 0.43 seconds! That's blazing fast when you consider that it just found all the primes under 10 million. So there it is: while I absolutely love Python, it's simply not designed for high performance tasks. (I'm not saying I'm the first one to discover this, but I had to find it out for myself.) Until next time, Adhityaa ### Shubham_Singh (italian mars society) #### EUROPA INSTALLATION europa-pso is a Platform for AI Planning, Scheduling, Constraint Programming and Optimization EUROPA INSTALLATION : 1. Download the Europa for your specific operating system from here There are not enough documentation about installing europa-pso other than I have tried using these documentation and successfully installed europa on my system operating system : Ubuntu 14.04 try to follow the following steps for easier installation : • JDK -- sudo apt-get install openjdk-7-jdk • ANT -- sudo apt-get install ant • Python -- sudo apt-get install python(IF YOU ARE USING UBUNTU 14.04 OR ABOVE PYTHON IS PRE INSTALLED ,SO YOU CAN SKIP THIS STEP) • subversion -- sudo apt-get install subversion • SWIG sudo apt-get install swig • for installing libantlr3c follow these steps in this sequence 1. type this in terminal svn co http://europa-pso.googlecode.com/svn/ThirdParty/trunk plasma.ThirdParty • This will create the plasma.ThirdParty directory 2. cd plasma.ThirdParty 3.Here , you will find the libantlr3c-3.1.3.tar.bz2 zip file ,unzip the file in the current directory ie inside plasma.ThirdParty 4. after extracting : cd libantlr3c-3.1.3 5. type : ./configure ; make (if you are using 64bit system type this ./configure --enable-64bit ; make the output will not be same as the image but the last two lines should be same 6. type : sudo make install • the output should be like this 7. now unzip the europa zip file outside plasma.ThirdParty directory • After downloading the appropriate EUROPA distribution for your system (available here), just unzip and set the EUROPA_HOME environment variable. For example, assuming that you have the EUROPA distribution in your ~/tmp directory and want to install EUROPA in your ~/europa directory, using bash you would do (modify appropriately for your os+shell) : • mkdir ~/europa • cd ~/europa • unzip ~/tmp/europa-2.1.2-linux.zip • export EUROPA_HOME=~/europa • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$EUROPA_HOME/lib # DYLD_LIBRARY_PATH on a Mac • export DYLD_BIND_AT_LAUNCH=YES # Only needed on Mac OS X 8.$EUROPA_HOME/bin/makeproject Light ~
cp $EUROPA_HOME/examples/Light/*.nddl ~/Light cp$EUROPA_HOME/examples/Light/*.bsh ~/Light

9     cd ~/Light
10   ant

The EUROPA GUI should appear.

The installation may result in some really inevitable java errors sometimes , but if you follow above steps sequentially you will get through the installation easily
But if you still get any error or doubt in the process or you find something wrong in the above steps please comment below

## March 16, 2016

### Kuldeep Singh (kivy)

#### Kivy: Let’s get started

Kivy is an Open Source, cross-platform Python framework for the development of applications that make use of innovative, multi-touch user interfaces.

## What’s so special about kivy?

With the same codebase. you can target Windows, OSX, Linux, Android and iOS (almost all the plaforms). Also Kivy is written in Python and Cython, based on OpenGL ES 2, supports various input devices and has an extensive widget library. All Kivy widgets are built with multitouch support making it awesome for Game Development as well.

## Who’s developing it?

There’s a sweet growing community of Kivy, it’s still samll but really efficient, the core developers are these guys and there are many more.

There are about 10 sister projects going on under Kivy Organisation, take a look and get your hands dirty by diving into their codebase.

## Getting Started

Install Kivy to your platform and read the instructions from here.

Let’s try to create a hello world example. (Example taken from the kivy website)

from kivy.app import App
from kivy.uix.button import Button

class TestApp(App):
def build(self):
return Button(text='Hello World')

TestApp().run()

Save this program in your .py file and run it, you should see something like this.

st application in Kivy which will work on all the platforms.

Now try something fun, try to package your application for either of the platforms.

You can find everything in Kivy Documentation. (pdf)

## Found some bug?

Report it via their issue tracker.

## March 12, 2016

#### Pre-application period

What is GSOC? Google Summer of Code is a global program that offers students stipends to write code for open source projects. We have worked with the open source community to identify and fund exciting projects for the upcoming summer. (source: https://www.google-melange.com/gsoc/homepage/google/gsoc2015)

I’ve currently been in touch with coala for a few weeks. It’s been one of my best professional experiences so far. They’re amazing people, helped me a lot during my trial, while trying to get some of my first experiences with the open source world.

Right now i’m trying so hard to get my applications right. It seems that they are counting a lot on GREAT applications. I think this is my most important step in this process right now.

It took me a lot of time to find the right organization for me, been to a lot so far and they were either cold or just unhelping. But here, I received help from the first question until my last. It has been a bit over a week from my first contribution, when i was struggling with typos, until they gave me write rights. A few days later, I started writing Tutorials for them (see http://coala.readthedocs.org/en/latest/Getting_Involved/Newcomers.html and http://coala.readthedocs.org/en/latest/Users/Tutorials/Git_Help.html), just to give back to this organization what it gave me at start: help. Now I am doing it all out of pleasure, the pleasure to develop professionally while loving what you do.

### mkatsimpris (MyHDL)

#### First Post

From this simple first post starts the fascinating ride through the exciting world of GSoC 2016. If I get accepted to participate in GSoC 2016, the organization I will be working, is called MyHDL and more details can be found in http://myhdl.org.

## March 11, 2016

### ljwolf (PySAL)

#### npr: North Carolina voters are likely to be confused when they...

npr:

North Carolina voters are likely to be confused when they arrive at the polling place on March 15th. In addition to presidential candidates, voters will see congressional primary candidates on the ballot.

But thanks to a federal court decision, the districts those candidates represent no longer exist and any voters in those races won’t count.

Thanks to three judges, two animal shapes and one hastily redrawn map of U.S. House seats, North Carolina politics have been thrown into chaos.

It started to go off the rails on February 5th when a panel of three federal judges determined that the boundaries of the state’s 1st and 12th congressional districts were drawn in such a way to concentrate African American voters and dilute their overall influence.

Coming just five weeks ahead of voting, there was no choice but to “Stop the current election, go back, redraw the lines,” said Josh Lawson, general counsel for the North Carolina State Board of Elections.

## North Carolina’s Congressional Primaries Are A Mess Because Of These Maps

GIF: Noah Veltman/WNYC and Alyson Hurt/NPR

### Adhityaa Chandrasekar (coala)

#### Shifting my blog to a static page

I recently learned that Github Pages supports only certain plugins. Which means I won't be able to use amazing stuff like jekyll-archives, jekyll-timeago. So I'm shifting my entire blog to a static website (generated by Jekyll locally), unlike before when I had Github build the site for me.

I also made a small script to automate the whole new post process for me. Here it is:

if [[ $# -eq 0 ]] ; then echo 'Please give the url ID.' exit 0 fi gvim -f ./_posts/date +%Y-%m-%d-$1.md && jekyll build && cd _site && git add -A && git commit -m "Site update" && git push --force && cd .. && echo "Done!"

Just save it as newpost in the Jekyll root directory and run it with ./newpost shift-to-static-webpage for example and write your post in gvim. When you finish it automatically force pushes to master (it's alright, it's just my blog).

I know it's ugly, but hey, it works.

Also I installed vim-instant-markdown with a slight modification to instant-markdown-d: instead of Chrome opening a new tab, I created a simple Chrome extension to open the markdown preview in a window. It's all working perfectly!

## March 10, 2016

### Adhityaa Chandrasekar (coala)

#### Hello, world!

Hi everyone, Adhityaa here! This is my first attempt at making a personal blog. I searched around and decided to go with Jekyll as it's simple and beautiful. I'm still learning my way around this, but I'll get there eventually.

Hope you enjoy your stay here! :)

## March 07, 2016

### Riddhish Bhalodia (dipy)

#### Code Mode On! GSOC 2016 with DIPY

Hi! I am Riddhish. I am very pleased to announce that I will be working on a denoising and segmentation project for DIPY  (Python Software Foundation), which is a free and open source platform for computational neuroimaging, specifically dealing with diffusion magnetic resonance imaging (dMRI), as a part of Google Summer of Code (2016) .I will be mentored by Eleftherios GaryfallidisOmar Ocegueda, Rafael Neto Henriques and Julio Villalon

## What is Google Summer of Code?

Google Summer of Code (GSoC) is an yearly internship program by Google to help the open source communities to reach out to student contributors. Organisations give out projects, and students apply for those projects, once selected the students are required to code throughout the summer in order to finish the project. This is my first time participating in GSOC and I hope to have lot of fun😀

## Project Overview

If you do not get the cartoon a little bit about diffusion MRI here may help😀

### Denoising by Local PCA

My project basically deals with denoising diffusion MRI images, which are corrupted by field noise. Currently DIPY have used non-local means approach for denoising, which gives results as shown below

The method however, has limited usage as it needs to estimate the noise variance of the signal which is often a bit troublesome, also it does not make the full use of the directional information in dMRI datasets. This project proposes to use a more robust and efficient method for automatically denoising diffusion MRI and structural MR datasets, using Local-PCA. Along with an accurate implementation of L-PCA and its adaptation to Rician noise, the project will also aim at optimising the implementation of L-PCA using Cython. I will be following the research paper by Diffusion Weighted Image Denoising Using Overcomplete Local PCA [coupe2013].

Apart from this I have been exploring usage of adaptive denoising for DIPY and have started contributing in regards to that here , I will describe this method as well once I make the formal PR regarding this.

### Brain Extraction

After implementing L-PCA, a method for robust brain extraction needs to be developed. DIPY’s median OTSU based implementation is known not to work so well with non-echo planar diffusion imaging (non-EPI dMRI) data. There are few possible ways to improve this, one is to generate labels and have weighted median OTSU. Another idea involves using some version of patch based segmentation using image library constructed from previously annotated images and using it as a reference for the subsequent extraction.

## More to come…

I will maintain this blog weekly and keep things updated about the progress, till next week

~Riddhish

## March 05, 2016

### srivatsan_r (MyHDL)

#### MyHDL Example

I was learning MyHDL recently and wanted to try out something in MyHDL. So, I modeled a Serial Adder using MyHDL.

Personally I like MyHDL very much as it is very easy to learn and python’s power and clarity makes MyHDL an ideal solution for high level modeling.

The circuit was modeled using two shift registers and a full adder.

There are two registers R1 and R2 which are parallel loaded with two numbers, inside the test bench. The final sum R1 + R2 is stored in R2.

At the positive edge of the clock the shift registers shift to right by one bit.

CIRCUIT CONSTRUCTION

The construction implemented in  my model is same as the one given above. R2 is the shift register at the top and R1 is the one at bottom.

R1’s Least Significant Bit ( LSB ) output is connected to the input bit of the shift register R1, so that the number in R1 will get restored into R1 at the end of the addition, by this way data in R1 is not lost. R1’s LSB is also connected as an input to the full adder.

R2’s LSB is connected as an input to the full adder. The sum of the two numbers is found out by adding bit by bit of the two numbers from their LSBs.

The output of the full adder is connected to the input of the shift register R2. By soing so the output sum will be stored in R2.

The carry out obtained by the full adder is given as the carry in for the full adder in the next positive edge of the clock.

The Python code written using MyHDL module can also be converted to Verilog or VHDL using MyHDL’s functions. The code written has a test bench in it . The conversion to Verilog/VHDL should be done only for the main module and not with the test bench.

The Project Source can be found here.

### ghoshbishakh (dipy)

#### Making rtl8723be Wireless Adapter Work In Linux

Till last year whenever I encountered a laptop with WiFi not working in linux it was a Broadcom Wireless Adapter.

But this year things are different. Nearly all new HP laptops are having problems with WiFi in linux (ubuntu, arch, manjaro). And surprisingly the problem is not that the WiFi driver is not working at all. But it is worse, the signal strength received is so weak that it is absolutely unusable.

A quick lspci | grep Wireless shows the Wireless Adapter in your system. In my case the device causing problem was a Realtek:

RTL8723be

After scanning through numerous threads I finally found the solution in this github issue:

https://github.com/lwfinger/rtlwifi_new/issues/88

## So here is the step by step procedure to solve the issue:

First some make sure the dependencies for building the driver are installed:

In Ubuntu:

In Arch:

Now clone the rtlwifi_new repository:

Checkout the branch rock.new_btcoex

Now build and install the driver

Reboot the system.

Now disable and enable the driver with proper parameters.

NOTE: If this does not work try: sudo modprobe -v rtl8723be ant_sel=1

### srivatsan_r (MyHDL)

#### Virtual Makeup

I was just sitting at my home and whiling away the time during my summer holidays and then I thought why not try something with python.

Why Python? It is simply because I just love python for its neat flawless syntax.

I tried doing color transfer onto an image, i.e. changing the color of an image without affecting the texture present in the image. For doing this I had to use LAB color space.

LAB color space is specially designed to measure colors in the same way human brain interprets them. So, I found this color space to be very apt for this project. I could just change the L, A and B values of each pixel by a constant value and get done with the color transfer.

For more info on LAB color space.

ALGORITHM :

Let ( L1, A1, B1 ) = mean of (L ), mean of(A ), mean of (B ) of all the pixels where color has to be changed and let (L2, A2, B2) be the LAB values of the final color to which the image has to be changed. Then for each pixel of the image,

L has to be increased by (L2 – L1)

A has to be increased by (A2 – A1)

B has to be increased by (B2 – B1)

So, essentially I m just changing the mean of each component of the LAB color space of the image to the target color’s LAB value. By doing so our eyes will perceive it in the same way as it did with that target color.

So, this will change the color of the image without affecting the texture. Now I did this same color transfer with an image of a face, where i applied the color transfer for just the cheeks of the face.

Then, I added gaussian blur from opencv python module to smoothen the edges of the region where color transfer was done. The final result obtained is shown below.

Before

After

Then, I tried the same for applying lipstick and nail polish.

The source code for the project can be found here.

### ghoshbishakh (dipy)

#### Building VTK with python bindings in linux (arch)

I came across VTK while building the docs for DIPY and what I needed was the python bindings.

I use arch linux so installing from pacman is simple:

But this fails to install the python bindings properly and when you try:

it throws the error:

That leaves no other way except to build VTK from source including the python wrapper, for the python version you want to use vtk in.

## So here is the step by step procedure:

From vtk website download the latest source tarball. For me it is VTK-7.0.0.tar.gz then extract it:

Now configure cmake properly for building python wrappers:

This will give you an interface like: Now use your arrow keys to select the option you want to change and press enter to change value.

Toggle VTK_WRAP_PYTHON on.

Toggle VTK_WRAP_TCL on.

Change CMAKE_INSTALL_PREFIX to /usr

Change VTK_PYTHON_VERSION to 2.7 (or the version of python you want to use vtk in)

Now press [c] to configure

Then press [g] to generate and exit

Note: Sometimes you need to press c and g again.

Now run:

This will create a directory: Wrapping/Python

Now install the python bindings:

Hopefully that should install vtk properly.

To check, in python run:

This should give something like:

‘/usr/lib/python2.7/site-packages/vtk/init.pyc’

#### Started Using Jekyll

So started using jekyll for my blog!

And syntax highlighting yay!!

## Why?

Since I started using linux I have been a huge fan of the open source initiative.
And why wouldn't I? I can use some of the most sophisticated, cutting edge, well designed software for basically no cost. On top of that you get the overwhelmingly helpful communities for each project.

Everyone knows that the best way to become better at something is to practice. When you look about tips on the internet about how to learn to work with some programming language or framework, you get the same reply "Try working on a project". The problem with this suggestion is that if you have close to 0 experience it's very hard to cook up something on your own. Ideas are not that easy to come by either. Working on an existing project seemed like a feasible idea.

## My story on finding coala.

So I decided that I wanted to start contributing to open source projects. If you are in a college environment you have undoubtedly heard about the Google summer of code. I wanted to participate (still want btw) so I looked up some popular open source projects that get accepted every year. Under the GNOME project ideas proposed for 2016 I found some contact details about a possible mentor (later it turned out that he is the founder of coala). We will call him sils like his github username. sils invited me to the community wide gitter channel and advised me to introduce myself as a newcomer.

## What is coala?

coala is an open source code analysis tool.

Easy to use COde AnaLysis Application - yes! For all languages! http://coala-analyzer.org/

Its analysis modules are called Bears and they are the reason coala supports (or can support) all languages. The best part about coala is that it's very easy to set up in an existing project. In its current version it is a CLI tool for Linux, OSX and Windows which I prefer because I work in the Linux CLI, but it will have a GUI soon enough. It already has plugins for popular editors such as atom and sublime.

Interestingly coala hasn't participated in Gsoc as an organization by itself but it mentored 2 students under GNOME's umbrella last year (hence me finding coala through GNOME's ideas page).

## Getting started

Like a lot of projects, coala has a github page used to host the repo, track issues, etc. In coala the issues have a difficulty level, so every contributor (even complete noobs like myself) have where to start. Just go over the newcomer issues, pick one (usually they are trivial bugs but if you have trouble someone will help you for sure on the gitter channel) and proceed to solve it. This way you learn about the coala conventions on the commit messages and so on.

It's important to note that the newcomer bugs are intentionally left unsolved so that new people can learn how to contribute to coala.

## What do you get from contributing?

As I wrote above you get experience, in my case this was very much needed. The most important aspect is that you learn by doing, and as a motivation bonus whatever you do will be used in a real life project. You will be given constructive feedback on your contributions, you will be helped when in need, you will be asked to give constructive feedback and help others in need.

What if you already have experience? The second most important aspect of contributing to open source projects (generally, not only coala) is that you get exposure. Your contribution will be there for everyone to see and review months/years after it was accepted. It is a way to demonstrate what you can achieve. Also you will meet new people that have different opinions than yours for the same matters.

It's a win win in both scenarios (noob/veteran).

## Conclusion

This was my experience working with the coala community in the last week. I hope I encouraged you to participate and contribute to some open source project, if that is coala even better. And there goes my first ever blog post.

## January 28, 2016

### Pulkit Goyal (Mercurial)

#### Introduction

Hello readers, as this is an introductory post so I must introduce myself a bit. Myself Pulkit Goyal, a sophomore in an engineering college in India, pursuing bachelors in Computer Science. I am a good competitive programmer and a data science enthusiast. I love teaching, solving problems, optimizing algorithms and has keen interest in the field of data science especially machine learning. I believe in a quote by Margaret Fuller that

## October 17, 2015

### kaichogami (mne-python)

#### Huffman loss-less compression

Hello everyone!
This time I wrote scripts to compress text files based on Huffman compression algorithm. And yes, it really compresses text files and we all know how much applications does compression have in real world. As always, every code that I write here will be in python, and written in very easy way. I hope its clear enough for you all to understand!

Firstly lets understand the intuition behind the algorithm. Every data type that we use in language for example, int, they all have certain size associated with them. For example int generally has a size of 4, double a size of 8, character a size of 1 and so on. These numerical values representing the size, indicates the number of bytes they take up on the memory. Each byte is of 8 bits. So an int data type takes 32 bits of memory. For a reminder, a bit is just a on or off signal for the computer. Everything that we use in computers are just a combination of 1s(on) and 0s(off).

Every text that we use is of ASCII encoding format. For more details you can refer wikipedia page. In Huffman algorithm we exploit two facts:

• ASCII format contains 128 different characters.
• Frequency of characters in a given text varies. Sometimes a lot.

Lets say a text is “zippy hippy deny meni”. Clearly this text contains a lot of ‘p’. ASCII code for ‘p’ is 112. Binary representation is ‘11100000’. So each ‘p’ occurrence in the text uses one byte of memory. This utilizes lot of space unnecessarily. What we do is, we take the frequency of all the characters of the text, and assign our own binary representation for characters. More the occurrence of a character in the text, shorter binary representation for that character. Pretty cool isn’t?

How is the idea implemented? That is the main question. In my implementation of the above idea, I have split it into two modules. One for encoding the data, other for decoding. It uses binary tree data structure so you might want to brush it up in case you have forgotten.

Lets begin with encoding.

Algorithm

1. We create a single node binary tree. The key is the character and value of the node is the frequency of character occurring in the tree.
2. Store these trees in ascending order of value(frequency) in a list.
3. Repeatedly pop the last two trees and combine them into one, with their parent value as the sum of frequency and its left and right node as the original tree nodes.
4. Adjust the list in a sorted manner accordingly.
5. Repeat step 3 and 4 till only one tree is left.

This will create a so called encoding tree. If we think of left as 0 and right as 1 and begin traversing from root, each leaf node will represent a new binary value for a character. Note that every leaf is a tree with key as the character. Larger frequency value of nodes have shorter path. This forms the basis of encoding our data into its representation.

Next part of challenge is to find each path of the leaf node and then save it in dictionary(hash map). Now if you think about it a bit, you will notice that the binary tree is a full binary tree, meaning a node is either a leaf, or has two children or is a root. Also as noted earlier, every leaf node is the character. To reach a particular leaf node from root, we follow a path, which will always be unique. We represent them with 0s and 1s as mentioned above. The reason for this is, since every character encoding could be of arbitrary length, we cannot split the string into particular length and parse it. Using a binary tree, we will know when it is suppose to end by the leaf node.
To find path of leaf nodes, we write a recursive function.

def _find_path(tree, path):
if type(tree.key) == str:
return [tree.key, ''.join(path)]

left = self._path_leaf(tree.left, path+'0')
right = self._path_leaf(tree.right, path+'1')

ans = []
ans.extend(left)
ans.extend(right)

return ans

Take a minute to look and understand it. These are best explained when you think about it!😀

Now after finding the paths of each character and storing it in a list, we can easily convert it into a dictionary.

def _create_dict(self, ans):
temp_dict = {}
for x in xrange(0,len(ans),2):
temp_dict[ans[x]] = ans[x+1]

return temp_dict

Following the above recursive function, we get a list with character at even index and frequency at odd index. We simply save it in a dictionary.

Next part is to convert the original string that we wish to compress, into a string of 0s and 1s using the dictionary that we just created. These 0s and 1s will then become the bits of single byte. This byte is then written into file.
To write bits in byte, we use bitarray library of python. Since implementation will be different for different users, I will not go into details and leave a task for you. If anyone gets stuck anywhere, please feel free to contact me!

Decoding

Decoding involves retrieving the original text from the compressed file using a meta file(the tree we created above) as a source. We can directly somehow use the created tree object in our decode program, or we can save the dictionary values in a file and re-create the tree. Former method will occupy a lot of space, since tree is a custom data type. Latter will involve re-creating a tree, but since creating tree is relatively fast considering a small input(Note that the characters to be encoded will be less, since most of the characters involves alphabets and numbers) we will use the second method.

Challenge is to read individual bits from a byte. File.read() method reads only a single byte. Here we will take the help of bit operators, particularly arithmetic right shift operator(>>).

def _get_bits(self, f):

byte = (ord(x) for x in f.read())
for x in byte:
for i in xrange(8):
yield (x >> i) & 1

The arithmetic right shift operator right shits by i position. The operator pads with the most significant bit of the number. Say if a binary number is 101(5). 5>>1, will give 110. Performing the & operation with one, 110 & 100 will give 0. Therefore we successfully extracted the second bit of the binary string. We continue to do this till 8 iterations to extract 8 bits of every character. Here yield is a python keyword which is similar to return, but differs in the sense that after returning a value it will continue to return from that point this loop ends.

After extracting the bits, we can construct the tree from the meta file, and traverse the tree till we reach a leaf node and write the value in a string to give the final output. Here is the code snippet to traverse the tree.

length = len(self.binary_bits)
index = 0

while index < length:
if temp.key != None:
original += temp.key
temp = self.meta
continue

if self.binary_bits[index] == '1':
temp = temp.right

else:
temp = temp.left

index += 1

Thats about it. The rest would involve to make it more functional by using a class. Here I wrote down the main algorithm and designing part of Huffman implementation. If you want to see the whole working code, you check my github repo and use it.

I hope you learnt something new, atleast 10% of what I said. If you found it interesting then go ahead and try it for yourself. You can share your implementation here. Thank you for reading and as always I am open to questions and feedback!

## September 18, 2015

### Preetwinder (ScrapingHub)

#### First Post

Hello this is my first post on my blog. The blog is hosted on Github pages and was generated using Jekyll. The theme used is So Simple.

First Post was originally published by preetwinder at preetwinder on September 18, 2015.

## September 12, 2015

### kaichogami (mne-python)

Hello everyone!😀
I hope you are all doing well. Whenever I write, I feel like I wrote yesterday! Time passes so quickly!

Just now I finished working on a manga downloader. What it does is, it will let you download any number of manga chapters and save it in your folder.  Its especially useful if you like reading manga in one go or perhaps get a limited internet connectivity or perhaps you happen to be a collector of mangas!😛
Its usage is fairly simple. First download this zip file and extract the contents. Then run the “main.py” file. You will have to provide arguments in the command line to download. For example “python main.py fairy_tail 2 3”. This will download fairy tail manga’s chapter 2 and 3 under the download folder. You will have to replace space with “_” to make it work. Like it?

This also comes with a spell corrector which is much needed in this case. We are all not Japanese neither do we know Japanese. So its highly likely that you will make mistake while downloading mangas such as “Karate Shoukoushi Kohinata Kairyuu”. The spell checker included in the software loads a manga list, with the names of huge collections of mangas. Then it uses the edit distance algorithm to find the nearest string to the user’s given string, if the given name is not in the list. The running time is O(m*n), which is slow, but not so slow considering our string length will almost never exceed even 70.

After correcting the spelling, urllib2 is used to download an image. First the html response is captured. Then in every html data, the image file is extracted. I have used a very naive logic to do that. I simply look for the first “.jpg” in the string and find the link associated with it. There are chances of various “.jpg” image, but it chooses the first one. I have noticed a bug with this is logic that it will download starting from the 2nd page. This is because the first “.jpg” associated link probably leads to the second page. I have yet to confirm this. Also I have used User Agents as normal programmatic way of opening the website and saving image was not working. It always saved a corrupted folder. I could not find a solution to that. Mostly likely the error resides somewhere in my use of “urllib2.retreive(” ..”)” method.
Downloading is is done until the end of chapter is reached. We know we have reached the end of chapter if it returns a 404 error. Unfortunately I do not have an elegant way of knowing if a certain chapter is the last chapter.
Everything is stored in a folder named “download” in the main directory. It first creates a directory named with the manga name and saves each chapter in different directory.

Make sure to have a look at my manga downloader.
Thank you everyone for reading!

Cheers!

## September 03, 2015

### kaichogami (mne-python)

#### Coding Competition

Hello everyone! I hope you are all doing good!

On 1/08/2015 our college ACM chapter held a coding competition, with me and two other friends, Ashutosh and Sharuya organizing the event. The problems set mostly consisted of algorithms based questions. Although we were nice enough to not put any tough algo based question. If you want to take a look at the questions, you can download it here.
Questions are not so easy and it will definitely make you think. First question is really easy, and second one is also okay okay. Third one is tough and you will have to think a lot to come up with a recursive formula. The fourth question is a harder version of Tower of Hanoi. Last question if just pure thinking but will require a lot of patience to read the question😛

It was the first time I organised an event. I was surprised that it went so well and smoothly. There were negligible trouble anywhere. Perhaps that is a bad thing considering organizing something = getting into trouble and solving it smartly. Well I hope that happens next time. And of course the event was fun!😀

Coding competition is a good way to improve your logical thinking as well as coding skills. Although in the vast field of computer science, its just a small part. That being said, solving questions in online competitions is fun. So take part in them if you think you are enjoying it, otherwise it will just become a burden for you.
Lastly, thank you for reading! This was not so informative as last time, forgive me for that!

## August 19, 2015

### Shridhar Mishra (italian mars society)

#### Finals

The final model of the project is in place and the Europa planner is working the way its supposed to be.
The code in the repository is in a working condition and has the default NDDL plan on it which moves the rover from the base to a rock and collect the sample.
Integration with the Husky rover s underway and the code is being wrapped up for the final submission.

Shridhar

## August 06, 2015

### ranveeraggarwal (dipy)

#### GSoC ’15: A Summary

It seems like yesterday when I was an open source newbie on IRC, equipped with the knowledge of programming in C++, learnt through university courses. Little did I know that writing programs is just the prologue of the process called software engineering. Not wary of things like building, good practices and writing code that doesn’t just run on my machine, but on the machines of all those who are affected by it, I started the summer struggling through cmakes and compiler errors and understanding code that was written well, just not by me.

Step 1 was packaging software and publishing it on launchpad. My mentor, Jonathan Riddell taught me this process by going through instructions step by step, and every time, giving me knowledge about tools that I wasn’t aware about. After a lot of failed attempts in sending patches upstream, I finally managed to get PackageKit and PackageKit-Qt5 pushed on to launchpad, and they built successfully. This was done because in everything following this, these two applications would be required. And this was a success because I didn’t face any problems in the subsequent parts of the project. The packages can be found here. I have also described a sequence of steps as learnt in the process in an earlier blogpost.

The next part of my summer was spent in understanding KDE terminology, how KDE software works, how to make KDE software work (pun intended), and understanding PackageKit by pinging a lot of people on IRC. After making a compilation of KDE documentation for myself and playing around with Frameworks 5 and Qt, I started working on making an application that would install a given package via PackageKit. This involved understanding the PackageKit API and also PackageKit-Qt, a Qt Wrapper for PackageKit. Building this application took more time than was estimated, but at the end of this exercise, I was pretty much well versed on using PackageKit and building a Frameworks application. This application has been put on KDE’s git repositories and would be helpful to anyone who’d want to do this exercise in the future.

Until this point, I was working on code written by myself. Now was the time to get into a real working application. This was Dolphin. I spent some time experimenting with Dolphin, and working out on how to make it install Samba. Later realizing I had set out on the wrong path, the real place to work it out was KDE-Network-Filesharing. Now, Network-Filesharing wasn’t on KF5 yet and this seemed like a good opportunity to learn some porting, the code for Filesharing not being too large in volume. So, after spending a lot of time on some well written blog posts by people who had done it previously and archived mailing lists (those are real life savers) and also the porting scripts, I was able to successfully port Network-Filesharing to KF5 and it’s now on the frameworks branch. Though it still needs kdelibs4 support, most of it is on KF5 and uses Qt5.

Next up was the task of using what I had learnt in the initial phases (installing software) and using it to do some actual work. And so, after getting a good hang of the Filesharing code, I managed to make it PackageKit compliant and now it installs Samba via PackageKit-Qt5.

Next, I took up Plasma-Desktop and package installs for KCMs, specifically access and locale.

First, I took up KCM access. Now KCM access has this option of enabling a screen reader. The problem? The option stays there even if the screen reader isn’t installed. So, the idea was to have that option only when a screen reader is installed, and if it isn’t, disable that option and have a button to install screen reader. Again, some Qt magic and using my past implementation of PackageKit, this was done – and now, it works as expected.

Next up was K3b. Now, K3b needs some codecs which need an interface for installing. Again, this has been implemented using PackageKit, but the UI part is on the todo list. Thereafter I started working on locale (now known as translations). Here, instead of using PackageKit, the idea was to use libKubuntu, which is a more stable way of doing it, and had been done previously. Yet again, this has been partly implemented, and I hope to complete its implementation soon (this took some time give that this wasn’t the usual PackageKit implementation).

And this pretty much summarizes up the summer. The most challenging part in it all? Working out how each application, programmed differently by different work. The most fun and interesting part? Same. Thanks to exposure to a variety of codebases, I can now work on most KDE applications easily and the learning curve wouldn’t be as steep.

I am indebted to my mentor Jonathan Riddell and all the people on #kde-devel, #packagekit and #kubuntu-devel without the help of whom I would still have been stuck on compiling my first codebase. It’s an amazing community and they are really very helpful. Even the easiest of my doubts were met by great enthusiasm and not once was I not given a response for an issue I faced. This is something that is not very common on IRC. Hats off to the KDE community for that! Google Summer of Code ’15 has been an amazing experience, and I will surely stick around the community as a contributor for as long as would be possible :)

## July 02, 2015

### Shridhar Mishra (italian mars society)

#### Mid - Term Post.

Now that my exams are over i can work with full efficiency and work on the project.
the current status of my project looks something like this.

Things done:

• Planner in place.
• Basic documentation update of europa internal working.
• scraped pygame simulation of europa.

Things i am working on right now:
• Integrating Siddhant's battery level indicator from Husky rover diagnostics with the planner for more realistic model.
• Fetching things and posting things on PyTango server. (Yet to bring it to a satisfactory level of working)
Things planned for future:
• Integrate more devices.
• improve docs.

## June 20, 2015

### Shridhar Mishra (italian mars society)

#### Update! @20/06/2015

Things done:

• Basic code structure of the battery.nddl has been set up.
• PlannerConfig.xml has is in place.
• PyEUROPA working on the docker image.

Things to do:
• test the current code with pyEUROPA.
• Document working and other functions of pyEUROPA(priority).
• Remove Arrow server code from the existing model.
• Remove Pygame simulation and place the model for real life testing with Husky rover.
• Plan and integrate more devices for planning.

## June 18, 2015

### ranveeraggarwal (dipy)

#### JEE: Making a Choice

Congratulations! You’ve successfully managed to steer past lakhs of people like you, and you’ve been given the opportunity to choose your path for your life ahead – something that the lakhs of others like you wouldn’t get. If you feel that this is your hard work, you have a right to be proud of yourself. If not, and you feel this was a fluke, then just imagine! Every event that has ever taken place on this planet has led to this moment. So if you ask yourself whether you deserve to be more opportune than others, well yes you do.
But oh wait! You’re throwing away this opportunity? You’re listening to that random uncle/aunty who claim to know everything only because their sons/daughters are making it large at TCS? Huh. What a waste! Please don’t do that.
Now is the time to decide. This decision of yours will influence your life ahead. 20 years down the line, if you look back, everything will boil down to this tiny decision. Are you still going to let Mr. And Mrs. Sharma take this decision for you? I hope not.

### Making a Decision

It’s not that you cannot make a decision. It’s just that you don’t know how to. To make an informed choice, you need to first decide on the parameters. What is more important to you? Is it the so-called packages that Google/Facebook offers or is it the thrill of seeing a spacecraft designed by you land on the moon? So, parameters. I personally feel that while making this choice, you should reflect on your life till now. You must be aged somewhere near 18. In the last 18 years of your life you must have developed interests. Were you one of those people who used to play for endless hours with LEGOs and try and build something new every time? Or maybe you were one of those who, in their 5th grade made a speed boat using a soft-drink bottle and a DC motor. See, things seem clearer already.

### Regarding Packages

*Drumroll* the moment many of you have been waiting for.
Yes. I won’t deny the fact that Computer Science and Engineering is one branch that can ensure that you have a decent pay right when you graduate out of college. But, 10 years hence, the salary package of a guy/girl who graduated as a Chemical Engineer (for example) would be the same as you, if not higher. However, he/she would be doing what he/she actually wanted to, unlike you, who gave a higher preference to money then. He/She will have enjoyed the last 10 years, and you’d have timed in and out with your pass card in an office you know you don’t care about. Again, I’m not saying Computer Science is bad, just that it’s not a field that everyone would enjoy.

### My experience with Computer Science.

Three years ago, I made a choice. Do I regret it? Not to the slightest. I wanted to work with software before it became a fad. And so, I went for it.

My preconception? I’ll get to make games and hack the crap out of NSA’s computers. Did I do so? No. Because no one told me. Even if it sounds so, Computer Science is not all about that. It’s more maths than electronics. It’s just raw application of your brain. Some of the greatest advancements in the field of computer science would seem very intuitive to you. “Hah! Ismein kya hai?” is what you’ll think on reading up an algorithm that took years to come up with. Computer Science is about understanding computers. It’s about learning to talk, learning to think, learning to write in their way and teaching them your own ways. Writing code is very similar to writing a novel. Except that instead of getting your message out to the readers, you need to make the computer understand what you think. And while humans are smart, a computer is an idiot.

Even though it seems like the whole life of a computer scientist revolves around a single machine and that the field is very narrow, it isn’t. The number of mutually exclusive areas in computer science is huge in number, beyond the scope of this article.

Computer Science at IIT Bombay is fun, and competitive and grueling at the same time. While you’ll be saved from the agony of using scary looking machines and turning that block of metal into a cylindrical shaft, you’ll be welcomed by long (and at times painful) assignments, 10-hour lab(s) (You’ll experience at least one of those), and the pain of having to (at times) put in some extra hours while people from (some) other departments are off for a trip to Goa. Otherwise, you’ll have more projects in your CV than many other branches, you’ll learn to know what online presence means and you can become a rock star even if you haven’t picked up a guitar all your life (you’ll get to know what this means later).

So, make a choice that you won’t regret. The four years ahead of you are going to be nothing short of a roller coaster ride. Which ride you want to take, you decide.

## June 09, 2015

### Shridhar Mishra (italian mars society)

#### Coding in full swing.

Ok, so all the installations are over after a bit of a hassle while installing EUROPA-pso, except that all the installations like Pytango and pyEUROPA went well.

Since i had to face a lot of problem installing EUROPA on a 64 bit ubuntu 14.10 machine, i have decided to write stepwise procedure of installing it so that if required it could be done again.

These steps has to be followed in specific order for successful installation or its almost inevitable to get some weird java errors.

Prerequisites.

• JDK-- sudo apt-get install openjdk-7-jdk
• ANT-- sudo apt-get install ant
• Python -- sudo apt-get install python
• subversion-- sudo apt-get install subversion
• wget -- sudo apt-get install wget
• SWIG sudo apt-get install swig
• libantlr3c
• unzip sudo apt-get install unzip

Now let us get the necessary packages to install libantlr3c.

svn co http://europa-pso.googlecode.com/svn/ThirdParty/trunk plasma.ThirdParty

Get Europa.

cd ~/plasma.ThirdParty

Install ANTLR-C
First, unzip libantlr3c-3.1.3.tar.bz2.

cd plasma.ThirdParty/libantlr3c-3.1.3
> ./configure --enable-64bit ; make> sudo make install

The above commands are for 64 bit machines.
for 32 bit machines remove --enable-64bit flag.

Installing EUROPA.
mkdir ~/europa
cd ~/
europaunzip ~/tmp/europa-2.1.2-linux.zip export EUROPA_HOME=~/europaexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$EUROPA_HOME/lib

Add the following lines to ~/.bashrc at the end.
EUROPA_HOME=~/europaLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$EUROPA_HOME/lib

Testing.

$EUROPA_HOME/bin/makeproject Light ~ cp$EUROPA_HOME/examples/Light/*.nddl ~/Light
cp $EUROPA_HOME/examples/Light/*.bsh ~/Light If the install was successful. cd ~/Light ant The Gui should appear for EUROPA. If all the steps a correctly followed it should work. Links. ANTLR-C installation Europa Installation. Quick start Apart from this i have been able to successfully run the Rover example from europa which is to be modified according to the further needs of the Italian mars society. ## May 29, 2015 ### Ravi Jain (MyHDL) #### Summer Project – MyHDL For people who dont know about HDL(Hardware Description Language), it is a computer language facilitating the design and simulation of electronic circuits, mainly digital logic circuits (eg: Flip Flops, RAM, etc). VHDL and Verilog are the industry standard HDLs. MyHDL is a free, open-source package for using Python as a hardware description and verification language. Its goal is to provide the high-end features of Python to the hardware designers. More information on MyHDL can be found here. MyHDL also looks to provide support for conversion of code written using MyHDL to VHDL and Verilog for the convenience of hardware designers so they can integrate their work seamlessly. An example of code written in MyHDL and the converted code can be found here. My summer project aims towards adding 2D List Conversion Support to this suite. You can have a look at the project proposal for detailed description of my project. Currently i am trying to figure out how MyHDL actually works from inside. After little tracing of the flow of basic D Flip flop program, I managed to understand to some extent working of a very basic component of MyHDL, Hierarchy Extractor. It uses help of profiler (sys.setProfile(func)) to get the hierarchy of the source code under simulation. I target to understand the working of simulation and conversion of the basic DFF program by end of this week(31/5/2015) and post the summary regarding the same. ## February 20, 2015 ### Leland Bybee (Statsmodels) #### Topic Coherence and News Events One important issue that has to dealt with when you get output from a topic model is, do the topics make sense to a reader? An intuitive approach here is to look at the top X words sorted by how common it is for a word to appear with a topic. This is the beta parameter in LDA. However, this approach isn’t very rigorous. In order to formalize the approach beyond just eyeballing a word list, a number of coherence measures have been proposed in the literature. I focus on a variant of the UCI measure proposed by Newman et al. [1] The UCI measure relies on the pointwise mutual information (PMI) to calculate the cohesion of a topic. The PMI for a pair of words indexed i and j is PMI(w_i, w_j) = log(p(w_i, w_j)/(p(w_i)p(w_j))) = log((D(w_i, w_j)N)/(D(w_i)D(w_j))) where D(w_i, w_j) is the number of documents where both words appear simultaneously and D(w_i) is the number of documents where word i appears. The way that the UCI measure works is that for each pair of words in the top X terms for a given topic, the PMI score is calculated – in the original article using an external corpus, like Wikipedia – and the median PMI score is used as the measure of coherence for a topic. Newman et al. find that this measure performs roughly as well as manually determining the coherence. I want to use the coherence score to compare a number of methods for estimating topics, as well as to compare a number of different data sets and a number of sorting for the word proportions. However, I also want to have some sort of test to get a sense for what is a coherent topic in general. To do this, I decided to not only compare the coherence scores of the different approaches, but also calculate the probability of observing my coherence scores assuming that the word pairs were drawn at random. I should note here that I differ from the UCI measure to some extent in that I just use the source corpus instead of the external corpus. I certainly don’t think this would be a problem for the WSJ abstracts corpus or the NYT corpus that I have given their size, though it may cause some problems for the WSJ articles corpus. Down the road I’d like to compare the performance to the Wiki corpus but given that other coherence measures have been developed that work similarly to the UCI measure and use the source corpus [2], I’m not too worried. To build my null model, I do two different samplings of word pairs from my source corpus. The first is to do uniform sampling of word pairs, the second is to weight the sampling by the Tf-Idf score of each word. For this second method, what I should get is a sample with more word pairs that contain words with high Tf-Idf scores than with the uniform sampling. The reason for doing both forms of sampling is to test whether high Tf-Idf terms are more coherent with the text corpus as a whole. If this were true and the terms that appear in the top word lists for my topics have higher Tf-Idf scores in general as well this could cause trouble for my test. The histogram below shows the distribution of the PMI scores for the uniformly sampled pairs (blue) and the TF-Idf weighted sampled pairs (red) So it looks like the Tf-Idf sampling doesn’t have much of an effect on the distribution of PMI scores and that most of the PMI scores are grouped around 0. So moving on to the actual data. I wanted to compare two methods for detecting the topics, as well as my three data sets and three different word sortings. The two methods that I am currently playing around with are exploratory factor analysis (EFA) and latent dirichlet allocation (LDA). EFA isn’t a topic model but the loadings can be thought of as the word proportions for LDA so I’m going to calculate the coherence in the same way. The three sortings that I want to look at are no sorting, sorting by the proportion scaled by the corpus proportion p(w_i | \theta_j) / p(w_i) and a variation on Tf-Idf designed for the top word lists. This Tf-Idf sorting takes the number of top word lists that a word appears in as the document frequency and then uses the raw proportion as the term frequency. Looking at each model, data set, sorting I get the following 2 tables. The first table shows the mean coherence score for each group while the second table shows the proportion of the topics that are significant at the 95% level. This significance is calculated based on the null model with uniform sampling. model word sorting NYT WSJ Abs. WSJ Art. Topic Number 25 40 30 EFA Raw -0.12 0.56 0.88 EFA Post 0.16 0.63 0.98 EFA Tf-Idf -0.12 0.56 0.88 LDA Raw -0.54 0.01 0.04 LDA Post 0.42 1.10 0.90 LDA Tf-Idf 0.16 0.75 1.03 model word sorting NYT WSJ Abs. WSJ Art. Topic Number 25 40 30 EFA Raw 0.52 0.43 0.83 EFA Post 0.72 0.50 0.9 EFA Tf-Idf 0.48 0.43 0.83 LDA Raw 0.00 0.00 0.00 LDA Post 1.00 0.95 0.77 LDA Tf-Idf 0.56 0.55 1.00 The post sorting refers to sorting by p(w_i | \theta_j) / p(w_i) Additionally, it is worth noting that the currently used topics were selected rather arbitrarily for each data set. I’m still cleaning up some of the results so I haven’t pinned the optimal number of topics down yet. These are all ok approximations for now. Having compared these results to the topic coherence results for other numbers of topics I don’t think it is going to have a major effect. The posterior sorting seems to perform the best. In all cases but one (LDA Tf-IDF WSJ Art.), it performs better than the other two sortings. Tf-Idf and unsorted appear to perform comparably for EFA but there is a difference when you use LDA. [1] Newman, Bonilla and Buntine. Improving Topic Coherence with Regularized Topic Models. 2011. [2] Mimno, Wallach, Talley, Leenders, and McCallum. Optimizing Semantic Coherence in Topic Models. 2011. ## February 14, 2015 ### Anish Shah (Core Python) #### Hello Jekyll! :) I decided to abandon WordPress & Blogger and give Jekyll a try. I can confidently say that I could not be happier about this decision. ### So, what’s wrong with WordPress and Blogger? Many posts on my WordPress blog are infected with spammy comments and Blogger UI is just not good to write code snippets. ### Jekyll Jekyll is a cool tool for building static sites. So, the site loads faster since there is no database query. Jekyll is tightly integrated with Github Pages. Just create a repository that looks like username.github.io and push your Jekyll project into this repo and you’re done. :) It is as simple as that. You can easily integrate Disqus comments to the post by tweaking the posts template and Google Analytics by tweaking the default template. ## February 06, 2015 ### Leland Bybee (Statsmodels) #### Clustering News Events I’ve been working on a project for some time now, with Bryan Kelly, to detect “news events” in a text corpus of Wall Street Journal abstracts that we scraped back in July of 2014. I’ve written some on this in the past and the project has gone through a number of iterations since then. We are now working with more data than just the WSJ abstracts. We have also been doing work with set of WSJ articles for a smaller period of time, along with the first paragraph of New York Times articles going back to the 19th Century. Right now, my focus has primarily been on making a convincing argument for existence of these “news events” and their usefulness for explaining other response variables. One way that I’ve been approaching this problem is to develop a classification system for the topics that we extract. In general, it appears that news events will pop up for some subset of the observations and then drop off again as time progresses. What I want to do is start getting a sense of the patterns within the subsets of observations where there is signal. What I develop here is a clustering system that I use to begin giving some structure to the topics. One way to think of the topics is as a distribution over time. When you think about them this way, you can imagine that each observation for a topic represents the density of that topic in that period with some noise. My goal is to isolate the periods where a topic has some discernible signal. What I want to do is first remove the noise from the density and then drop periods where there doesn’t appear to be any signal. To do this, what I do is first perform local linear regression and use the fitted curves as my new topics. This largely removes the noise and gives me something cleaner to work with. I set the bandwidth using leave-one-out cross validation. With the resulting fitted curves, I then do thresholding using sd_i sqrt(2 * log(N) / N) as my threshold. This comes out to about 0.01 for most topics. Any observations with a fitted topic proportion below the threshold are then dropped. This leave me a subset of observations for each topic where the topic appears to be relevant. What I can do with these subsets is then produce 4 groups of topic variants. The first is the raw observations for the full period, the second is the fitted observations for the full period, the third is the raw observations for each topic’s corresponding subset of relevant observations, and finally, the fourth is the fitted observations for each topic’s corresponding subset of relevant observations. What I want to do then is build clusters for each of the groups to get a sense of the different shapes we see in the topics. The standard approach for calculating the distance between different time series, in this case the topics, is dynamic time warping. DTW works by calculating the distance between every pair of points in the two time series and using a function of these distances to get this distance between the two series. It is a very flexible approach and the two time series can be of different lengths. What I do here is estimate a matrix of DTW distances between each pair of topics that I get out of LDA (or any other latent variable approach) and do k-means clustering based on the distance matrix. The following plot shows the explained variation for each group of topic variants over the number of clusters. It seems that for all variants the sweet spot is around 5-6 clusters. What I have below are four plots, for each of the four groups of topic variants, of the topic proportions for each of the 30 topics. The topics are color coded for their corresponding topic. The raw topic proportions over the full period The fitted topic proportions over the full period The raw topic proportions over the corresponding subset The fitted topic proportions over the corresponding subset I find that the fitted topic proportions over the corresponding subset give the best sense of the actual grouping of the data. One thing that the full period plots do help with though, is giving a sense of the edge cases. What we see is that two of the clusters represent topics where we don’t get to see the full support because some of it lies outside of our observed time series. The clustering devotes a cluster to topics where the right side support is cut off and a cluster for topics where the left side support is cut off. The clusters give some sense of the data, though it isn’t immediately clear what the difference is between the orange, blue and green clusters for the fitted subsets. However, looking at each of the curves for the fitted subsets, side by side, you can begin to get a sense of the shapes that appear in the topic distributions. An approach that might work well would be to classify the topics based on their skewness and some measure of multimodality. I think it would be best to throw out the cases where the full support doesn’t lie within the observed periods since we can’t get a great sense of what the distributions look like there. In addition to classifying based on skewness and multimodality, I try to do some sort of distribution fitting to a mixture of skewed normal distributions. Topics could be clustered based on the minimal number of components needed to explain some threshold of variation. These are both approaches I hope to develop more soon. ## October 10, 2014 ### Leland Bybee (Statsmodels) #### News Events and Factor Models Pt. 2 I’ve been working on the news events project lately and my primary focus has been on how to get coherent topics out of the model. The problem is that each document in our corpus is a month which includes a number of different events. As a result our topics end up being a mixture of different events. To deal with this I have been looking through the original text of the corpus to try and identify words that most clearly identify coherent topics. To do this I’ve played around with a number of information measures, term frequency, inverse document frequency and term frequency inverse document frequency. Ultimately, I would like to find a measure that we could use to automatically remove a set of words that detracts from the coherent topics. I played around with these measures to try and find different sets of the top words. I looked at these to try and find words that appear similar to the topics that I ultimately want to estimate. In addition to this I remove some of the words from the corpus based on their sparsity. If there are words that only appear in a couple of documents these are removed because otherwise the corpus is practically too big. However, I don’t want to remove too many of these because then we may loose the words that add the most to the coherent topics. It is helpful to see the top words for the different information measures: Tf Idf Tf-Idf 1 new jul amp 2 said googl billion 3 year obama yesterday 4 will qaeda decemb 5 milion worldcom railroad 6 share etf ton 7 compani prebon rose 8 stock vivendi septemb 9 corp nra januari 10 inc subprim octob 11 report qwest analyst 12 price barack august 13 york yeltsin airlin 14 market spitzer stockhold 15 amp lucent novemb 16 week ipod depreci 17 bank bernank februari 18 presid suv per 19 end euribor growth 20 month frb dont 21 say websit barrel 22 first facebook china 23 sale blog washingtonth 24 last instinet equival 25 feder sar drug The picture changes slightly if you look at the top words when you resample the counts so that each period has the same number of words: Tf Idf Tf-Idf 1 new jul amp 2 said opa yesterday 3 year obama billion 4 corp reconvers railroad 5 milion nra decemb 6 share googl ton 7 stock nav septemb 8 will londonspot stockhold 9 compani decoppet januari 10 inc instinet rose 11 york bidask octob 12 amp kaiserfraz washingtonth 13 report interdeal august 14 price pwa novemb 15 week lendleas airlin 16 presid cwt quotat 17 market vulte rebruari 18 bank yeltsin per 19 end frb par 20 exchange dminus equival 21 washington khrushchev depreci 22 secur salabl barrel 23 sale barack quot 24 month qaeda cotton 25 feder convair railway The first part of this is that there are some words that are relatively uniformative and represent more general changes in the way the WSJ was written that don’t represent events. A lot of this is clear from the word tables. One clear example of this is the common way that the months are written in the data for instance: Now this doesn’t represent any real difference in the content of the corpus between these two periods but it is still picked up by the topic modeling. Additionally, there are a lot of words that appear to be “background noise” that need to be removed. It is interesting to plot some of the words that should be more “bursty” as I suspect more “bursty” words will lead to the sort of topics that we want. For instance you can clearly see the different U.S. presidents in the data: You can see both of the Bushes, Obama, Roosevelt, Nixon, Reagan and Reagan’s death. Similarly you can see how the shape of teh economy has changed over time: The rise of Asia: You can also use the words as a proxy for events: You can see the post war period, potentially the civil rights movement, the rise of terrorism and the signficance of Iran. Additionally, you can see a similar thing looking at wars: It is interesting to note that the size of the Iraq event is much larger than the Vietnam event. This has interesting implications for the importance of different words for the event topics. Ideally we should still see a Vietnam event as well as an Iraq event. For a little more fun it is fun to see the quick rise and fall of Al Gore: I put together plots for all the words in the data set that are available here ## September 26, 2014 ### Leland Bybee (Statsmodels) #### Network Latent Dirichlet Allocation A while back I made a post about a model I was developing to investigate trust and influence in media networks. This project has since bifurcated into a project focusing on the network topic model and a project focusing on media networks. The reason for this is that the topic model I developed was relatively new and wasn’t entirely tied to the media network idea. Since my earlier post I have changed the model significantly and I wanted to focus on that here. The goal of the changes was to simplify the model to make it more computationally tenable and to more accurately reflect the network structure that I was interested in. The reasoning behind my new model is slightly different from the previous attempt. Where as before, I assumed that the parameter \alpha was the a time series and a set of documents was drawn for each author for each period using this time and author dependent parameter, the new approach assumes that each document is part of a time series that is tied to the generating author. What this means is that \alpha can go back to being a hyperparameter like in LDA. The reason for this change is that it significantly cut down on the run time by treating each period author pair as a document. Where before, each document specific topic proportion was drawn from a weighted sum of the time dependent \alphas for each author, in the new model each document is drawn from a weighted sum of the previous period’s document specific topic proportions and a new draw from an underlying Normal distribution centered around a weighted value of \alpha. What this means is that each document is now the product of the documents and the previous period from all other authors in the network as well as an exogenous shock that comes from the distribution centered around a weighted \alpha value. The weight for the \alpha value corresponds to the significance of the external shocks to the ith author’s document specific topic proportions. It may help to see the generative process layed out in full: For each document draw \theta_{it} \sim N(\theta_{t-1}^T \gamma_i, \sigma^2) + N(\Gamma_i \alpha, \delta^2) For each topic draw \phi_k \sim Dir(\beta) For each word in a document draw z_n \sim Mult(\phi(\theta_{it})) w \sim Mult(\phi_z) The benefit of this procedure over my previous attempt is that it still captures the influence measure (\gamma_i) while also allowing for external shocks (\Gamma_i) and is in general simpler than the previous attempt making it easier to implement. For this model all of the parameters have conjugate forms except for \theta_{it} which greatly simplifies the implementation. For \theta_{it} I use the Minmo method discussed in the earlier post linked above. The appendix of my paper on this topic goes over the implementation of the Gibbs sampler in more detail. ## August 07, 2014 ### Leland Bybee (Statsmodels) #### News Events and Factor Models Pt. 1 I’ve recently been working with a fairly large dataset of article summaries taken for the Wall Street Journal from 1925-2010. The goal of the project is to try and detect events in the news stream and then use these events to build factor models of the stock market. The way to think of an event in this context is a distinct period during which some information is entering the system. This information – in theory – should change how people behave in the market and could serve as a predictor for different market conditions. My main focus so far has been the event detection part of the project. This alone isn’t a trivial task. To identify the different events we wanted to start with a topic model like LDA. Topic models are currently fairly popular in the event detection literature because they allow you to capture a latent structure that resembles what we would think of as a topic or an idea, or in this case a news event. To put this all together I’ve primarily been using the tm and topicmodels libraries in R. The initial data processing was done in Python. Once the topic model is build the next step is to find clusters. To do this I have just been using k-means clustering. I played around with a Gaussian mixture model but the results weren’t great. Though I may return to it later once the data has been cleaned some more. For the clustering I’m trying to find clusters that best capture the idea of a news event. The way it works in that in each period – months in this case – there is some \theta that represents the proportions of each topic in that period. I’m running the k-means clustering on these proportions to detect periods where a single or group of topics dominates the news stream. What you see when you look at a plot of the topics is that topics will rise and then fall off to be replaced by an entirely new topic. What I ultimately want to detect are these distinct periods where one topic dominates. The structure of the topic proportions can be seen more clearly when you look at a plot of the topic proportions over time. The following plot shows each of the topics along with the cluster that most closely captures the key topic of a period. This was determined by finding the topic that had the largest relative proportion in for each cluster. If a topic had already been assigned then the second highest relative proportion topic was chosen. If you allow for multiple clusters to be dominated by a single topic the picture changes. This suggests two things, one there may be less distinct news events than there are topics (in the five topic case which is probably pretty limited). Two, there may be multiple topic news events. The following plot is from an earlier run where I did not remove as many popular words from the dataset. This resulted in more correlation between topics that led to a topic dominating multiple clusters. It is also clear that the topics are less distinct here and seem to be more heavily influenced by cyclical trends. Finally, I also included individual plots for each of the topics to make the picture clearer. There are several key aspects of the project that need to be accounted for before I move forward. First, the number of clusters and topics needs to be optimized. This will obivously also depend on the factor model but some basic idea of what number of topics and clusters best fits the data would be important. Limited to the 5 topic case I’ve put together a plot of the Bayesian information criterion for the 2-25 clusters. This supports the idea that the optimal number of clusters is somewhere under 5, probably 4. Finally, I need to run more tests to determine the coherence of the topics. The current iteration was run by removing the top 10% and bottom 50% of words in terms of density. This left roughly 3000 unique words. I had to cut the top words because they were messing with the topics and making them indistiguishable as discussed above. An alternative would be tf-idf, however, this produced similar results. Here are the topic 50 words for each of the current 5 topics. Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 1 amp dont amp amp analyst 2 depreci technolog airlin washingtonth bush 3 etc global dec quotat airlin 4 bushel doesnt debentur bushel didnt 5 auditor employe oct debentur comput 6 certifi risk jan summari clinton 7 washingtonth isnt comput airlin yen 8 aggreg bush penc elsewher stake 9 deplet that yearearli londonth network 10 ten network washingtonth bale editori 11 bale target aug wool jan 12 quotat game analyst aeronaut respond 13 latter team nov yearend employe 14 amort spokesman electron televis russia 15 roosevelt focus sept furnish technolog 16 gallon stake quotat guid wouldnt 17 irregular michael tokyo fraction warner 18 debentur analyst feb secondari feb 19 tone ventur mar ceil telecommun 20 mexican fed nuclear postwar hong 21 reelect challeng spokesman comparison doctor 22 locomot player interview kennedi phone 23 territori cant ventur stuart reform 24 bag role televis atom dec 25 audit there werent complianc dont 26 sec chines carter civilian argu 27 comparison patient wasnt oversubscrib entertain 28 seaboard wont wont zinc cite 29 accumul familiar takeov ind worri 30 understood giant overthecount washingtona electron 31 inquiri didnt undisclos halsey treatment 32 prevail realli subordin physic insid 33 suffici access diversifi unoffici isnt 34 tonnag strategi dont airplan doesnt 35 chamber media sixmonth westinghous global 36 dull sec ottawa ottawa sec 37 omit client missil sec kong 38 furnac rival taxexempt tube media 39 bros exampl medic subcommitte stanley 40 cuba student oneyear theatr medic 41 tendenc art juri lumber composit 42 admit kill yearold irregular gop 43 dominion track pershar compos partnership 44 butter theyr edit synthet lawsuit 45 pool typic pipelin oddlot founder 46 youngstown wasnt isnt pulp hasnt 47 resolut wrote technolog alloc arent 48 calcul children toll coupon korea 49 extent medic didnt locomot children 50 plate yearold jet omit chip ## May 31, 2014 ### Terri Oda (PSF Org admin) #### You can leave academia, but you can't get the academic spam out of your inbox When I used to do research on spam, I wound up spending a lot of time listening to people's little pet theories. One that came up plenty was "oh, I just never post my email address on the internet" which is fine enough as a strategy depending on what you do, but is rather infeasible for academics who want to publish, as custom says we've got to put our email addresses on the paper. This leads to a lot of really awesome contacts with other researchers around the world, but sometimes it leads to stuff like the email I got today: Dear Terri, As stated by the Carleton University's electronic repository, you authored the work entitled "Simple Security Policy for the Web" in the framework of your postgraduate degree. We are currently planning publications in this subject field, and we would be glad to know whether you would be interested in publishing the above mentioned work with us. LAP LAMBERT Academic Publishing is a member of an international publishing group, which has almost 10 years of experience in the publication of high-quality research works from well-known institutions across the globe. Besides producing printed scientific books, we also market them actively through more than 80,000 booksellers. Kindly confirm your interest in receiving more detailed information in this respect. I am looking forward to hearing from you. Best regards, Sarah Lynch Acquisition Editor LAP LAMBERT Academic Publishing is a trademark of OmniScriptum GmbH & Co. KG Heinrich-Böcking-Str. 6-8, 66121, Saarbrücken, Germany s.lynch(at)lap-publishing.com / www. lap-publishing .com Handelsregister Amtsgericht Saarbrücken HRA 10356 Identification Number (Verkehrsnummer): 13955 Partner with unlimited liability: VDM Management GmbH Handelsregister Amtsgericht Saarbrücken HRB 18918 Managing director: Thorsten Ohm (CEO) Well, I guess it's better than the many mispelled emails I get offering to let me buy a degree (I am *so* not the target audience for that, thanks), and at least it's not incredibly crappy conference spam. In fact, I'd never heard of this before, so I did a bit of searching. Let's just post a few of the summaries from that search: From wikipedia: The Australian Higher Education Research Data Collection (HERDC) explicitly excludes the books by VDM Verlag and Lambert Academic Publishing from ... From the well-titled Lambert Academic Publishing (or How Not to Publish Your Thesis): Lambert Academic Publishing (LAP) is an imprint of Verlag Dr Muller (VDM), a publisher infamous for selling cobbled-together "books" made ... And most amusingly, the reason I've included the phrase "academic spam" in the title: I was contacted today by a representative of Lambert Academic Publishing requesting that I change the title of my blog post "Academic Spam", ... So yeah, no. My thesis is already published, thanks, and Simple Security Policy for the Web is freely available on the web for probably obvious reasons. I never did convert the darned thing to html, though, which is mildly unfortunate in context! comments #### PlanetPlanet vs iPython Notebook [RESOLVED: see below] Short version: I'd like some help figuring out why RSS feeds that include iPython notebook contents (or more specifically, the CSS from iPython notebooks) are showing up as really messed up in the PythonPython blog aggregator. See the Python summer of code aggregator and search for a MNE-Python post to see an example of what's going wrong. Bigger context: One of the things we ask of Python's Google Summer of Code students is regular blog posts. This is a way of encouraging them to be public about their discoveries and share their process and thoughts with the wider Python community. It's also very helpful to me as an org admin, since it makes it easier for me to share and promote the students' work. It also helps me keep track of everyone's projects without burning myself out trying to keep up with a huge number of mailing lists for each "sub-org" under the Python umbrella. Python sponsors not only students to work on the language itself, but also for projects that make heavy use of Python. In 2014, we have around 20 sub-orgs, so that's a lot of mailing lists! One of the tools I use is PythonPython, software often used for making free software "planets" or blog aggregators. It's easy to use and run, and while it's old, it doesn't require me to install and run an entire larger framework which I would then have to keep up to date. It's basically making a static page using a shell script run by a cron job. From a security perspective, all I have to worry about is that my students will post something terrible that then gets aggregated, but I'd have to worry about that no matter what blogroll software I used. But for some reason, this year we've had some problems with some feeds, and it *looks* like the problem is specifically that PlanetPlanet can't handle iPython notebook formatted stuff in a blog post. This is pretty awkward, as iPython notebook is an awesome tool that I think we should be encouraging students to use for experimenting in Python, and it really irks me that it's not working. It looks like Chrome and Firefox parse the feed reasonably, which makes me think that somehow PlanetPlanet is the thing that's losing a <style> tag somewhere. The blogs in question seem to be on blogger, so it's also possible that it's google that's munging the stylesheet in a way that planetplanet doesn't parse. I don't suppose this bug sounds familiar to anyone? I did some quick googling, but unfortunately the terms are all sufficiently popular when used together that I didn't find any reference to this bug. I was hoping for a quick fix from someone else, but I don't mind hacking PlanetPlanet myself if that's what it takes. Anyone got a suggestion of where to start on a fix? Edit: Just because I saw someone linking this on twitter, I'll update in the main post: tried Mary's suggestion of Planet Venus (see comments below) out on Monday and it seems to have done the trick, so hurrah! comments ## April 26, 2014 ### Terri Oda (PSF Org admin) #### Mailman 3.0 Suite Beta! I'm happy to say that... Mailman 3.0 suite is now in beta! As many of you know, Mailman's been my open source project of choice for a good many years. It's the most popular open source mailing list manager with millions of users worldwide, and it's been quietly undergoing a complete re-write and re-working for version 3.0 over the past few years. I'm super excited to have it at the point where more people can really start trying it out. We've divided it into several pieces: the core, which sends the mails, the web interface that handles web-based subscriptions and settings, and the new web archiver, plus there's a set of scripts to bundle them all together. (Announcement post with all the links.) While I've done more work on the web interface and a little on the core, I'm most excited for the world to see the archiver, which is a really huge and beautiful change from the older pipermail. The new archiver is called Hyperkitty, and it's a huge change for Mailman. You can take a look at hyperkitty live on the fedora mailing list archives if you're curious! I'll bet it'll make you want your other open source lists to convert to Mailman 3 sooner rather than later. Plus, on top of being already cool, it's much easier to work with and extend than the old pipermail, so if you've always wanted to view your lists in some new and cool way, you can dust off your django skills and join the team! Do remember that the suite is in beta, so there's still some bugs to fix and probably a few features to add, but we do know that people are running Mailman 3 live on some lists, so it's reasonably safe to use if you want to try it out on some smaller lists. In theory, it can co-exist with Mailman 2, but I admit I haven't tried that out yet. I will be trying it, though: I'm hoping to switch some of my own lists over soon, but probably not for a couple of weeks due to other life commitments. So yeah, that's what I did at the PyCon sprints this year. Pretty cool, eh? comments ## March 29, 2014 ### Terri Oda (PSF Org admin) #### Sparkfun's Arduino Day Sale: looking for inspriation! Sparkfun has a bunch of Arduinos on crazy sale today, and they're allowing backorders. It's a one day sale, ending just before midnight US mountain time, so you've still got time to buy your own! Those$3 minis are amazing.

I wound up buying the maximum amount I could, since I figure if I don't use them myself, they'll make nice presents. I have plans for two of the mini ones already, as part of one of my rainy day projects that's only a little past drawing board and into "let's practice arduino coding and reading sensor data" stage. But the rest are waiting for new plans!

I feel a teensy bit guilty about buying so many arduinos when I haven't even found a good use for the Raspberry Pi I got at PyCon last year. I did buy it a pretty rainbow case and a cable, but my original plan to use it as the brains for a homemade cnc machine got scuttled when John went and bought a nice handybot cnc router.

A pretty picture of the pibow rainbow raspberry pi case from this most excellent post about it. They're on sale today too if you order through pimoroni

I've got a few arty projects with light that might be fun, but I kind of wanted to do something a bit more useful with it. Besides, I've got some arty blinky-light etextile projects that are going to happen first and by the time I'm done those I think I'll want something different.

And then there's the Galileo, which obviously is a big deal at work right now. One of the unexpected perks of my job is the maker community -- I've been hearing all about the cool things people have tried with their dev boards and seeing cool projects, and for a while we even had a biweekly meet-up going to chat with some of the local Hillsboro makers. I joined too late to get a chance at a board from the internal program, but I'll likely be picking one up up on my own dime once I've figured out how I'm going to use it! (John already has one and the case he made for it came off the 3d printer this morning and I'm jealous!)

So... I'm looking for inspiration: what's the neatest arduino/raspberry pi/galileo/etc. project you've seen lately?

## March 02, 2014

### Terri Oda (PSF Org admin)

#### Google Summer of Code: What do I do next?

Python's in as a mentoring organization again this year, and I'm running the show again this year. Exciting and exhausting!

In an attempt to cut down on the student questions that go directly to me, I made a flow chart of "what to do next" :

(there's also a more accessible version posted at the bottom of our ideas page)

I am amused to tell you all that it's already cut down significantly on the amount of "what do I do next?" emails I've gotten as an org admin compared to this time last year. I'm not sure if it's because it's more eye-catching or better placed or what makes it more effective, since those instructions could be found in the section for students before. We'll see its magical powers hold once the student application period opens, though!