# Python's Summer of Code 2015 Updates

## May 29, 2015

### Andres Vargas Gonzalez(Kivy)

#### Bonding period Summarize

During the bonding period I did some work on how kivy graphics pipeline works. Kivy has two implementations for lines. kivy.graphics.Line and kivy.graphics.SmoothLine. Normal line have antialising problems which are being solved in the latest one. SmoothLine is an experimental alternative that I put my hands on. While trying to create an SmoothLine object from python the following error was produced:

Exception TypeError: “object of type ‘NoneType’ has no len()” in ‘kivy.graphics.vertex_instructions.SmoothLine.build_smooth’ ignored

This was fixed by changing the way the constructor invoked the parent class. This issue shows up just when instantiated from python. Once I could instantiate SmoothLine I tested SmoothLine creating lines from points collected from touches. Then I came up with one of the issues of SmoothLine. When a line is nearly 180 degrees in direction a pixel stretch along the horizontal or vertical axis depending on the direction as can be seen on the following figure.

This is related somehow to my project since I can use smoothlines instead of lines for the gestures section. Three new classes are proposed: Point, Stroke and InkCanvas. There is an implementation in the kivy.graphics for points but stores a list of points I am looking to have a more native representation so a stroke can be handled as a list of points. For later algorithms for ink processing these object representation could make it easier.

The main goal of my project for the google summer of code besides rendering a matplotlib is being able to interact with it in a way that can be differentiated from already existing backends such as gtk, qt, etc.

Over the weekend a post on the MPL workflow implementation will be given. That’s it for now, next week I expect to have more work done and a first draft of the MPL modules implementation.

week sync 05

## Last week:

• Rewrite the module using amoco project.
• Using amoco project to symbolic execution gadgets, return a mapper.
• Caculate the sp move, know the gadget move.
• Tidy the regs relationships, got gadget regs.
• Discard mem[xxx] <- zzz gadgets.
• Symbolic execution gadget, return a mapper.
• Get the symbolic expression of every output regs.
• Example: pop rdi; ret. we need rdi == 0xbeefdead.
• Get rdi <- { | [0:64]->M64(rsp) | } from mapper.
• Convert this expression to z3 format. expression == 0xbeefdead
• Using z3 solver, add the condition above, and solve it.
• Get the content in mem[rsp].
• Extract gadget finder/classify/solver process respectively to a python class.
• Merge the module to binjitsu master branch and submit a PR.

## Next week:

• Do something as zach suggested in PR comments.
• doctests/examples for all the methods in this module.
• Topological sorting has some bugs needed to be fixed.

## May 28, 2015

### Artem Sobolev(Scikit-learn)

#### NCA

Not to be confused with NSA :-)
So the coding has started!

The first algorithm to implement is Nearest Components Analysis (NCA for short). Unlike other methods no complicated optimization procedure required: authors propose just a plain gradient descent (actually, ascent since we're going to maximize). Of course, this has it's own drawbacks: target is non-convex, so it's hard to come up with an efficient algorithm that's guaranteed to find the optimum.

Authors propose 2 objective functions with different interpretations. The first one minimizes expected number of correctly classified points, and has the gradient of the following form:$$\frac{\partial f}{\partial \mathbf L} = 2 \mathbf L \sum_{i} \Bigl( p_i \sum_{k} p_{ik} (x_i - x_k) (x_i - x_k)^T - \sum_{j \in C_i} p_{ij} (x_i - x_j) (x_i - x_j)^T \Bigr)$$And the second one minimizes KL-divergence, and its gradient is:$$\frac{\partial f}{\partial \mathbf L} = 2 \mathbf L \sum_{i} \Bigl( \sum_{k} p_{ik} (x_i - x_k) (x_i - x_k)^T - \frac{\sum_{j \in C_i} p_{ij} (x_i - x_j) (x_i - x_j)^T}{p_i} \Bigr)$$
One thing to notice here is $(x_i - x_k) (x_i - x_k)^T$ outer product. In order to speed up the whole algorithm we'd like to precompute these products in advance, but it could take a lot of space: $O(N^2 M^2)$ where $N$ is number of samples and $M$ is number of features. Unfortunately, this is too expensive even for medium-sized datasets (for example, for 1000 samples of 50 features it'd require ~10Gb of RAM if stored in doubles).

What can be done with it? I can think of several possibilities:

1. Recompute these products over and over again. There is space for various engineering optimizations, for example, we can keep a cache of those products, using it only if $p_{ij}$ is not too small.
2. Restrict ourselves to a diagonal $\mathbf{L}$ case. This is a useful option in general, since it allows to run these methods on larger datasets.
3. Do "coordinate-wise" gradient ascent: pick a cell in $\mathbf{L}$ and make a step along the gradient.
The basic implementation goes like this
def fit(self, X, y):    n_samples, n_features = X.shape    rng = np.random.RandomState(self.random_state)    L = rng.uniform(0, 1, (self.n_components, n_features))    outers = np.ndarray((n_samples, n_samples, n_features, n_features))    for i in range(n_samples):        for j in range(n_samples):            d = (X[i, :] - X[j, :])[None, :]            outers[i, j] = np.dot(d.T, d)    C = {}    for i in range(n_samples):        if y[i] not in C:            C[y[i]] = []        C[y[i]].append(i)    for it in range(self.max_iter):        grad = np.zeros( (n_features, n_features) )        fnc = 0        for i in range(n_samples):            x = X[i, :]            A = np.dot(L, x)[None, :] - np.dot(X, L.T) # n_samples x n_comp            logp = -(A*A).sum(axis=1)            logp[i] = -np.inf            logp -= sp.misc.logsumexp(logp)            p = np.exp(logp) # n_samples            class_neighbours = C[y[i]]            p_i = p[class_neighbours].sum()            grad += np.sum(p[:, None, None] * outers[i], axis=0) * p_i - \                np.sum(p[class_neighbours, None, None] * outers[i, class_neighbours], axis=0)            fnc += p_i        grad = 2 * self.learning_rate * np.dot(L, grad)        L += grad        print("Iteration {}, target = {}".format(i+1, fnc))    self.L = L    return self
Moreover, it even works! :-) I took the following example:
Yes, I like XKCD :-) BTW, you can get an XKCD "mod" for matplotlib.

Here we have 2 classes (red and blue) divided into train and test (train is opaque, semitransparent is test). Obviously, 1NN will make a lot of mistakes here: samples are very close according to feature 2, and quite distant according to the feature 1. It's decision areas are

So 1NN and 3NN make a lot of mistakes on this artificial problem. Let's plug in NCA as a transformer:

Decision boundary became much more linear, as one would assume looking at data. Right plot shows data space after applying learned linear transformation $\mathbf{L}$.

The above implementation is just for reference and better understanding of the algorithm. It uses a lot of memory, and not as efficient as one might want.

### Aron Barreira Bordin(Kivy)

#### Week 1 - Buildozer integration and new menu

Hi!

I started to work with Kivy Designer this week. In the last weeks, I studied the Kivy Documentation and made some contributions to Kivy Designer. I submitted my first PR today, I'm very happy about this first week. I had developed a completely new menu, with a better design, easier to use, and more powerful.

## Whats is new

Kivy Designer is now integrated with Buildozer. Now it's possible to build and run your Python application on Desktop, Android and iOS devices! To handle these multiple targets and options, I created a "Build Profile" settings.

There are three default profiles:

• Desktop
• Android - Buildozer
• iOS - Buildozer

The user is able to edit these profiles, create new ones or even delete them. With build profiles I hope to turn multi-platform development easier. Now it's just necessary to change the profile to build the same application to the desired target :)

## Bounding period

• I studied a lot about Kivy
• Added a initial support for Python 3
• Fixed Kivy Designer code style
• Tried to help users on #Kivy(I'm not yet experienced enough to support major part of the users, but I've been always reading and trying to help when possible)
• While studying Kivy Designer, I had found a lot of bugs and had some ideas to new improvements; Everything is listed here

## My first PR

I did my first PR to the project :) I'm still waiting the review.

New improvements:

Bugs(some bugs are related):

## Next week

In the next week, I'll be fixing more bugs and developing the Buildozer Settings UI, a easy to use interface to edit the buildozer.spec file.

I'll try to improve Kivy Designer performance as well, but I'll try to get some tips with my mentors before start working with it ;/

And if possible, I'll add support to the Hanga builder.

Thats it, thanks for reading :)

Aron Bordin.

## May 27, 2015

### Andrzej Grymkowski(Kivy)

Hi!

Coding time has just started. Before that I worked hard to get into this project. Nowadays from android side I feels much more confident about how to add new features or compile examples and run on my phone.

## What have I done

1. Some small fixes have been done like updated support for vibrator and email on early android version phones. Merged and updated old pull requests like orientation.
2. One facade has been added - audio with example and implementation for android. It's responsible for recording and playing audio on phone.
3. Update kivy pep8. Generally docs and style guide is a bit different in kivy and respectively in plyer. One of the reason can be using triple one quote ''' instead of triple double quotes """ for docs. But on other hand in pythonic standards it's correctly. Still I cant get accustom to this.
4. Structure of facades have been changed. For long time all facades where put in one file. One from last merged pull requests splits facades into separated files. One facade file per one class.

## In progress

There are many branches in progress.
1. Implementation audio for linux plyer-linux-audio . It uses pyaudio for recording and playing. Standard builtin python modules seems to hard for me.
2. plyer-android-hardware: it's implemented but in java file. The main goal is to move functionality from there to plyer. Most are moved but these are not good structured. It should be split into separate facades. Example has to be updated either. Current implementation are like all in one.
3. plyer-speech-recognition - android implementation works well enough. Linux recognizer recognizes very badly a words. For linux I used python package 'SpeechRecognition'. Later I think to test another module. Both platforms have same issue - they can't listen in background for very long time.
4. plyer-android-bluetooth - still like in raw. As I remember facade and implementation still lays in example folder. Works features like togling enable/disable and scanning for devices. Implementation for android of course. For linux I have plan to use package Blues. It should works on OSX also.
5. android-contact-list-example - I don't know when I end it. It's much complicated that I need some kind of manager to control contacts in easy way. Current methods mostly are based on django manager. I have to check how it solves python package sqlalchemy. It's another ORM and don't know what more o0. Other problem is to when to load data from contacts. Loading it at start will take some time. Both android and iOs have contacts split on two groups: people and groups. One contact has a lot of information like addresses, email addresses, images and groups which belongs to. By loading group I mean to load all people in that group. Fine solution would be implement browser that does not load all data but searches and paginates only that data. Another one is to load only names with ids of contacts and rest of data gets in mean time. I will consider it much more.

## What I have learnt?

• Kivy and plyer style coding
• How the compilation is done to run kivy on android (TODO how it look on iOs :-3 )
• How implement Java classes and interfaces in python
• pep257 and pep8. Go even further and check out hacking project!

best regards,

### Artem Sobolev(Scikit-learn)

#### API design

Having discussed mathematical aspects of the selected metric learners, it's time to move towards more practical things, and think how these methods fit existing scikit-learn conventions.

Since there're no metric learning methods in scikit-learn at the moment, and I'm going to contribute several of them, it makes sense to organize my contributions as a new module called metric_learning.

Many of metric learning models aim to aid KNN, so it's not an Estimator, but rather a Transformer. One possible application is to transform points from the original space to a new one using matrix $\mathbf{L}$ (recall $\mathbf{M} = \mathbf{L}^T \mathbf{L}$). This new space is interesting because Euclidean distance in it is exactly the Mahalanobis distance $D_\mathbf{M}$ in the original space, so one can use methods that support Euclidean distance, but don't support custom metric (or it's computationally expensive since calculating $D_\mathbf{M}$ requires matrix multiplication, so it might be preferable to do this multiplication only once per training sample).

ml = LMNNTransformer()knn = KNeighborsClassifier()pl = Pipeline( ('ml', ml), ('knn', knn) )pl.fit(X_train, y_train)pl.predict(X_test)

Another application is similarity learning. There are methods like SpectralClustering that can use precomputed affinity matrix, so we'd like to be able to compose those with metric learning.

ml = LMNNSimilarity()sc = SpectralClustering(affinity="precomputed")pl = Pipeline( ('ml', ml), ('sc', sc) )pl.fit(X_train, y_train)pl.predict(X_test)
Accordingly, each algorithm will be shipped in 2 versions: transformer + similarity learner. Of course, I'd like to minimize code duplication, so the actual implementation would be similar to that of SVMs: the base class and a couple of descendants that implement different transforms.

## Since my last post, several things have happened:

1) I got accepted to Google Summer of Code 2015 (yaaayy!!!)

2) The project I ended up choosing, telescope observation planning with Python and Astropy, has been in the planning stages for the last month.  Another student who joined the project, Brett Morris, was also accepted to GSoC, and together with our mentors, we've started to plan out the structure of the code, also known as the API.

3) I've been playing around with existing observation planning packages in Python to prepare for the project (namely Skyfield).

---

## So, my first major task is to build the documentation for code that acts as a mock API.

This came in part out of my need (as a newcomer to structured, open source coding) to better visualize and understand the hierarchy of classes, methods, etc. that make up the strawman API we're using to plan out the code base.

I'll be using Sphinx to build these documents, as it's what Astropy and affiliated packages use.  The minimum you need to build visualization such as this class inheritance diagram is a skeleton code of placeholders for future, working functions.

My battle plan:
• Make one .py file with function placeholders & index.rst file
•
• python setup.py build_sphinx

Some resources:

### What is Google Summer of Code?

Google Summer of Code is a really great opportunity for early-career astronomers to learn to code with forethought for open source projects that will actually get used by other astronomers — something we often aspire to do, but are rarely taught to do. To begin a GSoC project, you work one-on-one (or in my case, two-on-one) with mentors who are experienced open source developers to prepare a proposal for a software tool you would like to make with their help, including a detailed description of the project's deliverables and timeline.

In the astronomical world, one source of GSoC projects is astropy, our friendly neighborhood Pythonic astronomical Swiss-army knife. There are projects related to the active development on the "core" guts of astropy — like one proposed project by UW graduate student Patti Carroll — in addition to projects on affiliated packages which make use of astropy to do new things for more specific end-users than astropy core.

Your proposal gets written up in a wiki page on the astropy GitHub repository, where it can be revised with the help of your proposed mentors.

### My GSoC2015 project: astroplan

My GSoC 2015 proposal is to help co-develop astroplan (heads up: as of posting in May 2015, this repo will be boring), an observation planning and scheduling tool for observational astronomers. This package will allow observers to enter a table of astronomical targets and a range of observing dates in order to retrieve (1) the sky-coordinates for their targets, (2) rise/set times, moon/sun separation angles, airmass ephemerides, and other essential positional criteria necessary for determining the observability of a target, and (3) a roughly optimized observing schedule for the list of targets. This project will take advantage of the already-developed infrastructure for these calculations in the coordinates, time, and table modules of astropy, plus Astroquery — an astropy-affiliated package. If you don't already know about these powerful tools, check them out!

I will be working with a great team of astroplanners including mentors: Eric Jeschke, Christoph Deil, Erik Tollerud and Adrian Price-Whelan, and co-developer Jazmin Berlanga Medina.

### Call for input

Since we want astroplan to be useful and used by astronomers, I'd be happy to hear your thoughts on what astroplan absolutely must do. If you think you might be an astroplan user one day, leave a comment below or on Twitter with your top-priority observation planning/scheduling features.

### Sahil Shekhawat(PyDy)

#### GSoC Week #1 Update #1

Its started, a three month long journey. I had a meeting with Jason and Jim on Friday (22-05-2015), as I discussed in my last post all the technicalities. We discussed it and decided that its a bad idea to implement subsystems and try to use Simbody's way to doing things. We scraped most of the things except joints and bodies.

## May 25, 2015

### Shridhar Mishra(ERAS Project)

#### First Post!!

I guess its a bit late for my first post but here it goes. The official coding period has begun and its time to start coding. Past few days have been busy since its the semester end and all the submissions had to be done. Installed europa but there were quite a few errors on my side which needs immediate attention since all the work depends on that. This has to be ironed out and has to be discussed with the mentors.
Looked into dockers platform since its a easy way to distribute a single image of the system to all the members. Haven't tried it yet but it seems quite effective.

Better posts coming soon :P
Cheers
Shridhar

### Michael Mueller(Astropy)

#### GSOC 2015

This is my first blog entry for the 2015 Google Summer of Code--I'm excited to take part in the program for a second year, and to continue working with Astropy in particular! I just realized that Python students were expected to make a blog post about Community Bonding Period by May 24th, so this is just a bit late.

My project this year will involve the implementation of database indexing for the Table class in Astropy, which is a central Astropy data structure. I had a Google Hangout meeting with my mentors (Michael Droettboom and Tom Aldcroft) last week in which we discussed possibilities for functionality to implement over the summer. Among these are

• Allowing for multiple indexed columns within a Table, where each "index" is a 1-to-n mapping of keys to values
• Using indices to speed up existing Table operations, such as "join", "group_by", and "unique"
• Implement other Table methods like "where", "indices", "sort"
• A special index called the "primary key" which might yield algorithmic improvements
• Making sure values in an index can be selected by range
We also noted some open questions to tackle as the project gets further along. My work on the project begins today; so far (during the community bonding period) I've been reading the Astropy docs on current Table operations and looking at Pandas to see how the indexed Series class works under the hood. I discovered that Pandas uses a Python wrapper around a Cython/C hash table engine to implement its Series index; in fact, the same engine is used in the Python ASCII parser I investigated last year (e.g. for a hashmap of NaN values). I haven't figured out yet which data structure is most appropriate for our purposes--candidates include a hash map, B-tree, or a bitmap for certain cases--but that's part of what I'll be looking into this week. I'll also write asv benchmarks for current Table operations and work on passing Table information into a (not yet written) indexing data structure.

### Goran Cetusic(GNS3)

#### Google Summer of Code - prepost

This year (my last year as a student!) I decided to apply for GSOC (Google Summer of Code). It's been on my academical TODO list for a few years but haven't had the time or the motivation to apply. Since this is my last chance, I decided to have a go.

IMPORTANT: Being picked by the organization the student proposal is meant for doesn't mean you'll get accepted by Google -> after the organizations have been selected and after student proposal deadline, Google assigns student slots to organizations. So if the organization picked 4 students but Google assigns 2 slots, only 2 students will be accepted for that organization. But they may get selected for a different organization. That's why most students submit several proposals to different organizations. Basically, Google has a limited number of slots and distributes them between organizations.

Most organizations get 1-2 slots but organizations that have a vast number of student applications and are longtime contributors to GSOC get more slots. For example, Python Software Foundation (PSF) is a longtime member of GSOC and an umbrella organization. What this means is that this organization applied for GSOC, got accepted and then accepts other Python projects under its fold. Even some projects that haven't been accepted by Google (maybe because of limited number of slots) later get into GSOC through these umbrella organizations. Google is often extremely generous with slots given to umbrella organizations like PSF and Apache Software Foundation but if a large number of projects get under an umbrella organization, organizations might still get only 1-2 slots and some might theoretically even miss out. It depends.

Enough of the short intro, let's get back to what I originally wanted to write about. Since Python is the only programming language I trust myself to use to get the job done without the language being the obstacle, I've decided to work exclusively on Python projects. Now, I don't want to get stuck with a project that I'd do only for the money (and the amount Google pays isn't negligible from where I'm from) so I picked up three really cool projects to apply for:

• GNS3 - higly veracious network simulator of real, physical networks.
• NetworkXa software package for the creation, manipulation, and study of complex networks.
• SunPyPython for Solar Physics

Cool, right? All three projects I've mentioned are part of the PSF umbrella organization. PSF requires students to keep a blog of their project progress and that's the reason this blog exists. It did take some of my time from writing the actual proposals but now that I'm writing it I think it's actually a nice idea.
They're ordered based on preference. I'm working on my masters thesis, porting a network simulator developed at my university from FreeBSD to Linux. That's why GNS3 is first on my list. I can elaborate on the idea there and use it in GSOC. Concretely, the project idea is "Docker support for GNS3".

Background: right now GNS3 supports QEMU, VirtualBox and Dynamips (a Cisco IOS emulator). We can think of the nodes in GNS3 and the links between them as virtual machines that have their own network stacks and communicate amongst themselves like separate machine on any other "real" network. While this is nice by itself, QEMU and VirtualBox are "slow" virtualization technologies because they provide full virtualization -> you can run any OS on them. So while QEMU and VirtualBox can run various network services, it's not very efficient. Docker, on the other hand, uses kernel-level virtualization which means it's the same OS but processes are grouped and different groups isolated between themselves, effectively creating a VM. That's why it's extremely fast and can run thousands of GNS3 nodes -> no calls between host and guest systems, it's the same kernel! Docker is quite versatile when it comes to managing custom made kernel-based VMs. It takes the load of the programmer so he/she doesn't have to think about disk space, node startup and isolation etc.

The second project, NetworkX is basically a graph analysis software written in Python. You define your graph with nodes and edges and run various graph algorithms on it. Before Google announced the selected organizations, NetworkX has been on the PSF GSOC wiki page, one of the first. They're the first organization I've contacted. While for GNS3 I just chose one of the already available project ideas, for NetworkX I've proposed to make a Tkinter GUI since it doesn't have one. It would enable users draw edges and graphs without actually writing programs. This wasn't exactly rejected but one of the core developers explained to me in a lengthy email that while they appreciate the effort, NetworkX is moving away from any kind of GUI development and that I should probably pick one of the existing ideas. So I chose to write a backend API for NetworkX.

Background: Until now, graph data has been represented as volatile Python dictionaries. It would make sense to provide a flexible backend interface to easily create modules for graph storage to efficiently access graphs at any time. This would usually include graph databases since their data representation is close what NetworkX does and such databases have efficient algorithms to access graph data but it shouldn't be restricted to such storages. Case in point would be document-store databases that can more or less directly save Python dictionaries as JSON data and load them. SQL databases are somewhat trickier because their data representation isn't directly compatible with graph.

The last project, SunPy, is a software package for solar physics computations. Now, while the previous two projects are more along the line of what I usually do and study, this is more of a whim! I mean, solar physics! Cool! The project ideas is to refactor on of its modules called Lightcurve. I have to admit I don't know a lot (close to nothing) about solar physics but this refactor project has more to do with actual Python refactoring. I'll probably have to learn something about solar physics, which I'd like to but although GNS3 was added to PSF organization list after SunPy, I've put most of my effort into writing the proposal for GNS3  because of my thesis with a touch of regret that I won't work on software that researches the Sun!

Whichever project gets selected, I'm sure it's going to be a fun and educational experience for me. Keeping my fingers crossed for the next post where I'll write in more detail about the project I'll (hopefully) work on.

Cheers

# GNS3 Docker support

So the coding session for GSOC finally began this week. I got accepted with the GNS Docker support project and here is the project introduction and my plan of attack.

GNS3 is a network simulator that uses faithfully simulates network nodes. Docker is a highly flexible VM platform that uses Linux namespacing and cgroups to isolate processes inside what are effectively virtual machines. This would enable GNS3 users to create their custom virtual machines and move beyond the limitations of nodes that are network oriented and because of its lightweight implementation, would make it possible to run thousands of standalone servers on GNS3.

## Let's start coding!

### Andres Vargas Gonzalez(Kivy)

#### Beginning Google Summer of Code 2015

My name is Andres, I am writing this post to officially start my adventure as google summer of code 2015. During this time I have realized how hard people in my organization work every day to maintain Kivy. Additionally, I had to install an IRC server to keep me online on the chat server. People in general are very active and respond to any question very fast. I expect to contribute adding new features to Kivy and being as active in the community as my mentors are currently.

Thanks for the opportunity.

### Sumith(SymPy)

#### Gearing up for GSoC

Greetings! The community bonding is officially closed now. It's time for the coding period. I had promised myself a post every sunday from the 24th of May 2015 but seems like the first post is a bit late.

### Community bonding

I had discussions with Ondřej and Shivam about the big tasks in hand and how to go about handling the work. In the first discussion, we also assigned ourselves the first task that needs to be completed.
I have to:
* Clean up the necessary in the PR Shivam had sent during his proposal period.
* Implement sub_poly() and mul_poly() with Kronecker substitution in a clean fashion.
Shivam agreed to finish ring_series in SymPy which he has already started working on.
Also together we decided to work on a faster hashtable implementation.
I also discussed with Sushant about the structure of the current SymEngine and cleared my doubts there.

As a part of community bonding, I looked to some tools that I'll be using. Certain C++11 constructs, visitor pattern, etc. Even though I am not thorough with it, I think learning it as I progress with the work is the best thing to do.

Regarding the work I undertook in this period, is minimal, but here they are:
Issues
#443: Documentation of SymEngine.
Pull requests
As I read through the code I felt some clean ups necesarry which were done in
#444 and #438: Pending
#451, #442, #441 and #440: Merged

In my proposal, I had promised Piranha audit but it didn't happen in such a short period due to complex code. Best way to go forward was to start work for Polynomial.
The work regarding Polynomial class has already begun here. I thank the whole SymEngine community for actively participating there and giving their inputs.

### Targets for Week 1

Complete the Polynomial class, need to implement:
* basic functions __hash__, __eq__, compare, from_dict like other SymEngine classes.
* Implement printer and tests for that.
* Implement add_poly(), neg_poly(), sub_poly(), mul_poly(), eval() and respective tests.
If possible, time permits
* Start working on the hashtable along with Shivam.

I am really excited as the coding period has officially started. The whole SymEngine community has been active on Gitter as well as PR discussion, looking forward to awesome learning experience with them.

That's all for now. Catch you next week.
Freilos(German)

## May 24, 2015

### Abhijeet Kislay(pgmpy)

#### Integer Linear Programming in OpenGM

Today I will be discussing some of the ways in which inference algorithms especially the Linear Programming ones work in OpenGM. So there are many ways to solve an energy minimization problem. From the table below we can get an idea about the different accumulative operations used to do inference: That is, if we need […]

### Rafael Neto Henriques(Dipy)

#### [RNH Post #3] Time to start mapping brain connections and looking to brain properties in vivo

Hi all,

Tomorrow we are starting the coding period :), so it is time for some details about my project and tell you what was done in the community bonding period.

1) How can we study brain connections and brain's tissue properties in vivo? - A simple introduction for non experts

Trajectory of neuronal connections (tractography) and quantification of tissue properties in the living human brain can be obtain from measures of water diffusion using MRI scans. To give you an example how this is done, I will first start by describing one of the simplest technique - the diffusion tensor imaging (DTI).

By combining the information of several diffusion weighted images, DTI models the water diffusion for each image element using a tensor which can be represented by an ellipsoid (see Figure below).

Figure 1. Diffusion tensors computed from all voxels of a real brain image. This image was produced using Dipy as described in Dipy's website.

From figure 1 we can see that diffusion is larger is some directions. In fact the direction of larger diffusion can be related to the direction of brain's white matter fibers. The axon myelin sheaths restricts the water diffusion and thus diffusion is smaller on the directions perpendicular to fibers. On the other hand, the diffusion parallel to fibers is less restricted and therefore matching the direction of fibers.

Based on this, 3D virtual reconstruction of brain connection can be obtain using specific tracking algorithms - a procedure which is named fiber tracking. An example of this 3D maps obtain from a real brain dataset is shown below.

Figure 2. Example of corpus callosum fibers. These fibers connect the left and right fiber hemispheres. This image was produced using Dipy as described in Dipy's website.

Nowadays, DTI is still one of the diffusion weighted techniques most used in both clinical applications and in many research studies, however it is not always accurate. DTI cannot account properly for the crossing of different populations of white-matter fiber connections. Moreover, it ignores the non-Gaussian properties of diffusion in biological tissues which can be used to derive interesting and important measures of tissue properties.

2) Project proposal

In this project, I will be implementing an alternative the diffusion-weighted technique named the diffusion kurtosis imaging (DKI) in an open source software project, the Diffusion Imaging in Python (Dipy). DKI overcomes the two major limitations of DTI:
1. It quantifies the non-Gaussian properties of water diffusion in biological tissues by modelling the kurtosis tensor (KT) which can be used to derive important tissue measures as the density of axonal fibers.
2. Relative to the diffusion tensor, KT is also shown to offer a better characterization of the spatial arrangement of tissue microstructure and can be used as a basis for more robust tractography. Particularly, DKI based tractography is sensitive to resolve crossing fibers.

3) What is done so far

As an update of what I posted previously (see Post #2), I finished the work on DKI's simulations - procedures that will be useful for testing the codes that I will be implementing during this summer. In particular, as my mentor suggested, I added some automatic debugging scripts using Nose python testing. These scripts are now insuring that the kurtosis tensor is symmetry (as expected) and that simulations are able to currently produce the diffusion tensor and kurtosis tensor in both cases of well aligned and crossing fibers.

Many thanks to my mentor for teaching me how to work with nose python testing. In particular, the useful tip running the nose tests and knowing which lines the testing scripts are covering by using the following command:

nosetests -v dipy/sims/tests/test_voxel.py --with-coverage --cover-package=dipy

4) Next steps

After merging the DKI simulations to Dipy's master brunch, I will start working on the DKI reconstruction modules, based on some preliminary preparation work previously submitted by other dipy contributors. At the end of the week, I intend to finish the first part of the DKI reconstruction modules - the KT estimation from diffusion-weighted signals. For this I will implement the standard ordinary linear least-squares (OLS) solution of DKI.

#### [RNH Post#1]

Hi guys,

This year I am applying to the Google Summer of Code 2015. I am proposing to implement some techniques based on Diffusion Kurtosis Imaging in DIPY (Diffusion Imaging in Python). If you are interested in diffusion MRI techniques and how to implemented them using python,  the information you will find here will be useful for you.

In this blog I will report my progress during the summer!

Hope you will enjoy.

Rafael

#### [RNH Post#2] First post after acceptance! =)

Personal Note

Hi all,

I am please to inform that my project proposal was accepted to the Google Summer of Code!

Congrats to everyone that was also accepted!!! This definitely will be an exciting summer!

As I mentioned on my last post, I will be implementing some exiting MRI techniques which allows us to see brain connectivity in vivo - how awesome is that?

The following weeks I will give you more details about this. Keep tuned and you could explore the brain with me!

Greetings from Cambridge (UK),

Rafael N.H.
PhD Student at the University of Cambridge

Before the Student Coding Period

I am currently working on some simulations that will be useful for testing the imaging techniques that I will be implementing.

I start this work before applying to the GSoC (https://github.com/nipy/dipy/pull/582), and in the last weeks I have been improving it. At the moment, simulations are almost complete - codes are running without errors and they are written in PEP8 standards. Now I only have to add some automatic debugging scripts using Nose python testing.

During the following week, I will discuss the work done so far with my mentors (in particular I want to discuss some minor changes on the current scripts) and fix problems that I am facing in creating the automatic debugging scripts.

Minor details to discuss with mentors:

1) Suggestion on changes on the default values of the simulations modules

2) Discuss if is better to remove some unnecessary inputs or have redundant computing steps.

3) Discuss the definition of some important variables that will be used in future steps.

Problems to fix during this week:

1) Resolve problems in recognizing paths where the new version of modules are locally located.

2) Fix error when trying to run Nose:

Cannot run $nosetests test_voxel.py ERROR: Failure: ImportError (No module named runspeed) ### Julio Ernesto Villalon Reina(Dipy) #### Community bonding period Hi all, this is my first post since I got accepted to GSoC 2015. I am really excited about the start of the coding period and about being part of the greater community of the Python Software Foundation. Honestly, I am a bit scared, but I like the challenge and I am working with the best people. The Dipy team is really great! During the community bonding period I was able to interact with some of my mentors and draw a general plan of the coding phase. First, a short intro to my project which is called: "Tissue classification to improve tractography." This is the abstract of my project: • Diffusion Magnetic Resonance Imaging (dMRI) is used primarily for creating visual representations of the structural connectivity of the brain also known as tractography. Research has shown that using a tissue classifier can be of great benefit to create more accurate representations of the underlying connections. The goal of this project is to generate tissue classifiers using dMRI or a different MRI modality e.g. T1-weighted MRI (T1). This reduces to an image segmentation task. I will have to implement popular segmentation algorithms using T1 and invent a new one using dMRI data. As stated in my initial proposal, the first task for the the community bonding period was to read and discuss the paper by Zhang et al, 2001 (Yongyue Zhang; Brady, M.; Smith, S., "Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm," Medical Imaging, IEEE Transactions on , vol.20, no.1, pp.45,57, Jan 2001). This paper gives us a closer idea of how to approach the segmentation algorithm for MRI T1-weighted images of the brain. The goal is to derive partial volume estimates (PVEs) for each of the tissues and compartments of the brain, i.e. grey matter, white matter, and cerebrospinal fluid. With the mentors we defined the main strategy to code the segmentation algorithm proposed in the paper, which parts of the theory we would like to implement and which ones not as well as the general assumptions about the inputs to the program. ### Rupak Kumar Das(SunPy) #### The start… Today is the last day of the Community Bonding period. It was an exciting month, studying the codebase and communicating with my mentors. I have already started my project by fixing up a bug and adding a small new feature to Ginga(though these are not much). A funny thing happened today. The next thing on my list was the Cuts plugin which was to be modified so that it supports the addition of an extra cut. I was racking my head over how to add it when I discovered that it was already implemented! It was mentioned in the documentation but I had failed to notice it. I have now re-read the documentation in detail to prevent such occurences. The meetings will start soon and frankly, I am a little scared! But it will be an interesting and new experience for me so I am looking forward to it! ### Berker Peksag(ScrapingHub) #### Hello, World I'll be working on completing Python 3 port of Scrapy this summer. During community bonding period, • I've checked my proposal and made some updates to the Twisted section. twisted.web.static has already been ported to Python 3. • I've played with Twisted and read some good documentation about asynchronous programming in general. • I've followed development of both Scrapy and Twisted from GitHub. You can follow my work in the python3 branch at GitHub. ### Vivek Jain(pgmpy) #### Community Bonding Period Now that the coding period for this year’s Summer of Code is about to start, I am extremely happy that things have been working pretty well with me and my mentors over this community bonding period. We had a group meeting on IRC and all of us are excited to have a more than successful Summer of Code. ## Community Bonding Period In the community bonding period,I reviewed my proposal again and discussed with my mentors about what features are necessary, how things should be implemented and cleared my doubts. I read the documentation, read the code to understand the flow of execution and how things have been implemented. I read the documentation of pyparsing module which would be used for parsing UAI file format. Here are some of the notes which i created from the documentation so that i can easily find around some functions which would be needed in the later stage. 1. import pyparsing module as import pyparsing as pp. 2. p.parseString(s) → input is “s” and parser is “p” .If the syntax of s matches the syntax described by p, this expression will return an object that represents the parts that matched. This object will be an instance of class pp.ParseResults. 3. pp.Word() class produces a parser that matches a string of letters defined by its first argument 4. Use pp.Group(phrase) to group things. For example to differentiate models with variable numbers use pp.Group(). 5. Use setResultsName() to give name to the string which is returned for ex model_name = pp.Word(pp.alphas).setResultsName('modelName') I also made the grammar for the UAI module. Grammar for UAI Preamble: Preamble --> model_name \n no_variables model_name --> MARKOV | BAYES no_variables --> IntegerNumber \n domain_variables domain_variables --> IntegerNumber* \n no_functions no_functions --> IntegerNumber \n function_definition* function_definition* --> function_definition | function_definition function_definition* function_definition --> size_function " " IntegerNumber* ### Ziye Fan(Theano) #### Theano: proposal changed Hi there, I'm Ziye, I'm taking part in this year's GSoC. I apply for the Theano project because I use it in my own lab researching work (Btw, I do some music information retrieval work in lab). The original proposal is to decrease the python overhead generated by theano. The new one is to decrease the compiling time. It will improve the theano performance. During the community bonding period, with help of my mentor Fred, the optimization objective is almost confirmed. In the upcoming coding period. I'll put more time on it. It will be interesting. Ziye ### Vipul Sharma(MoinMoin) #### GSoC 2015: Improving the Issue Tracker of MoinMoin 2.0 MoinMoin is an open source, advanced, easy to use wiki engine implemented in Python. It is a collaborative software that runs a wiki which allows users to create and edit web pages collaboratively. Some of the sites using MoinMoin wiki engine includes: Python, Mercurial, Apache, Ubuntu, Debian, Wireshark and many more. In GSoC 2015, I'll be working on improving the issue tracker of MoinMoin 2.0. #### Project Details MoinMoin 2.0 has an existing implementation of a simple issue tracker. The current issue tracker requires some improvement in its UI/UX and few more features which would be good to have for better use. #### Implementation For improvement of the current issue tracker, the implementation of the task can be divided into 3 parts: • Creation of new tickets • View containing ticket list, searching / filtering of tickets • Ticket update view #### Creation of new tickets • Enter title to the issue • Search for possible duplicates: This can be implemented by providing auto suggestion of existing tickets based on the title of new ticket which the use wants to create. Currently, MoinMoin 2.0 uses Whoosh for searching queries, we can use this and jQuery for displaying real time suggestions of existing tickets if the title of new ticket to be created matches with any of the existing tickets. • Improve metadata fields and add new fields • File upload: add a feature to allow user to add upload a screenshot, media or a patch file Wireframe for ticket creation view: #### View containing ticket list (/+tickets), searching / filtering of tickets Currently the /+tickets view lists all the tickets with some options including few important filters like "all", open, "closed" and "tags". Also, the current view consist of an option for creating a new ticket and a search box to find tickets. In this view, we can include an "Advanced Search" feature where we can add additional filters for filtering tickets based on a particular "author", "tags", "assignee", "difficulty", "effort", "severity" and "priority". For "Advanced Search" option, a new view /+tickets/query will be created, where additional filters will be provided through which we will filter the results based in the query applied in the view: /+tickets?<query_type>=<query>. We will also allow to apply multiple filters in our advanced search. Wireframe for ticket list view and advanced search view: #### Ticket update view The update ticket view is similar to ticket create view. In this view, the comment mechanism can be improved by adding feature to reply to any comment, delete any comment, adding Markdown formatting syntax and posting comments after any updates in the meta data of the ticket. Wireframe for update ticket view: #### Community Bonding In the community bonding period, I reviewed my proposal again and discussed with my mentors about what features are necessary, how things should be implemented and cleared my doubts. I read the documentation, read the code to understand the flow of execution and how things have been implemented. I learned about form processing using Flatland, and its really cool :) as explained in their website: Flatland maps between rich, structured Python application data and the string-oriented flat namespace of web forms, key/value stores, text files and user input. Flatland provides a schema-driven mapping toolkit with optional data validation. I allows DWIM (Do What I Mean) binding. Here is a run-of-the-mill example to explain: >>> from flatland.out.markup import Generator>>> from flatland import Form, String>>> html = Generator()>>> class Name(Form):... firstname = String... lastname = String...>>> form = Login({'firstname': 'Vipul'}) "Do What I Mean" binding: >>> print html.input(form['firstname'])<input name="firstname" value="Vipul" /> I also read about Jinja2 template engine, a beautiful full featured template engine for Python. Having tried my hands on Django, I was a little familiar with Django template language which was also the inspiration for Jinja2. I read how macros work in Jinja2. Macros can be compared to functions in programming languages. Its a nice tool to promote DRY (Don't Repeat Yourself) principle while writing templates. The most often used elements can be written in a reusable function i.e. a macro which can be called like a function in the templates. A small example to demonstrate how to render a form element:  {% macro render_input(n, value='', type='text', size=20) -%} <input type="{{ type }}" name="{{ n }}" value="{{ value|e }}" size="{{ size }}"> {%- endmacro %} This macro can be called like a function:  <p>{{ render_input('firstname') }}</p> <p>{{ render_input('lastname') }}</p>   I also read about Whoosh. Its an amazing and quite robust searching library implemented purely in Python. Programmers can use it to add searching functionality in their applications and websites. Whoosh allows indexing of free-form or structured text and quick retrieval of matching documents based on simple or complex queries. The speed of Whoosh in processing queries fascinates me very much. it is pretty fast in processing even some complex queries. ### Ankit Kumar(SunPy) #### Python Software Foundation Phase I : Getting Accepted !!, Community Bonding, Mailing List, Preparation, Welcome Package<... So what’s up people. Long time huh!! It seems that getting done with semesters and getting accepted for GSOC 2015 and that too under such a prestigious organization as Python Software Foundation has made me fall into quite a celebratory mood. And thats why the heading is no longer Google Summer of Code 2015 Phase IV but Python Software Foundation Phase I :D So I’ll just list the things that have been going on with me over the last few weeks. Lets start with getting accepted. Well I’ve already talked about it in the first paragraph so not a lot about that but just how grateful I am to PSF and SunPy for giving me this opportunity. I really look forward to a Successful GSOC 2015 Completion. Moving on to Community Bonding period. This has been a nice phase where I talked to my mentor, we decided upon work timings (due to different time zones, work methods (we are using troll cards) and of course IRC. I’ve already gone over the few code pieces that I require to understand to start and infant will be making my first commit tomorrow itself (with the beginning of Coding Period). One thing that obviously characterized this period of Community Bonding was the annoying GSOC mailing list. OMG are people seriously crazy !! :-( :-/ like seriously I had to change the mailing list settings to abridged daily updates because I was getting like 10 mails every day and that too about some really stupid and irrelevant things. But yeah like whatever. So I guess I covered uptill preparation part. So lets move on to the Welcome package sent by Google to all accepted students. I must say that over the past few weeks I was excited about this package and it arrived just yesterday. So FTW it contains a moleskin notebook, a pen-cum-pencil, and GSOC sticker. It may also contain your payment card but since I live in India and the only option we have is to opt for Bank transfer so my package didn’t had the payment card. For others I am sure would have. But Now all is done and now it’s time to get some perspective. By that I mean “LESS TALK, MORE CODE” and so signing off there is only one thing on my mind i.e. Let The Coding Begin -- Delivered by Feed43 service #### Google Summer of Code 2015 Phase III : Github, Patch, PR and Proposal<br>By Ankit Kumar (MAR 21, 2015) Now why does the heading specially mentions Github (its common among developers right !!) but as it turns out it actually was my first time using it and hell it was confusing so I had to ask out my friends, seniors and I did trouble my mentor a lot and I am really very sorry for that !!(David if you read this do know I am really sorry this is my first time using Github. I promise that thats the first thing I am going to do after I submit my proposal) So Finally it seems that I am active now and moving forward. And as was the next step I started looking for issues to fix and to make a PR. So I read over the issues on the Github issue tracker and well I decided to deal with issue #798. Now while doing this activity is incredibly crucial for the mentors to judge the ability of us ,the new contributors, I on the other hand also had a lot of fun interacting with the sunny source code. I mean I got to read the original code and then add features to that and I was like how cool is that. So ok I did have a bit of confusion about about what exactly the issue was about in the first place but thanks to David that got sorted out. Then I headed to interact with the particular piece of code I had to improve and fix. And that lead me to the source code of parse_time function where I found that the issue existed because simply the author of the code never meant to add the support for numpy.datetime64 type input and also the time string input with time zone info in the string. So I tried to fix it one way which would’ve gotten a bit complicated but then I came up with a easier workaround by adding an if clause which handled the numpy.datetime64 input by converting it to diatomite.datetime type and then the function handled it just like it did other diatomite.datetime. Moving on the other problem was to add support for time_strings with time zone info attached at the trailing end. This was solved using parse function from dateutils library. So fun it was and so I end up making a nice PR which actually failed both tests. But on being urged by David a bit I again tweaked it a bit made some test cases and tried it again and now it passed one test although I am still not sure why its failing the travis-ci test. But again I remember that I have to commit again by avoiding code repetition and adding the test cases to test file using pytest. So more for later, right now I am just going to add some more commits improving my patch and then head straight into making the proposal which btw I am completely freaked about cause its such an important thing. So for now I just hope the proposal writing goes fine and I do get accepted. I’ll update this post later when I get done with my proposal and will be about to submit it because then that’ll officially be the end of Phase III of GSOC 2015. After that Phase IV will start that is waiting for the results but I think I am just gonna start reading up more code and atleast set up the skeleton of the code (or make some progress with it ) before I go back home for summers. But lets right now focus on the proposal thats in front of us. And to quote David The requirements to fulfil are the following: • Create a PR with a patch. This can be to any part of SunPy, the above would do. (by the way, does not need to be accepted, but better if it is). • Create a blog with some tags/categories (python, psf, gsoc,.. you choose) so what you write under it the PSF can grab it automatically. • Write your proposal. To write your proposal you should try to get familiar with everything, but mostly with the part that you are going to contribute. So, if your project involve lightcurves, it would be good that you understand how they work and how we want them to work (https://github.com/sunpy/sunpy-SEP/pull/6) even if you are not going to do such project. For that, it will be helpful if you know how sunpy.maps work too. The unifiedDownloader is going through deep changes, so keeping an eye on what are they is also good. -- Delivered by Feed43 service #### Google Summer of Code 2015 Phase II : Joining mailing Lists, Introducing myself, Talking to mentors, waiting for replies... Well I am obviously not gonna list down here the organizations and the projects that made it to my shortlist !! :P (for obvious reasons). I think I’ll only mention the final one that I end up preparing the proposal for but of course I don’t know what it will be so thats for later :P. Also seeing that I have got a lot of work to get done I am gonna keep this blog post short….save some talking for interacting with mentors !! Ha. So well yeah the first of all steps is to join the mailing lists of the development community of the respective organization and introduce yourself there along with the specific idea that you have selected from the pool of ideas of that organization. After that its a slight wait for reply but the developing community is really very helpful and welcoming and will help you to get on with open source development even if you are a beginner in it (I was !!) But I found really good organization and the mentors were really very patient in replying to my mails and answering all the questions pretty descriptively. Well after that comes reading up a bit more on the resources and links shared with you by the mentors and getting a sense of the organization and especially how you’re selected idea might integrate with their overall mission and code base. In my case it took a bit of time with few organizations while with others it was much more rapid. Now based on this newly gained knowledge we have to decide whether we might be able to develop that idea, be interested in it, and whether ultimately we get what the idea is. Well ultimately because you just have to you know get a gist of what it is although a bit more holistic gist because the rest is for the time when we start preparing the proposal. ( Note: It may seem that I simply had this all in my mind but no I had to talk to lots and lots of people, ex-GSOCers, some seniors at my college who were mentors for organizations and mentors out there in the dev-community. ) And Finally after all this well you get your final organization !! Right !! Well life aint that straight. After doing all this one random day (two days ago) I was just looking through the Python organizations because I felt that only if there could be a bit more interesting organizations with a bit more interesting idea to me and for me and there I hit PSF page and I am like “I definitely didn’t see all these new organizations before”. And so I have sent out the mails and introductions so now lets see what happens!! So then the whole process of Phase 2 was repeated and guess what The final Organization that I end up finally selecting is SunPy under Python Software Foundation. What I would especially like to mention here is the speed with which my mentor from SunPy helped me pick up the necessary bits and get started since obvious I was a bit late. So now here I am finally with one single project and setting up the dev environment and using bits of it. And I guess now lets move on to Phase III of GSOC 2015. So lets get our hands dirty now and deal some blows with the SunPy codebase!! -- Delivered by Feed43 service #### Google Summer of Code Phase 1: Shortlisting of Organisations to numbers I can deal with.<br>By Ankit Kumar (Mar 13, 2015... Ok so Here is my first blog post for my Google Summer of Code 2015 Proposal to Sunpy, Python Software Foundation. So hmm how was my experience applying for GSOC. Hmm Let me think of which word is more intense than tiring because woof is it tough man!! So I started looking for Organizations right after the List of Accepted Organization was posted and my god there were 137 organizations in total and thats kind of a lot!! So how do I filter down an organization thats suited for me. You know one important thing about me is that I love to learn and I have specific interests that I like to explore so I needed an organization that suited to my interest or speaking specifically that I be interested in continuing to work even if they don't pay me at all. See that is how I choose whether I will or not do anything. That is where I get my persistence from. And this may sound crazy because hey you might not be interested in anything but I am kinda unusual on that note. I am greatly interested in technology, business, entrepreneurship, astronomy, physics, and most importantly programming. So now 137. Well I know C,C++, Java, Python and web technologies so how do I start. Lets rewind to when exactly did I start loving programming or when exactly did it start speaking to me. I started out coding in my first year of college when we had a programming course in C. So it was nice I got to know about a very good programming language, C. And I aced that course too not to say because I liked it a lot….it felt singular I mean it was not as complex as talking to people it was simple and I liked that. Although at a lot of times I felt that it was restrictive I mean I couldn’t do everything that I wanted to do with it. I guess it was because it probably couldn;t be covered in a single semester or that the course for simply an introductory course so the ydidnt wanna complicate it enough so that others couldn’t follow. I wanted it to be able to talk to other files read from them write to it and wanted it to do this seamlessly without a lot of hassle but as it turnout that it wasn’t all easy. It almost always remained in the console. But come second year and I am introduced to online course on Python and I delve more into it. And soon enough I learn how to make gui applications in it, read files, write to them plot graphs make it talk to internet and that was liberating and that is the story of how I fell in love for love for the second time. And it was similar to the adrenaline rush that I got when I fell for Physics in ninth grade. So there it was yeah I felt liberated and powerful with python because it enabled me. Another thing that I have been particularly inclined to has been building things and then showing them off to people that it worked !! Ha So there was my decision — search for Python tag. and now we were down to 40 organizations and man the real struggle starts now. So now what I do is open up the ideas page of all the 40 organization on side tabs and hmm over two days read up the projects, filtering through them. So even 40 is a lot man. So I took up a simple criteria — I am just gonna select the projects and therefor the organization if my current skill matched the requisite skill for that idea. I have fair amount of experience using and developing in python and its libraries (following from the fact that it made me feel liberated). This took up a while. And guess I ended up with some 20 organization ideas page. Thats nice hah. So moving on I cut through the list by selecting the organizations that also coded about things that interested me. And this was the most time consuming process of all cause I had to read through each of the idea and read it like saying cover to cover and the googling about it seeing some online examples of what the organization did and what it was used for and after about a hell of a time I ended up on about 8 organizations which for me was decent to start talking to mentors, to hang out on IRC, introduce myself and you know start looking at a specific idea from each ideas page. So basically that meant 8 ideas selected down from 137 organizations times average of 7-8 ideas per ideas page ie 959-1096 ideas. Nice huh !! I had my spring break during this time so I was a bit merrier so I guess it took me a bit more time than it should have to get it done. But whatever happens ….. I am moving on to next phase and thats all that mattered now !! So Now let the talking begin. !! It was finally time for Phase 2. -- Delivered by Feed43 service ### Aman Jhunjhunwala(Astropy) #### GSOC ’15 Post 1 : Community Bonding Period (07-05-2015 to 24-05-2015) ## Introduction Now that the coding period for this year’s Summer of Code is about to start, I am extremely happy that things have been working pretty well with me and my organization – Astropy over this community bonding period. My mentors have been extremely helpful throughout the past 3 months that we have been in touch and all of us are excited to have a more than successful Summer of Code. For the summers , I will be working to revamp astropyhton.org The astropython.org site was launched 5 years ago with the goal of becoming the main community portal for Python in Astronomy. In one sense it has been successful because the site is one of the top two generic informational / resource sites about Python in astronomy. But in another way it has fallen short because there is little community involvement. This site uses Google App Engine and is basically all custom code built around the bloggart engine.The plan is to start over with Django and modern web tools to bring fresh energy and community involvement into this project.The main components of astropython are forums, tutorials, a wiki of useful resources and storing of code snippets. It is currently slow, outdated and difficult to maintain.I underlined a comprehensive plan to revamp and expand the functionality of the website from its roots up making the application fast and easy to maintain and scalable across all devices.In addition to redesigning and revamping each of its original components,I proposed certain new features that would be of tremendous productive value to the community. ## Community Bonding Period We have had a few very productive Hangouts throughout the community bonding phase, where through detailed interactions with the mentors , the proposal details were refined and I acquainted myself with the expectations and essential requirements. The work done during this period includes :- • Setting up the cloud test server. • Setting up the development environment on local machine • Setting up the Basic Project structure on Github. (https://github.com/x-calibre/astropython) • Designing the initial layout of the website , keeping true to its old design. The initially designed components are – The Homepage, The Single Blog/Tutorial , The Blog/Tutorial Roll and the packages section • Acquainting myself with the packages that are to be used – mainly django-moderation, askbot , grappelli,django-reversion,etc. Running their demo projects, reading documentation,running their demo apps to see how everything fits in. • Framed the initial set of database models and schemas for all the sections. • Integrating the original proposal, relevant portions of unconference summary and notes from Python in Astronomy into a top level design plan.These notes have been created by taking input from a large group of astronomers who actively use Python.The notes are available here . These notes will supersede any previous details of the project written/discussed earlier. • Study the existing Google App Engine astropython.org code, understand existing schemas and data structures. Get the code running on local server. • 2 main components of the final design that I worked on : • While I was framing the model fields I noticed that the django-tinyMCE plugin which we were supposed to use for WYSIWYG editing hasn’t been upgraded since 2013 and does not support tinyMCE4. I forked the repo and made customizations (maybe buggy,no testing done) to integrate tinyMCE4 • A native categories app : While framing models, I realized that the categories are extremely important for navigating the web app and it would be better if we had a central categories app. ## Plan for next week Work to accomplish till 01st June 2015 • Complete porting of old Google App Engine Code on local server • Create a parser to extract records from Google Cloud Bucket to usable format ( JSON, YAML, etc )- Use a dump to restore data to your local test server then use GAE API to pull records into your preferred format for migration. • Set up Backend User Organization structure • Set up Versioning and Moderation for Posts. ### AMiT Kumar(Sympy) #### GSoC : Getting Up For the Coding Period ### The Start of Coding Period! The Community bonding Period is close to end now & my Exams as well. Tomorrow starts the Coding Period & I have been waiting for it for some time now. Recently I gave a quick look to my Proposed Timeline in my Proposal, & decided to swap the 2nd Week's work with Ist, this will help me securing few credits in my College's Minor Project Submission (Which has the deadline of 30th May). ### Plan for Week 1 This week, I am planning to work on Solving Linear systems in solveset. (Currently solveset support univariate solvers only). The main functions, I would be implementing are: • eq_to_matrix : method to convert system of Equations to Matrix Form. • linsolve: It's the General Linear System solver. As mentioned in the proposal, Solving system of linear equations is an important feature of solvers in a CAS. Most of the CAS have a convenient single function to solve linear systems, for example LinearSolve in Mathematica. The linsolve which I would be implementing is inspired from Matlab & Maxima. ##### Features Overview We have a lot of reusable code in sympy.matrices & sympy.solvers.solvers, which would be quite useful. One of the most important thing I would like to have in linsolve is supporting a lot of input formats. Though, most of the CAS suport only one input format. This feature would be quite useful for sympy's linsolve. ###### The three most common input formats, I can recall as of now are: • Augmented Matrix Form • List Of Equations Form • Input A & b Matrix Form (from Ax = b) It would be great to have all three input formats supported in the Public API linsolve Method. Looking forward for Coding Period, that's all for now. ## May 23, 2015 ### Daniil Pakhomov(Scikit-image) #### GSoc introduction post: Patent-free Face Detection for Scikit-image library. The following blog post moved here. ### Siddhant Shrivastava(ERAS Project) #### GSoC '15 Community Bonding Third Post in the GSoC 2015 series. Here I'll take you through the engaging community bonding experience. ## Introduction to Community Bonding Community Bonding is arguably one of the most important phases of the Google Summer of Code. In the 2015 edition, it took place from April 27 to May 25. This is what the GSoC FAQ has to say about this period - Students get to know mentors, read documentation, get up to speed to begin working on their projects. ## About the community The Italian Mars Society is a highly motivated group of incredibly smart and friendly scientists and developers who share the vision of working towards manned missions to Mars. I have been interacting with the community since March 2015 and I've never looked back. I was interested in the projects even during the brief period when it was unclear whether IMS would be able to participate or not. I'm grateful to the community members for applying under the Python Software Foundation umbrella and giving students like me a brilliant opportunity to explore real world Open Source development. From what I've heard, this organization comes up with the coolest projects for GSoC. And I concur with them - my project seems to blend in all the cool fields required for exploration - Robotics, Body-tracking, Virtual Reality, Oculus Rift, Real-time 3-D video streaming, Augmented Reality, etc. ## Understanding the Codebase To this end, there is a stable amount of software/hardware development shared on the Bitbucket platform. While interacting with Franco and Ezio, I discovered that all students are given write access to the ERAS and V-ERAS repositories using the Mercurial revision control system. This imparts tremendous responsiblity as new developers which I very much appreciate since it fosters trust and makes us mature community members. Going through the codebase a couple of weeks ago, I found well-documented code, almost all of which follows the PEP8 guidelines and written in Python 3. The heart of the V-ERAS project is the Tango Controls server which is a distributed device server for Supervisory Control and Data Acquistion systems. This is ideal for a complex environment like ERAS where multiple hardware and software devices like the Oculus VR, Kinect, Linux Machines, Husky Rover, and Blender Game Engine applications are involved in a distributed setup. The entire networking subsystem of ERAS is well-explained in Ezio's thesis. ## Interacting with the Community My experience with the Italian Mars Society has been memorable and pleasant right from the word go when I first entered the hallowed IMS channel of IRC (Internet Relay Chat) and introduced myself. I was promptly pointed to the right person for my project of interest. Within a single IRC Chat session with Franco, I got a clear idea of what to expect from this GSoC. The IRC channel though frequented by a small number of people is always bustling with activity. We've had fruitful discussion for each and every part of the project - from software architecture diagrams in the proposal, to the collaboration between two GSoC projects, and even some fun interactions about Python software development and Mars exploration. I always appreciate the levels of responsibity and feedback that the community members muster during interacting with students. Helping my fellow GSoC aspirants and seeking help from them is always a refreshing experience. Apart from IRC and Email, I got the chance to video-conference with all the project mentors on two occasions - during my GSoC interview and in the Kickoff meeting after the GSoC selection. This was the first time I had a teleconference interview and I thank IMS for that. It felt more like a sincere discussion of the things that I had in mind for the GSoC project rather than a test of my skills. The trust these guys had in me let me confidently speak out my mind which helped me make my points. The big GSoC Kickoff meeting meetup took place on April 29, 2015 where we all gathered on Google Hangouts to discuss various important points for the upcoming summer of code such as - 1. The importance of blogging 2. Hardware/Software requirements 3. Strategic timeline of events 4. Software engineering guidelines 5. Suggestions of joint code review sessions I helped prepare the meeting minutes for this session since some members faced connection problems to join the Hangout. These are shared in this document. ## Setup and Technologies I have been exposed to an ample number of new concepts and technologies with this project. • Terrain Vehicle Rover - Clearpath Robotics' Husky robot which is ROS-based • Microsoft Kinect Sensor for obtaining body-tracking information • Minoru 3-d webcam for stereo video streaming • Oculus Rift Development Kit 2 for augmented reality applications To this end, I set up my workstation for the project requirements. My current machine configuration for this GSoC project is as follows: -Ubuntu 14.04.2 (Trusty Tahr) -ROS Indigo -Python 3 -Blender 2.74 -Tango Controls 1.99 -Linux Kernel 3.2 -Mercurial 3.4 -Hardware: 8 GB RAM, Intel Core i7 processor, Nvidia 2GB GPU GT650M In the last couple of weeks, I have been busy with setting up the various ROS packages which are required for bodytracking based semi-autonmous teleoperation. The list of ROS packages will be added to the project documentation soon. ## Learning Experience so far I've learnt an unexpected great deal about a lot of different things during this project. I had to do a lot of reading to get up to speed with the existing state of V-ERAS. Franco pointed me to the project documentation pages. I learned about Blender and Blender Game Engine after pulling an all-nighter. FFMPEG followed soon after that where I had to set up a MJPEG streaming server for the BGE client. That was followed by my first experience with PEP8, Mercurial, architecture diagrams, Tango Control system. My GSoC proposal has been an extensive piece of work with 61 revisions and brilliant feedback from my mentors. The proposal can be found here. A more comprehensive description of the project is taken up in this post. To be continued...about ROS, Software Testing, Mapping, algorithms, etc #### GSoC '15 Community Bonding Third Post in the GSoC 2015 series. Here I'll take you through the engaging community bonding experience. ## Introduction to Community Bonding Community Bonding is arguably one of the most important phases of the Google Summer of Code. In the 2015 edition, it took place from April 27 to May 25. This is what the GSoC FAQ has to say about this period - Students get to know mentors, read documentation, get up to speed to begin working on their projects. ## About the community The Italian Mars Society is a highly motivated group of incredibly smart and friendly scientists and developers who share the vision of working towards manned missions to Mars. I have been interacting with the community since March 2015 and I've never looked back. I was interested in the projects even during the brief period when it was unclear whether IMS would be able to participate or not. I'm grateful to the community members for applying under the Python Software Foundation umbrella and giving students like me a brilliant opportunity to explore real world Open Source development. From what I've heard, this organization comes up with the coolest projects for GSoC. And I concur with them - my project seems to blend in all the cool fields required for exploration - Robotics, Body-tracking, Virtual Reality, Oculus Rift, Real-time 3-D video streaming, Augmented Reality, etc. ## Understanding the Codebase To this end, there is a stable amount of software/hardware development shared on the Bitbucket platform. While interacting with Franco and Ezio, I discovered that all students are given write access to the ERAS and V-ERAS repositories using the Mercurial revision control system. This imparts tremendous responsiblity as new developers which I very much appreciate since it fosters trust and makes us mature community members. Going through the codebase a couple of weeks ago, I found well-documented code, almost all of which follows the PEP8 guidelines and written in Python 3. The heart of the V-ERAS project is the Tango Controls server which is a distributed device server for Supervisory Control and Data Acquistion systems. This is ideal for a complex environment like ERAS where multiple hardware and software devices like the Oculus VR, Kinect, Linux Machines, Husky Rover, and Blender Game Engine applications are involved in a distributed setup. The entire networking subsystem of ERAS is well-explained in Ezio's thesis. ## Interacting with the Community My experience with the Italian Mars Society has been memorable and pleasant right from the word go when I first entered the hallowed IMS channel of IRC (Internet Relay Chat) and introduced myself. I was promptly pointed to the right person for my project of interest. Within a single IRC Chat session with Franco, I got a clear idea of what to expect from this GSoC. The IRC channel though frequented by a small number of people is always bustling with activity. We've had fruitful discussion for each and every part of the project - from software architecture diagrams in the proposal, to the collaboration between two GSoC projects, and even some fun interactions about Python software development and Mars exploration. I always appreciate the levels of responsibity and feedback that the community members muster during interacting with students. Helping my fellow GSoC aspirants and seeking help from them is always a refreshing experience. Apart from IRC and Email, I got the chance to video-conference with all the project mentors on two occasions - during my GSoC interview and in the Kickoff meeting after the GSoC selection. This was the first time I had a teleconference interview and I thank IMS for that. It felt more like a sincere discussion of the things that I had in mind for the GSoC project rather than a test of my skills. The trust these guys had in me let me confidently speak out my mind which helped me make my points. The big GSoC Kickoff meeting meetup took place on April 29, 2015 where we all gathered on Google Hangouts to discuss various important points for the upcoming summer of code such as - 1. The importance of blogging 2. Hardware/Software requirements 3. Strategic timeline of events 4. Software engineering guidelines 5. Suggestions of joint code review sessions I helped prepare the meeting minutes for this session since some members faced connection problems to join the Hangout. These are shared in this document. ## Setup and Technologies I have been exposed to an ample number of new concepts and technologies with this project. • Terrain Vehicle Rover - Clearpath Robotics' Husky robot which is ROS-based • Microsoft Kinect Sensor for obtaining body-tracking information • Minoru 3-d webcam for stereo video streaming • Oculus Rift Development Kit 2 for augmented reality applications To this end, I set up my workstation for the project requirements. My current machine configuration for this GSoC project is as follows - -Ubuntu 14.04.2 (Trusty Tahr) -ROS Indigo -Python 3 -Blender 2.74 -Tango Controls 1.99 -Linux Kernel 3.2 -Mercurial 3.4 -Hardware: 8 GB RAM, Intel Core i7 processor, Nvidia 2GB GPU GT650M In the last couple of weeks, I have been busy with setting up the various ROS packages which are required for bodytracking based semi-autonmous teleoperation. The list of ROS packages will be added to the project documentation soon. ## Learning Experience so far I've learnt an unexpected great deal about a lot of different things during this project. I had to do a lot of reading to get up to speed with the existing state of V-ERAS. Franco pointed me to the project documentation pages. I learned about Blender and Blender Game Engine after pulling an all-nighter. FFMPEG followed soon after that where I had to set up a MJPEG streaming server for the BGE client. That was followed by my first experience with PEP8, Mercurial, architecture diagrams, Tango Control system. My GSoC proposal has been an extensive piece of work with 61 revisions and brilliant feedback from my mentors. The proposal can be found here. A more comprehensive description of the project is taken up in this post. To be continued...about ROS, Software Testing, Mapping, algorithms, etc ### Ambar Mehrotra(ERAS Project) #### GSoC 2015: First Biweekly Report The past two weeks have been hectic with my exams going on and finishing just 4 days ago. Choosing the suitable library In the past few days I have been experimenting with the suitable choice of graphical library to use for building the GUI, primary contenders being PyQt, WxPython and kivy. I have worked before on PyQt but WxPython and kivy are completely new to me. I spent a large amount of time reading the kivy docs and trying out the examples. The kivy library seems to be quite stable and is currently under active development. The main thing that fascinated me about it was the support for a markup language. Kivy allows to attach a separate .kv file along with the main kivy app which can be used to define various elements in the application along with their various attributes like size, position, color, etc. This feature allows user to easily modify and add features in case of large applications. A simple example of this can be found here. Similar to kivy, PyQt can also use the ui file generated in QtCreator and use the elements defined in it to make the application dynamic. WxPython too has a couple of editors but they seem to be mostly spotty according to the various comments and user reviews i read on various forums(here, here and a few others). I am mostly planning to stick to either PyQt or kivy for designing the interface and will decide this within a day or two after talking with my mentor. Installing Tango on a VMware instance I setup a VMware instance of Ubuntu 14.10 and setup the Tango server using this Tango Setup guide. I had to format my computer as it had gone haywire and hence i decided to do a fresh Tango installation on a VMware instance so that I don't lose any work in case of future problems. Software Engineering Practices and Guidelines I also went through and soaked in the Software Engineering Practices and Guidelines. I have also begun drafting the Software Architecture Document and it will soon be ready for review and discussions. This project seems to be a great in terms of learning opportunity and the work experience for me. I am eagerly looking forward to working with all the members of the community in developing this project and taking it forward. ## May 22, 2015 ### Ambar Mehrotra(ERAS Project) #### GSoC 2015 with Italian Mars Society 2015 has been a great year so far and it continues to give good news. I got selected for the Google Summer of Code program and will be working with the Italian Mars Society on the ERAS project under the Python Software Foundation umbrella. The European MaRs Analogue Station for Advanced Technologies Integration (ERAS) is a program led by the Italian Mars Society (IMS) whose main goal is to provide an effective test bed for field operation studies in preparation for manned missions to Mars. Preliminarily to its construction, IMS has started the development of an immersive Virtual Reality (VR) simulation of the ERAS Station (V-ERAS). Proposal Abstract • This project aims at developing a generic top level monitoring/alarming interface for the ERAS Habitat that would be able to manage all the relevant information. • Once the skeleton of the top level GUI is in place the aim will shift towards designing a popup / sub-GUI for the health monitor where different data from the V-ERAS will be shown according to the user needs. This will mainly include showing data on a preferential basis rather than showing all the data at once. • The aim will also be to enable the GUI to properly interact and interface with the Tango server and also will be able to leverage the functionality of the existing Tango Alarm Systems / PyAlarm for notifying the user about specific events. • A major portion of the project will aim at making the GUI highly customizable and easy to modify for additional data channels as and when there is a need to do so. I will be mentored by Mario Tambos and Yuval Brodsky. #### Deliverables • Habitat Monitoring GUI • Health Monitoring GUI The final output will be a top level monitoring interface for ERAS Habitat that will be integrated with the Tango Alarm Systems and will be able to manage everything via a highly customizable GUI. It will also include a sub-section specifically dedicated to the health monitor that will take data health related data of the Astronauts in EVA and present it in a manner specified by the user. I am looking forward to a really exciting summer working on this project. ### Vito Gentile(ERAS Project) #### Walking on Mars with Kinect and Python Today is “the” day: the coding period of GSoC 2015 is starting now! But during the last month I had also worked on something related to my project, and with this post I am going to tell you what I have done. For whom is not aware of what I am going to do during this summer, my work will be related to a very nice project by the Italian Mars Society (IMS), called ERAS: the idea is to make an analog station that will be used to train astronauts and allow them to acquire enough competences to reach the red planet (and survive to come back on Earth). But before building a real analog station, a preliminary phase is expected to be completed, and it is called Virtual ERAS or V-ERAS. V-ERAS has the aim to build a virtual martian environment (including a virtual habitable station), and let the astronauts to use for training purposes, but also to help engineers and designers to improve the project of the station and other useful tools. The first V-ERAS-14 mission has been conducted in Italy on December 2014, and it proved the strong potentialities of this system. V-ERAS uses multiple “Microsoft Kinect for Xbox 360” devices to recognize and track user movements. Data taken from Kinects are then sent to a Tango bus, to be available to any other ERAS software module. Basically, Kinect data are used to animate an avatar with the Blender Game Engine. The latter is also responsible to draw the whole virtual martian environment, manage interactions among multiple users and allow them to interface with tools, objects and ATVs, inside or outside the virtual station. In addition to the previous technologies, ERAS also uses a Motivity static omnidirectional treadmill, on which users can move to walk on the martian environment. My project for GSoC has been accepted by the Python Software Foundation (PSF) and IMS, and its title is “Enhancement of Kinect integration in V-ERAS“. You can find all information about it at this link, where descriptions of all the 4 projects related to ERAS and accepted for GSoC 2015 are available. Anyway, to summarize a bit, my project can be divided in four significant steps: • port the existing C# tracker to Python, using a not so common library called PyKinect; • implement a GUI to manage multiple Kinects from a single PC; • implement a better algorithm to improve users’ navigation by estimating user steps on the Motivity treadmill; • integrate touchless gesture recognition support. The first step required some practice with PyKinect, that I had never used before. Despite the recent release of PyKinect2, I will have to use the first version of this library, because it is the only one based on the Microsoft Kinect SDK 1.8 (compliant with Kinect for Xbox 360). You can find some documentation and the source code of PyKinect on GitHub. I had little issues in installating and configuring PyKinect, while the best problem was to find some working samples. I found just one, but it was not perfectly working (although it was a very good starting point), so I decided to share with everyone my findings on how to use PyKinect by creating three simple code snippets, available on my BitBucket account: In these days, I had also the opportunity to better familiarize with BitBucket and Mercurial. I created the first pull request, in order to fix some flaws in the ERAS documentation, and fixed them with Ezio Melotti (my mentor for this project). Together with other people involed in ERAS project (mentors and students) we had a conference call on 29th April to know each other, agree on guidelines and future plans. Everyone can take a look to this very brief report of what we talked about during that meeting. Now it’s time to start coding! I will use PyKinect, but also some other interesting stuff like Pygame and PGU for drawing the GUI. I will keep you updated with other posts like this one! Stay tuned! ### Sahil Shekhawat(PyDy) #### GSoC Project Implementational details ## Aim The aim is to define a system as: ### Jaakko LeppäkangaMNE-Python) #### Starting up Hi all. I was accepted as a GSoC student for the summer of 2015. The task is to improve the MNE-python's interactive visualization capabilities. The work will rely mainly on matplotlib libraries, but other options should be explored if they are available. I have previous experience working with PyQt and Qwt, so it would be quite tempting to utilize them for plotting and interactive functionalities. The problem with that approach are the extra dependencies that come with them. As you all probably know, issues with dependencies can become painful. I feel a little bit quilt for not being able to contribute any more than I have thus far, but it has been quite hectic here in Jyväskylä for the last couple of weeks. I work for Jyväskylä Centre for Interdisciplinary Brain Research (CIBR) and we have our brand new MEG platform just installed. The training sessions are now over and we have an opening ceremony next Friday. Exciting times... Anyway, on Monday I'll be starting with the improvement of the epoch visualization tool. Apparently, some of the users are switching from MNE to some other tools to visualize the data. My aim is to put a stop to that. I'll report more when I get things done. -Jaakko ### Abhijeet Kislay(pgmpy) #### Getting OpenGM to work … I had been trying hard to get accustomed with the OpenGM library so that I get to understand what are the things that I will be needing for my implementation of Approximate algorithms. I will jot down my findings here: Below is the most important image that will help me to write off things quite […] ### Zubin Mithra(pwntools) #### SROP demo examples This week I've been working on writing 2 demo examples for demonstrating how you can use SROP with binjitsu. I've also done a bit of work on the ARM end. I've been testing on an RPi and so far I've been unsuccessful at loading up R0 when the sigreturn call happens(so that's what I'll be working on next week). The example that demonstrates SROP usage can be seen here : https://github.com/binjitsu/examples/pull/4/files The first example has a regular binary that is statically linked against libc and has a very convenient PoC style information leak. We use SROP to execve a shell. The second example is a binary that has a wrapper around the open system call but does not have wrappers for any other system calls. We use this example to demonstrate SROP-ROP integration. First, we make a call to "open" to open up the flag file. Second, we make a call to "sendfile" to the send the file contents over. There is no sendfile wrapper present in the binary so the library automatically switches over to SROP and makes the corresponding system call. A description of the box on which tests were performed can be seen at the top of the patch in the above link. ### Brett Morris(Astropy) #### Post Zero: Syntax Highlighing in Blogger ## Introductions The 2015 Google Summer of Code is officially underway! I've met my mentors/teammates through a few very productive Google Hangouts with participating astronomers from Germany to Hawaii and it's almost time to get to work. Before the blog posts start flowing about astronomy, astropy and observations, I'm going to make a meta blog post about making blog posts. I trolled the internet for a convenient way to blog via iPython notebooks – my preferred medium for coding, documentation and sharing – and found it to be extremely difficult. Some solutions exist but I couldn't get any of them working satisfactorily in a reasonable amount of time, so I'm sticking with simple Blogger posts in the hopes that I can spend the time that I would have spent wrestling with Nikola and Pelican writing useful blog posts instead. In that spirit, I want to help propagate some useful instructions I used to set up my Blogger blog for writing posts about coding in Python. All of the credit for these tips goes to Mahesh Meniya via stackoverflow. ## Setting up SyntaxHighlighter If you need syntax highlighting in your Blogger blog, you should start on your Blogger homepage and click the drop down menu next to the "Go To Post List" button to reach your "Template" page. Click the "Edit HTML" button to get into the guts of your Blogger posts. Now on your screen you should see a whole lot of syntax-highlighted HTML code, which we're going to edit. Click in that code window and do Ctrl+F/Cmd+F to search for </b:skin>. You'll see that the code inside the b:skin tag is folded. Click the black arrow next to the b:skin tags to expand and see the code inside. Just before the </b:skin> tag, paste all of the text on this webpage. Next use Find again to search for the </head> tag, above which you should paste the following: For our final insertion, find the </body> tag, and place above it: <script language='javascript'>dp.SyntaxHighlighter.BloggerMode();dp.SyntaxHighlighter.HighlightAll('code');</script> and you're good to go! Now create a post to test it out. In the new post, switch from Compose mode to HTML mode, and enter some code that you'd like to post, like this:  <pre name="code" class="python"> print("Hello world!") </pre> and it will render like this: print("Hello world!") Great job! Now tell us about that code of yours. ### Deepak Garg(pgmpy) #### GSoC 2015 with pgmpy This year I am working with pgmpy which is a Python library for Probabilistic Graphical Models (PGM) and my project is to add state name support to pgmpy. Graphical Models are a fairly new technique in machine learning which allow us to compactly represent joint distribution over some set of random variables and also allows us to efficiently compute marginals and conditional marginals over these variables. The random variables have states which they can attain. For example, the random variable for the result of a coin toss can attain two states heads or tails. Similarly, when working with Graphical Models each of the random variables have specified states that they can be in. Let's take the famous student example of a Bayesian Network: The student network In the above figure you can see a set of random variables connected to each other using directed arrows. And with each variable is an associated table known as Conditional Probability Table or CPT. And here we have used numbers to represent the states of the variables like d0, d1 etc. So for the variable Difficulty we can have two states easy and hard which have been represented by 0 and 1. pgmpy also represents the state names using number internally. But for a user it is much better to provide input or get output of state as the name rather than the number. And pgmpy lacks this functionality right now and therefore the users have to manually keep a track of which number represented which state. This summer I will be adding this functionality to pgmpy so that the user can work with both state name or state number. I am still having discussion with my mentors about the best ways to do this and will write about the exact implementation details in my next blog post. ### Mark Wronkiewicz(MNE-Python) Coding-Day (C-Day) minus 5 My motivation for joining Google Summer of Code (GSoC) comes from the idea that we can use our brains to write code that will allow us to better understand neural coding within the brain. Towards this goal, my project is about developing code for recording and analyzing signals from the active human brain. There exists a wide range of ways to peer into the brain, and one method of parsing the different recording strategies is by judging their “invasiveness” (essentially how much trauma you cause in order to get a particular brain signal). In humans, we typically record brain activity completely non-invasively using techniques called magnetoencephalography (MEG; roughly translated as magnetic brain graph) or its electric counterpart, electroencephalograpy (EEG; roughly translated as electric brain graph). These passive recording methods are completely safe as they don’t require surgery, but the measurements tend to be more contaminated with noise and have relatively worse resolution in space when compared to invasive techniques. Removing noise from brain recordings is, therefore, one of the biggest challenges faced by MEG (and EEG) researchers. In magnetoencephalography, anything from the heart beating, to a nearby moving elevator or train, or to the Earth’s magnetic field can cause interference that’s much stronger than the brain’s incredibly feeble magnetic signals. In terms of signal-to-noise ratio, the challenge is akin to hearing a pin drop next to a jet engine; even so, MEG research still provides meaningful results with the right strategy. Artifact rejection, aimed at removing noise, is a major research endeavor within neuroscience and engineering. Artifact rejection can be as basic as using a threshold to throw out experimental trials when a signal was too strong to have come from the brain. On the other hand, it can be as convoluted as determining vector subspaces occupied mainly by spurious noise sources and projecting them out with matrix multiplication. Past researchers have made quite a bit of progress in artifact rejection, but it’s important to note that many of the methods were hard to generalize and only appropriate for a specific type of noise and dataset, so you almost need to tape, tack, and glue a medley of filters together to address all sources of contamination. In 2005, a refreshing dose of physics was injected into the MEG and artifact rejection literature. Using only Maxwell’s equations and the geometry of the magnetic sensors, a new method called “Signal Space Separation” (SSS) allowed signals from inside an imaginary sphere enveloping the MEG sensors to be separated from the interference originating outside this sensor sphere. If you find this terribly esoteric, here’s an analogy: Imagine this “MEG sensor sphere” is one of those upside-downhead steamer bowls you find in a salon – only this steamer can record your brain activity 1000 times per second down to femto-Tesla resolution. You need to know if the magnetic measurements being made by your (soon-to-be-patented) steamer bowl are coming from inside the bowl (i.e., your head) or from the electrical engineering class next door learning how to make a frog levitate with a dangerously overpowered MRI device. Just by knowing the exact shape of this steamer bowl, and one of the most important sets of physics equations from the 19th century, this becomes possible with SSS. This method is not specific for a single noise type (e.g., the heart, implanted metal, or other environmental sources of noise). Instead, it uses spherical harmonics to separate signals by their spatial complexity (as characterized by the spherical harmonic degree and order). In normal speak, there are certain patterns of recorded activity that are too complex to be produced by something inside your salon steamer (i.e., your brain) and must come from some external noise source; by splitting up the signal into a new way of representing the data (i.e., projecting it onto a different “basis”), you can separate signal from noise and obtain nice clean MEG recordings. My task is to take this beautiful finding and apply it to the MNE-Python library in two ways. First, I will implement the Maxwell filtering method into our open-source library, making the algorithm more transparent and easily applicable for those without the proprietary methods. This permits all the fancy noise rejection I’ve been describing up to this point. Second, I’m planning to include the code to allow all processing and source imagingcalculations to be accomplished in SSS space. The reasoning behind this approach is vague unless you have digested the (mathematically harrowing) literature on this subject. Fortunately, the outcomes are not; storing and analyzing data in SSS space should theoretically use only 1/3 the memory storage of the current sensor-based method. It will also reduce noise from subjects moving their heads in the scanner (making source localization more accurate; especially in infant MEG studies) and permit specific investigations of noise. Oh – and filtering signals post-hoc with software means you don’t need to drop (quite as many) hundreds of thousands of dollars on a triple-shielded high mu MEG shielding room every time you want to set up an MEG scanner. #### ERROR: Can't connect to home.iitk.ac.in:80 (Bad hostname) ### Yue Liu(pwntools) #### GSOC2015 Community Bonding Week 04 week sync 04 ## Last week: Automatic building rop chain for x86/x64/arm elf. Using topological sorting to solve two issues: Others: • Do a survey on Amoco project https://github.com/bdcht/amoco. • topological sorting see __build_top_sort() function. • Rewirte X64/ARM ROP chain building methods, see __build_x64(), __build_arm() ## Next week: • Need extract gadgets finding to a python Class. • Try to using Amoco instead of BARF project. • Merge to new rop module. #### GSOC2015 Community Bonding Week 04 week sync 04 ## Last week: Automatic building rop chain for x86/x64/arm elf. Using topological sorting to solve two issues: Others: • Do a survey on Amoco project https://github.com/bdcht/amoco. • topological sorting see __build_top_sort() function. • Rewirte X64/ARM ROP chain building methods, see __build_x64(), __build_arm() ## Next week: • Need extract gadgets finding to a python Class. • Try to using Amoco instead of BARF project. • Merge to new rop module. ## May 21, 2015 ### Aniruddh Kanojia(Qtile) #### Yay GSOC !! Hi I am Aniruddh and I am doing GSOC under the sub-org Qtile. I would firstly like to thank my mentors for having faith in me and choosing me for this programme. My project is improving serialization of Qtile. During the community bonding period I began talks with my mentor as to which approach will be the most suitable. Apart from that nothing else to report as of now. Cheers, Aniruddh Kanojia ### Stefan Richthofer(Jython) #### Getting started... So I am at Gsoc now and really happy that everything worked out so well. Huge thanks to Jim Baker for making this possible! Now let me introduce you to my project. I am the creator of the JyNI-project, see www.jyni.org for more details. Gsoc sponsors the development of the next important milestone: GC-support and hopefully also support for the ctypes-extension. But what am I talking here... just take look at my Gsoc-abstract: JyNI is a compatibility layer with the goal to enable Jython to use native CPython extensions like NumPy or SciPy. It already supports a fair part of Python's C-API and is for instance capable of running basic TKinter code (currently linux-only). However, a main show-stopper for a production-release is its lack of garbage collection. The gap between Jython- (i.e. Java-)gc and CPython-gc goes far beyond the difference between reference-counting- and mark-and-sweep-based approaches. Even more important is the philosophy regarding gc in native interfaces. While CPython exposes its gc-mechanism in native API, allowing native extensions to leverage it by following some instructions (i.e. perform reference counting and gc-registration), Java's native interface (JNI) leaves memory management completely to the native code. As a preparation for this proposal I worked out a concept how native gc can be emulated for JyNI in a way that is (almost 100%) fully compatible with CPython's native gc. That means a native extension written for CPython would run without modification on JyNI, having gc-emulation behave consistently to Jython gc. This includes weak referencing, finalization and object resurrection (aspects that are famously known to make clean gc support so hard). JyNI-support for weak references includes providing the PyWeakReference builtin-type, which is currently the main show-stopper to support the original native ctypes-extension. Thus, as an optional/secondary goal I propose to complete JyNI-support of the ctypes-extension. This is proposed softly, under the assumption that no further, still unrecognized hard show-stoppers for that extension come up. In a few days bonding period will be over and things will get serious. Since I did not have to learn much Jython- and JyNI-internals any more, I could use this period to close some open endings in JyNI which were actually unrelated to the Gsoc project, but -however- having this done feels much better now. I also fixed two related Jython-issues. Finally I reasoned about the Gsoc-project and think it would be a crucial debugging-feature if JyNI could dump a memory-allocation history of all its native references at any time (with Java GC-runs in time-line please!). To achieve this, I am currently writing the ReferenceMonitor-class, which will also expose this information in a way that allows to write gc-related unittests. (As this monitoring functionality was not considered in the timeline, it would be good to get it done before coding officially starts.) Recently Jim Baker - mentor of this project - managed to get two JRuby developers into an email-discussion with us and we gathered some interesting details how JRuby handles/plans to handle C-extension API. Thanks for this interesting thread! ## May 20, 2015 ### Manuel Paz Arribas(Astropy) #### GSoC bonding period The bonding period of the Google Summer of Code is on going, and I have been in touch with the mentors of my project about once a week. During this time (and also during the application process) I have been getting familiarized with the tools needed for the project: install an appropriate Conda environment, learn how to use git and Github, install the development versions of Astropy and Gammapy and make some pull requests for merging some code, add some documentations, and correct some (minor) bugs. In addition, together with my mentors, I have started working on the API (application programming interface) for the toolbox for background modeling for Gammapy. The approach is to start writing down some typical use cases and from there write a high-level pseudo-code that should work with minimal modifications once the necessary tools are implemented. One use case is for instance: having a list of off run event lists from a IACT system (such as H.E.S.S. or CTA), develop the necessary tools to divide the list of runs into specific bins according to the observation properties (for instance altitude and azimuth angles of the IACT system) and build bg templates for each bin. #### Project description Gamma-ray astronomy has experienced a fast development in the past 2 decades with both ground-based imaging atmospheric Cherenkov telescope (IACT) experiments like H.E.S.S., MAGIC and VERITAS (ref: ) and satellites like Fermi. In addition the next generation of IACT experiment CTA is in its prototype phase. The instruments developed for detecting gamma-rays accumulate many background events. The majority is rejected either by using an intelligent triggering system for the detectors, or in early stages of the data analysis. Unfortunately, there is still a dominant fraction of events passing the gamma-ray selection cuts, called the gamma-like background. In order to extract the gamma-ray signal, clever algorithms for modeling the gamma-like background are essential. During the GSoC 2015 I will be working for the Gammapy project. Gammapy is an open source (BSD licensed) gamma-ray astronomy Python package. It is an in-development affiliated package of Astropy, a community effort to develop a single core package for Astronomy in Python. Gammapy builds on the core scientific Python stack to provide tools to simulate and analyze the gamma-ray sky for telescopes such as Fermi, H.E.S.S. and CTA. Specifically I will implement the most successful background modeling methods largely in use by the gamma-ray community in the Astropy/Gammapy framework. The background methods can be classified in two categories, according to the observation strategy: 1. Background models from OFF observations, where the background is modeled from observations far away from any known sources. IACT experiments require dedicated OFF-source observations for modeling the background. 2. Background models from ON observations, where the background is modeled using observation within or close-by to the region of interest (a.k.a. ON region). For more details about background modeling you can read my proposal or the paper by Berge 2007. In a first step I will implement tools to create background model templates from observations with no or only a few gamma-ray sources in the field of view. In a second step, I will develop algorithms to estimate the background in observations containing gamma-ray sources to detect them and measure their spatial shape and energy spectrum, in some cases using the model templates from the first step. ### Siddhant Shrivastava(ERAS Project) #### GSoC 2015 with the Italian Mars Society I got accepted into the eleventh edition of the Google Summer of Code program (GSoC 2015) with the Python Software Foundation umbrella organization. The list of selected students was announced on 28th April, 2015. More specifically, I'll be working with the Italian Mars Society under the ERAS (European MaRs Analogue Station) project. Quoting from the source - The European MaRs Analogue Station for Advanced Technologies Integration (ERAS) is a program spearheaded by the Italian Mars Society (IMS) which main goal is to provide an effective test bed for field operation studies in preparation for manned missions to Mars. The focus of this GSoC project is Virtual Reality based Telerobotics for V-ERAS. Virtual European Mars Analog Station (V-ERAS) is based on immersive real-time environment simulations running on top of the Blender Game Engine (BGE). This project has three distinct components - 1. A ROS-Kinect interface for the Teleoperative control of the Clearpath Husky Robot rover's motion via human body-tracking. 2. Streaming the 3-D stereo camera video feed from the rover to BGE over the network. 3. Processing the video feed into an Augmented Reality experience through a head-mounted Virtual Reality device. The goal of this V-ERAS project is thus to develop a software and hardware system that enhances the capabilities of the crew members preparing for Mars missions. I feel elated to be a part of the Italian Mars Society and be able to contribute towards manned space exploration which is one of the vital aims of the next two decades. GSoC marks my first foray into the world of collaborative Open Source software development. I shall be mentored by two really cool people - Yuval Brodsky and Fabio Nigi with whom I share my interests in space exploration, robotics, networks, and free software. In addition, I'll be constantly interacting with the IMS-ERAS community - Franco Carbognani, Ezio Melotti, Mario Tambos, Ambar Mehrotra, Shridhar Mishra, Vito Gentile. Thank you Google for this unique birthday gift :) Looking forward to a great and challenging summer of Code! I'll share the details of the project in the next post in this series. ## May 18, 2015 #### ERROR: Can't connect to home.iitk.ac.in:80 (Network is unreachable) ## May 17, 2015 ### Rupak Kumar Das(SunPy) #### A sunny project It’s been an exciting week reading code and documentation. With one week to go, I am trying my best to be ready for the coding period. Anyways, let me explain what my project is all about. My project goals have changed – the original dealt with bringing support for IRIS(a satellite for observing the sun) and other 3D+ data formats(used to store the solar data) to SunPy by integrating a module which was previously developed by a SOCIS(a program like GSOC) student and creating a plugin for Ginga(a toolkit designed for building viewers for scientific image data in Python), which would allow basic manipulation and analysis of solar data. But, my org decided to bring back the student to rewrite the module from scratch, so my mentors decided to change the scope of the project. So I will be working on plugins for Ginga – basically adding features to it that would help the user to easily analyse data. A few goals – • Cuts plugin upgrade • Creating a new slit plugin • Better intensity scaling support • WCS support Qt will be used for the plugins so finally, I am going to learn to use a GUI Toolkit(GUI development always looked tough). No excuses this time! ### Siddharth Bhat(VisPy) #### Blog entry #(-1): A bit of information Setting up my development environment and getting into work with VisPy ### Yask Srivastava(MoinMoin) #### Fun in the midst of college exams Fun is working on your own projects! I was working for a while on my pet Facebook application TodJokes. It’s very close to the final developement stage now. It’s an awesome web application built with Django framework which works on top of lots of Facebook APIs and scrapping scripts to make Tribe of daradnaak jokes content more organized and interactive. It’s approval from Facebook is pending so not available for users at the moment. But here is the video demo of the app I recorded from my local developement machine: It has tons of amazing features and I am sure this application will blow your mind. :) Also started working on themes for Moin Moin. We will be using CSS pre-processor Less. I was personally in favor of using it since it makes developement flow very organized and future debugging and changes won’t be hectic since Bootstrap has a very organized collection of Less files. Instead of using grunt to compile my Less files I am using a paid app called CodeKit. Its easy to use and I love auto reload feature as it lets me see visual changes while editing the theme files. I love material design color scheme so this is my implementation of the basic theme for Moin Moin. As suggested by mentors the current color scheme doesn’t fully match the logo design collors and there should be a better contrast for text for readability so this requires more tweaks. I also learnt a handy trick on sublime text for debugging unclosed div tags. Press command + shift + A to highlight everything inside a particular div. CS exam tomorrow ! Wish me luck :) ### Abhijeet Kislay(pgmpy) #### GSoC 2015 It had been quite an awesome thing to be selected in Google Summer of Code for the second time. Along with it, I also got an offer to pursue masters in Johns Hopkins University in Computer Science in upcoming Fall. The god has been great to me! I thank almighty for all this . Okay, […] ### Aron Barreira Bordin(Kivy) #### Kivy Designer Roadmap Hi! In the next week, I'm going to start the Kivy Designer development, I'm very excited about it :) I'd like to list my goals with this project, and how you can help me to create a good version. ## My proposal I'll be working in a more stable and useful version of Kivy Designer. I'll be fully integrated with Hanga, Buildozer and Git to help the development process. In these integrations, I hope to work on an IDE that can make mobile development with Python easy. The Kivy Designer interface is not completely user friendly, some tasks, editing a .kv file or a buildozer spec file sometimes can be boring. I hope to improve it. Actually the Kivy Designer has no tutorials neither documentation. So I'll create some guides and examples to show to new developers how to get started with this IDE. I have a big list of open issues to work in the next months, you can read this list here. And to release a really good version, I'll need your help :) ## Can you help me? Kivy Designer is not yet a powerful IDE, so I'll need your help with it :) Take a look on how you can help me: • Have you found a bug? • Please, I know that it has a lot of bugs now, but the best way to help me with it is reporting. • Is there something missing on Kivy Designer? • I really need more ideas. If you'd like to see any kind of new feature on Kivy Designer, please, just let me know. • Should it be different? • If you don't like how something works on Kivy Designer or you just fell uncomfortable about something, please, just let me know. • Do you have an idea?? • I love ideas. Share it with me :) • Contributions are always welcome:) • Kivy projects have a big and helpful community. If you are interested, contributions are always welcome :) If you know how to help me with the topics above, please, contact me with aron.bordin@gmail.com, aronbordin on IRC, or just open an issue in my fork. Useful links: I'm aronbordin on IRC. ### The main idea of this post it to get ideas from you. Please, if there is something that you can do to help me with this project, don't hesitate to contact me. Thank you for reading, Aron Bordin. ## May 16, 2015 ### Chienli Ma(Theano) #### First Gift From Google Several days a ago, I reveived a Fedex package. When I opened it … Bomb! #### Payoneer pre-paid MasterCard with Google icon on it! Good looking card. I guass someone will regret for using existing bank account. Also this is my first own credit card. Finally I can make payment without a phone from my dad :) Google is ready to pay me now. “Are you OK?”( Leijun’s ‘Chinglish’ ) Kinda occupied in the last week. Luckily, I finished those buinesses. And now I am back again. The first feature I will add to theano is a function that allow user to make a copy of function and allow functions run multi-theadedly. There exist two similar features: pickle() and copy() of a function. To figure how I need to work. I need to take 3 steps as preparation: • Look into the code and see how function is generated Done • Look into the code and understand Function, FunctionMaker, Linker and FunctionGraph. next • Have a look at pickle and copy(), use them, figure out how them work and the differneces. Then think about my own idea. Ok, now I need to go and take the step two now. ## May 15, 2015 ### Lucas van Dijk(VisPy) #### GSoC 2015: Visualizing networks with Vispy So, let's revive this blog a little bit! I'm happy to announce that I've been accepted to the Google Summer of Code 2015! This time as student under the umbrella organisation of my favourite programming language, the Python Foundation! I'll be working on Vispy, a relatively young scientific visualization library, which uses the GPU intensively to achieve high performance even when visualizing large datasets, or when you require real-time interactivity. Read on to see what I want to achieve this summer! ### Yue Liu(pwntools) #### GSOC2015 Community Bonding Week 03 week sync 03 ## Last week: • Support x86/x64/arm32 now. • Tests for x86/x64/arm: • Done a ARM ROP example. • Fuctions backward compatibility. • Fix lots of bugs. • Support specify shared library's base address. • Optimize for ELF32 gadgets verify. ## Next week: • Merge eQu1NoX's rop code. • Go on optimizing the performance of this module. • Go on to do some ARM ROP examples. ## May 14, 2015 ### Pratyaksh Sharma(pgmpy) #### It's time. Maybe a bit late to congratulate me, but I've been selected for Google Summer of Code 2015! ### What's my project? My accepted proposal is to implement Markov-Chain Monte-Carlo (MCMC) algorithms for approximate inference in probabilistic graphical models. I'll be working with a fairly young (and proliferating) organisation called pgmpy. It comes under the umbrella of the Python Software Foundation (PSF), and this means I'm going to code in python (yay!). ### What's pgmpy? Graphical models are incredibly powerful constructs and have vast applications. Being this important means that basically a large number of people use them in their code. The lack of a robust and fully featured library led to everybody maintaining their own implementations. pgmpy is an attempt (a good one!) to provide such a library for graphical models for python users. The code is maintained at their GitHub. Stalking their commit logs, it seems that the Initial commit dates back to September 20, 2013. Hints that the project is fairly new! ### How's it going? I got to know of my selection towards the end of last month, but I was quite caught up with my finals at that time. Finals are over now, and I'm a week into my summer vacation. As per the program's timeline, it's the "Community bonding period" till May 25. It's just a period during which I should discuss my project with the assigned mentors, get familiar with the code base and do a rough sketch of my implementation. We (the mentors and accepted students) had an IRC chat the day before and discussed the plan ahead. I'm reading up about algorithms that I can implement and will shortly pen down the plan for myself. Don't think I can wait till May 25 to start coding. ## May 13, 2015 ### Palash Ahuja(pgmpy) #### Dynamic Bayesian Networks in pgmpy I am really excited for my selection in GSoc'15. I hope that I will live up to the expectations of my mentors and be able to complete my project in the allotted duration. My project is about dynamic Bayesian networks, which is time-variant extension of the static Bayesian networks,and have a wide range of applications in protein sequencing and voice recognition systems. So what is a Bayesian network? Bayesian network is a directed acyclic graph(DAG) that is an efficient and compact representation for a set of conditional independence assumptions about distributions. They are an elegant framework for learning models from data that can be combined with prior expert knowledge. Here is what a static Bayesian Network looks like, (The conventional student example presented in the book Probabilistic Graphical Models:Principles and Techniques by Daphne Koller and Nir Freidman is as follows.) The above directed graph tries to represent the random variables as nodes in a graph.These nodes represent the random variables and the edges represent the direct influence of one variable of one another. Here we are trying to monitor the grade the student that the student is going to get, which is conditionally dependent on the difficulty of the subject. The recommendation letter assigned to the student will now be stochastically dependent on the grade assigned by the professor. Here I have assumed that the grade assigned by the professor is ternary valued and the rest of the variables are secondary valued.$Difficulty :- Domain = Val(D) = \{d^0(easy),d^1(hard)\}Grade:- Domain = Val(G) = \{g^1,g^2,g^3\}Letter:- Domain = Val(L) = \{l^0(strong),l^1(weak)\}Intelligence:- Domain = Val(I) = \{i^0,i^1\}SAT:- Domain = Val(S) = \{s^0, s^1\}$In general each random variable is associated with a Conditional Probability Distribution also called as a CPD that specifies the distribution over the values of the random variable associated with its parents.The CPD encodes the distribution of the variables and help in precisely determining the output of the variable. Here is what a CPD encoded Bayesian network would look like:- One such model$P(I)$represents the distribution of intelligent versus the less intelligent students. Another model$P(D)$represents the distribution such that the difficult classes are distinguished from the lesser difficult ones. (Let's call the above Bayesian Network as$B_{Student}$for future reference) Let's consider some particular case for this example.$P(i^0,d^1,g^2,s^1,l^0)$The probability of this event can be computed from the events comprising it so by this formula by the chain rule$P(I,D,G,S,L) = P(D)P(I)P(G|I,D)P(S|I)P(L|G)$Thus the probability of this state can be used by the formula, so the probability of this event can be given by$P(i^0,d^1,g^2,s^1,l^0) = P(d^1)P(i^0)P(g^2|i^0, d^1)P(s^1|i^0)P(l^0|g^2) = 0.4*0.7*0.25*0.05*0.4 = 0.014 $Thus using the chain rule we can compute the probability of any given state. Now, let's come to Dynamic Bayesian Networks Dynamic Bayesian Networks(DBN's) are static Bayesian networks that are modeled over an arrangement of time-series. In a Dynamic Bayesian Network, each time slice is conditionally dependent on the previous one.Suppose that$B_{student}$is duplicated for a time series to form a two time slice Bayesian Network(2-TBN). (The variables are shown with a single letter notation for a compact representation.The value in the subscript notation denotes the value of the time series that the variable belongs to),it will look as follows Assuming that the output$l$is the observed output(or the evidence) along the course of time,there are two sets of edges in the above Bayesian Network. The first set of edges are the intra-slice edges that represent the influence between the random variables inside the original static Bayesian Network. Thus, the topology of the initial Bayesian Network is same as that of the$B_{student}$. The second set of the edges are the inter-slice edges that represent the conditional influence between random variables in two different slices that are adjacent to each other(here the underlying assumption is that this a 2-TBN for a lesser complexity.If this were a n-TBN, the inter-slice edges could branch out from the first slice to any of the n-th time slice). The probabilities among the original distribution now determine the probabilities in the successive time series.The conditional influence between$D^0$and$G^0$will remain the same as that of$D^1$and$G^1$.Thus the CPD's among the intra-slice edges will remain the same. However there will be a requirement for CPD's along the inter slice edges, that will further provide more information about the random variables in other states. Now if we were to determine the probability of$P(L_{1}) $, it would not only be dependent on the random variables that are present in the time series but also the previous state's variables too. In the next blog post, I will further inform how to compute the conditional probabilities in a DBN. Here is another complicated example that could demonstrate how tedious DBN's could look. This example is of a BATmobile: Towards a Bayesian Automated Taxi suggested by forbes. . Source:-http://bnt.googlecode.com/svn/trunk/docs/usage_dbn.html ### Richard Plangger(PyPy) #### Trace superword parallelism in PyPy In this post would like to present the idea of the vectorization optimization that will be implemented in the PyPy JIT compiler and compare it to vectorization algorithm of an ahead of time compiler. The big picture The benefit of executing vector statements is hardware support. SSE/AVX/NEON are able to execute vector statements nearly as efficient as doing the computation on only one element in the vector. This speeds up numerical applications. However, if you application only spends a fraction of its time executing vector statements the gain in speed will be insignificant. Let's start The idea is based on work done in these two sources: Exploiting superword level parallelism with multimedia instruction sets and Compiler optimizations for processors with SIMD instructions (see full references at the end). To understand the following paragraphs the notion of a dependency graph might help. A dependency graph has nodes for each instruction of a basic block or trace instructions. If instruction A at position X depends on instruction B at position Y (Y < X), then B must be executed before A. Using the graph it is possible verify if two instructions can be executed at the same time (they are not dependent). Don't despair! The description below contains a lot of domain specific terminology. It is not easy to describe the optimization technique without those and if you are not into programming compilers/VMs you might have a hard time understanding it. Single Instruction Multiple Data (SIMD) As the name of SIMD suggests, such an instruction applies an operation to multiple data elements. SSE or AVX would be examples of instruction extensions that allow e.g. arithmetic to be executed on multiple data elements. The size in bytes of a vector register is usually 16-64 depending on the instruction set available on the x86 CPU. The classical approach One of the best sources for data elements are (you guessed it) arrays. Loops that iterate over arrays are the target of vectorizing algorithms in compiler backends. Let's consider an example: a,b,c = ... # arrays of 32 bit floats for i in range(30): a[i] = b[i] + c[i] This is one of the most basic examples to show the power of vectorization. If a vector register would hold up to 30 elements of a 32 bit float the loop could be replaced with a single vector statement. a(0:30) = b(0:30) + c(0:30) # Fortran notation If the hardware is able to execute 30 floating point operations at the same time the loop is finished 30 times faster. Having that said, this is the theoretical optimal speedup. In reality things are not that simple, but these potentially will speedup for your numerical application. Preserving semantics How does an optimizer transform such a loop? This is where dependencies between statements of the loop and statements of an iteration come into play. Lets adapt the loop from the previous example: for i in range(1,29): a[i] = b[i-1] + c[i] # S_1 b[i] = a[i] * 2 # S_2 Lets construct the dependencies on a statement level and across iterations. The solid dependency edge (denoted with delta) is a true dependency. a[i] is written in S_1 and used in the rhs expression of S_2. The dashed line is a loop carried dependency. At iteration i > 1, the value of b[i-1] is read of the previous iteration. Each step has a previous step that needs to be completed first, making this example impossible to vectorize (If the loop is unrolled once, it is easier to see that S_2 depends on S_1 across iterations). Let's relax the previous example: for i in range(1,29): a[i] = b[i-1] + c[i] # S_1 b[i] = c[i] * 2 # S_2 Now S_2 does not depend on S_1, thus it is possible to swap the instructions in the loop body and pre compute all values of b[1:29] using vector instruction. Yielding the result: b(1:29) = c(1:29) * 2 # S_2 a(1:29) = b(0:28) + c(0:29) # S_1 Vectorization on loop basis needs a cyclic dependency graph and a way to find cycles. Graph algorithms that operate on cyclic graphs often have worse complexity than acyclic graphs or trees. A new approach to vectorize basic blocks The algorithm chosen and already (partly) implemented in the PyPy JIT backend must be able to operate on a trace. A trace represents a sequence of instructions that where executed in the user program quite frequently. Let's have a look at a trace that could have been generated from the first loop snippet (unrolled once): # loop for i in range(30): a[i] = b[i] + c[i] # trace: label(a,b,c,i) j = i + 1 guard_true(j <= 30) ai = load(a, i) # S_3 bi = load(b, i) # S_4 ci = ai + bi # S_5 store(c, i, ci) # S_6 k = j + 1 guard_true(k <= 30) aj = load(a, j) # S_9 bj = load(b, j) # S_10 cj = aj + bj # S_11 store(c, j, cj) # S_12 jump(a,b,c,k) The trick to find vectorizable statements is very simple: We know that the indices i and j are used to load and store from arrays. i = j + 1, thus statement S_3 and S_9 access two elements that are adjacent in the array a. We record them as a pair (S_3,S_9). The same is true for (S_4,S_10) and (S_6,S_12). Now the last missing tile in the puzzle is the addition instructions S_5 and S_11. Lets consider the dependency graph (slightly more complicated): The sky blue lines from the load operations provide a lead to the two addition operations. The algorithm adds new pairs by following the definition-use and use-definition edges in the dependency graph. Thus yielding the pair (S_5, S_11). Having all pairs the algorithm tries to extend pairs that are adjacent to each other. In this example above, there is nothing that can be merged, but if the loop is unrolled once more, there will be pairs that can be merged. If a pair is merged it is called a pack. As an example if there are two pairs (a,b), (b,c) they are merged into the pack (a,b,c). Operations that are independent and adjacent in memory have been merged into packs. The last step is to schedule the instructions using the acyclic dependency graph (see picture above). Scheduling tries to emit grouped operations. For the pair (S_3,S_9) it will schedule all other nodes until both have no preceding dependency and emits an instruction v_ai = vec_load(a, i, 2) operation instead of two separate load instructions. The resulting trace looks similar to: label(a,b,c,i) j = i + 1 guard_true(j <= 30) k = j + 1 guard_true(k <= 30) v_ai = vec_load(a, i, 2) v_bi = vec_load(b, i, 2) v_ci = vec_add(v_ai,v_bi,2) vec_store(c, i, ci) jump(a,b,c,k) If we are able to get rid of the redundant index calculation and the weaker guard(j <= 30), then the algorithm managed to let the loop finish twice as fast. This will be subject on the next post. Implementation progress Things are partly working already and some problems have already been solved. The PyPy backend in my vecopt branch can build dependency graphs of traces, is able to unroll them and apply the algorithm on that unrolled trace. So it is able to group instructions and emits vector operations. In addition I adapted the x86 backend to emit SSE2 vector instructions and tested it both in the test suite and build a sample interpreter that suddenly was able to boost the loop of 32-bit integer arithmetic 4x faster (wow!). Still there is a lot todo this summer! I started to test more complicated cases of NumPy traces and currently work on constant and variable expansion. Summary In this post I outlined how an ahead of time compiler vectorizes loops. This is not feasible because of the fact that traces do not carry explicit loop information in the loop header. In addition the dependency graph is cyclic and cycle checking is needed. It is also hard to apply loop distribution and loop fission on a trace (it is not easy to reconstruct resume state from a guard in a custom tailored trace). Basic blocks also suitable to vectorize statements. Even if the loop is not unrolled, if there is parallelism on statement level, the algorithm will find it. In fact the only few tricks are needed to reschedule traces in a vectorized form: unroll the loop, find pairs, extend them to packs and reschedule them. References Larsen, Samuel, and Saman Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. Vol. 35. No. 5. ACM, 2000. Pryanishnikov, Ivan, Andreas Krall, and Nigel Horspool. "Compiler optimizations for processors with SIMD instructions." Software: Practice and Experience 37.1 (2007): 93-113. ### Siddharth Bhat(VisPy) #### Welcome to GsoC! Initial GsoC proposal to VisPy ### Zubin Mithra(pwntools) #### Upto May 13th - Integrating SROP with ROP I've been working on the integration of SROP into ROP and its done! You can view the pull request here. Right now, if you use ROP to call a function that is unresolvable, but has a corresponding syscall, the ROP module will automatically switch to use SROP to invoke that particular syscall with the correct arguments. In the event that rop.base is specified, you can even continue your rop/srop chain right where you left off. The doctest here is a nice example of how this feature comes works out, so I'm just going to copy it here for your convenience.  >>> write('/tmp/rop_elf_x86', make_elf(asm('int 0x80; ret; add esp, 0x10; ret; pop eax; ret'))) >>> e = ELF('/tmp/rop_elf_x86') >>> e.symbols['funcname'] = e.address + 0x1234 >>> r = ROP(e) >>> r.funcname(1, 2) >>> r.funcname(3) >>> r.execve(4, 5, 6) >>> print r.dump() 0x0000: 0x8049288 (funcname) 0x0004: 0x8048057 (add esp, 0x10; ret) 0x0008: 0x1 0x000c: 0x2 0x0010: '' 0x0014: '' 0x0018: 0x8049288 (funcname) 0x001c: 0x804805b (pop eax; ret) 0x0020: 0x3 0x0024: 0x804805b (pop eax; ret) 0x0028: 0x77 0x002c: 0x8048054 (int 0x80) 0x0030: 0x0 (gs) 0x0034: 0x0 (fs) 0x0038: 0x0 (es) 0x003c: 0x0 (ds) 0x0040: 0x0 (edi) 0x0044: 0x0 (esi) 0x0048: 0x0 (ebp) 0x004c: 0x0 (esp) 0x0050: 0x4 (ebx) 0x0054: 0x6 (edx) 0x0058: 0x5 (ecx) 0x005c: 0xb (eax) 0x0060: 0x0 (trapno) 0x0064: 0x0 (err) 0x0068: 0x8048054 (eip) 0x006c: 0x73 (cs) 0x0070: 0x0 (eflags) 0x0074: 0x0 (esp_at_signal) 0x0078: 0x7b (ss) 0x007c: 0x0 (fpstate) >>> r = ROP(e, 0x8048000) >>> r.funcname(1, 2) >>> r.funcname(3) >>> r.execve(4, 5, 6) >>> print r.dump() 0x8048000: 0x8049288 (funcname) 0x8048004: 0x8048057 (add esp, 0x10; ret) 0x8048008: 0x1 0x804800c: 0x2 0x8048010: '' 0x8048014: '' 0x8048018: 0x8049288 (funcname) 0x804801c: 0x804805b (pop eax; ret) 0x8048020: 0x3 0x8048024: 0x804805b (pop eax; ret) 0x8048028: 0x77 0x804802c: 0x8048054 (int 0x80) 0x8048030: 0x0 (gs) 0x8048034: 0x0 (fs) 0x8048038: 0x0 (es) 0x804803c: 0x0 (ds) 0x8048040: 0x0 (edi) 0x8048044: 0x0 (esi) 0x8048048: 0x0 (ebp) 0x804804c: 0x8048080 (esp) 0x8048050: 0x4 (ebx) 0x8048054: 0x6 (edx) 0x8048058: 0x5 (ecx) 0x804805c: 0xb (eax) 0x8048060: 0x0 (trapno) 0x8048064: 0x0 (err) 0x8048068: 0x8048054 (eip) 0x804806c: 0x73 (cs) 0x8048070: 0x0 (eflags) 0x8048074: 0x0 (esp_at_signal) 0x8048078: 0x7b (ss) 0x804807c: 0x0 (fpstate) ## May 12, 2015 ### Chau Dang Nguyen(Core Python) #### About me Greeting, everybody. My name is Chau (pronounce as /chow/). You might notice that my nickname is kinggreedy, with "greedy" from "Greedy Algorithm". Why "king" then ? Well, I love the greedy algorithm so much that I will make sure no one will ever greedier than me, That's why it's "king". (Hint: I did choose my GSoC project based on Greedy Algorithm :P) About my project This summer, I'm working for CorePython. My project is "Create a RESTful API for Roundup". Roundup is an issue tracker for Python. Currently there is no official web API to get the data from Roundup. This project aims to implement the RESTful API for Roundup, thus enable the opportunities for developers to create and develop new services and features, which can easily keep track of the data for users. Additionally, I have to develop some basic tools that use the API, for example, improved stats page or dashboard. And my ultimate aim is to contribute it upstream. ### Vivek Jain(pgmpy) #### GSoC Selection Got selected for GSoC'15. Feeling awesome. It will be an awesome and challenging summer. It would be a great learning experience. Thanks to all the members for selecting me and keeping confidence in me. Been busy with exams and travelling back to home…so couldn’t post about it earlier…. :) My project for GSoC'15 is Parsing from and writing to standard PGM file formats. Pgmpy is a python library for creation, Manipulation and implementation of Probabilistic graph models.There are various standard file formats for representing PGM data. PGM data basically consists of graph,a table corresponding to each node and a few other attributes of a graph. Pgmpy needs functionality to read networks from and write networks to these standard file formats.Currently pgmpy supports 4 file formats ProbModelXML, PomDPX, XMLBIF and XMLBeliefNetwork file formats.The project aims to improve the existing implementation of the file formats and implement a UAI file format during the course of GSoC.This way models can be specified in a uniform file format and readily converted to bayesian or markov model objects. We recently had a meeting with our mentors to discuss the plan ahead.As a part of community bonding period i am reading about the pyparsing module which will be used to parse the UAI file format. Also i am planning to prepare an abstract grammar for the UAI format which will help me later during the implementation. ## May 11, 2015 ### Chau Dang Nguyen(Core Python) #### Week -2: Getting my workspace setup This week, I have my workspace setup on my computer. Usually, this part is very boring and it always a nightmare for me. We all want to get into the fun soon™ and no one wants to spend to much time trying to open the box. But now, I got the solution. My mentors are using Vagrant as instructed here https://wiki.python.org/moin/TrackerDevelopment (they will move to Docker soon). So I just needed to install Vagrant, VirtualBox and type in "vagrant up", and ta dah, my VM got installed and configured. The guest home directory is synchronized with host directory, so I can use PyCharm (yes I'm Windows-user) to develop without having to make a deployment script. Talking about performance, the VM is very light, it just takes 400MB RAM away and nothing more. So now I can spend my time tinkering with Roundup. ### Yask Srivastava(MoinMoin) #### Flipping bits not burgers this summer ! About my GSOC project: : I will be working for MoinMoin wiki which comes under Python Software Foundation. MoinMoin is a high performance open source Wiki Engine that is written in Python. It provides a feature-rich wiki that has a large community of users, and is very customizable and easy to use. MoinMoin is used by several organizations for their public wikis, such as Ubuntu, Apache, FreeBSD, and more. Proposal Detailed Description/Timeline : • Consistency is the most crucial thing in UI/UX design. Currently we lack generic design / color guidelines for moin2. During my initial bonding time period by mutual discussion I will prepare a generic design color for various items: Template: General colors in moin moin: Theme Basic: Color : Hex-code Blue : #______ Light Blue : #_____ Purple : #______ Grey : #_________ Green : #_______ … etc And then general design guidelines with the respective colors: Pressed Button : #______ Unpressed Button: #_______ Red is used to call attention when there are problems or something needs your attention. It should be used sparingly to retain effect: • Errors/Warnings • Notifications …….. etc Blue is used for information Design. It represents action-ability & continuity/ keep going. • Actionable items such as links • Buttons: Emphasized Actions …….. etc These guidelines can be used for future references and will be applied consistently across all moin wiki contents. Improvements in basic/modern themes • Currently we have 2 themes. Modern and Basic. Basic was done more recently ~ 2013 and is implemented by using bootstrap Modern theme was made much earlier and has hard coded css. Making modern theme dependend on bootstrap will have lots of advantages because of its great grid system and the basic styling it provides for most html elements. This will also help in making wiki look great in mobile devices and enable wiki-content for more print-friendly pages. Print classes are now in documents: http://getbootstrap.com/css/#responsive-utilities-print Similar to the regular responsive classes, use these for toggling content for print. Class Browser Print ---------------------------------------- .visible-print Hidden Visible .hidden-print Visible Hidden I will implement our modern theme design by using Bootstrap. Similar simple file structure will be made as made for basic theme. moin-2.0/MoinMoin/themes/basic/ [ templates & css] moin-2.0/MoinMoin/themes/modern -> [/templates & /css] All the current design from our modern theme will be re-implemented in grid-system. Bootstap html elements would be used instead. These include: 1. Navigation bar 2.Input boxes for search. 3.Forms 4.Breadcrumbs..etc Color schemes will be first discussed with mentors and then applied. This will enable great responsive layout . • Bootstrap for Basic theme will be updated to the latest version Proper css settings for different element containers [moin-footer, moin-container] If we switch tabs [in user settings] (basic&modern) themes , the size of div .moin-content changes as the content inside the form changes. This results in following unusual behavior: Following things need to be observed: *The footer jumps on changing tabs in User Setting page. *Fix Form content This can be fixed by giving a fixed relative percentage of height/width of moin-content and moin-footer to their parent div’s height-width. Changes need to implemented in both basic.css and modern.css Also I noticed we have constructed form using <td> <tl>[old and complicated way of making form]. Bootstrap forms [http://getbootstrap.com/css/#forms] needs to be implemented for every current form we have ,which will get rid of <td> <tl> form design. Example This is the typical way we have created forms moin-2.0/MoinMoin/templates/usersettings_forms.html  {% macro password(form) %} {{ gen.form.open(form, method="post", action=url_for('frontend.usersettings')) }} {{ forms.render_errors(form) }} <dl> {{ forms.render(form['password_current']) }} {{ forms.render(form['password1']) }} {{ forms.render(form['password2']) }} </dl> {{ forms.render_submit(form, 'part', 'password') }} {{ gen.form.close() }} {% endmacro %}  Without using <dl> </dl> better forms will be implemented with bootstrap: So that the html rendering of forms [html-source] look like this instead: {{ gen.form.open(form, method="post", action=url_for('frontend.usersettings')),class='form-horizontal' }} <div class="form-group"> <label for="inputEmail3" class="col-sm-2 control-label">Email</label> <div class="col-sm-10"> <!-- forms.render(form['password1'] --> <input type="email" class="form-control" id="inputEmail3" placeholder="Email"> </div> </div> ............................. We are using flatland + templates to render forms. The instructions for rendering customized form input will be followed from here: http://flatland.readthedocs.org/en/latest/markup.html.  • In the front page [modern theme] we display the sub-menu items in the footer as well. Exact same options at 2 places seem unnecessary and is inconsistent with the design of other pages[ this sub menu in footer doesn’t exist in any other page such as +modify’. Thus it should be removed. *Meta data page can be more interactive by creating hyperlinks to the mentioned itemids. ### Implement new design for quick links & left vertical menubar [Basic and Modern theme] *Fix side bar vertical design menu Sub menu of items in menu appear in a completely different block below the parent menu block. Which is confusing as it doesn’t show visual relation between menu items and it’s sub menu items. *Implement scrollable quick links inside menu box with search box. If a user creates lots of quick links , the quick links get stacked up one on top of the other linearly which gets messy and difficult to maintain. I will implement the following container for quick links inside our left -vertical menu box. The quick links will be auto sorted alphabetically. I will also implement a small search box inside this container to quickly search inside the quick links box. *Display items in menu according to their categories. [Main navigation menus and Quicklinks ] for modern theme As of now the newly created quicklinks are linearly added in the main nav-bar which becomes confusing[Note the 2 home tabs in the image]. This is a better design • If user creates too many quicklinks to be displayed in nav-bar [maybe more than 7-8] . We can display the rest of them in a similar way the excess bookmarks are shown in Google Chrome browser. • Taking inspiration from chrome browser I will develop a similar menu bar for our excessive quicklinks by jQuery. ### Better error-notifications *Fix confusing notifications As it can be observed from the above screenshot , improper data entry resulted in 2 ambiguous notifications. Changes saved and password can’t be blank. The top notification comes from our validation function in views.py which should be disabled to send save notifications while all the form entries aren’t properly filled. • Error messages in forms at various places could be made more visual appealing Again 2 notifications need a fix. Position of password field changes on incorrect input which is defective.Will be fixed by proper implementing bootstrap forms. • Implement tool-tip error messages across all the forms. This will also fix various design breaking issue including some posted on moin moin issue tracker: “https://bitbucket.org/thomaswaldmann/moin-2.0/issue/501/error-message-thrown-due-to-wrong-open-id&#8221; Files responsible for rendering error notifications: forms.html and views.py. The errors in templates are rendered by /moin-2.0/MoinMoin/templates/forms.html template: code snippet: 8 {% macro render_errors(field) %} 9 {% if field.errors %} 10: <ul class="moin-error"> 11 {% for error in field.errors %} 12 <li>{{ error }}</li> ———————> Needs to be implemented by tool-tips. • Implement addition of quick links inside Navigation tab in user setting. Currently there is nothing inside navigation tab. ### Tweak UI These issues will be fixed: • Download doesn’t start on clicking save button. Download link will be given directly from this button. *Show data size unit [Kb/Mb] in +history page. Currently it doesn’t show the unit of size’s data. Note the size column in the screenshot. • Proper color schemes will be applied to all the elements [colors will be decided by discussion with mentors.] ### Editor’s Improvements: *Implement the toolbar by adding javascript toolbar widget in +modify file for improving productivity while creating/modifying wiki. *Another crucial feature missing in our editor is that it doesn’t behave as an editor.As in the tab key in our editor will jump the selection to another element in the page instead of indenting [as expected in a markdown/other kind of editors]. Another important feature lacking in our editor is it doesn’t make nested indents automatically. It is a huge pain while making nested lists in markdown as indentation crucial for proper markdown. *I’ll write a javascript for “/MoinMoin/templates/modify.html” file to enable auto indentation while making nested lists. *We can fix tab key problem by adding event listener to our text field with a function which checks if the pressed key is tab , if yes we can use preventDefault() function to disable tab the usual behavior of tab key and instead put 4 plank spaces. [Code javascript] myInput.addEventListener(‘keydown',this.keyHandler,false); function keyHandler(env) { var TABKEY = 9; if(env.keyCode == TABKEY) { this.value += " "; if(env.preventDefault) { env.preventDefault(); } } return false; } *Everyone loves emoji and Twitter recently open sourced their emojis. We can use them for our editor [http://twitter.github.io/twemoji/]. This will also be integrated to our toolbar for wiki-editor for easy access. MoinMoin wiki. *Display warning message [that their ip address is being recorded] when a non-logged in user is editing the wiki ### Improvements in +modify page: • In +modify page we give too many options for users to modify. Ie : modify source page , meta-data [ Too many input fields in one page which can be overwhelming for a new user] • Since we already have meta option from the default main nav-bar menu. We should implement an edit button in the meta page. Currently this page only shows the uneditable meta data without any edit option. • Now when we have implemented meta page for display of meta data and editing of meta data, we can make our +modify page cleaner by removing edit meta-data fields. We can still mention in that page with href Edit meta-data which takes them to our new meta data page. • So now to edit wiki source -> modify and to edit meta data -> meta->edit. *While modifying wiki-content the submit button says ‘Ok’. It should instead say Save changes’. * The current +modify page needs visual improvements and element repositioning. As in the comment section should be below the editor. This is my mock up for +modify page. [After cleaning the meta-data edit fields] *Clicking help while editing wiki opens the link in the same tab. This can result in unsaved changes. It should instead open the help page in another small browser window. * Implement a more visual appealing tags. We can use this open source bootstrap plugin : http://timschlechter.github.io/bootstrap-tagsinput/examples/ if permitted by the mentors. If not I can implement similar thing using: http://getbootstrap.com/components/#labels and javascript. ### Implement better error handling: • Currently on uploading an incompatible file results in traceback error. Need better exception handling and display appropriate message after the process [File saved successfully / error for reason ______ ] ### Schedule of Deliverables My summer vacation will start from 10th or 15th June (approximately) to 10th August (as opposed to GSoC’s May 19-Aug 18) .This period is slightly shorter , thus I pan to take 10 day leave[10th August – 20 August] from college as official classes starts after a week only). Also I plan to start working early. In whatever time I can get [during weekends] , in late April and early May [End-term exams starts from late May] I will try to make up for the time. I also plan to get used to the codebase during this semester. I intend to keep in touch with the mentors throughout the week and ensure that I’m going in the right direction. 15th June – 24th June : #Improvements in basic/modern themes 25th June – 5th July: # Better error-notifications 6th July – 22nd July: #Implement new design for quick links & left vertical menubar [Basic and Modern theme] 22nd July:5th August: #Editors improvements 5th August: 10 August: #Improvements in +modify page 10th August: 20th August: # Tweak UIs & Better error handling ## May 10, 2015 ### Christof Angermueller(Theano) #### Welcome to GSoC2015! I am very proud to announce that my Google Summer of Code (GSoC) proposal ‘Theano: Interactive visualization of Computational Graphs’ was successful, which gives me the chance to enhance Theano’s visualization features this summer! Theano is a popular library for defining and automatically differentiating mathematical expression. It is widely used to implement complex machine learning models such as deep neural networks, which are compiled into a graph structure with interconnected nodes. Currently, Theano only supports static text and image output, which makes it hard to debug and analyze complex models such as a deep neural network with many layers. During Google Summer of Code, I will extend Theano by a module to interactively visualize and analyze complex computation graphs. The module will be based on the d3.j3 javascript library and allow to generate HTMLs documents, which can be opened in any web browser. It will allow to dynamically a) arrange, collapse and expand, as well to edit nodes, b) panning and zooming to different regions, and c) highlighting additional information via mouseover events. An additional goal is implementing an IPython %magic to directly visualize graphs in an IPython notebook! GSoC2015 will take place between 25 May and 31 August, and will contain a mid-term and end-term evaluation. I will be mentored by Frédéric Bastien, and regularly post about my progress. Looking forward to a great Summer! The post Welcome to GSoC2015! appeared first on Christof Angermueller. ### Sumith(SymPy) #### Google Summer Of Code with SymPy Hi there! The Google Summer of Code results are out and I have been selected. As mentioned in a previous post, my project of Implementing polynomial module in CSymPy has been selected and I get to work with SymPy under Python Software Foundation.. ### The excitement I really thank the community for accepting a freshman do a project. The community over at SymPy is so helpful and the working environment real fun that has motivated me to take up this project. My mentors are Ondřej Čertík himself and Sushant Hiray, who is a previous GSoC-cer at SymEngine(then CSymPy). I'd also like to congratulate Shivam, Isuru, Abinash for getting projects under SymEngine and all others who have been selected under SymPy and Python Software Foundations in general. I am excited for the summer to follow and the great learning experience ahead. ### SymPy and SymEngine SymPy is a Python library for symbolic mathematics. It aims to become a full-featured Computer Algebra System (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymEngine is a fast symbolic manipulation library, written in C++. Plans to be the fast swappable SymPy core and also a CAS on it's own. We are currently writing Ruby and Julia wrappers for it too. ### The Project SymEngine currently lacks a polynomial module which is very essential in achieving SymEngine's goal of being the fastest CAS ever. Having a polynomial module is a core concern and implementing a fast module also help in achieving a fast series module and other modules. Once implemented, SymEngine will be more capable as a fast optional SymPy core which we think is good to ship before 1.0 and at the same time SymEngine becomes a powerful CAS on it's own. You can find the proposal here for more detailed description of the project. Looking forward to a great summer and times to follow. ## May 09, 2015 ### Shivam Vats(SymPy) #### Beginning of GSoC 2015 In an earlier post I had talked about my application for Google Summer of Code, 2015. I am extremely glad to inform that my project for CSymPy (now called SymEngine) and SymPy has been selected under Python Software Foundation. Ondrej Certik and Thilina Rathnayake will be mentoring me during the course of my project. Ondrej started the SymPy library and is currently leading the development of SymEngine, while Thilina is a two time GSoC'er with SymPy and SymEngine and has research interests in Symbolic computation. Needless to say, I am extremely lucky to work under such talented people. My project involves writing polynomial based series expansion modules for SymEngine and SymPy. I have already had very productive discussions with my mentors and hope to do some good work. Looking forward to a great summer! Cheers! ## May 08, 2015 ### Vipul Sharma(MoinMoin) #### How I started contributing to Open Source I started my open source journey from December' 2014. I am quite fascinated with the concept of open source, I love contributing to code and it feels great to see my contribution doing something good for the community. You learn everyday, I've learned that it doesn't matter how big is your contribution or how much code you write, you'll still learn a lot. I met with some great people on IRC who were very generous and helped me a lot and even replied to my silliest doubts. In December' 2014, I searched for open source projects where I can use my Python/Django skills. Then I heard about www.openhatch.org. I contacted with the mentors through IRC and they were really helpful. I deployed the development version of the website (https://github.com/openhatch/oh-mainline/) and started to solve some easy bugs and write some documentations. After few easy tasks, I thought of writing some bigger feature. I wrote a tutorial on "Using the command line shell" (http://openhatch.org/missions/shell/about). I got great support and guidance from the openhatch community and I am really thankful for their help in introducing me and getting me started to open source. One of my college senior (GSoCer 2014) told me about Google Summer of Code, and asked me to search for an organization which works on Python and submit a proposal for GSoC 2015. I looked at PSF's summer of code wiki page: https://wiki.python.org/moin/SummerOfCode/2015 where I found MoinMoin pretty interesting (MoinMoin runs that wiki page :) I looked at its ideas page: https://moinmo.in/GoogleSoc2015/InitialProjectIdeas and I got even more interested to contribute to its code. I thought, I can use my knowledge in Python to contribute to its code. I read its code, read the documentation, deployed the development environment and started to look to fix some easy bugs. I discussed about the code with the mentors and they were really helpful. I wrote a proposal to "Improve the Issue Tracker" of MoinMoin-2.0. I discussed with my mentors about what can be done to improve the issue tracker, new features, UI/UX improvements etc. and I am grateful for their guidance. I am really happy tell that my proposal got selected for GSoC, 2015 :) And this was all because of the mentors who helped me in getting me started with opensource development and helped me to understand moin-2.0. Though I am not very proficient in Python but, I will try hard to work out every task. Its just a start and I will work hard to implement the project idea with the guidance of my mentors :) Currently, its the "Community Bonding" period and I am discussing with my mentors about the implementation of the issue tracker, reading the documentation, understanding the technology stack used, understanding the code to be worked upon and find out what more can be done. I will post more about the "community bonding" phase in my next blog post. ### Luca Puggini(statsmodels) #### Start of GSoC ### Start of GSoC ### GAM toolbox for StatsModels My love for open sources projects leads me to the GSoC project. My project is for the Python software foundation and in particular for the Statsmodels library. The aim of the project is to develop a GAM toolbox similar to the one available in the R package MGCV. At the moment we are trying to understand if the splines implementation that is available with the PASTY library is powerful enough for our needs. #### ERROR: Timed out waiting for data to be extracted. If the problem persists, try simplifying your search patterns. ### Prakhar Joshi(Plone) #### How it started .. In this blog I will share my experience how I approached for GSOC and my experience with plone in past few months. I had a great time while working with plone and I am learning a lot of new technologies with plone. People here in plone are really helpful and supportive and helped me a lot during my interaction with them. The journey was too good and full of surprises, success and defeat at different part of the journey. Journey Starts... After getting results for last year I have decided that I will give a try for GSOC next year, but at that point of time I had no Idea what I will do, How I will do and how will I proceed. Then During September I started looking for the organization that have been selected last year for the GSOC on the melange site. At first things were alien to me and I rarely understand anything on that site. I started reading about different organizations and and their projects. I mainly concentrated on organizations that involves python in it. Finally during the end of October I started reading about plone. Plone has a very huge code base and Contains a lot of its own coding conventions and other rules. Its always fun working with plone as people on IRC(Instant relay chat) are too helpful and they helped me a lot in understanding the work flow of plone. Finally After 1 month I was able to install the core development environment into my local and I am in condition to start developing things for plone. The Project Idea After installation now the main problem is to find the problem for which I can work on to find its solution. So I started looking through the last few years GSOC ideas to get the idea of which type of projects the organization expected and the level of projects. While I was going through these things I came to know about the ticket (Ticket in plone means issue or problem ) which has been proposed by Tom few months ago and he has also tried to do the changes accordingly but there were lot of constraint of test cases that made it less possible to work on it. Then I have seen a GSOC project of 2007 that tried to work on creating a Plone.transform package and then after discussing a lot about all these with plone people on irc, the final conclusion is to create a new safe_html package that will contain html filter using lxml and its own test cases and a seperate package for html filter. Now as we have decided the project and there were few ways to achieve that goal and we have decided to go this way(to create a seperate package). This was the right time to start working on the project and start learning about plone. Why We need to create a new safe_html over the existing one ? Right now we have safe_html under the plone.transform project but that safe_html contains the dependencies of CMFDefault and we need to remove that dependency to filter html. We will use lxml or html cleaner which are quite faster than the current filter system. This will gonna help in increasing speed as well as accuracy for the html filtering. Note Now the things in the blog is related to my project so most of them are technical stuff, I tried to explain it in layman form as much as possible but can't help more on it. :P The new safe Experimental html filter After creating a new add-on we have few things to do to setup that add-on. Basically we need to work for generic setup of the transform. Generic setup of safe_html for browser. Adding browser layer of safe_html. Create generic way to add filter to the browser. Register safe_html for the browser. Also add control panel for filter to the browser. Also create interface for filter in controlpanel.py under browser. By performing all these thing we will be able to set up out safe_html for the browser. This will create interface for filter control panel and also this will register a browser layer to safe_html. After this we need to setup profile for the transform by registering browser layer and control panel under the profile module. We will also register safe_html here. We will register the profile created above for safe_html transform in the configuration.zcml file. Also configure the de registration of safe_html profile there. After that we will also configure the post install import step from safe_html. After that we will create marker interface that defines a browser layer. We will also create interface for the safe_html transform for providing users the option to customize the HTML filter. These are the basic things required for setting up the safe_html so that it will register the browser layer so that it will work for cross browser. This will also register the safe_html profile and also create the interface for the safe_html. This will also register the filter control panel that will gonna be shown on the browser. We will also have to replace the existing one with a transform of the same name, since TinyMCE and p.a.controlpanel address safe_html by it’s Transform name, rather than asking for one MIME to another MIME. (safe_html = getattr(getToolByName(self, 'portal_transforms'), 'safe_html')) Automatic register and deregister for safe_html For automatically registering the safe_html package for the developer we will write functions for register and deregister of the product. We will register safe_html on the installation of the add on so we will just write the registration of safe_html under the “post_install” function. We will also take care for the deregistration of the old safe_html from the portal_transform. So when any developer just install safe_html, the old safe_html will de register and the new safe_html will get registered and this will gonna work perfectly. Other people who are just using plone site will use safe_html by adding an add-on in their “@@overview-controlpanel” and add safe_html. As they will add safe_html the old safe_html from portal_transform will be deregistered and the new safe_html that add-on one will get registered. This way normal user can also use the safe_html transform. Safe_html transform Right now safe_html transform of the portal_transform is using CMFDefault dependencies. Our main aim of the project is to write transforms that are independent of CMFDefault. For this we will rewrite the whole safe_html transform again without CMFDefault. We will use lxml for filtering html (mostly) there are other options too to use for filtering html in safe_html transform like beautifulsoup, html_laundary. Improve integration of tinyMCE with our HTML filter Right now there are lot of issue related to the tinyMCE installation with the HTML filter like it ignores the HTML filter settings. We need to improve the tinyMCE integration with our own html_filter. Automated tests for the transform Write tests for the safe_html transform the proper tests for input and output of the html as written in safe_html of the portal_transform. We will test our html filter with the real life documents and will check for faults in these cases and improve our html filter.We will also develop robot tests for the same. Few More goals After discussing a lot the stretch goals with mentors the proposal can be extended by involving two more issues :- • Making tinyMCE use the safe_html filter configuration. • To create default and customized filter settings for safe_html. Making tinyMCE use the safe_html filter configuration As we will gonna integrate tinyMCE with our safe_html product, we will try to unify the filtering configuration and will try to match the filter criteria of safe_html and tinyMCE. As we know the safe_html does filtering on server side and tinyMCE works on client side and as the tinyMCE has its own filter criteria so there may be differences in the filtering for the text from safe_html and tinyMCE so we will try to configure tinyMCE according to safe_html so to match the filter criteria between them and also the filter criteria will match both at server and client. In a nutshell we can say that we will try to match the filtering configuartion of tinyMCE as on server side we will use safe_html and now we will try to configure tinyMCE to use safe_html filtering configuration so that the filtering process can be unified across client and server. To create default and customized filter settings for safe_html Here we will create a preset so that it will just change the control panel options for allowed tags, instead of having to choose tags yourself. So basically we will just toggle the control panel settings such that if user doesn't select tags for filtering then the default filter settings will be applied. So in a nutshell we can say that for our safe_html we will have a default settings to filter the safe_html and when user selects the tags then we will shift things in control panel and then the customized filter will be activated. These are the new two issues that are related to the proposal and that will gonna help in creating a better product. Though I have discussed these things on irc still there can be some modification in these things which can be handle during the development of the product. 27th April Finally after submitting proposal and also discussing that much things with the people on irc, my proposal has been accepted by Plone Foundation. Those are some of the unforgettable moments of life when I saw my name in the accepted projects list. I worked hard to get through it and finally got accepted for the google summer of code 2015. I will really like to continue the work with plone after GSOC also. I have learnt a lot of new technologies while working with plone and will learn a lot more new technologies this summer. That was really a nice experience for me but it was just the beginning, a lot more yet to come, a lot more to learn, a lot more to code and a lot more to enjoy. This was the basic overview of the work flow of my work to be done this summer. Hope you enjoyed reading that blog. happy Coding!! Cheers, ### Himanshu Mishra(NetworkX) #### Google Summer of Code 2015 So, the results are out and my proposal 'NetworkX : Implementing Add-on System' is accepted. So, here it is, my first summer of college and I'll be coding. What a bliss! For those who don't know, NetworkX is a python library which participated in Google Summer of Code 2015 as a sub-org under Python Software Foundation. Coding starts from May 25. I'll have to get familiar with the community first. And gather all the knowledge that I'll need during the summer. For it, I've been reading an O'Reilly book, Python in a Nutshell. Part V of the book 'Extending and Embedding' is very helpful for me. Also, I've found SciPy tutorials very useful. I've recently watched the one for Cython. I wish NetworkX will have one of those in upcoming years. That's all for now. Cheers! ### Yue Liu(pwntools) #### GSOC2015 Community Bonding Week 04 week sync 04 ## Last week: Automatic building rop chain for x86/x64/arm elf. Using topological sorting to solve two issues: Others: • Do a survey on Amoco project https://github.com/bdcht/amoco. • topological sorting see __build_top_sort() function. • Rewirte X64/ARM ROP chain building methods, see __build_x64(), __build_arm() ## Next week: • Need extract gadgets finding to a python Class. • Try to using Amoco instead of BARF project. • Merge to new rop module. ## May 07, 2015 ### Yask Srivastava(MoinMoin) #### #GSoC ’15 Started On 27th April the GSoC results were announced and I remember my self eagerly waiting for the results on Google-melangne website. At 12:30 (IST) the results were supposed to be out but the website seemed to have crashed and I was super nervous. But then I recieved a beautiful mail with a beautiful tinker-bell notification sound and voila ! I shouted since the email had GSoC keyword but.. It turned out to be google calendar notification :/ But then another tinker-bell sound rang and voila!! I smiled and quickly told my brother and mom about it :). I was stoked. I then connected to my organization channal in IRC and posted a thank you message. I was always fascinated by opensource comminity. People were developing softwares not for fame or money but because it is their hobby , their passion! Some of the most amazing top technical companies that exist today started out by using opensource softwares and still use and endorse them. Linux kernel started by a kid Linus Torvalds started out as a hobby project where He didn’t like the idea of UNIX becoming propritery software and wrote with the help of the most amazing community a Unix-Like kernel today popularly known as Linux. I can’t overstate the value of this opensource software. It powers the most powerfull computers, most of the web servers and most of the mobile phones (Android). This was the motivation. I really wanted to get involved in the opensource community but in the beginning it all felt very overwhealming. But luckily my friend Akshay Katyal who is a mozilla rep member told me about the code-sprint they were organizing in India. And thats where the real magic happened. I was introduced to many opensource enthusiasts. Some had even travelled from outside Delhi just to code for 24 hours straight. Such is their passion. I was confused in the beginning which project to choose from. There were so many! I was sitting next to Akshay Aurora who happened to be a Web developer enthusiast and suggested me to work on a Django project called MDN. And thats when I overcame my fear. I fixed 2 bugs that day and I will always remember my first patch ( a regex error). Both of my patches were merged to the main branch and I was happy to see my code running in a popular project used by thousands of users. And so started my opensource journey! I am super excited to be working on a project that is closly assosiated with web developement and comes under Python Software Foundation umbrella. This is the community bonding period and I am interacting with all my assigned mentors. Really glad to have Ajitesh Gupta, Bastian Blank, Saurabh Kathpalia as my mentors. I should thank Roger Hasse as he has been continously helping me ever since I joined this community. Also thanks to Thomas Waldmann for believing in me with this project. Can’t wait to start coding this summer! ### Rupak Kumar Das(SunPy) #### The journey begins… Ah…it’s been a long time since my last post! Unfortunately, I was busy with my exams and could not get any time. But now that it is over, it’s time for GSoC! Speaking of which, I have been selected for this year’s GSoC! I can’t believe it even though it has been nearly two weeks since the announcement! Let me give a complete account of it. Hang on tight – this is going to be a long post! It all started back in December when one typical evening, I was trying to find some great articles to read on the web. It happened by chance that I came upon a blog by a student who had taken part in GSoC. I had heard about this “Google Summer of Code” before on the site of Blender(an awesome Open Source program used for 3D rendering) which participates every year. I had not looked into it before so I decided to check it out. I was surprised to find this cool concept by Google with which students can contribute to Open Source development. And they get paid for it too! I was interested in participating but there was a big problem – I have used many Open Source softwares but have never contributed before! So I looked into all the details of this program(even contacted a few previous “GSoC’ers”) and came up with three prerequisites to help me get started – 1. Learn Git 2. Select an organisation that participates in GSoC and get to know more about it 3. Try to contribute in any way possible Thus began my journey with GSoC. I quickly learnt the basics of Git and decided upon an organisation. I was drawn to the PSF(Python Software Foundation) as I had started learning “THE awesomest language ever” that is Python during summer last year and I was blown away by its power and simplicity. So I wanted to contribute to the PSF but now I had to choose a sub-org. I have been an astronomy “fan” since I was an 8-year old child and always had a deep interest in the mysteries of the universe(though later my interests shifted from astronomy to physics). So after looking at a few astronomy related sub-orgs, I selected SunPy and TARDIS SN(the software not the time machine!). Everything was going well – I had little knowledge about the technical part of SunPy but I was learning new stuff and trying to contribute as much as I could. Soon the application period started and I started writing my proposals after researching the project ideas. I looked at a few of my fellow students’ proposals and was amazed by their quality. With a heavy heart, I submitted my proposals, thinking I had no chance and all was in vain. The days passed quickly and soon it was time for the announcement. I was very pessimistic about my chances so I did not stay awake till 00:30 IST(the time at which the results were to be announced). Next morning, I decided not to check till later in the evening but I gave in to the suspense. And there it was! My name on the Accepted Students List for the org SunPy! I was so astonished and could not believe my luck(and my eyes) that I shut down my computer and checked again after an hour to see if it was still there! Ok, I am going to stop here as I have been going on for a while now. In the next post I will be talking about SunPy and my project. ### Sahil Shekhawat(PyDy) #### GSoC 2015 Community Bonding Week 01 The goal of PyDy is to have a modular framework and eventually a physics abstraction layer which utilizes a variety of backends. This project will advance PyDy towards it by improving the abstraction layer and creating GUI tools for users. I am calling this module "PyDy-InGen". :) This week I had a meeting with Tarun, one of my mentors and we decided the I should work on a more detailed timeline and list down the exact interface for the API. #### Building Heroku inspired PaaS for our Data center I have always been fascinated by the idea of working on cloud services and now I have the opportunity. Our institute have a very good data center but there is a problem with it. Most of the students use it for deploying web apps but the people managing it don't have much experience or knowledge of it. As a consequence whenever someone get the permission to deploy a web apps, he/she is alloted a VM with the desired and it costs a ton of resources. For example, for a simple RoR app, one must need atleast 2 gigs of RAM just for the basic services like Ruby. Now imagine this for about 20 different project. Every project running in it own VMs with its services. ### Chienli Ma(Theano) #### First Look Into Theano Core It’s been a week and a half since google-melange anounced the accepted student for google summer of code. I was luckcy enough to be accepted by Theano – sub-foundation of Python organization, to help them add new features: Allow user to modify compiled function. As scheduled, from 27th April to 23rd May is the community bounding period. During these days I ought to get familiar with Theano core code and Theano dev community. Before the application started, I’ve dived into Theano cored and got the basic idea of what I’am going to do. However, to make the idea more clear and to fit the requirement that student should post every week about their progress. I decide to write two post about Theano core – about how theano work. This is the first post. This post will talk about what is a function? And how a function is generate. ## How a function is generated? Just recall how we compiled a function func = theano.function( [ inputs ], output ), we can know that we should start out journey from method function(), which locates in theano/compile/founction.py. In method function(), after some data verification, it will call orig_func() or pfunc() which return a function that user will get. Since pfunc() will also call orig_func(), we are going to look into pfunc() first. ### pfunc.py pfunc() have two major tasks: • Transfer input_variable into In() instances. So does shared_variable. ( In function graph, SharedVariabls are treated as input, updates are treated as output ). • Rebuild computational graph using updates and inputs, transform output into Out instances. ### orig_func(): Now it time for a look into orig_func(). orig_func() will again makes sure that inputs and outputs are transformed into In an Out. And then it will use create method in FunctionMaker to make a function, which will be return. ### FunctionMaker: FunctionMaker.__init()__ is where fgraph is extracted and optimized. FuncitonMaker.create() is where function will be compiled and linked. In fact, FunctionMaker.linker.make_thunk() is where function is linked. ## What’s theano.function? Each function is a callable object. theano.function is not a python function. Instead, it a class with method __call__(). Every funciton stores its own fgraph, maker, storages and many other configurations. However, the core of a function is a function fn() returned by linker.make_thunk(). Every time a function is called, it will first verify the input data, and then call self.fn() to get output values. Now I know how is a function borned. Also, I know that to complete my missions, I need to focus more on FunctionMaker and Function rather that orig_func() and pfunc(). However, there still exist some question, such as: What does make_thunk() do? and What is In(), Out() and container? In the next post, I will have a look at this and other relative data structures. ### Siddhant Shrivastava(ERAS Project) #### GSoC '15 - About my Project Second Post in the GSoC 2015 series. This post is intended to explain my project proposal. The project proposal that I submitted can be found here. to be continued... #### GSoC '15 - About my Project Second Post in the GSoC 2015 series. This post is intended to explain my project proposal. The project proposal that I submitted can be found here. to be continued... ## May 06, 2015 ### Mridul Seth(NetworkX) #### GSoC 2015 – Python Software Foundation: NetworkX I have been accepted as a student to work on NetworkX for summer 2015. Yay! I will be updating this blog regarding the progress made over the summer. :D ### Abraham de Jesus Escalante Avalos(SciPy) #### My GSoC experience Hello all, My name is Abraham Escalante and I'm a mexican software engineer. The purpose of this blog is to relate my experiences and motivations to participate in the 2015 Google Summer of Code. I am not much of a blogger (in fact, this is my first blog entry ever) but if you got here, then chances are you are interested in either the GSoC, the personal experience of a GSoCer or maybe we have a relationship of some sort and you have a personal interest (I'm looking at you Hélène). Either way, I will do my best to walk you through my experience with the hope that this may turn out to be useful for someone in the future, be it to help you get into the GSoC programme or just to get to know me a little better if you find that interesting enough. I have some catching up to do because this journey started for me several months ago. The list of selected student proposals has already been published (**spoiler alert** I got selected) and the coding period will start in about three weeks time but for now I just wanted to write a first entry to get the ball rolling and so you get an idea of what you can expect, should you choose to continue reading these blog entries. I will begin my storytelling soon. Cheers, Abraham. ### Aron Barreira Bordin(Kivy) #### New menu Hello everyone! KD will have some new features, so I'm going to adjust the menu. I just draw some of my ideas, it can change well I start the development. This new menu will help us to handle new build options, as Buildozer and Hanga, with some Run profiles. You can take a look in this photo: (Check the original image here ) I read a lot about Kivy these weeks, so I'll be able to start my proposal development soon :) Keep connected to know about the development process. See you soon. Aron Bordin. ## May 05, 2015 ### Prakhar Joshi(Plone) #### Nightmare Converted to Dream Finally after all Hard work and patience finally I got selected for Google summer of code 2015 under Plone Foundation. This is the introductory blog. ## What is the purpose of this blog ? This blog will contain all summary of all the work done during my project and also the new things I will learn during that period. ## Why a separate blog ? A separate blog is a good idea to keep the track of the work that has been done during the project and my old blog contains few things that are not related to this project so to keep the project recored neat and clean this is the best way. Hope you enjoy this blog. I will try to share all the new technologies I will learn during this summer same as I did in my other blog. ### Zubin Mithra(pwntools) #### Updates upto 5th May 2015 - SROP support added in! So far I've been working on adding in SROP support to binjitsu for the x86 and x64 platforms and its been merged with the main branch. You can see the code here! If you wish to use binjitsu to create an SROP frame, you can now simply do something along the lines of :- >>> context.arch = "amd64" >>> s = SigreturnFrame(arch="amd64") >>> assert len(frame) == 248 >>> s.set_regvalue("rax", 0xa) >>> s.set_regvalue("rdi", 0x00601000) >>> s.set_regvalue("rsi", 0x1000) >>> s.set_regvalue("rdx", 0x7) >>> frame = s.get_frame() We hope you find this functionality useful! Another little something I've been working on is an interesting idea suggested by ebeip90; it is to integrate SROP into rop.py. The idea would be along the lines of whats described here. An excerpt from the github issue link : "In the event that read is not an exported symbol in any of the available libraries (e.g. if libc is not provided) but a syscall gadget is available, it should transparently switch to SROP without the user knowing." Its a WIP and you can view the current state at the pull request I've created here. Cheers! ## May 04, 2015 ### Rashid Khan(GNS3) #### Community Bonding Period Hey! The community bonding for Google Summer of Code is going till May 25th, 2015. In the community bonding period, I have decided to get a good idea about the GNS3-Server Code, study the REST API provied by the server and come up with a mock-up UI of how the web application is going to look like. I have started working from bottom up the list and decided that I will develop the mock-up UI. I am planning to have a grid in about 70% of the screen, the grid will act as the main drawing area with x-y coordinate system. Networking devices can be dragged from the panel and dropped on to the drawing panel. I have seen the grid being impelemented by various web applications, and need to look deeper on how to implement it. My end exams have gotten over, which means I will be able to devote more energy and time to the project. Once, I have the mock-ups ready, I will be posting them on the blog as well for any suggestions from all the awesome people! One of the best thing about the GNS3 community is the Jungle. Anything related to networks, programming, architecture, etc can be posted. The community is 275,000 people strong, which means it basically has all the who's who of networking industry ;). GNS3 has started weekly development reports which is good. It sure does put a little pressure on the developer, but in the end helps the community and the developers. You can follow the Developement Reports for the latest month. Finally, I worked on my personal website which was pending from a quite a long time. You can view the site here. Last weekend was constructive for getting things started. ### Yask Srivastava(MoinMoin) #### self.note(mysql): As the tittle says, this post is intended as a self note containing basic sql commands. For the table: +-------------+------------+------------+ | name | species | birthdate | +=============+============+============+ | Andrea | alpaca | 2001-01-16 | | Bruno | alpaca | 2004-09-23 | | Charlie | alpaca | 2004-09-23 | | Della | alpaca | 2006-01-09 | | Emma | alpaca | 2013-03-16 | | Fred | brown bear | 1993-05-02 | | George | brown bear | 1997-06-24 | | Molly | brown bear | 1981-10-17 | | Eliezer | camel | 1971-03-08 | To list total number of animals of each species : 1. group by species ( Aggregate all the rows with same species) 2. count(*) as num (Count the length of all agregatted rows). select name, count(*) as num from animals group by species;  | species | num | +============+=====+ | alpaca | 5 | | brown bear | 3 | | camel | 3 | | dingo | 3 | | echidna | 1 | | ferret | 5 | | gorilla | 9 | To list oldest animal from each species select species, min(birthdate) from animals groupby species;  Insertion :  insert into animals(name,species,birthdate) values ('kid','opossum','2014-02-01'); ### Joins Psedo code for join select table_name.column_name from table1 , table2 where table1.name = table2.name …. ;  If we had another table called diet | species | food | +============+===========+ | alpaca | plants | | brown bear | fish | | brown bear | meat | | brown bear | plants | | camel | plants | To list individual animals eating food: select animals.name, diet.food from animals, diet where animals.species = diet.species;  +-------------+-----------+ | name | food | +=============+===========+ | Andrea | plants | | Bruno | plants | | Charlie | plants | | Della | plants | | Emma | plants | | Fred | fish |  You can’t use where condition with count(*) as num. Use where instead. <3 : Find animal which eats only one food: 1. Aggregate by animals eating same food 2. Count them 3. Use having condition select diet.food, count(*) as num from diet,animals where diet.species = animals.species group by diet.food having num =1  Will be updated with more stuff soon, sleepy :P ### Sahil Shekhawat(PyDy) #### Build a blog with Jekyll and Github pages I recently migrated my blog from Wordpress to Jekyll, a fantastic website generator that's designed for building minimal, static blog posts to be hosted on Github pages. Firstly, I wanted to use Ghost but then simplicity of Jekyll's theming layer and writing workflow had me. ## May 02, 2015 ### Nikolay Mayorov(SciPy) #### GSoC 2015 with scipy Hello! My name is Nikolay and I live in Russia. This year I was chosen as the student for Google Summer of Code. I’ll be working on one of the core Python scientific libraries called scipy. My task is to improve capabilities of nonlinear least squares fitting in scipy.optimize. In this post I’m going to explain what is the least squares fitting and demonstrate how it can be done in scipy on a small problem. The least squares methods arise naturally in processing of experimental results. Imagine you are a physicist and you have conducted experiments in your lab measuring the input-output pairs: $(x_i, y_i), i = 1, \ldots, m$ From theory you know what the underlying physical law drives the phenomena and you know the relation up to the unknown parameters you wish to estimate, the measurement model can be formulated as follows: $y_i = f(x_i; \theta) + r_i,$ where $r_i$ is the measurement error and $\theta$ — the vector of unknown parameters. The simplest example might be finding conductor resistance $R$ from the measurements of voltage $U$ and current $I$ keeping in mind Ohm’s law $U = I R$. This problem is easy, because the model is linear in unknown parameter $R$, but we’ll be interested in a general case of nonlinear in $\theta$ functions $f(x_i; \theta)$. The simple idea behind least squares is to search the best value of $\theta$ as the minimizer of the sum of residuals between observations and predictions: $\hat{\theta} = \arg \min_\theta \sum_{i=1}^m (y_i - f(x_i; \theta))^2$ In case of the linear model this problem has closed form solution, in nonlinear case it is solved iteratively by linearizing $f(x_i; \theta)$ around the current estimate and finding the new estimate as the linear least squares solution. This is a very rough explanation, but I don’t want to go into details in this post. Instead I will show how one well known problem can be solved by the means of scipy, and discuss what shortcomings scipy currently has. I will consider a problem from chemical kinetics called $\alpha$-pinene isomerization. Let us denote concentrations of different reagents as $y_1, y_2, y_3, y_4, y_5$, during the chemical reaction they obey the system of linear differential equations: $\dot{y}_1 = -(\theta_1 + \theta_2) y_1 \\ \dot{y}_2 = \theta_1 y_1 \\ \dot{y}_3 = \theta_2 y_1 - (\theta_3 + \theta_4) y_3 + \theta_5 y_5 \\ \dot{y}_4 = \theta_3 y_3 \\ \dot{y}_5 = \theta_4 y_3 -\theta_5 y_5$ With the initial conditions $y_1(0) = 100, y_2(0) = y_3(0) = y_4(0) = y_5(0)=0$. We want to estimate model parameters $\theta_1 - \theta_5$ given the measurements of each concentration $y_1 - y_k$ at 8 time stamps, i.e. we have 40 measurements in total. As we can see here our model is given implicitly through the system of ODE. Theoretically we can write closed form solution of this linear system and thus find the explicit function. But as the form of solution depends on the values of coefficients $\theta$ this approach is impractical and frankly unnecessary when working with numerics. Much better approach is to compute the function by numerical integration, excellent routines for which is available in scipy. To ensure better numerical properties of nonlinear least squares minimization it is very desirable to provide the matrix of partial derivatives of our model function with respect to parameters $\theta$ called Jacobian (otherwise it will be estimated with finite differences). We can accomplish this by numerical integration as well. Let’s denote $w_{ij}(t) = \partial y_i / \partial \theta_j$, for this variables we have differentials equations: $\dot{w}_{ij} = \dfrac{\partial g_i}{\partial \theta_j}(y; \theta),$ with initial conditions $w_{ij}(0) = 0$. Here $g_i(y; \theta)$ is the right hand side function of ODE for $y_i$. Thus by integrating joint equations for all $y$ and $w$ we will find very precise values of the model function and its Jacobian — the required ingredients for running the method. And here is the code in Python: Unfortunately I had to compute the function and the Jacobian separately (i.e. doing excessive computations) because this is how scipy interface for leastsq works for now, I think I will have time to fix this too. Now let’s examine the results visually. I wish I could insert IPython notebook here, but here only the final plot: Note that I used the initial guess suggested in papers: $\theta^{(0)} = (5.84 \cdot 10^{-5}, 2.65 \cdot 10^{-5}, 1.63 \cdot 10^{-5}, 2.777 \cdot 10^{-4}, 4.61 \cdot 10^{-5})^T$. The fit looks very reasonable and natural (and in fact it’s optimal from the least squares point of view). We see that scipy is capable of solving nonlinear least squares problems quite well. What’s missing then? The first missing feature is the option to specify simple constraints on parameters in the form of upper and lower bounds. It’s a simple case, yet important: often parameters have the meaning of physical quantities which are bounded by their nature (like the mass can’t be negative, etc). It sounds simple enough to implement, but in fact it’s not that simple. The lmfit Python library handles this case by variables transformation wrapper around scipy leastsq, you can read about it here. This is a nice and reasonalbe approach providing you don’t want to modify optimization solver code. But a) it creates some overhead and b) the theoretical properties of such approach are not well studied (as far as I know), the link above contains warnings about potential problems when working with bounds. My project is devoted to creation of the new type of solver designed specifically for nonlinear least squares problems with bounds. The second problem is scalability of the current implementation to problems with the large number of measurements/parameters, which have sparse structure (sparse Jacobian matrix). Now scipy relies on wrapper of MINPACK routines implemented in Fortran very long time ago without thoughts of sparse structure support, and it’s more or less impossible to adapt it. Each iterations of MINPACK algorithm takes $O(m n^2)$ floating point operations, where $m$ is the number of measurements, $n$ — the number of parameters. I have to admit that at the moment I have a little idea where to find suitable large-scale sparse problems for benchmarking, but I’ll be working on it. Interesting to notice that in the considered chemical kinetics problem the Jacobian is in fact rather sparse (but the problem is very small, so doesn’t matter). I assume that such problems are not uncommon. I will be working on these two problems during the first half of the summer and hopefully will solve them successfully. The details will follow. ### Yask Srivastava(MoinMoin) #### self.note(Django) This post is going to be more like a self note with code snippets which I usually need for basic setup of any django application. This will be updated as I discover more. ### Basic setup of application: 1. Static files STATICFILES_DIRS = ( os.path.join(BASE_DIR,'static'), ) TEMPLATE_DIRS = ( os.path.join(BASE_DIR,'templates'), ) STATIC_URL = '/static/' Now put the static files inside each app’s template/static/app_name folder. Whenever django gets the request to render the specific template file , it will start searching in every directory (/template/static/app_name) . Referring to static files from your template {% load staticfiles %}  2. Changing default database to mysql Typical setup of creating entities DATABASES = { 'default': { 'ENGINE': 'django.db.backends.mysql', 'NAME': 'firstdb', 'USER':'root', 'PASSWORD':'password', 'HOST':'localhost', 'PORT':'3306', } }  1. Many-One (with Questions column as foreign key) from django.db import models # Create your models here. class Question(models.Model): text = models.CharField(max_length=200) date = models.DateTimeField('Date Published') def __str__(self): return self.text class Choice(models.Model): question = models.ForeignKey(Question) text = models.CharField(max_length=500) votes = models.IntegerField(default=0) def __str__(self): return self.text 2. To display entities in database nicely from /admin def __str__(self): return self.attribute_name  3. To register the model’s entities to enable their display from /admin from django.contrib import admin from polls.models import Question,Choice # Register your models here. admin.site.register(Question) admin.site.register(Choice) ### Typical way of URL mapping To map urls from each specific application insdide Django’s project inside main projects urls.py url(r'^polls/',include('polls.urls')),  now inside app’s urls.py from app_name import views urlpattern=[ url(r'^about ,views.about),..] [/code] ### Basic operations on models from app_name.models import Model_name # get all attributes Model_name.objects.all() #count them (Model_name.objects.all().count()) #sort them Model_name.objects.all().order_by('votes') #add data Model_name.objects.create(attribute=value) To contribute https://github.com/yask123/Django_cheatcodes [ I know it looks ugly there at present , I’ll format it soon ] ### Chau Dang Nguyen(Core Python) #### Week -3 : And the journey begin So finally, after a month of waiting, I have been accepted to join GSOC 2015 this summer. I got this summer working for Python Software Foundation. My project is to add a REST API to Roundup. The project is initially aimed to bugs.python.org, but I also want to contribute it to the code upstream. So far, the experience with Python is very nice. I hope it will be the successful summer. ### Artem Sobolev(Scikit-learn) #### Introduction In this article I'll briefly outline details of the models I chose to implement, so one can understand what's going on without going through original papers. Metric Learning is (mostly) about learning a Mahalanobis distance, which can be parametrized by a quadratic semi-positive matrix$\mathbf{M}$: $$D_\mathbf{M}(x, y) = \sqrt{(x-y)^T \mathbf{M} (x-y)}$$ If we do Cholesky decomposition of$\mathbf{M} = \mathbf{L}^T \mathbf{L}$, then one can view this distance as Euclidean distance in a space obtained using linear transformation$\mathbf{L}$: $$D_\mathbf{M}(x, y) = \sqrt{(x-y)^T \mathbf{L}^T \mathbf{L}(x-y)} = \sqrt{(\mathbf{L}x-\mathbf{L}y)^T (\mathbf{L}x-\mathbf{L}y)} = \| \mathbf{L} x - \mathbf{L} y \|_2$$ So one can either learn a PSD matrix$\mathbf{M}$, or an arbitrary matrix$\mathbf{L}$, but the later results in non-convex optimization problem: obviously, if we apply an unitary transformation$U$to$\mathbf{L}$, we'd get$(U \mathbf{L})^T U \mathbf{L} = \mathbf{L} U^T U \mathbf{L} = \mathbf{L}^T \mathbf{L}$. ## LMNN — Large Margin Nearest Neighbors LMNN tries to warp the data space to maximize the margin between different classes. Like SVM, it uses hinge loss and, more importantly, has a convex optimization problem formulation. Unlike SVM, it works naturally with multiclass classification. Let's introduce some useful notation. Let$y_{ij} = 1$if and only if$y_i = y_j$, and$\eta_{ij} = 1$iff$x_j$is one of k nearest neighbors of$x_i$in terms of some "prior" distance (Euclidean, for example). Then the cost function is $$\sum_{ij} \eta_{ij} D_\mathbf{M}(x_i, x_j)^2 + C \sum_{ijk} \eta_{ij} (1-y_{ik}) [1 + D_\mathbf{M}(x_i, x_j)^2 - D_\mathbf{M}(x_i, x_k)^2]_+$$ where$[\cdot]_+$is a hinge loss. The second part of the term, essentially, tries to make$x_i$and$x_j$closer than$x_i$and$x_k$by some margin (1 unit of distance), if$x_j$is one of the$k$neighbors of$x_i$and$x_k$has different class labels. The first part pulls similar points together. Since$\mathbf{M}$has to be PSD, this results in a SDP problem: $$\sum_{ij} \eta_{ij} D_\mathbf{M}(x_i, x_j)^2 + C \sum_{ijk} \eta_{ij} (1-y_{ik}) \xi_{ijk} \rightarrow \min \\ \text{s.t.} \begin{cases} D_\mathbf{M}(x_i, x_k) ^2- D_\mathbf{M}(x_i, x_j)^2 \ge 1 - \xi_{ijk} \\ \xi_{ijk} \ge 0 \\ \mathbf{M} \succeq 0 \end{cases}$$ It can be solved by any general SDP solver. Authors, though, developed their own solver which works way faster by exploiting some properties of the problem being solved. It's description can be found in this paper, while the code is available here. I'll elaborate the algorithm in a follow-up post on LMNN. Pros • Convex Const • Hard optimization problem LMNN generally performs very well in practice, although it is sometimes prone to overfitting due to the absence of regularization, especially in high dimension. It is also very sensitive to the ability of the Euclidean distance to select relevant target neighbors. References: ## NCA — Neighbors Component Analysis NCA's optimization problem minimizes Leave-One-Out error of KNN on data transformed by matrix$\mathbf{L}$. Since KNN's decision function is non-continuous, NCA considers a continuous relaxation (which is independent of particular k) using stochastic approach $$p_{ij} = \frac{\exp(-\| \mathbf{L} x_i - \mathbf{L} x_j \|^2) }{ \sum_{k \not= i} \exp(-\| \mathbf{L} x_i - \mathbf{L} x_k \|^2) }$$ this is the probability of choosing$x_j$as a neighbor of$x_i$. Then probability of the correct classification of a point$x_i$is $$p_i = \sum_{j \in C_i} p_{ij}$$ Where$C_i = \{ k | y_k = y_i, k \not= i \}$is a set of data indices having the same target class as$y_i$. The objective then is to maximize expected number of correctly classified points: $$\max_\mathbf{L}\sum_i p_i$$ Also, there's another formulation, which effectively minimizes KL-divergence: $$\max_\mathbf{L}\sum_i \log p_i$$ Optimization is done using unconstrained Gradient Descent methods. Also, unlike LMNN, NCA has non-convex objective, so random restarts may be useful. Pros: • Since it learns transformation matrix$\mathbf{L}$rather than quadratic form matrix$\mathbf{L}^T \mathbf{L}$, it can do dimensionality reduction. • Optimization is easier, no constrained programming required Cons: • Non-convex loss References: ## ITML — Information-Theoretic Metric Learning ITML tries to find a PSD matrix$\mathbf{M}$such that distance between similar points is less than some hyperparameter$u$, and distance between dissimilar points is more than$l$. There could be many such matrices, so ITML also introduces regularization: we're given a prior matrix$\mathbf{M}_0$, and we'd like to keep$\mathbf{M}$as close as possible to that$\mathbf{M}_0$. This closeness is defined using Gaussian distributions: for each PSD$\mathbf{M}$there's a family of Normal distributions of the form$\mathcal{N}(\mu, s \mathbf{M})$. Fix$\mu = 0$and$s = 1$, and we get a bijection between PSD matrices and a restricted famility of probability distributions$p(x; \mathbf{M}) = \mathcal{N}(0, \mathbf{M})$. This allows us to define closeness of PSD matrices via closeness of probability distributions. This is where the Information Theory steps in, since the measure of probability closeness used is Kulback-Leibler divergence. Then one can show that KL divergence in this case boils down to the following: $$\text{KL}(p(x; \mathbf{M}_0) || p(x; \mathbf{M})) = \frac{1}{2} D_{ld}(\mathbf{M}, \mathbf{M}_0) \rightarrow \min$$ where$D_{ld}$is a LogDet divergence, which is a special case of Bregman divergence generated by$\phi(\mathbf{M}) = -\log \det \mathbf{M}$, and has the following formula: $$D_{ld}(\mathbf{M}, \mathbf{M}_0) = \text{tr}(\mathbf{M}\mathbf{M}_0^{-1}) - \log \det (\mathbf{M} \mathbf{M}_0^{-1}) - d$$ One final note: we don't want the optimization problem to be too restrictive, so slack variables are introduced: $$\min_{M \succeq 0, \xi} D_{ld}(\mathbf{M}, \mathbf{M}_0) + \gamma D_{ld}(\text{diag}(\xi), \text{diag}(\xi_0)) \\ \text{s.t.} \begin{cases} D_\mathbf{M}(x_i, x_j)^2 \le \xi_{c(i,j)}, (i, j) \in \text{Similar} \\ D_\mathbf{M}(x_i, x_j)^2 \ge \xi_{c(i,j)}, (i, j) \in \text{Dissimilar} \end{cases}$$ Where$\xi_0$is a vector of$u$s and$l$s. Authors propose an iterative algorithm, each iteration takes$O(c d^2)$time where$c$is number of constraints (up to$n^2$(number of training samples squared) if we do all-vs-all), and$d$is dimensionality of data. Pros • Has regularization • No eigendecompositions required Cons • 2 hyperparameters$u$and$l$•$\mathbf{M}_0$may affect goodness of the solution References: ## May 01, 2015 ### Richard Plangger(PyPy) #### Challenge accepted! I'm very happy to be accepted for GSoC 2015! Here is my short description on what I going to work on this summer. I already started to implement my proposal. It took some time until I understood the structure of the project. The thing that helped me most is the documentation at PyPy and RPython. I have put substantial amount of time into the first prototype that is able to transform traces on IR level and even compile to SSE2 vector code at runtime. In the next post I'll describe how the current prototype internally works and what I'm aiming at next. ### Sudhanshu Mishra(SymPy) #### Google Summer of Code 2015 with SymPy Once again I got accepted into Google Summer of Code! I'll be working on assumptions system of SymPy. This time, SymPy is participating under Python Software Foundation. SymPy is a Python library for symbolic mathematics. It aims to become a full-featured Computer Algebra System while keeping the code as simple as possible in order to be comprehensible and easily extensible. Here's what ideas page says about the project: The project is to completely remove our old assumptions system, replacing it with the new one. The difference between the two systems is outlined in the first two sections of this blog post. A list of detailed issues can be found at this issue. This project is challenging. It requires deep understanding of the core of SymPy, basic logical inference, excellent code organization, and attention to performance. It is also very important and of high value to the SymPy community. You should take a look at the work started at https://github.com/sympy/sympy/pull/2508. Numerous related tasks are mentioned in the "Ideas" section. My mentors are Aaron Meurer and Tim Lahey. Currently SymPy has two versions of mathematical assumptions. One is called "old assumptions" because a new implementation has been carried out recently. Since "old assumptions" were developed a long back, they are more mature and faster. However, because of its design, it is not capable of doing some interesting things like assuming something over an expression e.g. x**2 + 2 > 0. Old assumptions store assumptions in the object itself. For example, the code x = Symbol('x', finite=True) will store the assumption that the x is finite in this object itself. Both systems expose different APIs to query the facts: Old: In [1]: from sympy import * In [2]: x = Symbol('x', imaginary=True) In [3]: x.is_real Out[3]: False  New: In [4]: y = Symbol('y') In [5]: ask(Q.real(y), Q.positive(y)) Out[5]: True  My work includes but is not limited to: • Identifying inconsistencies between old and new assumptions and eliminate them. • Improving performance of the new assumptions. • Making new assumptions read old assumptions. • Removing assumptions from the core as much as possible. • Making API of old assumptions call new assumptions internally. That's all for now. Looking forward to a great summer! ## April 30, 2015 ### Vito Gentile(ERAS Project) #### Using Kinect Studio and recorded Kinect data I’ve just tried Kinect Studio, after having recorded (during the last Easter break, in Italy) some data of my girlfriend moving in front of the Kinect. My goal was to use Kinect Studio to emulate a “fake Kinect” that provides the data as they were recorded, by simply playbacking and looping them. This could allow me to develop without a physical device, and in a quite faster way; good news for my project for Google Summer of Code! However, I’ve just discovered that, in order to use Kinect Studio and recorded Kinect data, you need a Kinect-enaled app AND a Kinect plugged in your PC! That is, if you want to emulate a Kinect… you need a physical Kinect! Dear Microsoft, why this? ## April 28, 2015 ### Vito Gentile(ERAS Project) #### Accepted for Google Summer of Code 2015! Yes! My project has been accepted!!! I will participate to Google Summer of Code, under the supervision of Ezio Melotti and Aldebran, on behalf of the Italian Mars Society and the Python Software Foundation. Here is my project title and a short abstract of it. Project title: Enhancement of Kinect integration in V-ERAS The available virtual reality simulation of the ERAS Station allows users to interact with a simulated Martian environment using Aldebran Motivity, Oculus Rift and MS Kinect. However the integration of the latter technology is still not complete, and this project aims to enhance it in order to: • increase manageability of multiple Kinects • improve user navigation • reproduce users’ movements in real time • reduce data transfer latency by enhancing Tango integration • support touchless gestures You can discover more about the ERAS project and the Italian Mars Society at this link. ### AMiT Kumar(Sympy) #### Google Summer Of Code with SymPy Yay! the much awaited results of Google Summer Of Code is out now, and I have been selected to work with SymPy under Python Software Foundation. ### For those who don't know about GSoC Google Summer of Code is a global program that offers students stipends to write code for open source projects. ### A bit about SymPy SymPy is a Python library for symbolic mathematics. It aims to become a full-featured Computer Algebra System (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. ### About My Project My Project is being mentored by some really awesome guys Ondřej Čertík, Sean Vig and Harsh Gupta. The Project aims at improving the current Equation solvers in SymPy. The Current solve is a huge mess, It needs to be broken into various sub-hints, to make the code more robust, modular, and approachable for developers, moving in lines of the new API, as developed in solveset. Currently the new API is implemented for univariate Equations only, we need to incorporate it for linear systems, multivariate equations & transcendental by rewriting the solvers for these in the new solveset Module. Looking forward for a great summer with SymPy! ## April 27, 2015 ### Rashid Khan(GNS3) #### GSoC Acceptance Hey! I have been accepted in to Google Summer of Code to work with GNS-3 which falls under the Python Software Foundation. I will be working on the Web Client for GNS3 which currently has a client built on PyQt and Python. The architecture of the web application will be based on REST Services provided by the GNS3-Server. For the project, I have chosen to work with AngularJS for developing the front-end services. Right now, I'm really excited to be accepted. I'm looking forward to an awesome summer, and this blog will contain updates as the project moves forward. ### Aron Barreira Bordin(Kivy) #### Methodology Hello everyone! I'll be using this first week to learn more the Kivy Designer source code. It's not easy to understand completely, I'm going to get an ideia on how it works and learn about intern processes to be able to add new features and fix bugs. #### Logs To keep all the development process well documented, I'll always open a new pull request to describe the bug, create a new branch to each bug, on in this branch and well it's ok and tested, merge into master. Using this methodology it's really easier to trace a log about my development. #### Issues Now I created a list of issues that I'm going to work in this Google Summer of Code. Check this list here. #### Milestones I have some milestones too, you can check these milestones here. #### Kivy Designer Development Hello everyone! This is the first post of the blog. This afternoon I have been accepted into Google Summer of Code 2015! This is a perfect news! I have some goals, and I want to release a more stable version of Kivy Designer at the end of this summer. You can check the version under development right here In this blog I'll be posting the development process weekly, some tips about new features and about kivy! So, see you soon :) ## April 26, 2015 #### Chinese Remainder Theorem This is a brief introduction to CRT. It is an algorithm to solve simultaneous linear congruences. Consider  x = a_1 (mod n_1) x = a_2 (mod n_2) x = a_3 (mod n_3)  If n1, n2, n_3 are relatively coprime, then solutions to this set exists. Let N = n_1 * n_2 * n_3. Then, if x' is one solution to these equations, then the general set of solutions is x = x' (mod N). ## Algorithm We define  N_1 = N / n_1 N_2 = N / n_2 N_3 = N / n_3  Then we solve the following linear congruences:  N_1 * x = 1 (mod n_1) N_2 * x = 1 (mod n_2) N_3 * x = 1 (mod (n_3)  Let the solutions be x_1, x_2, x_3. Then one solution is x' = a_1*N_1*x_1 + a_2*N_2*x_2 + a_3*N_3*x_3. And the general solution can be found using the above congruence. ## Application Quite often we use CRT to take the modulo of a number with a number that is not prime. Or, let's say we have a congruence in which a*x = b (mod n), where n is not prime. In these cases, we factorize n to split the modulo operations into a number of such congruences, solve them individually and then combine them using CRT. ## April 15, 2015 ### Aman Singh(Scikit-image) #### Optimized Array Access:- The __getitem__ method(array access using []) of numpy arrays is written for python and while using it in cython it has the corrssponding overhead. We can get around it using following suggestion:- If the array is intermediate in programme, it can be declared as a C or cython array. But it may not work best for returning value to python code, etc. So instead of it we can use a NumPy array. NumPy arrays are already implemented in C and cython has direct interface with it. Hence they can be used. Accessing elements of NumPy arrays has roughly the same speeed as that of accessing elements from C array. In NumPy arrays modes can also be specified. Like ‘c’ or ‘fortran’ types. These work best when according to our mode of iteration on arrays(row or column wise). ‘c’ mode works best when the iteration is row wise and ‘fortran’ for column wise. By spcifying the mode we don’t get any extra speed if array is arranged in same way but if not it raises and exception. Using it we can use the different array operations, but each element access time is also improved. But for that we have to access the array element wise. There are some cons of using NumPy arrays also. Passing NumPy array slices between functions can have a significant speed loss, since these slices are Python Objects. For this we have memoryview. It supports the same fast indexing as that of NumPy arrays and on the other hand their slices continue to support the optimized buffer access. They also work well while passing throgh diffrent functions in module. They can be used with inline functions also. However passing a memoryview object to a NumPy function becomes slower as the memoryview object has to be first converted to NumPy Array. If you need to use NumPy functions and pass memoryviews between functions you can define a ‘memoryview’ object that views the same data as your array. If for some reason a memoryview is to be converted in NumPy array np.asarray() function can be used. The memoryview can also be declared as C contiguous or Fortran contiguous depending upon conditions using syntax ::1. For eg. A two dimensional, int type, fortran contiguous array can be defined as ‘cdef int [::1 , :]’. Cython also supports pointers. Many of the operations can be done using pointers with almost the same speed as that of optimized array lookup, but code readability is compromized heavily. * is not supported in cython for derefrencing pointers. [] should be used for that purpose. I will try to upload an example each of abovesaid statements in the upcoming blogs, which will make them more clear. Until then </keep coding> #### Understanding Python Overhead while passing arrays as function arguments: Just to understand what are the factors which effect the scale of optimization in Cython I thought to do some experiments. What I did was took a basic function which passes large arrays as arguments and tried to analyse the effect of various changes. Also can we beat the speed of numpy with explicit C looping? We will see :) The basic function with which I started was: def fun(a): x = np.sin(a) return x def trial(a): return fun(a)  Runtime: 195ms Note:- Using cdef instead of def and defining type of all variables cause no difference in runtime. Then instead of passing whole array in function, I did explicit looping and used sin function from libc.math. Code was:-   cdef fun(a): return sin(a) def trial(np.ndarray[double] a): for i in xrange(a.shape[0]): a[i] = fun(a[i]) return a Runtime now was: 340ms There are some more points to note in this function:- 1) If we use def instead of cdef while declaring fun() runtime escalates to 500ms. This is the change a pure C loop without python overhead can bring. 2) Another thing, If np.sin is used in place of runtime is 16s. np.sin is a python function which has some python overhead. When called many times this overhead gets added every time the function is called slows the code heavily. But if we need to pass arrays, this np.sin works quite well as was seen in case-1. Now only if we define the type of a as double, run-time comes down to 252ms. Note:- If in def "trial(np.ndarray[double] a): " if we don't define the type of a runtime is 1.525s. Next I removed the function fun() and did the computations in trial function itself.  from libc.math cimport sin def trial(np.ndarray[double] a): for i in xrange(a.shape[0]): a[i] = sin(a[i]) return a This time run-time was 169ms. We have finally beaten the 1st code. I have yet not explained the variation in runtime in many cases. I will try to do it soon. NB: 1. Data over which all the calculations were done was generated by a = np.linspace(1,1000,1000000) 2. Using typed memoryview instead of np.ndarray caused no change. 3. Timings were estimated by using Ipython magic function %timeit. ## April 09, 2015 ### Aman Singh(Scikit-image) #### Optimizing array In Cython There are many ways to handle arrays in Cython. I tried to find the most optimize way out of them. For it I took an easy problem of finding number of elements greater than 20 in two arrays. Then I profiled them in Ipython using %prun command and analysed the outcome. The version I started with was:- import numpy as np cimport numpy as np def fun(a, b): c1, c2 = 0, 0 for i in range(len(a)): if a[i] > 20: c1 += 1 for i in range(len(b)): if b[i] > 50: c2 +=1 return c1, c2 def trial(a, b): return fun(a, b)  I created an array using numpy linspace function of 100000 elements and passed it to the above function. Execution time came out to be:- 10.393 seconds. Then I did some modifications and typed the arguments of Line 6 and changed it to:- [1]def fun(np.ndarray[int, ndim = 1]a, np.ndarray[int, ndim = 1] b): [2] cdef int i, c1 =0, c2 = 0  The execution time after changing [1] was 0.208 sec i.e. 50x faster and after [2] it came down to 0.031 seconds. As the timing was getting smaller and smaller I created another data-set of 1000000 elements(100xto last one) and now the timing was:- 0.23 seconds. Then I thought of testing the speed gain of memoryview tehnique and changed the code to:- # cython: profile=True import numpy as np cimport numpy as np cdef fun(int[:1] a, int[:1] b): cdef int i, c1 =0, c2 = 0 for i in range(len(a)): if a[i] > 20: c1 += 1 for i in range(len(b)): if b[i] > 50: c2 +=1 return c1, c2 def trial(a, b): return fun(a, b)  This had the same timings as of the last one but when I changed the buffers to ‘C type contiguous’ i got 0.154 seconds i.e 1.5 better than the last numpy array version over the new data set. Next modification was completely different. I made the function fun of complete c type. # cython: profile=True import numpy as np cimport numpy as np cdef fun(int *a, int *b, int sa, int sb): cdef int i, c1 =0, c2 = 0 for i in range(sa): if a[i] > 20: c1 += 1 for i in range(sb): if b[i] > 50: c2 +=1 return c1, c2 def trial(a, b): return fun( np.PyArray_DATA(a), np.PyArray_DATA(b), len(a), len(b))  This version gave me another 10% speedup and now the timing was .144 seconds. Now since the fun() function was entirely a c function I thought to use it without gil. I changed the line: cdef fun(int *a, int *b, int sa, int sb): to cdef fun(int *a, int *b, int sa, int sb) nogil:  This gave me another speed up of 2x. ## April 08, 2015 ### Aman Singh(Scikit-image) #### Experimenting with Numpy-C API in Cython import numpy as np cimport numpy as np np.import_array() # Must if Numpy -C API is used in Code # The wrapper code, with numpy type annotations:- def cos_doubles_func(np.ndarray[double, ndim=1] in_array not None, np.ndarray[double, ndim=1] out_array not None): cdef int p = 50 cdef np.ndarray[double, ndim=1, mode="c"] temp = in_array[in_array > p] cos_doubles( np.PyArray_DATA(temp), np.PyArray_DATA(out_array), temp.shape[0]) cdef void cos_doubles(double * in_array, double * out_array, int size): cdef int i; print size for i in range(size): print in_array[i] # INPUT: [1] a = np.array([1,255,55,45,85]).astype(np.double) [2] exp.cos_doubles_func(a, a) # OUTPUT: 255.0 55.0 85.0  ## April 07, 2015 ### Aman Singh(Scikit-image) #### Using Cython in IPython IPython is an interactive shell for Python offering enhanced introspection, tab completion and rich history. With little modifications and some special syntaxes, we can run it for Cython also. It has some classic tools which make things like memory management, optimization etc handy. Installation:- 1. For linux:- Type in terminal: >>> sudo easy_install ipython 2. Then we also need to install readline using command: >>> sudo easy_install readline Now we are ready to go. To use cython in Ipython we need to load cython magic using : In [1]: %load_ext cythonmagic This enables all cython magic functions in IPython. Now lets say we want to write a cython function of adding two integers. We can do it in shell itself. What we should have done without it:- Write the function in a .pyx file. Compile it using setup.py ( either using distutils or manually ). Then load import it in shell. But using IPython it’s too easy. Just add %%cython magic command as the first line and then write cython code. It does all the remaining booriing work for you. Cython function to add two integers:- In [4]: %%cython ...: def sum(int a, int b): ...: cdef int s = a+b ...: return s ...:  This is all that you need to do. Al the compiling and importing is done by IPython itself. Remember the %%cython command is valid for only that block. If you want to define another cython function start again with %%cython magic function. Now lets test it. In [6]: sum(1205, 2565) Out[6]: 3770  Yes its working…!!! But what about raising errors? Lets try.. In [6]: sum(152, 'dhn') Out[6]: 3770 TypeError Traceback (most recent call last) in () ----> 1 sum(152, 'dhn') /home/aman/.cache/ipython/cython/_cython_magic_161302c79052187807a09738c3cc5bd1.so in _cython_ magic_161302c79052187807a09738c3cc5bd1.sum (/home/aman/.cache/ipython/cython/_cython_magic_161 302c79052187807a09738c3cc5bd1.c:615)() TypeError: an integer is required  It raises error for wrong data types also. Now lets come to bench-marking and optimization. IPython provide many which automatically measure the execution time of a function and give detailed report. The first one and easiest to use is %timeit. It gives best execution time after running the function many times. Lets use it for above defined function: In [11]: %timeit sum 10000000 loops, best of 3: 20.4 ns per loop  We can manually modify no of times it runs. But for that we need to import some functions as:- In [13]: from timeit import Timer, timeit, repeat Then type: Repeat = repeat("sum(255, 5895)", setup="from __main__ import sum", repeat=3, number=100000)  repeat argument defines how many times you want to do the check. number arguments defines number of times you want to run the function to get best from it. Output with repeat = 3 comes out to be: [0.015902996063232422, 0.006285905838012695, 0.006062984466552734] There is another very handy tool %prun. It tells the detailed report of time taken by various functions while execution. Lets define another function which will help in understanding %prun. In [6]: %%cython ...: def sq(int a): ...: return a*a ...:  In [7]: %%cython ....: from __main__ import sq ....: def fun(int a, int b): ....: return sq(a)+sq(b) ....:  NB: To import sq the command is from __main__ import sq Now lets create arguments for the functions: In [17]: a = no.linspace(10,1500,150000) In [18]: b = no.linspace(100,1500,150000)  We nee to run this function many times since its very small and the time taken will be very small for bench-marking: In [24]: %prun [fun(250, 254) for _ in range(1000000)]  Output we got is:- 1000003 function calls in 0.541 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.301 0.301 0.541 0.541 :1() 1000000 0.223 0.000 0.223 0.000 {_cython_magic_83e0908c3b53c6e543c5774b7.fun} 1 0.017 0.017 0.017 0.017 {range} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' object Meaning of different arguments is: ncalls : for the number of calls, tottime: for the total time spent in the given function (and excluding time made in calls to sub-functions), percall: is the quotient of tottime divided by ncalls cumtime: is the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions. percall: is the quotient of cumtime divided by primitive calls There are other tools like %lprun an some other for memory profiling. We can use them according to need. Ipython has another very important and useful feature. Using it we can directly import any cython (.pyx file) as we do with .py (python) files. The file can be reloaded also. The corrsponding commands are:- import pyximport pyximport.install(reload_support=True) #for relaoding import my_cython_module #file name is my_cython_module.pyx  That’s all for now. </keep coding> ## April 03, 2015 ### Udara Piumal De Silva(MyHDL) #### Install GHDL ## GHDL is a complete VHDL simulator, using the GCC technology. This post is a guide to installing GHDL from source. 1. Download the GCC 4.8.2 source http://ftp.gnu.org/gnu/gcc/gcc-4.8.2/gcc-4.8.2.tar.bz2 2. Extract the GCC 4.8.2 source to a directory using following command 1. $ bunzip2  gcc-4.8.2.tar.bz2$tar xvf gcc-4.8.2.tar 3. Download the GHDL source http://sourceforge.net/projects/ghdl-updates/ 4. Extract the source to a folder called GHDL 5. change the current directory to GHDL/translate/gcc/ and execute dist.sh this would create a zip file ghdl-0.31.tar.bz2 6. $ cd GHDL/translate/gcc/$./dist.sh 7. This would create a zip ghdl-0.31.tar.bz2 . Extract this folder just as before. Inside this folder there will be a folder called vhdl 8. Copy this folder to the gcc folder which is inside the extracted GCC source. 9. Change the current directory, the folder to which GCC source is extracted. Configure the GCC build with the following command. 10. $ mkdir build$cd build$ ../configure --enable-languages=vhdl --disable-bootstrap
11.  This would create the Makefile with the specified configuration. Now simply run make and make install
12. $make CFLAGS="-O"$ sudo make install

## March 31, 2015

### Udara Piumal De Silva(MyHDL)

#### Introduction to MyHDL

MyHDL is a hardware description language written in Python. The motivation behind MyHDL is to bring rich features of Python to HDL. It however is not intend to used as a standalone HDL like Verilog and VHDL. Instead MyHDL code can be automatically converted into Verilog or VHDL and then synthesized.

Installation guide for MyHDL :
http://myhdl.org/start/installation.html

Source of MyHDL:
https://github.com/jandecaluwe/myhdl

A simple Logic Unit wrote in MyHDL and its simulation results look like follows:

The comments are based on the paper as written in this version

Some actions items, others listed inline:

• Derive DGR and width with a smoothed image.

• Check background . Compare with 2MASS. Derive DGR width and intercept using Planck and 2MASS image. Compare derived intercepts. Is a background subtraction necessary before the model fitting?

Interpretation of steps used to calculate DGR and HI width

• For each big region you calculate the center HI velocity (this is a single value per region)

• I recently just switched to using the HI center for each LOS. This led to a larger uncertainty in the HI width, which I think is more realistic.

• Compare the following likelihood space which was calculating with indiviual centers for each LOS to Figure 4 in the paper. The derived DGR and width are more uncertain with a varying center.

+ **Action** Use single vel center. We do not expect the center $$HI$$ to
vary within a MC.

• Assume delta_V=50 km/sec, calculate N(HI) for the region

• Using MLE and the whole Av image calculate DGR. Examine residuals, look at PDF and mask out pixels with residuals above 3-sigma

• Repeat until DGR converges.

• Before the next step I bin the images to 1 deg on a side.
• After this apply MLE to estimate DGR and delta_V with the main goal of estimating valiance and uncertainties on the parameters. Are you fitting for 2 or 3 parameters here?

• I was originally fitting for three parameters, DGR, HI width, and the intercept. However, the MLE intercept in California was quite negative, about -1 mag. The model was DGR + intercept, leading to negative surface densities. After some consideration I realized that there should not be an intercept, given the simple perscription that = DGR. I am now fitting only 2 parameters, DGR and HI width.
• After initial MLE run Av and N(HI) images were binned to avoid correlated residuals and the MLE parameter search was repeated again.

• Am I correct that in the iterative approach when you follow Planck et al. method you use a single delta_V=50 all the time and just estimate DGR? This is what I think is going on from the text, but in Section 3.3 you state that even delta_V is being calculated. Is this correct?

• I incorrectly summarized the steps in section 3.3. After step 5, there should be a step 6, where the MLE width is used as the initial width in step 1. Repeat steps 1 through 6 until the input width in step 1 is the same as the MLE width derived in step 5. I am performing one iteration inside another iteration, one to create the mask, the other to converge on the width because masking requires the width as a prior.

• I am iteratively solving for in the initial masking until converges between MLE estimates from consecutive masks. I chose an initial value of = 50 km/s, derive an initial , solve the the DGR and , create a model map, exclude pixels with high residuals, then repeat using the MLE as the initial value for the next iteration.

• I can run the analysis with varying initial widths to make sure that the initial width chosen does not affect the converged MLE width.

• In Section 3.2.3 you even say “three model parameters”, do you mean the HI center as well? This really needs to be clear and consistent.

• Previously I was included the intercept in Equation (4) in the paper as a parameter in the MLE, but I am no longer using the intercept. This should be “two model parameters”.

Please let me know if this is correct. Please also provide answers to questions below both to me and in appropriate places in the paper.

——— Points that need to be addressed in the section: ——–

• What does Table 1 show, final values or values derived BEFORE binning the images to avoid correlated residuals?

• Table 1 shows the final values after binning, i.e., the best estimate of the errors on each parameter.

• Action clarify that Table 1 shows final results

• We need to say why is delta_V=50 km/s used in this calculation, especially as the parameter grid goes to 70 km/s. Can you send me (better provide a table in the appendix) estimated DGR values from this first step. We want to see (and comment) how have final DGR values changed relative to these initial values.

• I chose 50 km/s because we expect the width of cloud to be several a few times the CO width. The width converges to a much different value for each of the three clouds.

• Action Show that the step to iterate supplying the derived width as the input width in deriving the mask is necessary. Perhaps this added level of complexity is unnecessary. Perhaps use some value of the peak, like 20% as the width?

• For the masking method, please check the Planck paper to see if their PDF is also NOT centered at 0 mag (peak in Figure 2 is offset from 0 - I wonder if this is ok or shows some background subtraction problem maybe).

• Figure 12 from the Planck paper shows that the portion of residuals correpsonding to the white noise are centered at 0 mag. They do include an intecerpt in their model . Perhaps we should include an intercept in the model , but use the intercept from the residual analysis, i.e., the fitted gaussian mean from the residual. This means that in the MLE grid search, the fitted gaussian mean would be added to the model . It is unclear to me how the Planck paper solved for the intercept.

• I ran the test of using the intercept derived in the residual fitting as a prior in the model for the MLE calculation of the binned images. The derived parameters did not vary within the errors.

• Action Plot vs. with the fitted DGR with and without an intercept for each cloud.

• How are the boundaries for MLE parameter search determined? E.g. HI width [0, 70] km/s and the given DGR range. This needs to be mentioned in the paper.

• I chose the width range based on typical cloud widths. I suppose the true upper limit on the range would be the 220 km/s. The DGR range I chose to be conservative. Kim & Martin 1996 found variations in the DGR of ~3 from 5.3 mag cm. Perhaps I should limit the DGR range?
• In section 3.4 and 3.2.2 you talk about 1st moment of square of Tb(v). I am surprised to see that Tb(v) is being squared first. I am used to seeing moments applied on HI spectra in the following way with the 1st moment being a good representation of the velocity where the bulk of HI emission is located. If you use something else then we need some kind of justification/explanation or reference provided.

• This was a mistake in the text. I am deriving the HI center by averaging the spectrum weighted by the square of .

• Action This needs to be justified.

• By briefly looking at results and Sigma_HI plots now for Perseus you get Sigma_HI<10, and also you get generally higher values for taurus and california. I remember that for a long time you were getting small values of Sigma_HI for Taurus. Do you understand when did this change happen? What was happening before to cause essentially an opposite trend in Sigma_HI (higher values for Perseus and California and low values for Taurus)? How do we know that current values are more accurate than the previous ones (from a few months ago).

• This boils down to isolating the HI envelope. If you look at figure 5, the HI spectra are much different for Taurus 2 than Taurus 1. When we were solving for a single HI envelope in Taurus, the MLE width was narrow, as shown for the Taurus region in Figure 5. The width MLE is large for Taurus 2 when we separate Taurus. The cores with low thresholds are the ones in Taurus 1.

• Action Include an additional section showing the differences between the new derived DGR, width for Perseus with Lee+12.

• Looking at Table 1, delta_V for California is about 4 times higher than delta_V for Perseus and most of Taurus - does this make sense when you look at HI spectra? Have you found anything in the literature about HI in California for comparisons?

• I haven’t found anything in the literature, but I will look soon. The spectra for California are much more extended than for Perseus and Taurus. You can see mild evidence of this in Figure 5. I’ll make up a plot of a few sample individual spectra from each cloud.
• How was sigmaHI derived for Figure 5?

• I calculated by calculating standard deviation of all pixels in a region for each velocity.
• In section 3.2.1 you say that regions were selected based on regions identified in Lada et al. (2010). Is this the right reference? This paper just talks about global GMC properties. This statement has to be clarified/changed in the paper.

• What is shown in Table 5? Is “residual scale” in degrees?

• Do you mean Table 4? The residual scale is used to disclude residuals below a certain value during the masking. The residual scale times the fitted Gaussian width of the residual PDF is the threshold for masking. Above this residual threshold, I mask the pixel.
• Yes, it would be good to have values in table 2 for comparison.

• Introduction will need to be expended and provide more detailed information but this can wait. However make sure you have enough information for your prelim research talk.

• Sentence starting with “transition has been identified as a threshold…” is not correct. The threshold was found observationally and HI-H2 transition is one possible explanation of this threshold, although still not fully confirmed and understood - that’s why we study our GMCs, otherwise all would be understood.

• Keep Section 2 for now, we can later trim it down. But please go through and provide more information that things are clear and can be followed. Also, be consistent how you call different things, e.g. you have different symbols for dust emissivity.

• I can not follow exactly what is being done from equations 1 and 2. Also, equations 1 and 3 are almost identical but you call quantities differently. It is also not clear to me that part about zero-level from LAB.

• Please check Draine book for calling various variables. E.g. I am used to seeing “total-to-selective extinction ratio” for Rv (not selective extinction curve?). This is very important, plus great as prelim prep.

• Does Fig 1 show your final Av image? As you derive this using two Planck images, how do transition pixels look like? Do you need to do something to ensure smooth transitions? Or are there any sharp transitions present (which can affect other calculations)?

• Fig. 1 does show the final model image. There are indeed some sharp transitions. Especially in the western side of Taurus. See below.

+ You can also find the data here /d/bip3/ezbc/taurus/data/av/taurus_av_planck_5arcmin.fits

• Please state in relevant sections RMS uncertainty of your Av image, 2mass Av image, HI data. This is important. Also mention any limitations Juni’s maps have, e.g. limited to <25mag.

• Section 3: This section has a lot of good material but I can not follow what exactly is being done - sorry. Please go through to ensure logical progression.

• In 3.2.1 say boundaries are set by eye. Was any specific strategy used? In 3.1 at the start, say what corresponds to data and what is the model.

• In 3.2.2 you say you use 2nd moment - is this correct? 1st moment corresponds to mean velocity, often it is calculated by applying intensity weighting. 2nd moment corresponds to velocity dispersion.

• I’ve gone through up to Section 4. I see that section 5 starts abruptly with modeling but will need some basic explanation of the models and various parameters. Also, this should have first a section showing Perseus application and discussion of how results compare with Lee et al.

• You also want to have some discussion of estimated delta_V and DGR parameters somewhere.

## March 27, 2015

### Patricia Carroll(Astropy)

#### GSoC Astropy proposal submitted!

Welcome to my blog for Google Summer of Code 2015!

I am a PhD candidate in Astronomy with plans to pursue a career in data science. The hacking has always been my favorite part of the research process. Although I’m adept at scientific hacking, this means that the science result is the end-product regardless of the quality or final state of the code written to get there. This is why the GSoC opportunity appeals to me!

The GSoC will allow me to grow as a true developer while contributing to the Astropy project, a Python package widely used and highly valued among astronomers for data analysis. I've proposed to develop fast model rasterization methods for Astropy as well as a web app based on JS9 to demo the new capabilities (see my full project proposal here). This meshes well with my research, and it will be fun to incorporate my own data into the testing and demos. If selected, I will be blogging weekly updates on the project.

xx fingers crossed xx

### Himanshu Mishra(NetworkX)

#### Working with networkX

Two months have passed since I found this amazing Python library on Graph Theory and Complex Networks. It's hosted on Github. With the simplicity of python, they have an amazing code base to work on. It's a very mature library even though they are participating in Google Summer of Code 2015. They have participated as sub-org under the umbrella of PSF.

I'm interested in working on a project to introduce add-ons for networkx. The python library is great in itself but there are also some awesome realated libraries written in C/C++. If my proposal is approved, I'll have to work on at least two Add-ons and give a general idea for other add-ons to be added in future.

That's all for now.
Cheers!

### Ziye Fan(Theano)

#### Hello world!

Hi!
This is a blog planed to be used for GSoC application and blogging my progress.

### Chienli Ma(Theano)

#### GSoC2015_blog_test

Just a post to test if catagory and rss feed works well.

## March 26, 2015

### Artem Sobolev(Scikit-learn)

#### Hello world

Hi there, and welcome to my GSoC-related blog!

If you want to read my other thoughts, you're welcome to follow to barmaley.exe.name

### First blog post

This is my first blog post. Here is my application to GSoC 2015.

## March 25, 2015

### Brett Morris(Astropy)

#### Test post

Here's a test post to show that I'm ready for Google Summer of Code 2015!

print("hello world")

### Elizaveta Guseva(Qtile)

#### GSoC Test Blog Post

Haven’t been here forever.

Hope will be back with GSoC =)

### Gregory Hunt(statsmodels)

#### First Contributions

In furtherance of the goal of contributing to the statsmodels project through PSF and Google Summer of Code I'm starting this blog to track my contributions.

A bit ago I forked the project on github and have been happily hacking away. Today I made a PR on a little project I've been working on adding some more covaiance structures for the GEE functionality of the package.

### Andrzej Grymkowski(Kivy)

#### First Post

First post should be like template so it is.

## March 24, 2015

### Rashid Khan(GNS3)

#### Hello

Hey!

This blog will be used for blogging about progress made on building the web interface for GNS3. Weekly blog posts will be made on the work done and the status of the project could be monitored.

### Manuel Paz Arribas(Astropy)

Link to my application in Astropy wiki.

### Julio Ernesto Villalon Reina(Dipy)

#### Curriculum Vitae

This is my updated CV.

### Shivam Vats(SymPy)

#### Hey there!

I am applying for Google Summer of Code, 2015. It is an annual program sponsored by Google, that pays students to contribute to open source organizations of their choice. My application involves two symbolic computation libraries - Sympy and CSymPy, the former written in Python and the latter in C++. CSymPy is meant to be a fast cousin of Sympy and can be used with optional wrappers.

I am proposing to implement fast series expansion in CSymPy and Sympy. Sympy already has series expansion, but suffers from speed issues and lack of a class structure. You can look at my full proposal here

I will use this blog to write about my GSoC experience, if I am selected.

Cheers!

## March 23, 2015

### Stefan Richthofer(Jython)

#### Welcome

Hello developers, friends and other curious visitors,

welcome to my blog!

I am currently getting the GSoC proposal in line. Keep fingers crossed, it will work out nicely in time...

## Starting project.

Blog for the description of the GSoC 2015 progress.
This is the beginning of the project.
I will write more in the future.

I have added additional cores to the analysis, because why not? Below are the regions selected for each core region. Regions were chosen to include a diffuse part and separate from other structures. The identified regions are from Lombardi et al. (2010), described in this post. I included additional regions besides the ones listed by Lombardi 2010, by labeling the new region as a second part of the nearest region.

Below are the results for each cloud. The contours represent binned counts in logarithmic points. Cores contain a few hundred points each.

Perseus:

Taurus:

California:

### Distance from galactic plane

Next result is furthering the result from the latter half of this post, which discusses the predicted from the K+09 model vs. galactic latitude. Here we take it a step further and consider the core region distance from the galactic plane. To calculate the distance from the galactic plane take the following geometry

where is the scale height of the sun wrt the galactic plane, is the scale height of the cloud wrt the galactic plane, is the cloud distance from the galactic plane along the LOS from the sun, is the distance between the sun and the galactic plane along the LOS, and is the distance to the cloud. Given

we can solve for as

This gives us the following trend between and distance from the galactic plane (not much of a trend)

## March 22, 2015

#### Which project?

As part of my quest to adventure-ously code for open source projects, I'm applying to Google Summer of Code 2015.   I'm looking at two project areas affiliated with Astropy, the astronomer's best friend* when coding in Python.  My potential mentors are professional astronomers and astrophysicists, as well as long-time contributors to Astropy and similar projects.

One possible project would involve putting together a planning tool for observational work.  Our motivation comes from a need to automate the analysis of scheduling telescope time, and do it in a package that "FITS"** in nicely with pre-existing Python work flows.

Some of the thoughts that literally keep astronomers awake at night:

"When will my star/galaxy/X-ray source/etc. be visible, and for how long?"

"Will the moon/other objects/atmospheric effects get me noisy star spectra?"

"Can my telescope's motors move fast enough to view a second or third galaxy in one night?"

 ???!!!

Tools that answer these questions are constantly being developed by various observatories.  Some existing software packages use languages like C and Javascript.  Since Astropy's goal is to provide a one-stop-shop for all your open source astronomical needs, a Python implementation would be very useful.

The products of such a tool would include the usual plots and tables (airmass, paralletic angle, etc.) and customizable object lists, but would be packaged in an easy-to-use GUI, perhaps with simple animations and web browser capabilities.  The tool would work in requested features from the Astropy community and be used by researchers at various observatories to provide feedback.

...

"Gee, Jaz, sounds great!  But how would you do actually do it?"

For starters, I had to learn how to use Git (a back-up system used by programmers).  Then I dug around in the Astropy code repository on GitHub and a couple different developer mailing lists until I knew enough lingo to go undercover as a "real" programmer.

I also asked the incredibly smart (and super helpful!) astropy-dev community about the procedures for setting up isolated environments for testing code.  I've been practicing my cloning, forking, branching, setup.py'ing, and list-mailing.  So far, so good--no one's blown my cover.

...but this is only the start!  In the next couple of days, I need to:

1. Take a closer look at the code base for Potential Project #2, and run some tests on it.
2. Skype meet with my potential mentors.  We're scattered across 10+ time zones, so that should be easy.
3. Submit a patch (bug fix, documentation edit, etc.) to the Astropy code base.  This is required for applications to GSOC under the Python umbrella.
4. Get my application hammered out.
5. ??? You tell me.  It's like 4AM over here.

Footnotes:

*Even if said astronomer has real, human friends.  Unless said human friends know how to write code in Python.  Still waiting on Astropy to accept forgot_wallet=TRUE option when I run letsgobeer().

** Flexible Image Transport System.

### Vipul Sharma(MoinMoin)

#### Gesture Recognition using OpenCV + Python

This python script can be used to analyse hand gestures by contour detection, and convex hull of hand palm using opencv library used for computer vision processes.

code: https://github.com/vipul-sharma20/gesture-opencv

The video below shows the working of the code:

How?
• Change hand gesture captured to grayscale

•  Blur image
• Thresholding
• Draw contours
• Find convex hull and convexity defects

## March 21, 2015

### Vito Gentile(ERAS Project)

#### Starting to apply for Google Summer of Code 2015!

This is to let you all know that I’m applying now for GSoC 2015!

I hope to be selected to partecipate!

Stay tuned, I will let you know!

## March 20, 2015

#### First Post

Hi Mom, Boyfriend, and People I'm Trying To Impress

...this is my blog!  It's called Jazmin's Open Source Adventure.  Like the page says.

I'll be writing about writing open source code here, and related stuff--which, like anything on the internet, always involves cats.

 Image stolen from http://rforcats.net/

## March 19, 2015

#### Sternberg and Krumholz Model Fitting

The K+09 derived their column density calculations from the analytic model from krumholz08 of formation and photodissociation of a spherical cloud bathed in a uniform ISRF.\@ See krumholz09 and krumholz08 for details in the derivation, and a summary of the results in lee12. Here we quickly summarize the important assumptions and results.

\noindent where is the dust optical depth of the cloud if and dust is well-mixed, is the dust-to-molecule absorption ratio, and is the ratio of number densities between the cloud molecular component and CNM component. They modeled using empircal results from PDR models as a function of the ratio of the rate at which Lyman-Werner photons are absorbed by dust grains to the rate at which LW photons are absorbed by , . K+09 relates to the CNM and WNM properties. They define

\noindent where is the CNM number density, and is the minimum CNM number density required for the CNM to remain in pressure equilibrium with the WNM.\@ Written in terms of

We can calculate the CNM temperature, from EQ (19) in K+09. See this module for the solution of given a value. We can plot as a function of galactic longitude:

where we can see that Perseus and Taurus have much lower predicted values than California. This is expected given the low HI surface density thresholds seen in Perseus and Taurus. The locations of cores in galactic coordinates is shown here (taken from Lombardi et al. (2007))

## March 18, 2015

### Julio Ernesto Villalon Reina(Dipy)

This is Julio Villalon's blog. I will be posting new progress done on my research on this blog. Stay tuned, more soon!

## March 17, 2015

### Rupak Kumar Das(SunPy)

#### Hello {OpenSource} world!

Welcome to my blog! This is a log of my progress in the GSOC program . More updates to follow so…stay tuned!

### K+09 Model Discussion

The K+09 derived their column density calculations from the analytic model from krumholz08 of formation and photodissociation of a spherical cloud bathed in a uniform ISRF.\@ See krumholz09 and krumholz08 for details in the derivation, and a summary of the results in lee12. Here we quickly summarize the important assumptions and results.

\noindent where is the dust optical depth of the cloud if and dust is well-mixed, is the dust-to-molecule absorption ratio, and is the ratio of number densities between the cloud molecular component and CNM component. They modeled using empircal results from PDR models as a function of the ratio of the rate at which Lyman-Werner photons are absorbed by dust grains to the rate at which LW photons are absorbed by , . K+09 relates to the CNM and WNM properties. They define

\noindent where is the CNM number density, and is the minimum CNM number density required for the CNM to remain in pressure equilibrium with the WNM.\@ Written in terms of

### Predicted with galactic latitude

We can calculate the CNM temperature, from EQ (19) in K+09. See this module for the solution of given a value. We can plot as a function of galactic longitude:

where we can see that Perseus and Taurus have much lower predicted values than California. This is expected given the low HI surface density thresholds seen in Perseus and Taurus. The locations of cores in galactic coordinates is shown here (taken from Lombardi et al. (2007))

## March 12, 2015

#### Sternberg Model Fitting

I successfully fit the Sternberg model to the vs. relationship. I assumed that our case is a two-sided irradiation by an isotropic field, where they predict an threshold given by

where the variables are described in the previous post.

I am fitting the / ratio as a function of total gas surface density, = where and are the and gas fraction to the total. I solve for as a function of with

Below are the fits to Taurus cores. Where only the and parameters were allowed to vary in the Sternberg+14 and Krumholz+09 models, respectively. A metallicity of was assumed. I also assumed for the S+14 model, and for the K+09 model.

Below are Taurus fits. Find the locations of the cores in this post

Below are Perseus fits

## March 11, 2015

#### Sternberg Model Fitting

I successfully fit the Sternberg model to the vs. relationship. I assumed that our case is a two-sided irradiation by an isotropic field, where they predict an threshold given by

where the variables are described in the previous post.

I am fitting the / ratio as a function of total gas surface density, = where and are the and gas fraction to the total. I solve for as a function of with

Below are the fits to Taurus cores. Where only the and parameters were allowed to vary in the Sternberg+14 and Krumholz+09 models, respectively. A metallicity of was assumed. I also assumed for the S+14 model, and for the K+09 model.

Below are Taurus fits. Find the locations of the cores in this post

Below are Perseus fits

## March 08, 2015

#### Sternberg Summary

This post is a continuation of this post

## March 06, 2015

#### Sternberg Summary

This post is a continuation of this post, but don’t bother with the previous post, it’s a poor, short, summary.

## Summary of Sternberg et al. (2014)

The main predictor of and is the ratio of the free space FUV field intensity to the gas density, i.e. dissociation rate / formation rate. Next predictors are metallicity and dust-to-gas mass ratio.

##### Breakdown:

The HI to transition profiles and column densities are controlled by the dimensionless parameter . determines the Lyman-Werner optical depth due to dust associated with HI (described as HI-dust). They derive this optical depth as

$$\tau_{1,{\rm tot}} = {\rm log}[\frac{\alpha G}{2} + 1]$$

is the ratio of free space dissociation rate to the formation rate. G is the cloud-averaged self-shielding factor. Together is a measure of the dust-absorption efficiency of the -dissociating photons. G depends on the competition of line absorption and dust absorption.

##### dissociation

absorbs LW photons, excites from ground electronic state to excited state. Rapid decays occur to either ro-vibrational or continuum state. Decays to continuum state result in dissociation. In the end, the mean dissociation probability is dependent on the total incident LW flux, the dissociation bandwidth, and the mean flux density weighted by dissociation transitions.

##### Absorption

The local dissociation rate at any cloud depth is

where is the self_shielding function which quantifies the reduction of the total dissociation rate due to opacity in all of the absorption lines. is the free-space dissociation rate. is the column density of H. is the dust optical depth, where is the dust-gran LW-photon absorption cross section per H nucleon.

Assuming that the DGR mass ratio is proportional to metallicity

where depends on the grain composition, . is the metallicity relative to solar.

##### Balancing Absorption + Dissociation = HI + column densities

They assume a formation rate coefficient per volume for formation on grains, which depends on gas temperature and metallicity. They pair with the dissociation rate from Equation \ref{eq:diss_rate}. The volume density of , , can be written in terms of the volume density, and HI volume density, as . Balancing absorption with dissociation we get

The relationship between the volume densities does not make sense to me. If we plug in , we get this

so the dissociation rate is scaled by $n_1$? They integrate over the volume densities to get

This shows a key insight that the dust opacities associated with the atomic and molecular columns can be considered separately, despite the mixing of and .

They define the dimensionless parameter

where mean flux density weighted by dissociation transitions of and is the total dust cross section. Then they define the dimensionless “G-integral”

where is the -dust dissociating bandwidth. We can relate this to the formation rate of

## March 03, 2015

#### Low Perseus HI Surface Density Thresholds

Below are results from using different maps. See yesterday’s post for how regions are chosen and comparison with Lee+12. Each map leads to similar DGRs, however the K+09 map yields a slightly larger HI width. This leads to thresholds similar to Lee+12.

Planck:

Kainulain et al. (2009)

Lee et al. (2012) IRIS

## March 02, 2015

#### Low Perseus HI Surface Density Thresholds

Below are results from Planck . We can see that the thresholds are quite low compared to those found in Lee+12.

Here are some examples from Lee+12

The locations of each of the cores and their regions are shown here:

And here is the corresponding likelihood space.

## February 26, 2015

#### Negative H2 surface Densities

I have found systematically negative in California. Below is an example histogram of the calculated during a monte carlo simulation for a single core. These values are obviously unphysical.

Below is a screenshot of the HI spectrum at (ra, dec) ~ (4:34:00, 36:30:00) with the Planck Av 4 and 8 mag contours. The median velocity range from the monte carlo simulation is ~ -8 to 4 km/s.

To double check the column densities, N(H2) is given by

For the spectrum below, we will integrate from -10 to 5, i.e. across 15 km/s with an average of . So . A typical DGR = mag. = 3.7 mag at this pixel.

This seems quite low. The high DGR value is the culprit.

To calculate and I first calculate using equation (1), compute the surface densities individually by

is then the sum of and ._

Likely the source of this problem is the residuals during the masking procedure are not trivial for California. See below. I’m interpreting the large negative residuals to mean that there is excess HI which is not traced by any dust.

## February 25, 2015

#### 2MASS Av of TCP

Jouni Kainulainen shared 2MASS Av image with us today. He provided the following warning:

Attached is a near-IR-derived visual extinction map (A_V) that more-or-less stitches together the maps of California, Taurus, and Perseus from Kainulainen et al. (2009). It is not quite a perfect coverage, but you can have a look and decide if it is adequate.

I did not quite get the full picture of what you are planning to do… But in any case, I think I should emphasise that maps such as these are derived by comparing stellar colours to some reference field. In other words, the maps measure the relative extinction compared to that reference field, not absolute extinction (this is, to my knowledge, true for all near-IR maps out there).

Since this kind of large maps span several degrees on the sky, the reference field colours are likely to change within the map. Therefore, I have used several reference fields and interpolated between them to compute the “zero-point” of the extinction in the map.

Whether the above affects your analysis or not, I don’t now, but I think it is an important point to remember when considering the ability of the map to trace various gas components.

Below is a screenshot of the Av image with the Planck Av 4 and 8 mag contours.

test

## Summary of Krumholz 2009 fitting

Performed fitting of K+09 model to TCP complex. Included only cores from Taurus 1 region.

test

## Summary of Sternberg et al. (2014)

The main predictor of $$\Sigma_{HI}$$ and is the ratio of the free space FUV field intensity to the gas density, i.e. dissociation rate / $H_2$ formation rate. Next predictors are metallicity and dust-to-gas mass ratio.

$\Sigma_{test}$

##### Breakdown:
1. Normalize UV radiation field with $H_2$ dissociation rate. Define derivative of dissociation bandwidth as the $H_2$ self-shielding function.

2. Define FUV opacity of dust grains and $H_2$ formation rate on dust. Dust-to-gass mass ratio scales linearly with metallicity.

3. Define differentiable EQ of depth-dependent steady-state $HI / H_2$ formation/destruction. $H_2$-dust-limited dissociated bandwidth defined.

4. Integrate diff EQ to get HI column density.

## Summary of Sternberg et al. (2014)

The main predictor of HI and H2 surface densities

## Iterative MLE binning

Created systematic approach to determine affect of residual scale width on the output.

#### Perseus

dust2gas_ratio hi_velocity_width intercept residual_width_scale
0.120 9.00016 0.01 1.5
0.120 9.00016 0.02 2.0
0.120 9.00016 0.03 2.5
0.125 8.66682 0.06 3.0
0.125 8.66682 0.06 3.5
0.125 8.33348 0.08 4.0

#### California

dust2gas_ratio hi_velocity_width intercept residual_width_scale
0.280 17.66700 -0.92 1.5
0.300 16.66698 -0.91 2.0
0.335 15.00028 -0.89 2.5
0.355 14.33360 -0.87 3.0
0.365 13.66692 -0.87 3.5
0.375 13.33358 -0.86 4.0

## Iterative MLE binning

Created systematic approach to determine affect of residual scale width on the output. Below is an example table for the results to expect for tomorrow. Note these results are not real!

dust2gas_ratio hi_velocity_width intercept residual_width_scale
0.15 7.6668 -2.220446e-16 1.5
0.15 7.6668 -2.220446e-16 2.0
0.15 7.6668 -2.220446e-16 2.5
0.15 7.6668 -2.220446e-16 3.0

## Iterative MLE binning

Today I performed an automated approach to iteratively creating a mask, solving for the bests HI width, and using that best HI width to create the mask until the derived HI width converged. Below are the resulting likelihood spaces

## August 23, 2014

### Abhijeet Kislay(pgmpy)

#### Self disparaging thoughts that cleared…

desperate musings.

## August 19, 2014

### Michael Mueller(Astropy)

#### Week 13

This was the final week of Google Summer of Code, and since last Monday was the suggested "pencils down" date, I spent the week focusing on getting the main pull request ready for merging. I began by testing the new fast converter for unusual input, then handled issues Erik noted with the PR, filed an issue with Pandas, and began work on a new branch which implements a different memory scheme in the tokenizer. The PR seems to be in a final review stage, so hopefully it'll be merged by next week.
After testing out xstrtod(), I noticed a couple problems with extreme input values and fixed them; the most notable problem was an inability to handle subnormals (values with exponent less that -308). As of now, the converter seems to work pretty well for a wide range of input, and the absolute worst-case error seems to be around 3.0 ULP. Interestingly, when I reported the problems with the old xstrtod() as a bug in Pandas, the response I received was that the current code should remain, but a new parameter float_precision might be added to allow for more accurate conversion. Both Tom and I found this response a little bizarre, since the issues with xstrtod() seem quite buggy, but in any case I have an open PR to implement this in Pandas.
Aside from this, Erik pointed out some suggestions and concerns about the PR, which I dealt with in new commits. For example, he suggested that I use the mmap module in Python rather than dealing with platform-dependent memory mapping in C, which seems to make more sense for the sake of portability. He also pointed out that the method FileString.splitlines(), which returns a generator yielding lines from the memory-mapped file, was inefficient due to repeated calls to chr(). I ultimately rewrote it in C, and although its performance is really only important for commented-header files with header line deep into the file, I managed to get more than a 2x speedup on a 10,000-line integer file with a commented header line in the last row with the new approach.
Although it won't be a part of the main PR, I've also been working on a separate branch change-memory-layout which changes the storage of output in memory in the tokenizer. The main purpose of this branch is to reduce the memory footprint of parsing, as the peak memory usage is almost twice that of Pandas; the basic idea is that instead of storing output in char **output_cols, it's stored instead in a single string char *output and an array of pointers, char **line_ptrs, records the beginning of each line for conversion purposes. While I'm still working on memory improvements, I actually managed to get a bit of a speed boost with this approach. Pure floating-point data is now slightly quicker to read with io.ascii than with Pandas, even without multiprocessing enabled!
Since today is the absolute pencils down date, this marks the official end of the coding period and the end of my blog posts. I plan to continue responding to the review of the main PR and finish up the work in my new branch, but the real work of the summer is basically over. It's been a great experience, and I'm glad I was able to learn a lot and get involved in Astropy development!

## August 12, 2014

### Michael Mueller(Astropy)

#### Week 12

There's not too much to report for this week, as I basically worked on making some final changes and double-checking the writing code to make sure it works with the entire functionality of the legacy writer. After improving performance issues related to tokenization and string conversion, I created a final version of the IPython notebook for reading. Since IPython doesn't work well with multiprocessing, I wrote a separate script to test the performance of the fast reader in parallel and output the results in an HTML file; here are the results on my laptop. Parallel reading seems to work well for very large input files, and I guess the goal of beating Pandas (at least for huge input and ordinary data) is basically complete! Writing is still a little slower than the Pandas method to_csv, but I fixed an issue involving custom formatting; the results can be viewed here.
I also wrote up a separate section in the documentation for fast ASCII I/O, although there's still the question of how to incorporate IPython notebooks in the documentation. For now I have the notebooks hosted in a repo called ascii-profiling, but they may be moved to a new repo called astropy-notebooks. More importantly, Tom noticed that there must actually be something wrong with the fast converter (xstrtod()), since increasing the number of significant figures seems to scale the potential conversion error linearly. After looking over xstrtod() and reading more about IEEE floating-point arithmetic, I found a reasonable solution by forcing xstrtod() to stop parsing digits after the 17th digit (since doubles can only have a maximum precision of 17 digits) and by correcting an issue in the second half of xstrtod(), where the significand is scaled by a power of ten. I tested the new version of xstrtod() in the conversion notebook and found that low-precision values are now guaranteed to be within 0.5 ULP, while high-precision values are within 1.0 ULP about 90% of the time with no linear growth in error.
Once I commit the new xstrtod(), my PR should be pretty close to merging--at this point I'll probably write some more tests just to make sure everything works okay. Today is the suggested "pencils down" date of Google Summer of Code, so I guess it's time to wrap up.

## August 05, 2014

### Michael Mueller(Astropy)

#### Week 11

Since the real goal at this point is to finish up my main PR and my multiprocessing branch in order to merge, I ended up spending this week on final changes instead of Erik's dtype idea. My code's gotten some more review and my mentors and I have been investigating some details and various test cases, which should be really useful for documentation.
One nice thing I managed to discover was how well xstrtod() (the Pandas-borrowed fast float conversion function) works for various input precisions. Unlike strtod(), which is guaranteed to be within 0.5 ULP (units in the last place, or the distance between the two closest floating-point numbers) of the correct result, xstrtod() has no general bound and in fact might be off by several ULP for input with numerous significant figures. However, it works pretty well when the number of significant figures is relatively low, so users might prefer to choose use_fast_convert=True for fairly low-precision data. I wrote up an IPython notebook showing a few results, which Tom also built on in another notebook. I plan to include the results in the final documentation, as users might find it useful to know more about rounding issues with parsing and judge whether or not the fast converter is appropriate for their purposes.
On the multiprocessing branch, I added in xstrtod() and the use_fast_converter parameter, which defaults to False. After discussion with my mentors, I changed the file reading system so that the parser employs memory mapping whenever it reads from a file; the philosophy is that, aside from speed gains, reading a 1 GB file via memory mapping will save users from having to load a full gigabyte into memory. The main challenge with memory mapping (and later with other compatibility concerns) is getting the code to run correctly on Windows, which turned out to be more frustrating than I expected.
Since Windows doesn't have the POSIX function memmap(), specific Windows memory mapping code has to be wrapped in an #ifdef _WIN32 block, and the fact that Windows has no fork() call for multiprocessing means that memory is not simply copy-on-write as it is on Linux, which leads to a host of other issues. For example, I first ran into a weird issue involving pickling that ultimately turned out to be due to a six bug, which has been noted for versions < 1.7.0. I opened a PR to update the bundled version of six in AstroPy, so that issue should be fixed pretty quickly. There were some other problems, such as the fact that processes cannot be created with bound methods in Windows (which I circumvented by turning _read_chunk into a normal method and making CParser picklable via a custom __reduce__ method), but things seem to work correctly on Windows now. I actually found that there was a slowdown in switching to parallel reading, but I couldn't find a cause with cProfile; it might have been the fact that I used a very old laptop for testing, so I'll have to find some other way to see if there's actually a problem.
Tom also wrote a very informative IPython notebook detailing the performance of the new implementation compared to the old readers,genfromtxt(), and Pandas, which I'll include in the documentation for the new fast readers. It was also nice to see an interesting discussion regarding metadata parsing in issue #2810 and a new PR to remove boilerplate code, which is always good. I also made a quick fix to the HTML reading tests and opened a PR to allow for a user-specified backend parser in HTML reading, as Tom pointed out that certain files will work with one backend (e.g. html5lib) and not the default.

## July 29, 2014

### Michael Mueller(Astropy)

#### Week 10

I spent this week mostly working on the multiprocessing branch, fixing up some issues from last week and adding a bit more functionality. Most importantly, I finally switched to a more efficient scheme for reading data when given a filename as input; I'd been meaning to deal with this previously, since profiling indicated that the naïve method of simply reading file data into a single Python string took up something like 15% of processing time, so it's nice to have a more efficient method.
One idea I had thought of previously was to split file reading into each process, but my initial changes showed a speed decrease, which made sense after I came across this comment on StackOverflow explaining that it's best from a performance standpoint to access file data sequentially. I then switched over to reading separate chunks for each process in the main process at the beginning of parsing and then passing each chunk to its respective process, which seems more promising but still runs into issues with memory management. I've been trying to find a better solution today, and I think I should be able to figure it out by tomorrow.
Another issue I looked into was finding a faster algorithm for converting strings to doubles, based on Pandas' superior performance. After looking into what Pandas does, I found that a fairly simple conversion function xstrtod() replaces strtod() from the standard library; from what I was told by a Pandas developer, it seems that Pandas considers the speed gains more important than retaining the relatively slow correction loop in strtod() for high-precision values and is therefore willing to have values off by more than 0.5 ULP (units in the last place). numpy's conversion routine doesn't seem to offer a clear benefit (I tried replacing strtod() with numpy's conversion and got mixed results), so I'm not sure if any optimization is possible.
I then added a parallel parameter for reading and patched up the failing tests (due to multi-line quoted values, which Tom suggested should be an acceptable but documented problem with parallelized reading). I also fixed a unicode issue with string conversion in Python 3, which arose because the 'S' dtype in numpy corresponds to bytes in Python3 rather than str. Interestingly I found it a fairly trivial extension of the file reading code to implement memory mapping for reading on non-Windows systems, so I tentatively added it with a memory_map keyword and will test it more thoroughly to see if the performance boost is anything very significant. This does not enable a memory-mapped view of the file inside the output Table, but simply memory-maps the file to a char * pointer for faster reading and then un-maps it after processing is done.
Next week I'll be working on Erik's idea of creating custom numpy dtypes, and I'll also add documentation, make some final fixes, and otherwise get the engine ready for merging in the meantime.

## Tempita and {S,D,C,Z} BLAS Functions

Developing a fast version of the multivariate Kalman filter for Statsmodels has required dipping into Cython, for fast loops, direct access to memory, and the ability to directly call the Fortran BLAS libraries.

Once you do this, you have to start worrying about the datatype that you're working with Numpy and Scipy typically do this worrying for you, so that you can, for example, take the dot product of a single precision array with a double complex array, and no problems will result.

In [1]:
import numpy as np
x = np.array([1,2,3,4], dtype=np.float32)
y = np.array([1,2,3,4], dtype=np.complex128) + 1j
z = np.dot(x, y)
print z, z.dtype

(30+10j) complex128


Whereas if you use the scipy direct calls to the BLAS libraries with wrong or differing datatypes, in the best case scenario it will perform casts to the required datatype. Notice the warning, below, and the truncation of the complex part.

In [2]:
from scipy.linalg.blas import ddot
z = ddot(x,y)
print z

30.0

-c:2: ComplexWarning: Casting complex values to real discards the imaginary part


The sciply.linalg.blas functions do some checking to prevent the BLAS library from getting an argument with the wrong datatype, but in the worst case if something slips through, it could crash Python with a segmentation fault.

### Types and the Kalman filter

This matters for the Cython-based Kalman filter for two reasons. The first is that we likely want to behave as nicely as numpy dot and allow the filter to run on any datatype. The second is that numerical derivatives in Statsmodels are computed via complex step differentiation, which requires the function to be able to deal with at least the double complex case.

This means that all of the Cython functions need to be duplicated four times.

Fortunately, all of the underlying BLAS functions are structured with a prefix at the beginning indicating the datatype, followed by the call. For example, dgemm performs matrix multiplication on double precision arrays, whereas zgemm performs matrix multiplication on double precision complex arrays.

The need for relatively simple duplication means that this is a great place for templating, and it turns out that Cython has great, simple templating engine built in: Tempita.

As an example, take a look at the below code. This generates four functions: sselect_state_cov, dselect_state_cov, cselect_state_cov, and zselect_state_cov which handle part of the Kalman filtering operations for the four different datatypes.

TYPES = {
"s": ("np.float32_t", "np.float32", "np.NPY_FLOAT32"),
"d": ("np.float64_t", "float", "np.NPY_FLOAT64"),
"c": ("np.complex64_t", "np.complex64", "np.NPY_COMPLEX64"),
"z": ("np.complex128_t", "complex", "np.NPY_COMPLEX128"),
}

{{for prefix, types in TYPES.items()}}

# ### Selected state covariance matrice
cdef int {{prefix}}select_state_cov(int k_states, int k_posdef,
{{cython_type}} * tmp,
{{cython_type}} * selection,
{{cython_type}} * state_cov,
{{cython_type}} * selected_state_cov):
cdef:
{{cython_type}} alpha = 1.0
{{cython_type}} beta = 0.0

# #### Calculate selected state covariance matrix
# $Q_t^* = R_t Q_t R_t'$
#
# Combine the selection matrix and the state covariance matrix to get
# the simplified (but possibly singular) "selected" state covariance
# matrix (see e.g. Durbin and Koopman p. 43)

# tmp0 array used here, dimension $(m \times r)$

# $\\#_0 = 1.0 * R_t Q_t$
# $(m \times r) = (m \times r) (r \times r)$
{{prefix}}gemm("N", "N", &k_states, &k_posdef, &k_posdef,
&alpha, selection, &k_states,
state_cov, &k_posdef,
&beta, tmp, &k_states)
# $Q_t^* = 1.0 * \\#_0 R_t'$
# $(m \times m) = (m \times r) (m \times r)'$
{{prefix}}gemm("N", "T", &k_states, &k_states, &k_posdef,
&alpha, tmp, &k_states,
selection, &k_states,
&beta, selected_state_cov, &k_states)

{{endfor}}


Of course this merely provides the capability to support multiple datatypes, and wrapping that in a user-friendly way is part of another part of the project.

## June 23, 2014

### The initialization problem

The Kalman filter is a recursion for optimally making inferences about an unknown state variable given a related observed variable. In particular, if the state variable at time $t$ is represented by $\alpha_t$, then the (linear, Gaussian) Kalman filter takes as input the mean and variance of that state conditional on observations up to time $t-1$ and provides as output the filtered mean and variance of the state at time $t$ and the predicted mean and variance of the state at time $t$.

More concretely, we denote (see Durbin and Koopman (2012) for all notation)

\begin{align} \alpha_t | Y_{t-1} & \sim N(a_t, P_t) \\ \alpha_t | Y_{t} & \sim N(a_{t|t}, P_{t|t}) \\ \alpha_{t+1} | Y_{t} & \sim N(a_{t+1}, P_{t+1}) \\ \end{align}

Then the inputs to the Kalman filter recursion are $a_t$ and $P_t$ and the outputs are $a_{t|t}, P_{t|t}$ (called filtered values) and $a_{t+1}, P_{t+1}$ (called predicted values).

This process is done for $t = 1, \dots, n$. While the predicted values as outputs of the recursion are available as inputs to subsequent iterations, an important question is initialization: what values should be used as inputs to start the very first recursion.

Specifically, when running the recursion for $t = 1$, we need as inputs $a_1, P_1$. These values define, respectively, the expectation and variance / covariance matrix for the initial state $\alpha_1 | Y_0$. Here, though, $Y_0$ denotes the observation of no data, so in fact we are looking for the unconditional expectation and variance / covariance matrix of $\alpha_1$. The question is how to find these.

In general this is a rather difficult problem (for example for non-stationary proceses) but for stationary processes, an analytic solution can be found.

### Stationary processes

A (covariance) stationary process is, very roughly speaking, one for which the mean and covariances are not time-dependent. What this means is that we can solve for the unconditional expectation and variance explicity (this section results from Hamilton (1994), Chapter 13)

The state equation for a state-space process (to which the Kalman filter is applied) is

$$\alpha_{t+1} = T \alpha_t + \eta_t$$

Below I set up the elements of a typical state equation like that which would be found in the ARMA case, where the transition matrix $T$ is a sort-of companion matrix. I'm setting it up in such a way that I'll be able to adjust the dimension of the state, so we can see how some of the below methods scale.

In [3]:
import numpy as np
from scipy import linalg

def state(m=10):
T = np.zeros((m, m), dtype=complex)
T[0,0] = 0.6 + 1j
idx = np.diag_indices(m-1)
T[(idx[0]+1, idx[1])] = 1

Q = np.eye(m)

return T, Q


#### Unconditional mean

Taking unconditional expectations of both sides yields:

$$E[\alpha_{t+1}] = T E[ \alpha_t] + E[\eta_t]$$

or $(I - T) E[\alpha_t] = 0$ and given stationarity this means that the unique solution is $E[\alpha_t] = 0$ for all $t$. Thus in initializing the Kalman filter, we set $a_t = E[\alpha_t] = 0$.

#### Unconditional variance / covariance matrix

Slightly more tricky is the variance / covariance matrix. To find it (as in Hamilton) post-multiply by the transpose of the state and take expectations:

$$E[\alpha_{t+1} \alpha_{t+1}'] = E[(T \alpha_t + \eta_t)(\alpha_t' T' + \eta_t')]$$

This yields an equation of the form (denoting by $\Sigma$ and $Q$ the variance / covariance matrices of the state and disturbance):

$$\Sigma = T \Sigma T' + Q$$

Hamilton then shows that this equation can be solved as:

$$vec(\Sigma) = [I - (T \otimes T)]^{-1} vec(Q)$$

where $\otimes$ refers to the Kronecker product. There are two things that jump out about this equation:

1. It can be easily solved. In Python, it would look something like:
m = T.shape[0]
Sigma = np.linalg.inv(np.eye(m**2) - np.kron(T, T)).dot(Q.reshape(Q.size, 1)).reshape(n,n)

2. It will scale very poorly (in terms of computational time) with the dimension of the state-space ($m$). In particular, you have to take the inverse of an $m^2 \times m^2$ matrix.

Below I take a look at the timing for solving it this way using the code above (direct_inverse) and using built-in scipy direct method (which uses a linear solver rather than taking the inverse, so it is a bit faster)s

In []:
def direct_inverse(A, Q):
n = A.shape[0]
return np.linalg.inv(np.eye(n**2) - np.kron(A,A.conj())).dot(Q.reshape(Q.size, 1)).reshape(n,n)

def direct_solver(A, Q):
return linalg.solve_discrete_lyapunov(A, Q)

# Example
from numpy.testing import assert_allclose
np.set_printoptions(precision=10)
sol1 = direct_inverse(T, Q)
sol2 = direct_solver(T, Q)

assert_allclose(sol1,sol2)

In [6]:
# Timings for m=1
T, Q = state(1)
%timeit direct_inverse(T, Q)
%timeit direct_solver(T, Q)

10000 loops, best of 3: 58.8 µs per loop
10000 loops, best of 3: 73.4 µs per loop


In [7]:
# Timings for m=5
T, Q = state(5)
%timeit direct_inverse(T, Q)
%timeit direct_solver(T, Q)

10000 loops, best of 3: 152 µs per loop
10000 loops, best of 3: 128 µs per loop


In [8]:
# Timings for m=10
T, Q = state(10)
%timeit direct_inverse(T, Q)
%timeit direct_solver(T, Q)

100 loops, best of 3: 1.71 ms per loop
1000 loops, best of 3: 1.03 ms per loop


In [11]:
# Timings for m=50
T, Q = state(50)
%timeit direct_inverse(T, Q)
%timeit direct_solver(T, Q)

1 loops, best of 3: 12.6 s per loop
1 loops, best of 3: 3.5 s per loop



### Lyapunov equations

As you can notice by looking at the name of the scipy function, the equation describing the unconditional variance / covariance matrix, $\Sigma = T \Sigma T' + Q$ is an example of a discrete Lyapunov equation.

One place to turn to improve performance on matrix-related operations is to the underlying Fortran linear algebra libraries: BLAS and LAPACK; if there exists a special-case solver for discrete time Lyapunov equations, we can call that function and be done.

Unfortunately, no such function exists, but what does exist is a special-case solver for Sylvester equations (*trsyl), which are equations of the form $AX + XB = C$. Furthermore, the continuous Lyapunov equation, $AX + AX^H + Q = 0$ is a special case of a Sylvester equation. Thus if we can transform the discrete to a continuous Lyapunov equation, we can then solve it quickly as a Sylvester equation.

The current implementation of the scipy discrete Lyapunov solver does not do that, although their continuous solver solve_lyapunov does call solve_sylvester which calls *trsyl. So, we need to find a transformation from discrete to continuous and directly call solve_lyapunov which will do the heavy lifting for us.

It turns out that there are several transformations that will do it. See Gajic, Z., and M.T.J. Qureshi. 2008. for details. Below I present two bilinear transformations, and show their timings.

In [34]:
def bilinear1(A, Q):
A = A.conj().T
n = A.shape[0]
eye = np.eye(n)
B = np.linalg.inv(A - eye).dot(A + eye)
res = linalg.solve_lyapunov(B.conj().T, -Q)
return 0.5*(B - eye).conj().T.dot(res).dot(B - eye)

def bilinear2(A, Q):
A = A.conj().T
n = A.shape[0]
eye = np.eye(n)
AI_inv = np.linalg.inv(A + eye)
B = (A - eye).dot(AI_inv)
C = 2*np.linalg.inv(A.conj().T + eye).dot(Q).dot(AI_inv)
return linalg.solve_lyapunov(B.conj().T, -C)

# Example:
T, Q = state(3)
sol3 = bilinear1(T, Q)
sol4 = bilinear2(T, Q)

assert_allclose(sol1,sol3)
assert_allclose(sol3,sol4)

In [36]:
# Timings for m=1
T, Q = state(1)
%timeit bilinear1(T, Q)
%timeit bilinear2(T, Q)

1000 loops, best of 3: 202 µs per loop
1000 loops, best of 3: 219 µs per loop


In [37]:
# Timings for m=5
T, Q = state(5)
%timeit bilinear1(T, Q)
%timeit bilinear2(T, Q)

1000 loops, best of 3: 257 µs per loop
1000 loops, best of 3: 316 µs per loop


In [38]:
# Timings for m=10
T, Q = state(10)
%timeit bilinear1(T, Q)
%timeit bilinear2(T, Q)

1000 loops, best of 3: 268 µs per loop
1000 loops, best of 3: 298 µs per loop


In [39]:
# Timings for m=50
T, Q = state(50)
%timeit bilinear1(T, Q)
%timeit bilinear2(T, Q)

100 loops, best of 3: 2.31 ms per loop
100 loops, best of 3: 2.66 ms per loop



Notice that this method does so well we can even try $m=500$.

In [40]:
# Timings for m=500
T, Q = state(500)
%timeit bilinear1(T, Q)
%timeit bilinear2(T, Q)

1 loops, best of 3: 1.51 s per loop
1 loops, best of 3: 1.67 s per loop



### Final thoughts

The first thing to notice is how much better the bilinear transformations do as $m$ grows large. They are able to take advantage of the special formulation of the problem so as to avoid many calculations that a generic inverse (or linear solver) would have to do. Second, though, for small $m$, the original analytic solutions are actually better.

I have submitted a pull request to Scipy to augment the solve_discrete_lyapunov for large $m$ ($m >= 10$) using the second bilinear transformation to solve it as a Sylvester equation.

Finally, below is a graph of the timings.

In [27]:



## June 02, 2014

### Getting Started with Python Packaging

The first two weeks of the state-space project have been dedicated to introducing the Kalman filter - which was written in Cython with calls to the BLAS and LAPACK libraries linked to in Scipy - into the Statsmodels build process. A future post may describe why it was not just written in pure Python (in brief, it is because the Kalman filter is a recursive algorithm with a loop over the number of entries in a dataset, where each loop involves many matrix operations on relatively small matrices). For now, though, the source kalman_filter.pyx needs to be "Cythonized" into kalman_filter.c and then compiled into (e.g.) kalman_filter.so, either when the package is installed using pip, or from source (e.g. python setup.py install).

The first thing to figure out was the state of Python's packaging. I've had a vague sense of some of the various tools of Python packaging for a while (especially since it used to be recommended to specify --distribute which making a new virtualenv), but I built all my Cython packages either via a python setup.py build_ext --inplace (from the Cython quickstart) or via IPython magic.

The recommended setup.py file from Cython quickstart is:

from distutils.core import setup
from Cython.Build import cythonize

setup(
name = 'Hello world app',
ext_modules = cythonize("hello.pyx"),
)


as you can see, this uses the distutils package. However, while distutils is part of base Python and is standard for packaging, it, from what I could tell, distribute was the up-and-coming way to proceed. Would that it were that simple; it turns out that Python packaging is not for the faint of heart. A wonderful stackoverflow answer describes the state-of-the-art (hopefully) as of October 2013. It comes to the conclusion that setuptools is probably the way to go, unless you only need basic packaging, in which case you should use distutils.

### Setuptools

So it appeared that the way to go was to use setuptools (and more than personal preference, Statsmodels uses setuptools). Unfortunately, I have always previously used the above snippet which is distutils based, and as it turns out, the magic that makes that bit of code possible is not available in setuptools. You can read this mailing list conversation from September 2013 for a fascinating back-and-forth about what should be supported where, leading to the surprising conclusion that to make Setuptools automatically call Cython to build *.pyx files, one should trick it into believing there was a fake Pyrex installation.

This approach can be seen at the repository for the existing Kalman filter code, or at https://github.com/njsmith/scikits-sparse (in both cases, look for the "fake_pyrex" directory in the project root).

It's often a good idea, though, to look at NumPy and SciPy for how it should be done, and it turns out that neither of them use a fake Pyrex directory, and neither do rely on setuptools (or distutils) to Cythonize the *.pyx files. Instead, they use a direct subprocess call to cythonize directly. Why do this, though?

### NumPy and SciPy

Although at first it seemed like an awfully Byzantine and arbitrary mish-mash of roll-your-owns, where no two parties do things the same way, it turns out that the NumPy / SciPy approach agrees, in spirit, with the latest Cython documentation on compilation. The idea is that Cython should not be a required dependency for installation, and thus the already Cythonized *.c files should be included in the distributed package. These will be cythonized during the python setup.py sdist process.

So the end result is that setuptools should not be required to cythonize *.pyx files, it only needs to compile and link *.c files (which it has no problem with - no fake pyrex directory needed). Then the question is, how to cythonize the files? It turns out that the common way, as mentioned above, is to use a subprocess call to the cythonize binary directly (see Statsmodels, NumPy, SciPy).

## May 31, 2014

### Luca Puggini(statsmodels)

#### Faster python implementation of Sparse PCA

If we are interested only in a small number of sparse principal components the sklearn implementation of sparse PCA can be slow.  Here  an alternative implementation of sparse PCA based on the paper Sparse principal component analysis via regularized matrix low rank approximation .

The file is still a not well documented beta version but the syntax is similar to the one used in sklearn and so it should be easy to use.

here a small benchmark study :

import time

print("start")
for i in range(10):

X = np.random.random((100, 30))
X = sk.preprocessing.scale(X)   # scale the data
alpha = 1 # set the penalty
nPCs = 4 # set the number of components

start_time = time.time()
a = IterativeSPCA(Npc=nPCs, alpha=alpha)
sP, sT = a.fit(X[:])
time1 = time.time()
spca = SparsePCA(n_components=nPCs, alpha=alpha, ridge_alpha=0)
spca.fit(X[:])
time2 = time.time()
print("Time for new SPCA algorithm =", time1-start_time, "time for sklearn SparsePCA=", time2-time1)

Time for new SPCA algorithm = 0.09139561653137207 time for sklearn SparsePCA= 4.52611780166626
Time for new SPCA algorithm = 0.0671834945678711 time for sklearn SparsePCA= 0.6769702434539795
Time for new SPCA algorithm = 0.06313467025756836 time for sklearn SparsePCA= 1.1517982482910156
Time for new SPCA algorithm = 0.0669105052947998 time for sklearn SparsePCA= 2.237234592437744
Time for new SPCA algorithm = 0.12893128395080566 time for sklearn SparsePCA= 2.349125623703003
Time for new SPCA algorithm = 0.07786917686462402 time for sklearn SparsePCA= 1.397700548171997
Time for new SPCA algorithm = 0.05494523048400879 time for sklearn SparsePCA= 0.7182488441467285
Time for new SPCA algorithm = 0.052568674087524414 time for sklearn SparsePCA= 1.5870840549468994
Time for new SPCA algorithm = 0.15899276733398438 time for sklearn SparsePCA= 0.9816904067993164
Time for new SPCA algorithm = 0.08110642433166504 time for sklearn SparsePCA= 1.7430102825164795
finish!


### Terri Oda (PSF Org admin)

When I used to do research on spam, I wound up spending a lot of time listening to people's little pet theories. One that came up plenty was "oh, I just never post my email address on the internet" which is fine enough as a strategy depending on what you do, but is rather infeasible for academics who want to publish, as custom says we've got to put our email addresses on the paper. This leads to a lot of really awesome contacts with other researchers around the world, but sometimes it leads to stuff like the email I got today:

Dear Terri,

As stated by the Carleton University's electronic repository, you authored the work entitled "Simple Security Policy for the Web" in the framework of your postgraduate degree.

We are currently planning publications in this subject field, and we would be glad to know whether you would be interested in publishing the above mentioned work with us.

LAP LAMBERT Academic Publishing is a member of an international publishing group, which has almost 10 years of experience in the publication of high-quality research works from well-known institutions across the globe.

Besides producing printed scientific books, we also market them actively through more than 80,000 booksellers.

Kindly confirm your interest in receiving more detailed information in this respect.

I am looking forward to hearing from you.

Best regards,
Sarah Lynch
Acquisition Editor

GmbH & Co. KG

Heinrich-Böcking-Str. 6-8, 66121, Saarbrücken, Germany
s.lynch(at)lap-publishing.com / www. lap-publishing .com

Handelsregister Amtsgericht Saarbrücken HRA 10356
Identification Number (Verkehrsnummer): 13955
Partner with unlimited liability: VDM Management GmbH
Handelsregister Amtsgericht Saarbrücken HRB 18918
Managing director: Thorsten Ohm (CEO)

Well, I guess it's better than the many mispelled emails I get offering to let me buy a degree (I am *so* not the target audience for that, thanks), and at least it's not incredibly crappy conference spam. In fact, I'd never heard of this before, so I did a bit of searching.

Let's just post a few of the summaries from that search:

From wikipedia:
The Australian Higher Education Research Data Collection (HERDC) explicitly excludes the books by VDM Verlag and Lambert Academic Publishing from ...

From the well-titled Lambert Academic Publishing (or How Not to Publish Your Thesis):
Lambert Academic Publishing (LAP) is an imprint of Verlag Dr Muller (VDM), a publisher infamous for selling cobbled-together "books" made ...

And most amusingly, the reason I've included the phrase "academic spam" in the title:
I was contacted today by a representative of Lambert Academic Publishing requesting that I change the title of my blog post "Academic Spam", ...

So yeah, no. My thesis is already published, thanks, and Simple Security Policy for the Web is freely available on the web for probably obvious reasons. I never did convert the darned thing to html, though, which is mildly unfortunate in context!

#### PlanetPlanet vs iPython Notebook [RESOLVED: see below]

Short version:

I'd like some help figuring out why RSS feeds that include iPython notebook contents (or more specifically, the CSS from iPython notebooks) are showing up as really messed up in the PythonPython blog aggregator. See the Python summer of code aggregator and search for a MNE-Python post to see an example of what's going wrong.

Bigger context:

One of the things we ask of Python's Google Summer of Code students is regular blog posts. This is a way of encouraging them to be public about their discoveries and share their process and thoughts with the wider Python community. It's also very helpful to me as an org admin, since it makes it easier for me to share and promote the students' work. It also helps me keep track of everyone's projects without burning myself out trying to keep up with a huge number of mailing lists for each "sub-org" under the Python umbrella. Python sponsors not only students to work on the language itself, but also for projects that make heavy use of Python. In 2014, we have around 20 sub-orgs, so that's a lot of mailing lists!

One of the tools I use is PythonPython, software often used for making free software "planets" or blog aggregators. It's easy to use and run, and while it's old, it doesn't require me to install and run an entire larger framework which I would then have to keep up to date. It's basically making a static page using a shell script run by a cron job. From a security perspective, all I have to worry about is that my students will post something terrible that then gets aggregated, but I'd have to worry about that no matter what blogroll software I used.

But for some reason, this year we've had some problems with some feeds, and it *looks* like the problem is specifically that PlanetPlanet can't handle iPython notebook formatted stuff in a blog post. This is pretty awkward, as iPython notebook is an awesome tool that I think we should be encouraging students to use for experimenting in Python, and it really irks me that it's not working. It looks like Chrome and Firefox parse the feed reasonably, which makes me think that somehow PlanetPlanet is the thing that's losing a <style> tag somewhere. The blogs in question seem to be on blogger, so it's also possible that it's google that's munging the stylesheet in a way that planetplanet doesn't parse.

I don't suppose this bug sounds familiar to anyone? I did some quick googling, but unfortunately the terms are all sufficiently popular when used together that I didn't find any reference to this bug. I was hoping for a quick fix from someone else, but I don't mind hacking PlanetPlanet myself if that's what it takes.

Anyone got a suggestion of where to start on a fix?

Edit: Just because I saw someone linking this on twitter, I'll update in the main post: tried Mary's suggestion of Planet Venus (see comments below) out on Monday and it seems to have done the trick, so hurrah!

## May 29, 2014

### Abhijeet Kislay(pgmpy)

#### Benchmarking OpenCV’s integration with Shogun

I have made some codes for converting Mat objects from OpenCV library to CDenseFeatures object in Shogun Library. There are basically three methods of conversion. Using the for loops (i.e the manual way) Using the Memcpy command. Using constructors. I have enlisted here the benchmarking output for the conversion of Random matrices from Mat to […]

## May 24, 2014

#### Unobserved components models

The first model considered in the state space models GSoC 2015 project is the class of univariate unobserved components models. This blog post lays out the general structure and the different variations that will be allowed.

The basic unobserved components (or structural time series) model can be written (see Durbin and Koopman 2012, Chapter 3 for notation and additional details):

$$y_t = \underbrace{\mu_{t}}_{\text{trend}} + \underbrace{\gamma_{t}}_{\text{seasonal}} + \underbrace{c_{t}}_{\text{cycle}} + \underbrace{\varepsilon_t}_{\text{irregular}}$$

where different specifications for the different individual components can support a range of models.

## May 18, 2014

#### GSoC 2014 - State Space Models

This summer I will again be participating in Google Summer of Code for the Statsmodels Python project.

Last summer I worked on Threshold Autoregressions, breakpoint tests, and Markov Switching Models. These all broadly fell into the category of non-linear time series.

This summer I have a much tighter focus, that of integrating a high-performance multivariate Kalman filter into the time series package. The Kalman filter is an important tool used in conjunction with state space models; it filters the data to provide optimal estimates of the underlying (unobserved) state. It has many uses in engineering, optimal control, signal processing, econometrics, and others.

Statsmodels currently uses a univariate Kalman filter to estimate autoregressive / moving average (ARMA) models in its time series analysis package. It is written in Cython, and was recently updated to include direct calls to the underlying Fortran BLAS/LAPACK libraries for improved speed. My project will continue in this direction, but will give a full multivariate Kalman filter which will allow estimation of arbitrary State Space models.

Much of the underlying code for the filter itself has already been written and can be found at https://github.com/ChadFulton/pykalman_filter, and so the GSOC project is primarily in integration with Statsmodels, adding unit tests, and in developing a framework for the easy definition of state space models in Python, which can then be estimated with the filter. Time permitting, a variety of smoothers will be added as well.

Finally, specific a state space model will be developed for the Vector ARMA (VARMA) models.

## May 04, 2014

### Andres Vargas Gonzalez(Kivy)

#### Pointing godaddy domain to an aws ec2

You just bought a domain and don’t know how to point it to your server, just follow these easy steps to make it possible:

First we need to set up AWS to provide an IP address for your DNS settings.

1. On EC2 Management console you will have a vertical menu on the left hand side.
2. Under “NETWORK & SECURITY” group click on “Elastic IPs”.
3. On the top menu you will see a blue button “Allocate New Address” click on it.
4. Just be sure “EIP used in” is set to “EC2″ then click “Yes, Allocate”.
5. A new IP address will be created on the table, select it by clicking on the empty square by the left of the name.
6. Now click on “Associate Address” on the pop-up click on instance and select the instance you would like to associate to this IP.
7. Finally click “Associate” and that’s it. For now to access via SSH, FTP, etc. you will need to use the new elastic IP.

On the godaddy side we will set up the points to address with the new elastic ip.

2. Under the upper menu click “Domains” and then click “Manage my Domains”.
3. Select the domain you would like to change by clicking the link to the domain on the table under “Domain Name” column.
4. In Domain Details there are three tabs, you should click on “DNS Zone File”.
5. Under A(Host) , click on “Edit Record” at the end in “Actions” column.
6. Now change the value on the field “Points to” with the elastic ip of your amazon ec2 instance.

## April 26, 2014

### Terri Oda (PSF Org admin)

#### Mailman 3.0 Suite Beta!

I'm happy to say that...

Mailman 3.0 suite is now in beta!

As many of you know, Mailman's been my open source project of choice for a good many years. It's the most popular open source mailing list manager with millions of users worldwide, and it's been quietly undergoing a complete re-write and re-working for version 3.0 over the past few years. I'm super excited to have it at the point where more people can really start trying it out. We've divided it into several pieces: the core, which sends the mails, the web interface that handles web-based subscriptions and settings, and the new web archiver, plus there's a set of scripts to bundle them all together. (Announcement post with all the links.)

While I've done more work on the web interface and a little on the core, I'm most excited for the world to see the archiver, which is a really huge and beautiful change from the older pipermail. The new archiver is called Hyperkitty, and it's a huge change for Mailman.

You can take a look at hyperkitty live on the fedora mailing list archives if you're curious! I'll bet it'll make you want your other open source lists to convert to Mailman 3 sooner rather than later. Plus, on top of being already cool, it's much easier to work with and extend than the old pipermail, so if you've always wanted to view your lists in some new and cool way, you can dust off your django skills and join the team!

Do remember that the suite is in beta, so there's still some bugs to fix and probably a few features to add, but we do know that people are running Mailman 3 live on some lists, so it's reasonably safe to use if you want to try it out on some smaller lists. In theory, it can co-exist with Mailman 2, but I admit I haven't tried that out yet. I will be trying it, though: I'm hoping to switch some of my own lists over soon, but probably not for a couple of weeks due to other life commitments.

So yeah, that's what I did at the PyCon sprints this year. Pretty cool, eh?

## March 29, 2014

### Terri Oda (PSF Org admin)

Sparkfun has a bunch of Arduinos on crazy sale today, and they're allowing backorders. It's a one day sale, ending just before midnight US mountain time, so you've still got time to buy your own! Those $3 minis are amazing. I wound up buying the maximum amount I could, since I figure if I don't use them myself, they'll make nice presents. I have plans for two of the mini ones already, as part of one of my rainy day projects that's only a little past drawing board and into "let's practice arduino coding and reading sensor data" stage. But the rest are waiting for new plans! I feel a teensy bit guilty about buying so many arduinos when I haven't even found a good use for the Raspberry Pi I got at PyCon last year. I did buy it a pretty rainbow case and a cable, but my original plan to use it as the brains for a homemade cnc machine got scuttled when John went and bought a nice handybot cnc router. A pretty picture of the pibow rainbow raspberry pi case from this most excellent post about it. They're on sale today too if you order through pimoroni I've got a few arty projects with light that might be fun, but I kind of wanted to do something a bit more useful with it. Besides, I've got some arty blinky-light etextile projects that are going to happen first and by the time I'm done those I think I'll want something different. And then there's the Galileo, which obviously is a big deal at work right now. One of the unexpected perks of my job is the maker community -- I've been hearing all about the cool things people have tried with their dev boards and seeing cool projects, and for a while we even had a biweekly meet-up going to chat with some of the local Hillsboro makers. I joined too late to get a chance at a board from the internal program, but I'll likely be picking one up up on my own dime once I've figured out how I'm going to use it! (John already has one and the case he made for it came off the 3d printer this morning and I'm jealous!) So... I'm looking for inspiration: what's the neatest arduino/raspberry pi/galileo/etc. project you've seen lately? comments ## March 02, 2014 ### Terri Oda (PSF Org admin) #### Google Summer of Code: What do I do next? Python's in as a mentoring organization again this year, and I'm running the show again this year. Exciting and exhausting! In an attempt to cut down on the student questions that go directly to me, I made a flow chart of "what to do next" : (there's also a more accessible version posted at the bottom of our ideas page) I am amused to tell you all that it's already cut down significantly on the amount of "what do I do next?" emails I've gotten as an org admin compared to this time last year. I'm not sure if it's because it's more eye-catching or better placed or what makes it more effective, since those instructions could be found in the section for students before. We'll see its magical powers hold once the student application period opens, though! comments ## June 12, 2013 ### Andres Vargas Gonzalez(Kivy) #### Install GTSVM in Ubuntu using CUDA 5.0 In order to install this library for fast svm calculation you must download the src from: http://ttic.uchicago.edu/~cotter/projects/gtsvm/. Once downloaded you should type: tar -xvzf gtsvm_src.tgz cd gtsvm_src make  After this you will probably have this error: headers.hpp cuda_runtime.h no such file or directory Solution: Open the Makefile and add the path to your cuda_runtime.h file. On line 29: DEFINE_FLAGS := -I/usr/local/cuda-5.0/include/ That is the path in my installation. You should also comment line 24 –> “mex” subfolder. After this you should be able to compile. If you want an even faster svm regardless the accuracy. Add -use_fast_math at the beginning of line 36 NVCC_FLAGS := -use_fast_math. Then a problem could came up: /usr/bin/ld -lcudart cannot find Solution: Find the path to libcudart.so you can use locate libcudart.so. Then add this path to the line 71 and 88 in LINKER_FLAGS := -L/usr/local/cuda-5.0/lib64 After this just type make and you should be able to compile the source code. ## May 31, 2013 ### Andres Vargas Gonzalez(Kivy) #### Install Pyopencl and CUDA 5.0 on Ubuntu 13.04 64 bits using nvidia optimus with Bumblebee and Primus The main motivation for this post is how difficult was for me run pyopencl on my fresh ubuntu 13.04 installation. First of all nvidia drivers don’t work well on ubuntu; I am still unable to run nvidia-settings in order to change xorg.conf to run ubuntu-desktop with nvidia card. Let’s start sharing what I did to achieve running pyopencl programs on ubuntu. Since my graphics card is an optimus enabled, I followed this wonderful post in which this guy explains how to use your discrete nvidia card to run steam for linux. He states that you should NOT install nvidia-drivers directly so you should have a clean installation. Basicly to make optimus work Bumblebee should be installed in our system. These are the summarized steps from cjenkins blog: ### Bumblebee and Primus installation with nVidia propietary driver sudo add-apt-repository ppa:bumblebee/stable sudo add-apt-repository ppa:ubuntu-x-swat/x-updates sudo apt-get update sudo apt-get install bumblebee bumblebee-nvidia sudo shutdown -r now  To run programs with the nvidia card you require to type in a terminal optirun followed by the name of the program you would like to run, such as: optirun glxspheres optirun glxgears  You can check how increase the performance by just running them without optirun with your low power graphic card. If you want to get even better performance install Primus: sudo add-apt-repository ppa:zhurikhin/primus sudo apt-get update sudo apt-get install primus  ### Test Primus vblank_mode=0 optirun -b primus glxspheres  ### (Optional and recommended: use latest nvidia-drivers) Just in case this does not work for you or you want to run bumblebee with the latest nvidia drivers, you can try this post. Summarizing they installed the latest nvidia-drivers in that time (nvidia-experimental-310) and then he changed configuration files for bumblebee and for primus as well: I have to say that I followed the steps in the same way but I hope someone else tries with the latest nvidia-drivers. sudo apt-get install nvidia-310-updates nvidia-experimental-310 nvidia-settings-310-updates  Modify bumblebee configuration file: sudo vim /etc/bumblebee/bumblebee.conf  – on line 22, make sure “Driver=” is set to “nvidia”, like this: Driver=nvidia – change the “KernelDriver=” (on line 55) to “nvidia-experimental-310″, like this: KernelDriver=nvidia-experimental-310 – change “LibraryPath=” (on line 58) to “/usr/lib/nvidia-experimental-310:/usr/lib32/nvidia-experimental-310″, so it looks like this: LibraryPath=/usr/lib/nvidia-experimental-310:/usr/lib32/nvidia-experimental-310 – change the “XorgModulePath=” (line 61) to “XorgModulePath=/usr/lib/nvidia-experimental-310/xorg,/usr/lib/xorg/modules” so it looks like this: XorgModulePath=/usr/lib/nvidia-experimental-310/xorg,/usr/lib/xorg/modules Restart Bumblebee sudo service bumblebeed restart  Logout and Login and try optirun glxspheres  If you are using primus modify script /usr/bin/primusrun in line 16 changing nvidia-current with the nvidia-driver you installed. Same in line 27. After this you should be able to run bumblebee and primus together and get the best from your graphic card. ### Installing CUDA toolkit Now that our nvidia drivers are working next step will be install CUDA toolkit to work with it. I will summarize this excellent post in the following steps: Downloads => => CUDA pack from developer.nvidia.com => pyopencl from https://pypi.python.org/pypi/pyopencl I was just interested in pyopencl so I will focus in that part. Then in a terminal: sudo vim /etc/environment ==> and add to PATH line the following ‘:/usr/local/cuda-5.0/bin’ sudo vim /etc/ld.so.conf  ==> and add lines: /usr/local/cuda-5.0/lib and /lib sudo vim /etc/bash.bashrc  ==> add lines to the end export PATH=/usr/local/cuda-5.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-5.0/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/lib/:$LD_LIBRARY_PATH

Then run the following commands:

sudo ldconfig
sudo apt-get install freeglut3-dev python-opengl python-pytools python-setuptools python-numpy libboost1.48-all-dev
sudo apt-get install build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
sudo ln -s /usr/lib/x86_64-linux-gnu/libglut.so /usr/lib/libglut.so
sudo sh cuda_5.0.35_linux_64_ubuntu11.10-1.run


Once the installer is running there are some issues I had the first one don’t install the nvidia-drivers from the pack just type no at the moment of installation.
Type y for CUDA toolkit and SAMPLES (optional).

The reason why you should not install the nvidia-drivers from the pack, you will probably get this error:

‘/lib/modules/3.8.0-22-generic/build/include/linux/version.h’ does
not exist.  The most likely reason for this is that the kernel
source files in ‘/lib/modules/3.8.0-22-generic/build’ have not been
configured.

Next Possible error with the toolkit is the following one:

Unsupported compiler: 4.7.3

SOLUTION:

sudo apt-get install gcc gcc-4.4
sudo update-alternatives --remove-all gcc
sudo update-alternatives --config gcc
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.4 20
gcc --version


Error:
Missing required library libglut.so

SOLUTION:

sudo ln -s /usr/lib/x86_64-linux-gnu/libglut.so.3 /usr/lib/libglut.so


### Installing PyOpenCL

Once you have CUDA installed you should proceed to install pyopencl:

python configure.py   --boost-inc-dir=/usr/include/boost   --boost-lib-dir=/usr/lib   --no-use-shipped-boost   --cl-inc-dir=/usr/local/cuda-5.0/include/   --cl-lib-dir=/usr/lib/nvidia-310-updates/  --cl-libname=OpenCL
make
sudo make install


After I got to install pyopencl successfully a last error showed up:

SOLUTION:

If you just have this one

ln -s libnvidia-opencl.so.310.44 libnvidia-opencl.so.1
echo libnvidia-opencl.so.1 | sudo tee /etc/OpenCL/vendors/nvidia.icd
cd examples
optirun python benchmark.py


After this you should be able to run pyopencl scripts with optirun

## October 25, 2012

### Lucas van Dijk(VisPy)

#### Our 8x8x8 RGB LED Cube

At Delft University of Technology, we're busy with a huge RGB LED cube. The complete cube will consists of multiple 8x8x8 RGB LED cubes, each controlled by a raspberry pi. Here are the first videos of a working single 8x8x8 cube.

## October 16, 2012

### Lucas van Dijk(VisPy)

#### Controlling a servo with an AVR

Hi there, and welcome to my first AVR tutorial on this site. We'll be doing something basic today, namely controlling a servo. There are a lot of tutorials on how to control it with an Arduino, but less tutorials using only a bare AVR chip. In this tutorial we'll be using the ATTiny44, a small and cheap microprocessor, which also contains a 16 bit timer, which will make our life a bit easier.

Servos are often used to move robot arms and things alike, because they can rotate a specific amount of degrees very precisely, depending on the pulsewidth you feed it with the microcontroller. They can also be used as a motor (you'll need special 'continuous rotation' servos for that), you'll often find them in RC cars.

So lets get started, and see how you actually control a servo!

## June 14, 2012

### Lucas van Dijk(VisPy)

#### Multithreading with C++11: Protecting data

It took a little bit longer than expected, but we're back! Welcome to this second part in a series of articles about multithreading with C++11. In the previous part, I briefly explained what a thread is, and how to create one with the new C++ thread library. This time, we will be writing a lot more code, so open up your favourite IDE if you want to try the examples while you're reading. ;)

In the previous article we also saw that sometimes, the output wasn't completely right when running multiple threads simultaneously. Today, we'll see that there are some other problems with sharing a resource between threads, and of course, provide some solutions to these problems.

## May 03, 2012

### Lucas van Dijk(VisPy)

#### Introduction to threads with C++11

The free lunch is over. The time that our complex algorithm was running extremely slow on a computer, but ran extremely fast a few years later, because the processor speeds exploded, is gone. The trend with current processors is to add more cores, rather than increasing the clock frequency.

As programmer, you should be aware of this. Of course, processors will always perform better with each year, but less fast than before. Currently, a lot of programs can benefit the most by using multiple threads, because of today's multicore processors.

In this article I'll briefly explain what a thread is, and how you can create them with the new threading library in C++11. I'm planning to write multiple articles about this topic, each going a little bit more in depth.