Python's Summer of Code 2015 Updates

July 06, 2015

Mridul Seth(NetworkX)

GSOC 2015 PYTHON SOFTWARE FOUNDATION NETWORKX BIWEEKLY REPORT 3

Hello folks, this blog post will cover the work done in week 5 and week 6.

As decided in #1592, G.degree() now is a hybrid between the old G.degree() and G.degree_iter(). This is implemented in #1617 and merged with the iter_refactor branch. We also decided (#1591) to stick with the old interface of G.adjacency_iter() for the new method G.adjacency() and remove G.adjacency_list() for the Di/Multi/Graphs classes. The methods G.nodes_with_selfloops(), G.selfloop_edges() now return an iterator instead of lists #1634.

And with these changes merged into the iter_refactor branch, the work for the core Di/Multi/Graphs is done. We have planned to do an extensive review before merging it with the master branch, and this will also need a review of the documentation.

Just a recap:

G.func() now works as G.func_iter(), with the original G.func() gone. Only iterators no lists. Where func belongs to ( nodes, edges, neighbors, successors, predecessors, in_edges, out_edges ). And G.degree() now returns the degree of the node if a single node is passed and works as G.degree_iter() if a bunch of nodes or nothing is passed. Same behaviour for in_degree() and out_degree().

Summer going really fast. Mid terms evaluations already done. I passed :)

Cheers!

PS: I wrote something regarding these changes at http://mseth.me/moving-from-networkx-1-xx-to-networkx-2-0/

AMiT Kumar(Sympy)

GSoC : This week in SymPy #6

Hi there! It's been six weeks into GSoC, and it marks the half of GSoC. The Mid term evaluations have been done now, Google has been preety quick doing this, I recieved the passing mail within 15 minutes after the deadline to fill up evaluations, so basically GSoC Admin did the following, (I guess):

SELECT * FROM GSoCStudents
WHERE EvaluationResult='PASS';

and SendThemMail


(Don't Judge my SQL, I am not good at it!)

Progress of Week 6

Last week my Linsolve PR #9438 finally got Merged Thanks! to @hargup @moorepants @flacjacket @debugger22 for reviewing it and suggesting constructive changes.

This week I worked on Intersection's of FiniteSet with symbolic elements, which was a blocking issue for lot of things, I managed to Fix the failing test which I mentioned in my last post. Eventually this PR got Merged as well, which has opened doors for lot of improvements.

Thanks to @jksuom & @hargup for iterating over this PR, and making some very useful comments for improving it to make it Mergeable.

I had a couple of hangout meeting with @hargup this week, (though now he has left for SciPy for a couple of weeks), we discussed about the further plan, for making solveset more robust, such as like returning the domain of invert while calling the invert_real , See #9617.

Motivation for this:

In [8]: x = Symbol('x', real=True)

In [9]: n = Symbol('n', real=True)

In [12]: solveset(Abs(x) - n, x)
Out[12]: {-n, n}


The solution returned above is not actually complete, unless, somehow we state n should be positive for the output set to be non-Empty. See #9588

from future import plan Week #7:

This week I plan to work on making invert_real more robust.

Relavant Issue:

$git log PR #9618 : Add test for solveset(x**2 + a, x) issue 9557 PR #9587 : Add Linsolve Docs PR #9500 : Documenting solveset PR #9612 : solveset return solution for solveset(True, ..) PR #9540 : Intersection's of FiniteSet with symbolic elements PR #9438 : Linsolve PR #9463 : ComplexPlane PR #9527 : Printing of ProductSets PR # 9524 : Fix solveset returned solution making denom zero That's all for now, looking forward for week #7. :grinning: July 05, 2015 Jazmin's Open Source Adventure Quick Update - Thursday, 2 July 2015 Quick update! Today, I: 1) Pushed a PR with functions and example notebooks for airmass and parallactic angle plots. 2) Worked on plot_sky issues. Jaakko Leppäkanga(MNE-Python) ICA I somehow managed to catch a cold in the middle of the summer. So this last week I've been working with half strength, but got at least something done. The browser family got a new member as the ICA plotter was added. It's basically using the same functions as mne browse raw, but with small modifications. This meant heavy refactoring of the code I made earlier in June. I also made some smaller fixes to existing code. Next step is to add a function for plotting independent components as epochs. Udara Piumal De Silva(MyHDL) Conversion error Today I tried to convert my design in to Verilog and VHDL since it is required that I verify my design on hardware. Unfortunately I found several issues that does not allow the convertion to happen. Mainly there are two types of errors. One saying that signal has multiple drivers and other saying signal in multiple list is not supported. I plan to contact my mentors regarding this issue and fix it before I develop the controller further. Signal in multiple list is not supported: activeRow_x[1] Christof Angermueller(Theano) GSoC: Week four and five Theano graphs become editable! By clicking on nodes, it is now possible to change their label. This allows to shorten default labels or to extend them by additional information. Moving the cursor over nodes will now also highlight all incoming and outgoing edges . You can find three examples here. I started to work on curved edges that minimize intersections with nodes, but everything is still in development: Apart from that, I fixed a couple of bugs and revised the backend to visualizing more detailed graph information in the future, such as timing information or nested graphs. I welcome any feedback and ideas to further improve the visualization! The post GSoC: Week four and five appeared first on Christof Angermueller. July 04, 2015 Udara Piumal De Silva(MyHDL) Load Mode Programming Today the load mode programming of the model is improved to that the model output shows in which mode the sdram is used. In my particular case the controller is using the sdram in the Burst write mode with CAS latency of 3 cycles and a burst length of 1 Apart from the load mode programming, initial auto_refresh command is also added to the state machine. However refreshing of the sdram need the most of the work. Pratyaksh Sharma(pgmpy) Wait, how do I order a Markov Chain? (Part 1) Clearly, not any Markov Chain would do. At the expense of sounding banal, let me describe (again) what will fit our bill. We wish to sample from a Bayesian Network (given some evidence) or from a Markov Network, both of which are in general hard to sample from. Till now, we've figured out that using a Markov Chain can solve our worries. But, the question still remains, how do we come up with the right Markov Chain? We want precisely one property: that the Markov Chain have the same stationary distribution$\pi$as the distribution$P(\textbf{X}|\textbf{E}=\textbf{e})$we wish to sample from. Factored State Space First, we define the states of our Markov Chain. Naturally, as we want our samples to be instantiations to the variables of our model (Bayesian Network or Markov Network), we let our states be these instantiations. Each state of the Markov Chain shall now represent a complete assignment (a particle). At first, it would seem that we would have to maintain an exponentially large number of objects, but as it turns out, that isn't really required. We'll just maintain the current state and will modify it as we perform a run. Multiple Kernels In theory, we can define a single transition function$\mathcal{T}$that takes the current state and gives the probability distribution over the next states. But in practice, it is basically convenient to work with multiple transition models, one per variable. We shall have the transition model$\mathcal{T}_i$for the$i$th variable of the model. On simulating a run of the Markov Chain, we: 1. Start with a starting state$S_0$which is a complete assignment to all variables in the model. 2. In a certain order of variables, transition to the next state (assignment) of that variable. 3. Do this for all variables in a pre-defined order. This completes a single step of our random walk and generates a single sample. Repeat the above steps with the sampled state as the new starting state and we have our sampling algorithm. I haven't yet described how we are supposed to get the ideal$\mathcal{T}_i$s; I'll probably save it for the next post. Till then, check out the implementation of the above here. Chienli Ma(Theano) Evaluation Passed and the Next Step: OpFromGraph Evaluation passed and the next step: OpFromGraph The PR of function.copy() is ready to merged, only need fred to fix a small bug. And in this Friday I passed the mid-term evaluation. So it’s time to take the next step. In the original proposal ,the next step is to swap output and updates. After a discussion with Fred, we thought this feature is useless so we skip this and head to the next feature directly – OpFromGraph. Goal: make class OpFromGraph work. Big How? OpFromGraph should init a gof.op that has no difference with other Ops and can be optimized. Otherwise it has no sense. For this, we need to make it work on GPU, make sure it works with C code and document it. Make sure infer_shape(), grad() work with it. Ideally, make R_op() work too. Detailed how. • Implement __hash__() and __eq__() method so it is a basic • Implement infer_shape() method so that it’s optimizable • test if it work with shared variable as input and if not make it work. Add test for that. • Move it correctly to the GPU. We can do it quickly for the old back-end, move all float32 inputs to the GPU. Otherwise, we need to compile the inner function, see which inputs get moved to the GPU, then create a new OpFromGraph with the corresponding input to the GPU. #2982 • Makergrad() work. This should remove the grad_depth parameter First Step: infer_shape: The main idea is to calculatet the shapes of outputs from given input shapes. This is a process similar to executing a function – we cannot know the shape of a variable before knowing the shape of the variables it depends on. So, we can mimic the make_thunk() method to calculate the shape from output to input. I come out with a draft now, and need some help with test case. Zubin Mithra(pwntools) Tests for AMD64 and aarch64 This week I've been working on adding an integration test into srop.py for AMD64. You can see the merged PR here. Writing an integration test involves writing mako templates for read and sigreturn. I've also been working on setting up an AARCH64 qemu machine with proper networking settings. Next week, I'll be working on getting AARCH64 merged in along with its doctest, and the rest of the integration tests. Isuru Fernando(SymPy) GSoC Week 6 This week, I worked on improving the testing and making Sage wrappers. First, building with Clang had several issues and they were not tested. One issue was a clang bug when -ffast-math optimization is used. This flag would make floating point arithmetic perform better, but it may do arithmetic not allowed by the IEEE floating point standard. Since it performs faster we have enabled it in Release mode and due to a bug in clang, a compiler error is given saying error: unknown type name '__extern_always_inline' . This was fixed by first checking if the error is there in cmake and then adding a flag D__extern_always_inline=inline. Another issue was that type_traits header was not found. This was fixed by upgrading the standard C++ library, libstdc++ This week, I finished the wrappers for Sage. Now converters to and from sage can be found at sage.symbolic.symengine. For this module to convert using the C++ level members, symengine_wrapper.pyx 's definitions of the classes were taken out and declared in symengine_wrapper.pxd and implemented in pyx file. To install symengine in sage, https://github.com/sympy/symengine/issues/474 has to be resolved. A cmake check will be added to find whether this issue exists and if so, then the flag -Wa,-q will be added to the list of flags. We have to make a release of symengine if we were to make spkg's to install symengine in Sage, so some of my time next week will involve getting symengine ready for a release and then making spkgs for everyone to try out symengine. Shivam Vats(SymPy) GSoC Week 6 I successfully passed my mid term evaluation this week and now the second half of my project has begun! It has been a challenging journey so far that has made me explore new algorithms (some very ingenious) and read a lot of code (much more difficult than I had imagined). This week, Mario whose code I am working on, helped in a big way by showing how and where to improve the algorithms. It is clear now that all the functions need to guarantee the order of the series they output. We were planning to keep it optional but since ring_series functions each other, an error in the order will propagate and eventually make it unpredictable. So Far PR 9575 is ready for merge except for a bug in sympify that converts PythonRational into float. I had a discussion with Mario on PR 9495. He has suggested a lot of improvements while dealing with fractional exponents, especially the fact that Newton method may not be ideal in these cases. It is very interesting to try and compare different algorithms and come up with ways to optimise them for out needs. The scope for improvement is immense and we'll need to decide the order in which we'll push the optimisations. I started writing the RingSeries class for series evaluation in PR 9614. I was supposed to update my blog yesterday, but I dozed off while working on it. According to my current approach, I am writing classes for all the functions so that they can be represented symbolically. Another issue that needs to be tackled is expansion of nested functions, things like sin(cos(sin(x)). This will need some work as there are many approaches to tackle it. Currently, I evaluate the inner functions( it they exist) recursively with prec + 1. This will work in simple cases, but not if there are cancellations, eg, sin(cos(x)) - sin(cos(x**2)). Next Week • Get PR 9575 merged. • Improve PR 9495 and get it merged. • Finalise the series class hierarchy and the series evaluation function. The next phase of my project is in Symengine and there have been a lot of improvements and changes there. I will need to play with the new stuff and perhaps also think of ways to port ring_series there. Cheers! July 03, 2015 Lucas van Dijk(VisPy) GSoC 2015: Midterm summary Hi all! It's midterm time! And therefore it is time for a summary. What did I learn these past few weeks, and what were the main road blocks? What I learned This is my first project where I use OpenGL, and a lot has become clearer how this system works: the pipeline, the individual shaders and GLSL, and how they're used for drawing 2D and 3D shapes. Of course, I've only scratched the surface right now, but this is a very good basis for more advances techniques. I've learned about some mathematical techniques for drawing 2D sprites A bit more Git experience in a situation where I'm not the only developer of the repository. This has been a great experience, and the core developers of Vispy are very active and responsive. Challenges I was a bit fooled by my almost lecture free college schedule in May/June, but the final personal assignments where a bit tougher and bigger than expected. So combining GSoC and all these study assignments was sometimes quite a challenge. But the college year is almost over, and after next week I can focus 100% on the GSoC. In terms of code: I don't think I've encountered real big roadblocks, it took maybe a bit more time before every piece of a lot of shader code fell together, but I think I'm starting to get a good understanding of both the Vispy architecture and OpenGL. Past week The past week I've trying to flesh out the requirements for the network API a bit, and I've also been investigating the required changes for the arrow head visual, because there's a scenegraph and visual system overhaul coming: https://github.com/vispy/vispy/pull/928. Until next time! Abraham de Jesus Escalante Avalos(SciPy) Mid-term summary Hello all, We're reaching the halfway mark for the GSoC and it's been a great journey so far. I have had some off court issues. I was hesitant to write about them because I don't want my blog to turn into me ranting and complaining but I have decided to briefly mention them in this occasion because they are relevant and at this point they are all but overcome. Long story short, I was denied the scholarship that I needed to be able to go to Sheffield so I had to start looking for financing options from scratch. Almost at the same time I was offered a place at the University of Toronto (which was originally my first choice). The reason why this is relevant to the GSoC is because it coincided with the beginning of the program so I was forced to cope with not just the summer of code but also with searching/applying for funding and paperwork for the U of T which combined to make for a lot of work and a tough first month. I will be honest and say that I got a little worried at around week 3 and week 4 because things didn't seem to be going the way I had foreseen in my proposal to the GSoC. In my previous post I wrote about how I had to make a change to my approach and I knew I had to commit to it so it would eventually pay off. At this point I am feeling pretty good with the way the project is shaping up. As I mentioned, I had to make some changes, but out of about 40 open issues, now only 23 remain, I have lined up PRs for another 8 and I have started discussion (either with the community or with my mentor) on almost all that remain, including some of the longer ones like NaN handling which will span over the entire scipy.stats module and is likely to become a long term community effort depending on what road Numpy and Pandas take on this matter in the future. I am happy to look at the things that are still left and find that I at least have a decent idea of what I must do. This was definitely not the case three or four weeks ago and I'm glad with the decision that I made when choosing a community and a project. My mentor is always willing to help me understand unknown concepts and point me in the right direction so that I can learn for myself and the community is engaging and active which helps me keep things going. My girlfriend, Hélène has also played a major role in helping me keep my motivation when it seems like things amount to more than I can handle. I realise that this blog (since the first post) has been a lot more about my personal journey than technical details about the project. I do apologise if this is not what you expect but I reckon that this makes it easier to appreciate for a reader who is not familiarised with 'scipy.stats', and if you are familiarised you probably follow the issues or the developer's mailing list (where I post a weekly update) so technical details would be redundant to you. I also think that the setup of the project, which revolves around solving many issues makes it too difficult to write about specific details without branching into too many tangents for a reader to enjoy. If you would like to know more about the technical aspect of the project you can look at the PRs, contact me directly (via a comment here or the SciPy community) or even better, download SciPy and play around with it. If you find something wrong with the statistics module, chances are it's my fault, feel free to let me know. If you like it, you can thank guys like Ralf Gommers (my mentor), Evgeni Burovski and Josef Perktold (to name just a few of the most active members in 'scipy.stats') for their hard work and support to the community. I encourage anyone who is interested enough to go here to see my proposal or go here to see currently open tasks to find out more about the project. I will be happy to fill you in on the details if you reach me personally. Sincerely, Abraham. Sumith(SymPy) GSoC Progress - Week 6 Hello, received a mail few minutes into typing this, passed the midterm review successfully :) Just left me wondering how do these guys process so many evaluations so quickly. I do have to confirm with Ondřej about this. Anyways, the project goes on and here is my this week's summary. Progress SymEngine successfully moved to using Catch as a testing framework. The travis builds for clang were breaking, this let me to play around with travis and clang builds to fix this issue. The linux clang build used to break because we used to mix-up and link libraries like GMP compiled with different standard libraries. Thanks to Isuru for lending a helping hand and fixing it in his PR. Next task to make SYMENGINE_ASSERT not use standard assert(), hence I wrote my custom assert which simulates the internal assert. Now we could add the DNDEBUG as a release flag when Piranha is a dependence, this was also done. Started work on Expression wrapper, PR that starts off from Francesco's work sent in. Investigated the slow down in benchmarks that I have been reporting in the last couple of posts. Using git commit(amazing tool, good to see binary search in action!), the first bad commit was tracked. We realized that the inclusion of piranha.hpp header caused the slowdown and was resolved by using mp_integer.hpp, just the requirement header. With immense help of Franceso, the problem was cornered to this: * Inclusion of thread_pool leads to the slowdown, a global variable that it declares to be specific. * In general a multi-threaded application may result in some compiler optimizations going off, hence slowdown. * Since this benchmark is memory allocation intensive, another speculation is that compiler allocates memory differently. This SO question asked by @bluescarni should lead to very interesting developments. We have to investigate this problem and get it sorted. Not only because we depend on Piranha, we might also have multi-threading in SymEngine later too. Report No benchmarking was done this week. Here is my PR reports. WIP * #500 - Expression Wrapper Merged * #493 - The PR with Catch got merged. * #498 - Made SYMENGINE_ASSERT use custom assert instead of assert() and DNDEBUG as a release flag with PIRANHA. * #502 - Make poly_mul used mpz_addmul (FMA), nice speedup of expand2b. * #496 - En route to fixing SYMENGINE_ASSERT led to a minor fix in one of the assert statements. * #491 - Minor fix in compiler choice documentation. Targets for Week 6 • Get the Expression class merged. • Investigate and fix the slow-downs. The rest of tasks can be finalized in later discussion with Ondřej. That's all this week. Ciao Yue Liu(pwntools) GSOC2015 Students coding Week 05 week sync 10 Last week: • issues #37 set ESP/RSP fixed, and a simple implementation for migrate method. • All testcases in issues #38 passed. • All testcases in issues #39 passed. • All testcases in issues #36 passed, but need more testcases. Next week: • Optimizing and fix potential bugs. • Add some doctests and pass the example doctests. Keerthan Jaic(MyHDL) GSoC Midterm Summary So far, I’ve fixed a release blocking bug, updated the documentation and revamped the core tests. Most of my pull requests have been merged into master. I’ve also worked on refactoring some of the core decorators and improving the conversion tests. However, these are not yet ready to be merged. In the second period, I will focus on improving the conversion modules. More details can be found in my proposal. July 02, 2015 Manuel Paz Arribas(Astropy) Mid-Term Summary Mid-term has arrived and quite some work has been done for Gammapy, especially in the observation, dataset and background modules. At the same time I have learnt a lot about Gammapy, Astropy (especially tables, quantities, angles, times and fits files handling), and python (especially numpy and matplotlib.pyplot). But the most useful thing I'm learning is to produce good code via code reviews. The code review process is sometimes hard and frustrating, but very necessary in order to produce clear code that can be read and used by others. The last week I have been working on a method to filter observations tables as the one presented in the figure on the first report. The method is intended to be used to select observations according to different criteria (for instance data quality, or within a certain region in the sky) that should be used for a particular analysis. In the case of background modeling this is important to separate observations taken on or close to known sources or far from them. In addition, the observations can be grouped according to similar observation conditions. For instance observations taken under a similar zenith angle. This parameter is very important in gamma-ray observations. The zenith angle of the telescopes is defined as the angle between the vertical (zenith) and the direction where the telescopes are pointing. The smaller the zenith angle is, the more vertical the telescopes are pointing, and the thinner is the atmosphere layer. This has large consequences in the amount and properties of the gamma-rays detected by the telescopes. Gamma-rays interact in the upper atmosphere and produce Cherenkov light, which is detected by the telescopes. The amount of light produced is directly proportional to the energy of the gamma-ray. In addition, the light is emitted in a narrow cone along the direction of the gamma-ray. At lower zenith angles the Cherenkov light has to travel a smaller distance through the atmosphere, so there is less absorption. This means that lower energy gamma-rays can be detected. At higher zenith angles the Cherenkov light of low-energy gamma-rays is totally absorbed, but the Cherenkov light cones of the high-energy ones are longer, and hence the section of ground covered is larger, so particles that fall further away from the telescopes can be detected, increasing the amount of detected high-energy gamma-rays. The zenith angle is maybe the most important parameter, when grouping the observations in order to produce models of the background. The method implemented can filter the observations according to this (and other) parameters. An example using a dummy observation table generated with the tool presented on the first report is presented here (please click on the picture for an enlarged view): Please notice that instead of the mentioned zenith angle, altitude as the zenith's complementary angle (altitude_angle = 90 deg - zenith_angle) is used. In this case, the first table was generated with random altitude angles between 45 deg and 90 deg (or 0 deg to 45 deg in zenith), while the second table is filtered to keep only zenith angles in the range of 20 deg to 30 deg (or 60 deg to 70 deg in altitude). The tool can be used to apply selections in any variable present in the observation table. In addition, an 'inverted' flag has been programmed in order o be able to apply the filter to keep the values outside the selection range, instead of inside. Recapitulating the progress done until now, the next steps will be to finish the tools that I am implementing now: the filter observations method described before and the background cube model class on the previous report. In both cases there is still some work to do: an inline application for filtering observations and more methods to create cube background models. The big milestone is to have a working chain to produce cube background models from existing event lists within a couple of weeks. Vito Gentile(ERAS Project) Enhancement of Kinect integration in V-ERAS: Mid-term summary This is my third report on what I have done for my GSoC project. If you don’t know what it is about and want to find more information, please refer to this page and this blog post. In this report, I will summarize what I have done until now, and also describe what I will do during the next weeks. My project is about the enhancement of Kinect integration in V-ERAS, which was all based on C#, in order to use the official Microsoft API (SDK version: 1.8). However, the whole ERAS source code is mainly written in Python, so the first step was to port the C# body tracker in Python, by using PyKinect. This also required the rewrite of all the GUI (by using PGU). Then, I have also integrated the height estimation of the user in the body tracker, by using skeletal information for calculating it. This has been implemented as a Tango command, so that it can be executed by any device connected to the Tango bus. This feature will be very useful to modulate the avatar size before starting simulation in V-ERAS. I have also took a look to the webplotter module, which will be useful for the incoming AMADEE mission, to verify the effect of virtual reality interaction on user’s movements. What I have done is to edit the server.py script, which was not able to manage numpy arrays. These structures are used by PyTango for attributes defined as “SPECTRUM”; in order to correctly save user’s data in JSON, I had to add a custom JSON encoder (see this commit for more information). What I am starting to do now is perhaps the most significant part of my project, which is the implementation of user’s step estimation. At the moment, this feature is integrated in the V-ERAS Blender repository, as a Python Blender script. The idea now is to change the architecture to be event-based: everytime a Kinect frame with skeletal data is read by the body tracker, it will calculate user’s movements in terms of body orientation and linear distance, and I will push a new change event. This event will be read by a new module, that is being developed by Siddhant (another student which is participating to GSoC 2015 with IMS and PSF), to move a virtual rover (or any other humanoid avatar) according to user’s movements. I have started to developing the event-based architecture, and what I will start to do in these days is to integrate the step estimation algorithm, starting from the one that is currently implemented in V-ERAS Blender. Then I will improve it, in particular for what about the linear distance estimation; the body orientation is quite well calculated with the current algortihm indeed, so although I will check its validity, hopefully it will be simply used as it is now. The last stage of my project will be to implement gesture recognition, in particular the possibility to recognize if user’s hands are closed or not. In these days I had to implement this feature in C# for a project that I am developing for my PhD research. With Microsoft Kinect SDK 1.8, it is possible by using KinectInteraction, but I am still not sure about the feasibility of this feature with only PyKinect (which is a sort of binding of the C++ Microsoft API). I will discover more about this matter in the next weeks. I will let you know every progress with the next updates! Stay tunes! Rafael Neto Henriques(Dipy) [RNH Post #6] Mid-Term Summary We are now at the middle of the GSoC 2015 coding period, so it is time to summarize the progress done so far and update the plan for the work of the second half part of the program. Progress summary Overall a lot was achieved! As planed on my project proposal, during the first half of the coding period, I finalized the implementation of the first version of the diffusion kurtosis imaging (DKI) reconstruction module. Moreover, some exciting extra steps were done! Accomplishing the first steps of the project proposal 1) The first accomplished achievement was merging the work done on the community bonding period to the main master Dipy repository. This work consisted on some DKI simulation modules that can be used to study the expected ground truth kurtosis values of white matter brain fibers. In this project, these simulations were useful to test the real brain DKI processing module. The documentation of this work can be already found in Dipy's website 2) The second achievement was finalizing the procedures to fit the DKI model on real brain data. This was done from inheritance of a module class already implemented in Dipy, which contains the implementation of the simpler diffusion tensor model (for more details on this you can see my previous post). Completion of the DKI fitting procedure was followed by implementation of functions to compute the ordinary linear least square fit solution of the DKI model. By establishing the inheritance between the DKI and diffusion tensor modules, duplication of code was avoided and the standard diffusion tensor measures were automatically incorporated. The figure below shows an example of these standard measures obtained from the new DKI module after the implementation of the relevant fitting functions. Figure 1 - Real brain standard diffusion tensor measures obtained from the DKI module, which included the diffusion fractional anisotropy (FA), the mean diffusivity (MK), the axial diffusivity (AD) and the radial diffusivity (RD). The raw brain dataset used for the images reconstruction was kindly provided by Maurizio Marrale (University of Palermo). 3) Finally, from the DKI developed fitting functions, standard measures of kurtosis were implemented. These were based on the analytical solutions proposed by Tabesh and colleagues which required, for instance, the implementation of sub-functions to rotate 4D matrices and to compute Carlson's incomplete elliptical integrals. Having implemented the analytical solution of standard kurtosis measure functions, I accomplished all the work proposed for the first half of the GSoC. Below I am showing the first real brain images reconstructed kurtosis from the new implemented modules. Figure 2 - Real brain standard kurtosis tensor measures obtained from the DKI module, which included the mean kurtosis (MK), the axial kurtosis (AK), and radial kurtosis (RK). The raw brain dataset used for the images reconstruction was kindly provided by Maurizio Marrale (University of Palermo). Extra steps accomplished Some extra steps were also accomplished during the first half of the GSoC program. In particular, from the feedback that I obtained at the International Society for Magnetic Resonance in Medicine (ISMRM) conference (see my fourth post), I decided to implement an additional DKI fitting solution - the weighted linear least square DKI fit solution. This fit is considered to be one of the most robust fitting approaches in recent DKI literature (for more details see my previous post)Therefore, having this method implemented, I am insuring that the new Dipy's DKI modules are implemented according to the most advanced DKI state of art. To show how productive was the ISMRM conference for the project, I am sharing you a photo that I took at the conference with one of the head developers of Dipy - Eleftherios Garyfallidis. Figure 3 - Photo taken at the ISMRM conference - I am wearing the Dipy's T-shirt at the right side of the photo and in the left side you can see the head Dipy's developer Eleftherios Garyfallidis. Next steps After discussing with my mentor, we agreed that we should dedicate more time on the first part of the project proposal, i.e. improving the DKI reconstruction module. Due to the huge extent of code and the math complexity of this module, I will dedicate a couple of weeks more in improving the module's performance, quality of code testing and documentation. In this way, we decided to postpone the two last milestones initially planed for the second half term of the GSoC to the last three weeks of the GSoC coding period The next steps of the updated project plan are as described in the following points: 1) Merge the pull requests that contain the new DKI modules with the master's Dipy repository. To facilitate the revision of the implemented functions by the mentoring organization, I will split my initial pull request into smaller pull requests. 2) At the same time that the previous developed code is reviewed, I will implement new features on the functions for estimating kurtosis parameters to reduce processing time. For instance, I will implement some optional variables that allow each method to receive a Boolean mask to point the image voxels to be processed. This will save the time wasted on processing unnecessary voxels as from the background. 3) I will also implement simpler numerical methods for a faster estimation of the standard DKI measures. These numerical methods are expected to be less accurate that the analytical solutions already implemented, however they provide alternatives less computationally demanding. Moreover, they will provide a simpler mathematical framework which will be used to further validate the analytical solutions. 4) Further improvements of the weighed linear least square solution will be performed. In particular, the weights' estimations used on the fit will be improved by an iterative algorithm as described on recent DKI literature 5) Finally, the procedures to estimate from DKI concrete biophysical measures and white matter fiber directions will be implemented as I described on the initial project proposal. Shridhar Mishra(ERAS Project) Mid - Term Post. Now that my exams are over i can work with full efficiency and work on the project. the current status of my project looks something like this. Things done: • Planner in place. • Basic documentation update of europa internal working. • scraped pygame simulation of europa. Things i am working on right now: • Integrating Siddhant's battery level indicator from Husky rover diagnostics with the planner for more realistic model. • Fetching things and posting things on PyTango server. (Yet to bring it to a satisfactory level of working) Things planned for future: • Integrate more devices. • improve docs. Ambar Mehrotra(ERAS Project) GSoC 2015: Mid-Term and 4th Biweekly Report Google Summer of Code 2015 started on May 25th and the midterm is already here. I am glad to note that my progress has been in accordance with the timeline I had initially provided. This includes all the work that I had mentioned till the last blog post in this series as well as the work done during the previous week. During the past week I was busy working on the Data Aggregation and Summary Creation for various branches in the tree. Basic structure and functionality of the tree is as follows: • The tree can have several nodes inside it. • Each node can either be a branch(can have more branches or leaves as children) or a leaf(cannot have any children). • Each node has its raw data and a summary. • The raw data for a leaf node is the data coming in directly from the device servers, while the raw data for branches is the summary of individual nodes. • The summary for a leaf node can be defined as the minimum/maximum./average value of the sensor readings over a period of time. Later the user can create a custom function for defining the summary. • The summary for a branch is the minimum/maximum/average value of its children. Implementation: For summarizing information over time at different levels of hierarchy it was necessary to keep logging the data coming in from the device servers. I decided to go with MongoDB as a json style database seemed like the best option for storing and retrieving data for different levels of hierarchy and mongodb is quite popular for doing such tasks. I started a thread as soon as the user created the summary for a data source which polls the device server at regular intervals and logs the data in the mongodb database. Similar threads were created for each level of hierarchy where each node has the information about its raw data and summary and knows its immediate children. This kind of structure simplified the process of managing the hierarchy at different levels. When the user clicks a node its information - raw data and summary, are shown on the right panel in different tabs. The user has the option of modifying the summary as well if he wants to do so. Here is a screenshot for the raw data: In the upcoming weeks and the later part of the program, I am planning to work on various bug fixes, implementation of functionality for multiple attributes from a device server and integration with the Tango Alarm Systems and monitoring alarms. Happy Coding! Udara Piumal De Silva(MyHDL) Mid-Term Summary I have started my project by developing a sdram model which could be used to test the sdram controller functionality. I have incorporated timing delays in the model so that the controller can be tested for a greater extend. How simulator works 1. At each clock cycle the model will look at its pins cs ras cas we and print the decoded command. It can be one of the following commands, • NOP, ACTIVE, READ, WRITE, PRECHARGE, AUTO_REFRESH, LOAD_MODE 2. Sdram has its own state machine running inside. Each bank can work independently from others. Therefore there are separate state machines for each bank. 3. The model can actually store the written value and read them later. Therefore read / write operation of the sdram controller can be tested by the model. 4. Sdram has some self tests which checks for illegal behaviours like trying to read or write without activating a row. These errors will be printed to the console and should be avoided to proper functionality of controller. My project main objective is to design a synthesizable sdram controller with MyHDL. I have started its work and at the moment it can perform read and write operations. However it requires additional work in row refreshing. How controller works 1. Write operation is done by writing the address and data to addr and data_i ports in respective order and waits until done_o goes high. 2. Read operation is done by writing the address to addr port and waiting until done_o goes high. When done_o goes high data_o bus contains the value of the read address. Results Following screenshot is the results of the TestBench.py which uses the controller to write and read from memory. As the output shows it can retrieve the read value from the sdram without any errors. After midterm... Controller need more work in row refreshing which will be the first priority. I would improve the model accordingly so that the sdram can be tested for refreshing functionality. Hardware verification of the design is the next priority. I planed to test my design on Xula2 board with its inbuilt Sdram chip. Jakob de Maeyer(ScrapingHub) Meet the Add-on Manager Previously, I introduced the concept of Scrapy add-ons and how it will improve the experience of both users and developers. Users will have a single entry-point to enabling and configuring add-ons without being required to learn about Scrapy’s internal settings structure. Developers will gain better control over enforcing and checking proper configuration of their Scrapy extensions. Additional to their extension, they can provide a Scrapy add-on. An add-on is any Python object that provides the add-on interface. The interface, in turn, consists of few descriptive variables (name, version, …) and two callbacks: One for enforcing configuration, called before the initialisation of Scrapy’s crawler, and one for post-init checks, called immediately before crawling begins. This post describes the current state of and issues with the implementation of add-on management in Scrapy. Current state The pull request with the current work-in-progress on the implementation can be found on GitHub. Besides a lot of infrastructure (base classes, interfaces, helper functions, tests), its heart is the AddonManager. The add-on manager ‘holds’ all loaded add-ons and has methods to load configuration files, add add-ons, and check dependency issues. Furthermore, it is the entry point for calling the add-ons’ callbacks. The ‘loading’ and ‘holding’ part can be used independently of one another, but in my eyes there are too many cross-dependencies for the ‘normal’ intended usage to justify separating them into two classes. Two “single” entry points? From a user’s perspective, Scrapy settings are controlled from two configuration files: scrapy.cfg and settings.py. This distinction is not some historical-backwards-compatible leftover, but has a sensible reason: Scrapy uses projects as organisational structure. All spiders, extensions, declarations of what can be scraped, etc. live in a Scrapy project. Every project has settings.py in which crawling-related settings are stored. However, there are other settings that can or should not live in settings.py. This (obviously) includes the path to settings.py (for ease of understanding, I will always write settings.py for the settings module, although it can be any Python module), and settings that are not bound to a particular project. Most prominently, Scrapyd, an application for deploying and running Scrapy spiders, uses scrapy.cfg to store information on deployment targets (i.e. the address and auth info for the server you want to deploy your Scrapy spiders to). Now, add-ons are bound to a project as much as crawling settings are. Consequentially, add-on configuration should therefore live in settings.py. However, Python is a programming language, and not a standard for configuration files, and its syntax is therefore (for the purpose of configuration) less user-friendly. An ini configuration like this: # In scrapy.cfg [addon:path.to.mysql_pipe] database = some.server user = some_user password = some!password  would (could) look similar to this in Python syntax: # In settings.py addon_mysqlpipe = dict( _name = 'path.to.mysql_pipe', database = 'some.server', user = 'some_user', password = 'some!password', )  While I much prefer the first version, putting add-on configuration into scrapy.cfg would be very inconsistent with the previous distinction of the two configuration files. It will therefore probably end up in settings.py. The syntax is a little less user-friendly, but after all, most Scrapy users should be familiar with Python. For now, I have decided to write code that reads from both. Allowing add-ons to load and configure other add-ons In some cases, it might be helpful if add-ons were allowed to load and configure other add-ons. For example, there might be ‘umbrella add-ons’ that decide what subordinate add-ons need to be enabled and configured given some configuration values. Or an add-on might depend on some other add-on being configured in a specific way. The big issue with this is that, with the current implementation, the first time the methods of an add-ons are called is during the first round of callbacks to update_settings(). Should an add-on load or reconfigure another add-on here, other add-ons might already have been called. While it is possible to ensure that the update_settings() method of the newly added add-on is called, there is no guarantee (and in fact, it is quite unlikely) that all add-ons see the same add-on configuration in their update_settings(). I see three possible approaches to this: 1. Forbid add-ons from loading or configuring other add-ons. In this case ‘umbrella add-ons’ would not be possible and all cross-configuration dependencies would again be burdened onto the user. 2. Forbid add-ons to do any kind of settings introspection in update_settings(), instead only allow them to do changes to the settings object or load other add-ons. In this case, configuring already enabled add-ons should be avoided, as there is no guarantee that their update_settings() method has not already been called 3. Add a third callback, update_addons(config, addonmgr), to the add-on interface. Only loading and configuring other add-ons should be done in this method. While it may be allowed, developers should be aware that depending on the config (of their own add-on, i.e. the one whose update_addons() is currently called) is fragile as, once again, there is no guarantee in which order add-ons will be called back. I have put too much thought into it just yet, but I think I prefer option 3. Julio Ernesto Villalon Reina(Dipy) Midterm Summary So, the first part of GSoC is over and the first midterm is due today. Here is a summary of this period. The main goal of the project is to implement a segmentation program that is able to estimate the Partial Volume (PV) between the three main tissue types of the brain (i.e. white matter, cerebrospinal fluid (CSF) and grey matter). The input to the algorithm is a T1-weighted Magnetic Resonance Image (MRI) of the brain. I checked back on what I have worked on so far and these are my two big accomplishments: - The Iterated Conditional Modes (ICM) for the Maximum a Posteriori - Markov Random Field (MAP-MRF) Segmentation. This part of the algorithm is at the core of the segmentation as it minimizes the posterior energy of each voxel given its neighborhood, which is equivalent to estimating the MAP. - The Expectation Maximization (EM) algorithm in order to update the tissue/label parameters (mean and variance of each label). This technique is used because this is an “incomplete” problem, since we know the probability distribution of the tissue intensities but don’t know how each one contributes to it. By combining these two powerful algorithms I was able to obtain 1) the segmented brain into three classes and 2) the PV estimates (PVE) for each tissue type. The images below show an example coronal slice of a normal brain and its corresponding outputs. What comes next? Tests, tests, tests…. Since I have the segmentation algorithm already up and running I have to do many tests for input parameters such as the number of iterations to update the parameters with the EM algorithm, the beta value, which determines the importance of the neighborhood voxels, and the size of the neighborhood. Validation scripts must be implemented as well to compare the resulting segmentation with publicly available programs. These validation scripts will initially compute measures such as Dice and Jaccard coefficients to verify how close my method’s results are to the others. For an updated version of the code and all its development since my first pull request please go to: https://github.com/nipy/dipy/pull/670#partial-pull-merging  T1 original image  Segmented image. Red: white matter, Yellow: grey matter,Light Blue: corticospinal fluid  Corticospinal fluid PVE  Grey matter PVE  White matter PVE Jazmin's Open Source Adventure Quick Update - Wednesday, 1 July 2015 Quick update! Today, I: 1) Made a PR for plot_airmass and plot_parallactic, as well as some example notebooks for their use. Aron Barreira Bordin(Kivy) Mid-Term Summary Hi! We are at the middle of the program, so let's get an overview of my proposal, what I've done, what I'll be doing in the second part of the project. I'll also post about my experience until now, good and bad aspects, and how I'll work to do a good job. Project Development I'm really happy to be able to work with extra features not listed in my proposal. As long as I did a good advance in my proposal, I worked in some interesting and important improvements to Kivy Designer. In the second part of the program, I'll finish to code my proposal and try to add as many new features/bug fixes as possible. Blockers Unfortunately, my University has a different calendar this year, I'll have classes until August 31 ;/, so I'm really sad to not be able to work with full time in my project. Sometimes I need to divide my study/work time. As I wrote above, I'm really happy to have a good progress, but I'd love to be able to do even more. Second period In this second period, I'll try to focus my development to be able to release a more stable version of Kivy Designer. Right now Kivy Designer is an alpha tool, and actually, Isn't a nice tool to use. But by the end of the project, my goal is to invert this point of view. To improve the stability project, I'd like to add Unit Tests and documentation to the project. Thx, Aron Bordin. July 01, 2015 Siddhant Shrivastava(ERAS Project) Mid-term Report - GSoC '15 Hi all! I made it through the first half of the GSoC 2015 program). This is the evaluation week of the Google Summer of Code 2015 program with the Python Software Foundation and the Italian Mars Society ERAS Project. Mentors and students evaluate the journey so far in the program by answering some questions about their students and mentors respectively. My entire project can be visualized in the following diagram - Achievements- Husky-ROS-Tango Interface • ROS-Tango interfaces to connect the Telerobotics module with the rest of ERAS. • ROS Interfaces for Navigation and Control of Husky • Logging Diagnostics of the robot to the Tango Bus • Driving the Husky around using human commands Video Streaming • Single Camera Video streaming to Blender Game Engine This is how it works. ffmpeg is used as the streaming server to which Blender Game Engine subscribes. The ffserver.conf file is configured as follows which describes the characterstics of the stream: Port 8190 BindAddress 0.0.0.0 MaxClients 10 MaxBandwidth 50000 NoDaemon <Feed webcam.ffm> file /tmp/webcam.ffm FileMaxSize 2000M </Feed> <Stream webcam.mjpeg> Feed webcam.ffm Format mjpeg VideoSize 640x480 VideoFrameRate 30 VideoBitRate 24300 VideoQMin 1 VideoQMax 5 </Stream>  Then the Blender Game Engine and its associated Python library bge kicks in to display the stream on the Video Texture: # Get an instance of the video texture bge.logic.video = bge.texture.Texture(obj, ID) # a ffmpeg server is streaming the feed on the IP:PORT/FILE # specified in FFMPEG_PARAM, # BGE reads the stream from the mjpeg file. bge.logic.video.source = bge.texture.VideoFFmpeg(FFMPEG_PARAM) bge.logic.video.source.play() bge.logic.video.refresh(True)  The entire source code for single camera streaming can be found in this repository. • Setting up the Minoru Camera for stereo vision It turns out this camera can stream at 30 frames per second for both cameras. The last week has been particularly challenging to figure out the optimal settings for the Minoru Webcam to work. It depends on the Video Buffer Memory allocated by the Linux Kernel for libuvc and v4l2 compatible webcams. Different kernel versions result in different performances. I am unable to stream • Setting up the Oculus Rift DK1 for the Virtual Reality work in the upcoming second term Crash-testing and Roadblocks This project was not without its share of obstacles. A few memorable roadblocks come to mind- 1. Remote Husky testing - Matt (from Canada), Franco (from Italy), and I (from India) tested whether we could remotely control Husky. The main issue we faced was Network Connectivity. We were all on different networks geographically, which the ROS in our machines could not resolve. Thus some messages (like GPS) were accessible whereas the others (like Husky Status messages) were not. The solution we sought is to create a Virtual Private Network for our computers for future testing. 2. Minoru Camera Performance differences - Since the Minoru's performance varies with the Kernel version, I had to bump down the frames per second to 15 fps for both cameras and stream them in the Blender Game Engine. This temporary hack should be resolved as ERAS moves to newer Linux versions. 3. Tango related - Tango-controls is a sophisticated piece of SCADA library with a server database for maintaining device server lists. It was painful to use the provided GUI - Jive to configure the device servers. To make the process in line with other development activities, I wrote a little CLI-based Device server registration and de-registration interactive script. A blog post which explains this in detail. 4. Common testing platform - I needed to use ROS Indigo, which is supported only on Ubuntu 14.04. ERAS is currently using Ubuntu 14.10. In order to enable Italian Mars Society and the members to execute my scripts, they needed my version of Ubuntu. Solution - Virtual Linux Containers. We are using a Docker Image which my mentors can use on their machine regarding of their native OS. This post explains this point. Expectations from the second term This is a huge project in that I have to deal with many different technologies like - 1. Robot Operating System 2. FFmpeg 3. Blender Game Engine 4. Oculus VR SDK 5. Tango-Controls So far, the journey has been exciting and there has been a lot of learning and development. The second term will be intense, challenging, and above all, fun. To-do list - 1. Get Minoru webcam to work with ffmpeg streaming 2. Use Oculus for an Augmented Reality application Source 3. Integrate Bodytracking with Telerobotics 4. Automation in Husky movement and using a UR5 manipulator 5. Set up a PPTP or OpenVPN for ERAS Time really flies by fast when I am learning new things. GSoC so far has taught me how to not be a bad software engineer, but also how to be a good open source community contributor. That is what the spirit of Google Summer of Code is about and I have imbibed a lot. Besides, working with the Italian Mars Society has also motivated me to learn the Italian language. So Python is not the only language that I'm practicing over this summer ;) Here's to the second term of Google Summer of Code 2015! Ciao :) Sartaj Singh(SymPy) GSoC: Update Week-5 Midterm evaluation has started and is scheduled to end by 3rd of July. So, far GSoC has been a very good experience and hopefully the next half would even be better. Yesterday, I had a meeting with my mentor @jcrist. It was decided that we will meet every Tuesday on gitter at 7:30 P.M IST. We discussed my next steps in implementing the algorithm and completing FormalPowerSeries. Highlights: • Most of my time was spent on writing a rough implementation for the second part of the algorithm. Currently it is able to compute series for some functions. But fails for a lot of them. Some early testing indicates, this maybe due to rsolve not being able to solve some type of recurrence equations. • FourierSeries and FormalPowerSeries no longer computes the series of a function. Computation is performed inside fourier_series and fps functions respectively. Both the classes are now used for representing the series only. • I decided it was time to add sphinx documentation for sequences. So, I opened #9590. Probably, it will be best to add documentation at the same time as the implementation from next time. • Also opened #9599 that allows Idx instances to be used as limits in both Sum and sequence. Tasks Week-6: • #9523's review is mostly done and should get merged soon. Also, add the documentation for Fourier series. • Polish #9572 and make it more robust. • Improve the range of functions for which series can be computed. Probably will need to improve the algorithm for solving the recurrence equations. This week is going to be fun. Lots to do :) Jazmin's Open Source Adventure Quick Update - Monday, 29 June and Tuesday, 30 June 2015 Quick update! The last two days, I: 1) Updated plots.py to reflect the updated core.py functions. 2) Updated example notebooks to include those Astroplan objects/functions. Quick Update - Friday, June 26 2015 Quick update! Today, I: 1) Worked on plots.py 2) Worked on plotting example ipython notebooks. Richard Plangger(PyPy) GSOC Mid Term Now the first half of the GSoC program 2015 is over and it has been a great time. I compared the time line just recently and I have almost finished all the work that needed to be done for the whole proposal. Here is a list what I have implemented. • The tracing intermediate representation has now operations for SIMD instructions (named vec_XYZ) and vector variables • The proposed algorithm was implemented in the optimization backend of the JIT compiler • Guard strength reduction that handles arithmetic arguments. • Routines to build a dependency graph and reschedule the trace • Extended the backend to emit SSE4.1 SIMD instructions for the new tracing operations • Ran some benchmark programs and evaluated the current gain I even extended the algorithm to be able handle simple reduction patterns. I did not include this in my proposal. This means that numpy.sum or numpy.prod can be executed with SIMD instructions. Here is a preview of trace loop speedup the optimization currently achieves. Note that the setup for all programs is the following: Create two vectors (or one for the last three) (10.000 elements) and execute the operation (e.g. multiply) on the specified datatype. It would look something similar to: a = np.zeros(10000,dtype='float') b = np.ones(10000,dtype='float') np.multiply(a,b,out=a) After about 1000 iterations of multiplying the tracing JIT records and optimizes the trace. Before jumping to and after exiting the trace the time is recorded. The difference you see in the plot above. Note there is still a problem with any/all and that this is only a micro benchmark. It does not necessarily tell anything about the whole runtime of the program. For multiply-float64 the theoretical maximum speedup is nearly achieved! Expectations for the next two months One thing I'm looking forward to is the Python conference in Bilbao. I have not met my mentors and other developers yet. This will be awesome! I have also been promised that we will take a look at the optimization so that I can further improve the optimization. To get even better result I will also need to restructure some parts in the Micro-NumPy library in PyPy. I think I'm quite close to the end of the implementation (because I started in February already) and I expect that the rest of the GSoC program I will extend, test, polish, restructure and benchmark the optimization. Prakhar Joshi(Plone) The Transform Finally!! Hello everyone, today I will tell you how I implemented the safe_html transform using lxml library of python. I tried to port the safe_html form CMFDefault dependencies to lxml libraries and tried to install my new transform in place of the old safe_html transform. So when ever our add-on is installed then it will uninstall the safe_html and install our new transoform. So there are a lot of things going on in the mind about lxml, why we use that and all.. So lets explore all these things. What is lxml ? The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. Why we need to port the transform to lxml ? Earlier the safe_html transform had the dependencies of CMFDefault and we all are working to make plone free from CMFDefault dependencies as due to CMFDefault dependencies the transform was slow and also the code base for safe_html was old and needs to be updates or we can say it needs to be changed. So as we have seen lxml is fast so we choose that for our transform. How to implement our transform using lxml ? Till now its all good that we have decided what to use to remove CMFDefault dependencies. But now main thing is how to implement the lxml for our new transoform so that it functions same as the previous old safe_html transform. So for that I have to dig the lxml libraries and find out the modules that are useful for our transform. So I found out that we have use the cleaner class of lxml package. This class have several functions like "__init__" and "__call__". So I inherited the cleaner class into my HTMLParser class and overwrite the "__call__" function according to the requirements of our transforms. Also I created a new function named "fragment_fromString()" which return the string by removing the nasty tags or element from it. Here is the snippet for the function :- def fragment_fromstring(html, create_parent=False, parser=None, base_url=None, **kw): if not isinstance(html, _strings): raise TypeError('string required') accept_leading_text = bool(create_parent) elements = fragments_fromstring(html, parser=parser, no_leading_text=not accept_leading_text, base_url=base_url, **kw) if not elements: raise etree.ParserError('No elements found') temp2 = [] if len(elements) > 1: for i in range(len(elements)): result = elements[i] if result.tail and result.tail.strip(): raise etree.ParserError('Element followed by text: %r' % result.tail) result.tail = None temp2.append(result) else: result = elements[0] if result.tail and result.tail.strip(): raise etree.ParserError('Element followed by text: %r' % result.tail) result.tail = None temp2.append(result) return temp2 After that I created the main class for our transform named SafeHTML and in that class I defined the pre configured transform status as in the nasty tags and valid tags for the transform initially. After that the transform is created that it will take the data as a stream and will give out data also as a stream. We created a data object of IDataStream class. Now after that the convert function will take data as input and will do the operations as required as if the user give the input of nasty tags and valid tags it will filter the input html accordingly or if user doesn't give the input then it will take the default configuration of the transform and will do operations accordingly. After writing that transform I test that transform with a lot of html inputs and checked their outputs also. They were all as required. There we go, tests cases were passing and the safe_html transform script we created was working perfectly. So the last thing that was left was to register our transform and remove old safe_html transform of PortalTransform. Register new transform and remove old safe_html transform on add-on installation.. As of new the transform is ready and new have to integrate with plone. For that we have to modify the setuphandlers.py file as in that file we have our add-on configuration after add-on installation. We have function class "post_install" so we will configure our transform and remove the old safe_html transform on post_installation of our add-on. There are 2 things that have to be done on the add-on installation :- 1) The old safe_html of PortalTransform have to be uninstalled/unregistered. 2) The new transform that we have created above named "exp_safe_html" have to installed. So for uninstalling the old transform we will unregister the transform with name by using the transformEngine of PortalTransform. We will get the transform name by "getToolByName(context, 'portal_transforms')" this will give us all the transform of the portal_transforms and we will just uninstall the tranfrom with name safe_html. For confirming that we will use the logger message which will say "safe_html transform un registered" . After unregistering the old safe_html its time to register our new exp_safe_html transform. For that we will use pkgutil to get the module where we have our new transform and we will register our new transform using getToolByName(context, 'portal_transforms') so by using TranfromEngine of portal Transform we will be able to register our new transform for our new add-on and put the logger message on successful registration of new transform. Finally when I ran the test cases after implementing these things, I saw the logger message as "UnRegistering the Safe_html" and then next message is "Registering exp_safe_html". Yayaya!! Finally able to register my new transform and unregister the old transform. I tried to make you understand the code as much as possible but most part of it was coding so it better to see the code for the same as it will be more clear form the code as it quite impossible to tell all the minute things done in code to be detailed here. Hope you will understand. Cheers!! Goran Cetusic(GNS3) GSOC GNS3 Docker support: The road so far So midterm evaluations are ending soon and I'd like to write about my progress before that. If you remember my last update it was about how to write a new GNS3 module. Probably the biggest issue you'll run into is implementing links between between various nodes. This is because GNS3 is a jungle of different technologies, all with their own networking technologies. Implementing Docker links is no different. Docker is different kind of virtualization than what GNS3 has been using until now -> OS-level virtualization. VMware, for instance uses full virtualization. You can read more about the difference on one of the million articles on the Internet. An important thing to note is that Docker uses namespaces to manage its network interfaces. More on this here: https://docs.docker.com/articles/networking/#container-networking. It's great, go read it! GNS3 uses UDP tunnels for connecting its various VM technologies. This means that it after creating a network interface on the virtual machine, it allocates a UDP port on that interface. But this is REALLY not that easy to do in Docker because a lot of the virtualization technologies have UDP tunnels built in - Docker doesn't. Assuming you've read the article above, this is how it will work (still having trouble with it): 1. Create a veth pair 2. Allocate UDP port on one end of veth pair 3. Wait for container to start and then push the other interface into container namespace 4. Connect interface to ubridge If you're wondering what ubridge is -> it's a great little piece of technology that allows you to connect udp tunnels and interfaces. Hardly anyone's heard of it but GNS3 has been using it for their VMware machines for quite some time: https://github.com/GNS3/ubridge The biggest problem with this is that this is all hidden deep inside GNS3 code which makes you constantly aske the question: "Where the hell should I override this??" Also, you have to take into consideration unforseen problems like the one I've mention earlier: You have to actually start the container in order to create the namespace and push the veth interface into it. Another major problem that was solved is that Docker container require a running process without which they'll just terminate. I've decided to make an official Docker image to be used for Docker containers: https://github.com/gcetusic/vroot-linux. It's not yet merged as part of GNS3. Basically, it uses a sleep command to act as a dummy init process and also installs packages like ip, tcpdump, netstat etc. It's a great piece of code and you can use it independently of GNS3. In the future I expect there'll be a setting, something like "Startup command" so users will be able to use their own Docker images with their own init process. It's been bumpy road so far, solving problems I haven't really thought about when I was writing the proposal but Docker support is slowly getting there. GNS Docker support GNS3 Docker support So the coding session for GSOC finally began this week. I got accepted with the GNS Docker support project and here is the project introduction and my plan of attack. GNS3 is a network simulator that uses faithfully simulates network nodes. Docker is a highly flexible VM platform that uses Linux namespacing and cgroups to isolate processes inside what are effectively virtual machines. This would enable GNS3 users to create their custom virtual machines and move beyond the limitations of nodes that are network oriented and because of its lightweight implementation, would make it possible to run thousands of standalone servers on GNS3. Right now GNS3 supports QEMU, VirtualBox and Dynamips (a Cisco IOS emulator). The nodes in GNS3 and the links between them can be thought of as virtual machines that have their own network stacks and communicate amongst themselves like separate machine on any other "real" network. While this is nice by itself, QEMU and VirtualBox are "slow" virtualization technologies because they provide full virtualization -> they can run any OS but this comes at a price. So while QEMU and VirtualBox can run various network services, it's not very efficient. Docker, on the other hand, uses kernel-level virtualization which means it's the same OS but processes are grouped together and different groups isolated between themselves, effectively creating a virtual machine. That's why Docker and other such virtualization solutions are extremely fast and can run thousands of GNS3 nodes -> no translation layer between host and guest systems because they run the same kernel. Docker is quite versatile when it comes to managing custom made kernel-based VMs. It takes the load of the programmer so he/she doesn't have to think about disk space, node startup, process isolation etc. Links between Docker containers pose an additional problem. In its current form, GNS3 uses UDP networking (tunnels) for all communication between all nodes. The advantage is that this is done in userland. It is very simple and works on all OSes without requiring admin privileges. However, using UDP tunnels has proven to be more difficult to integrate new emulators to GNS3 because they usually do not support UDP networking out of the box. OpenvSwitch is a production quality, multilayer virtual switch and interconnecting Docker containers and alleviating the problem of new emulators requires at least basic support for OpenvSwitch ports in GNS3. Additionally, this would enable Docker links in GNS3 to be manipulated through Linux utilities like netem and tc that are specialized for such tasks, something not possible with UDP tunnels. Let's start coding! Udara Piumal De Silva(MyHDL) Completed the read operation I was able to fix the read operation bug today. There were two underlying causes for the bug. Firstly, I reduced the timing parameter so that the write command need to be hold only for one clock cycle. But the same parameter is used for read operation. This make the modal making the data available one clock cycle earlier. As an easy fix, I introduced a separate parameter for reading. Secondly, my modal waits exactly 2 cycles before making data available. In the actual case data should be ready and stable by the second cycle. To fix this I introduced an additional block which write data in the neg edge of the first cycle. With these changes, my controller can do read and write operations without any errors. June 30, 2015 Sudhanshu Mishra(SymPy) GSoC'15: Mixing both assumption systems, Midterm updates It's been very long since I've written anything here. Here's some of the pull requests that I've created during this period: There's also this patch which makes changes in the Symbol itself to make this work. commit de49998cc22c1873799539237d6202134a463956 Author: Sudhanshu Mishra <mrsud94@gmail.com> Date: Tue Jun 23 16:35:13 2015 +0530 Symbol creation adds provided assumptions to global assumptions diff --git a/sympy/core/symbol.py b/sympy/core/symbol.py index 3945fa1..45be26d 100644 --- a/sympy/core/symbol.py +++ b/sympy/core/symbol.py @@ -96,8 +96,41 @@ def __new__(cls, name, **assumptions): False """ + from sympy.assumptions.assume import global_assumptions + from sympy.assumptions.ask import Q + cls._sanitize(assumptions, cls) - return Symbol.__xnew_cached_(cls, name, **assumptions) + sym = Symbol.__xnew_cached_(cls, name, **assumptions) + + items_to_remove = [] + # Remove previous assumptions on the symbol with same name. + # Note: This doesn't check expressions e.g. Q.real(x) and + # Q.positive(x + 1) are not contradicting. + for assumption in global_assumptions: + if isinstance(assumption.arg, cls): + if str(assumption.arg) == name: + items_to_remove.append(assumption) + + for item in items_to_remove: + global_assumptions.remove(item) + + for key, value in assumptions.items(): + if not hasattr(Q, key): + continue + # Special case to handle commutative key as this is true + # by default + if key == 'commutative': + if not assumptions[key]: + global_assumptions.add(~getattr(Q, key)(sym)) + continue + + if value: + global_assumptions.add(getattr(Q, key)(sym)) + elif value is False: + global_assumptions.add(~getattr(Q, key)(sym)) + + return sym + def __new_stage2__(cls, name, **assumptions): if not isinstance(name, string_types): Master In [1]: from sympy import * In [2]: %time x = Symbol('x', positive=True, real=True, integer=True) CPU times: user 233 µs, sys: 29 µs, total: 262 µs Wall time: 231 µs This branch In [1]: from sympy import * In [2]: %time x = Symbol('x', positive=True, real=True, integer=True) CPU times: user 652 µs, sys: 42 µs, total: 694 µs Wall time: 657 µs  I did a small benchmarking by creating 100 symbols and setting assumptions over them and then later asserting them. It turns out that the one with changes in the ask handers is performing better than the other two. Here's the report of the benchmarking: When Symbol is modified Line # Mem usage Increment Line Contents ================================================ 6 30.2 MiB 0.0 MiB @profile 7 def mem_test(): 8 30.5 MiB 0.3 MiB _syms = [Symbol('x_' + str(i), real=True, positive=True) for i in range(1, 101)] 9 34.7 MiB 4.2 MiB for i in _syms: 10 34.7 MiB 0.0 MiB assert ask(Q.positive(i)) is True  pyinstrument report When ask handlers are modified Line # Mem usage Increment Line Contents ================================================ 6 30.2 MiB 0.0 MiB @profile 7 def mem_test(): 8 30.4 MiB 0.2 MiB _syms = [Symbol('x_' + str(i), real=True, positive=True) for i in range(1, 101)] 9 31.5 MiB 1.1 MiB for i in _syms: 10 31.5 MiB 0.0 MiB assert ask(Q.positive(i)) is True  pyinstrument report When satask handlers are modified Line # Mem usage Increment Line Contents ================================================ 6 30.2 MiB 0.0 MiB @profile 7 def mem_test(): 8 30.4 MiB 0.2 MiB _syms = [Symbol('x_' + str(i), real=True, positive=True) for i in range(1, 101)] 9 41.1 MiB 10.7 MiB for i in _syms: 10 41.1 MiB 0.0 MiB assert ask(Q.positive(i)) is True  pyinstrument report On the other hand, the documentation PR is almost ready to go. As of now I'm working on fixing the inconsistencies between the two assumption systems. After that I'll move to reduce autosimplification based on the assumptions in the core. That's all for now. Cheers! Sahil Shekhawat(PyDy) GSoC Week #5 Update #2 I raised many PRs including unit tests for various parts of the API which include one giant PR including everything and atomic PRs for every single components. The main challenge was to balance between symbolic and numeric part of the API. Palash Ahuja(pgmpy) Inference in Dynamic Bayesian Network (continued) For the past 2 weeks I have spent some time understanding the algorithmic implementation for inference and implementing it. Today I will be talking about the junction tree algorithm for inference in Dynamic Bayesian Networks. For processing the algorithm, here are the following steps 1) Initialization :- This requires constructing the two initial junction trees J1 and Jt. 1. J1 is the junction tree created from the initial timeslice. is the junction tree created from the timeslice 1 of the 2TBN(2 - timeslice bayesian network).Jt is the junction tree created from the timeslice 2 of the 2TBN. Time counter is initialized to 0. Also, let the interface nodes(denoted by I1, I2 for the timeslices 1 and 2 respectively ) be those nodes whose children are there in the first timeslice. 2. If the queries are performed on the initial timeslice. Then the results can be output by the standard VariableElimination procedure where we could have the model having the timeslice 1 of the bayesian network as the base for inference. 3. For evidence, if the current time in the evidence is 0, then the evidence should be applied to the initial static bayesian network. Otherwise, it has to be applied to the second timeslice of the 2-TBN. 4. For creating the junction tree J1, the procedure as follows:- 1. Moralize the initial static bayesian network. 2. Add the edges from the interface nodes so as to make I1 a clique. 3. Rest of the procedure is the same as it was before. The above step is the only difference. 5. For the junction tree Jt, a similar procedure is followed, where there is a clique formed for I2 as well. 2) Inference procedure:- In this procedure, the clique potential from the interface nodes is passed onto the interface clique. (similar to the message passing algorithm). The time counter is incremented accordingly. So basically the junction tree Jt seems some sort of the engine where the in-clique is where the values are supplied and the out-clique is where the values are obtained, given the e The variables in the query are taken out as always at each step, and the evidence is applied also. The best part about this procedure, that this method eliminates entanglement and only the out-clique potential is required for inference. The implementation is still in progress. Vivek Jain(pgmpy) ProbModelXMl Reader And Writer I worked on ProbModelXML reader and writer module for this project.My Project involved solving various bugs which were present in the module. It also involved solving the various TODO's to be done. Some of TODO's are Decision Criteria : The tag DecisionCriteria is used in multicriteria decision making. as follows: <DecisionCriteria> <Criterion name = string > <AdditionalProperties />0..1 </Criterion>2..n </DecisionCriteria> Potential : The tag DecisionCriteria is used in multicriteria decision making. as follows: <Potential type="" name=""> <Variablies > <Variable name="string"/> </Variables> <Values></Values> </Potential> My project involved parsing the above type of XML for the reader module. For writer class my project involved given an instance of Bayesian Model, create a probmodelxml file of that given Bayesian Model. Michael Mueller(Astropy) Week 5 In our last meeting, my mentors and I talked about breaking up this summer's work into multiple major pull requests, as opposed to last year's enormous pull request which was merged toward the very end of the summer. It'll be nice to do this in pieces just to make sure everything gets into the master branch of Astropy as intended, so we're planning on getting a PR in very soon (we discussed a deadline of 1-2 weeks past last Wednesday's meeting). The idea is to have working code that handles changes to Table indices when the Table itself is modified, and after this PR we can focus more on speeding up the index system and adding more functionality. With that in mind, I mostly spent this week working on previous #TODO's, fixing bugs, and generally getting ready for a PR. Having previously ignored some of the subtleties of Table and Column copying, I found it pretty irritating to ensure that indices are preserved/copied/deep copied as appropriate when doing things like constructing one Table from another, slicing a Table by rows, etc. -- mostly because there are some intricacies involved in subclassing numpy.ndarray that I wasn't aware of before running across them. Also, while I managed to get this working correctly, there might end up being relevant time bottlenecks we need to take into consideration. I also moved the relevant tests for Table indices to a new file test_index.py (adding some new tests), and fixed a couple issues including a bug with Table.remove_rows when the argument passed is a slice object. For the actual indexing engine, I found a library called bintrees which provides C-based binary tree and red-black tree classes, so for now I'm using this as the default engine (with the optional bintrees dependency, and falling back on my pure-Python classes if the dependency isn't present). I'm looking forward to figuring out the plan for a PR at this Wednesday's meeting, and from there moving on to optimization and increasing functionality. Julio Ernesto Villalon Reina(Dipy) Hi all, I mentioned before that I was at a conference meeting (Organization of Human Brain Mapping, 2015 http://ohbm.loni.usc.edu/) where I had the great chance to meet with my mentors. Now, it's time to update on what was done during those days and during the week after (last week). As stated in my proposal, the project consists of classifying a brain T1 MRI into “tissue classes” and estimating the partial volume at the boundary between those tissues. Consequently, this is a brain segmentation problem. We decided to use a segmentation method based on Markov Random Field modeling, specifically the Maximum a Posteriori MRF approach (MAP-MRF). The implementation of a MAP-MRF estimation for brain tissue segmentation is based on the Expectation Maximization (EM) algorithm, as described in Zhang et al. 2001 ("Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm," Medical Imaging, IEEE Transactions on, vol.20, no.1, pp.45,57, Jan 2001). The maximization step is performed using the Iterative Conditional Modes (ICM) algorithm. Thus, together with my mentors, we decided to first work on the ICM algorithm. I started working on it during the Hackathon at OHBM and finished it up last week. It is working now and I already shared it publicly to the rest of the DIPY team. I submitted my first pull request called: WIP: Tissue classification using MAP-MRF https://github.com/nipy/dipy/pull/670#partial-pull-merging There was a lot of feedback from all the team, especially regarding how to make it faster. The plan for this week is to include the EM on top of the ICM and provide the first Partial Volume Estimates. Will do some testing and validation of the method to see how it performs compared to other publicly available methods such as FAST from FSL (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FAST). June 29, 2015 Sahil Shekhawat(PyDy) GSoC Week 6 After numerous discussions and iterating we have a better understanding of the problem that we want to solve. Dynamics is a challenge for me that hold me back more times than few. So, after our last meeting we decided to redefine the timeline and this blog post is all about that. Udara Piumal De Silva(MyHDL) Read Operation Bug Currently I'm stucked at a bug in reading the data using the controller. The following code snippet is the cause for this error, if rdPipeline_r.val[2] == READ_C : sdramData_x.next = sd_intf.dq.val else : sdramData_x.next = sdramData_r.val Here the rePipeline_r is a shift register that has the length of CAS+2 . At each read operation a READ_C bit ('1') is added to the end of the register. This way shift register's 1st bit position will have a READ_C exactly after the cas latency. The design repeatedly check for 1st bit position and when this is equal to READ_C it will assign the sd_intf.dq.val to sdramData_x variable. However I keep getting an error of assigning a None type value where int or intbv is expected. Wei Xue(Scikit-learn) GSoC Week 5 The week 5 began with a discussion with whether we should deprecate params. I fixed some bugs in checking functions, random number generator and one of covariance updating methods. In the following days, I completed the main functions of GaussianMixutre and all test cases, except AIC, BIC and sampling functions. The tests are some kind of challenging, sine the current implementation in the master branch contains very old test cases imported from Weiss's implementation which is never got improved. I simplified the test cases, and wrote more tests that are not covered by the current implementation, such as covariance estimation, ground truth parameter prediction, and other user-friendly warnings and errors. Next week, I will begin to code BayesianGaussianMixture. Zubin Mithra(pwntools) MIPS and MIPSel now in, doctests added I was travelling for the most part of last week, and thats why this post is coming out a bit late. Right now we have doctests for ARM, MIPS and MIPSel added in, and srop.py has been changed to use an <offset: reg> representation internally. I've made a pull request with squashed commits at https://github.com/binjitsu/binjitsu/pull/44 for those of you who wish to see the diff involved. Mark Wronkiewicz(MNE-Python) Bug Hunt C-Day plus 34 For over a week now, the name of the game has been bug hunting. I had a finished first draft since the last blog post, so I’ve been trying to get the output of my custom SSS filter to match the proprietary version with sample data. One issue that took a couple days to track down was a simple but erroneous switch of two angles in a complex spherical coordinate gradient to Cartesian coordinate gradient transformation matrix. I can’t say that this is a new class of obstacles – spherical coordinates have thrown wrench after wrench into my code since different mathematicians regularly define these coordinates in different ways. (Is it just me, or is having seven separately accepted conventions for the spherical coordinate system a bit absurd?) My project crosses a couple domains of mathematics, so wrestling with these different conventions has helped me deeply appreciate the other mathematical concepts that do have a single accepted formulation. Regardless, weeding out the spherical coordinate issue and a menagerie of other bugs has left me with a filter that produces filtered data that is similar to (but not exactly matching) the proprietary code (see some example output below). Luckily, I do have several checkpoints in the filter’s processing chain and I know the problem is between the last checkpoint and the final output. My mentors have been fantastic so far, and we have a potential bead on the last issue; the weak magnetic signals produced by the brain are measured with two major classes of MEG pickup coils: magnetometers and gradiometers. In a very simple sense, one measures the magnetic field while the other measures the spatial derivative of the magnetic field, and (because of this difference) they provide readings on very different scales that I have yet to normalize. Given some luck, this last patch could fix the issue and yield a working solution to the first half of my GSoC project! (Knock on wood.)  Exemplar data showing raw unfiltered MEG signal and the same data after the benchmark SSS filter and my own custom filtering (top). Difference between benchmark and my custom implementation (bottom). The filter in progress is close, but not quite the same as the benchmark implying there remains some bugs to fix. June 28, 2015 Stefan Richthofer(Jython) Midterm evaluation The midterm-evaluation milestone is as follows: Have JyNI detect and break reference-cycles in native objects backed by Java-GC. This must be done by Java-GC in order to deal with interfering non-native PyObjects. Further this functionality must be monitorable, so that it can transparently be observed and confirmed. Sketch of some issues The issues to overcome for this milestone were manifold: • The ordinary reference-counting for scenarios that actually should work without GC contained a lot of bugs in JyNI C-code. This had to be fixed. When I wrote this code initially, the GC-concept was still an early draft and in many scenarios it was unclear whether and how reference-counting should be applied. Now all this needed to be fixed (and there are probably still remaining issues of this type) • JNI defines a clear policy how to deal with provided jobject-pointers. Some of them must be freed explicitly. On the other hand some might be freed implicitly by the JVM - without your intention, if you don't get it right. Also on this front vast clean-up in JyNI-code was needed, also to avoid immortal trash. • JyNI used to keep alive Java-side-PyObjects that were needed by native objects indefinitely. Now these must be kept alive by the Java-copy of the native reference-graph instead. It was hard to make this mechanism sufficiently robust. Several bugs caused reference-loss and had to be found to make the entire construct work. On the other hand some bugs also caused hard references to persist, which kept Java-GC from collecting the right objects and triggering JyNI's GC-mechanism. • Issues with converting self-containing PyObjects between native side and Java-side had to be solved. These were actually bugs unrelated to GC, but still had to be solved to achieve the milestone. • A mechanism to monitor native references from Java-side, especially their malloc/free actions had to be established. Macros to report these actions to Java/JyNI were inserted into JyNI's native code directly before the actual calls to malloc or free. What made this edgy is the fact that some objects are not freed by native code (which was vastly inherited from CPython 2.7), but cached for future use (e.g. one-letter strings, small numbers, short tuples, short lists). Acquiring/returning an object from/to such a cache is now also reported as malloc/free, but specially flagged. For all these actions JyNI records timestamps and maintains a native object-log where one can transparently see the lifetime-cycle of each native object. • The original plan to explore native object's connectivity in the GC_Track-method is not feasible because for tuples and lists this method is usually called before the object is populated. JyNI will have a mechanism to make it robust of invalid exploration-attempts, but this mechanism should not be used for normal basic operation (e.g. tuple-allocation happens for every method-call) but only for edgy cases, e.g. if an extension defines its own types, registers instances of them in JyNI-GC and then does odd stuff with them. So now GC_track saves objects in a todo-list regarding exploration and actual exploration is performed at some critical JyNI-operations like on object sync-on-init or just before releasing the GIL. It is likely that this strategy will have to be fine-tuned later. Proof of the milestone To prove the achievement of the explained milestone I wrote a script that creates a reference-cycle of a tuple and a list such that naive reference-counting would not be sufficient to break it. CPython would have to make use of its garbage collector to free the corresponding references. 1. I pass the self-containing tuple/list to a native method-call to let JyNI create native counterparts of the objects. 2. I demonstrate that JyNI's reference monitor can display the corresponding native objects ("leaks" in some sense). 3. The script runs Java-GC and confirms that it collects the Jython-side objects (using a weak reference). 4. JyNI's GC-mechanism reports native references to clear. It found them, because the corresponding JyNI GC-heads were collected by Java-GC. 5. Using JyNI's reference monitor again, I confirm that all native objects were freed. Also those in the cycle. The GC demonstration-script import time from JyNI import JyNI from JyNI import JyReferenceMonitor as monitor from JyNI.gc import JyWeakReferenceGC from java.lang import System from java.lang.ref import WeakReference import DemoExtension #Note: # For now we attempt to verify JyNI's GC-functionality independently from # Jython concepts like Jython weak references or Jython GC-module. # So we use java.lang.ref.WeakReference and java.lang.System.gc # to monitor and control Java-gc. JyNI.JyRefMonitor_setMemDebugFlags(1) JyWeakReferenceGC.monitorNativeCollection = True l = (123, [0, "test"]) l[1][0] = l #We create weak reference to l to monitor collection by Java-GC: wkl = WeakReference(l) print "weak(l): "+str(wkl.get()) # We pass down l to some native method. We don't care for the method itself, # but conversion to native side causes creation of native PyObjects that # correspond to l and its elements. We will then track the life-cycle of these. print "make l native..." DemoExtension.argCountToString(l) print "Delete l... (but GC not yet ran)" del l print "weak(l) after del: "+str(wkl.get()) print "" # monitor.list-methods display the following format: # [native pointer]{'' | '_GC_J' | '_J'} ([type]) #[native ref-count]: [repr] *[creation time] # _GC_J means that JyNI tracks the object # _J means that a JyNI-GC-head exists, but the object is not actually treated by GC # This can serve monitoring purposes or soft-keep-alive (c.f. java.lang.ref.SoftReference) # for caching. print "Leaks before GC:" monitor.listLeaks() print "" # By inserting this line you can confirm that native # leaks would persist if JyNI-GC is not working: #JyWeakReferenceGC.nativecollectionEnabled = False print "calling Java-GC..." System.gc() time.sleep(2) print "weak(l) after GC: "+str(wkl.get()) print "" monitor.listWouldDeleteNative() print "" print "leaks after GC:" monitor.listLeaks() print "" print "====" print "exit" print "====" It is contained in JyNI in the file JyNI-Demo/src/JyNIRefMonitor.py Instructions to reproduce this evaluation 1. You can get the JyNI-sources by calling git clone https://github.com/Stewori/JyNI Switch to JyNI-folder: cd JyNI 2. (On Linux with gcc) edit the makefile (for OSX with llvm/clang makefile.osx) to contain the right paths for JAVA_HOME etc. You can place a symlink to jython.jar (2.7.0 or newer!) in the JyNI-folder or adjust the Jython-path in makefile. 3. Run make (Linux with gcc) (for OSX with clang use make -f makefile.osx) 4. To build the DemoExtension enter its folder: cd DemoExtension and run setup.py: python setup.py build cd .. 5. Confirm that JyNI works: ./JyNI_unittest.sh 6. ./JyNI_GCDemo.sh Discussion of the output Running JyNI_GCDemo.sh: JyNI: memDebug enabled! weak(l): (123, [(123, [...]), 'test']) make l native... Delete l... (but GC not yet ran) weak(l) after del: (123, [(123, [...]), 'test']) Leaks before GC: Current native leaks: 139971370108712_GC_J (list) #2: "[(123, [...]), 'test']" *28 139971370123336_J (str) #2: "test" *28 139971370119272_GC_J (tuple) #1: "((123, [(123, [...]), 'test']),)" *28 139971370108616_GC_J (tuple) #3: "(123, [(123, [...]), 'test'])" *28 calling Java-GC... weak(l) after GC: None Native delete-attempts: 139971370108712_GC_J (list) #0: -jfreed- *28 139971370123336_J (str) #0: -jfreed- *28 139971370119272_GC_J (tuple) #0: -jfreed- *28 139971370108616_GC_J (tuple) #0: -jfreed- *28 leaks after GC: no leaks recorded ==== exit ==== Let's briefly discuss this output. We created a self-containing tuple called l. To allow it to self-contain we must put a list in between. Using a Java-WeakReference, we confirm that Java-GC collects our tuple. Before that we let JyNI's reference monitor print a list of native objects that are currently allocated. We refer to them as "leaks", because all native calls are over and there is no obvious need for natively allocated objects now. #x names their current native ref-count. It explains as follows (observe that it contains a cycle): 139971370108712_GC_J (list) #2: "[(123, [...]), 'test']" This is l[1]. One reference is from JyNI to keep it alive, the second one is from l. 139971370123336_J (str) #2: "test" This is l[1][1]. One reference is from JyNI to keep it alive, the second one is from l[1]. 139971370119272_GC_J (tuple) #1: "((123, [(123, [...]), 'test']),)" This is the argument-tuple that was used to pass l to the native method. The reference is from JyNI to keep it alive. 139971370108616_GC_J (tuple) #3: "(123, [(123, [...]), 'test'])" This is l. One reference is from JyNI to keep it alive, the second one is from the argument-tuple (139971370108616)and the third one is from l[1]. Thus it builds a reference-cycle with l[1]. After running Java-GC (and giving it some time to finnish) we confirm that our weak reference to l was cleared. And indeed, JyNI's GC-mechanism reported some references to clear, all reported leaks among them. Finally another call to JyNI's reference monitor does not list leaks any more. Check that this behavior is not self-evident In JyNI-Demo/src/JyNIRefMonitor.py go to the section: # By inserting this line you can confirm that native # leaks would persist if JyNI-GC is not working: #JyWeakReferenceGC.nativecollectionEnabled = False Change it to # By inserting this line you can confirm that native # leaks would persist if JyNI-GC is not working: JyWeakReferenceGC.nativecollectionEnabled = False Run JyNI_GCDemo.sh again. You will notice that the native leaks persist. Next steps The mechanism currently does not cover all native types. While many should already work I expect that some bugfixing and clean-up will be required to make this actually work. With the demonstrated reference-monitor-mechanism the necessary tools to make this debugging straight forward are now available. After fixing the remaining types and providing some tests for this, I will implement an improvement to the GC-mechanism that makes it robust against silent modification of native PyObjects (e.g. via macros). And provide tests for this. Finally I will add support for the PyWeakReference builtin type. As far as time allows after that I'll try to get ctypes working. June 27, 2015 Yask Srivastava(MoinMoin) New UserSettings and various tweaks My last commit for xstatic was finally merged. The less file compiled for both the themes successfully and there were no issues even with Basic theme. Instead of making a todo list in etherpad, I have started making issues in Bitbucket. Since the theme has started coming out with basic functionality. Other people who notice the bug may also create issues there. Issues Page :  RogerHaase pointed another bug which was the weird overlay of forms and menu whenhamburgerbutton was clicked to collapse the navbar in menu bar. This issue was fixed in [cumulative patch#2 of CR](https://codereview.appspot.com/245670043/) New User Setting I finally implemented a new user setting page which uses bootstrap forms. This wasn’t as easy at it sounds. We use flatland for forms. The way we rendered the form was through pre-defined macros. But the pre-defined macros also rendered unwanted stuff such as label,table,td.. etc. So the way forms work in Moin Moin is like this. There are html macros defined in forms.html. There is a forms.py while which contains Flatlandform related constants. So lets say we wish to render a form for css input field. Code snippet : We have form’s class defined in views.py file: In this case it looks like: This class creates provides the basic skeleton for forms. The forms.py file detects the kind of html tag required for the form field, example:input text, checkbox,submit.. etc and renders the macros present in forms.htmlfile. For convenience we have macros defined which contains some unwanted stuff such as labels with table form design (td dd,dt) Editing this file would have changed the behavior in other non bootstrap themes which depend on this design. So I had to make exclusive forms.html template file for modernized theme. I also changed the setting tabs design to match the current design of the theme. Another issue I encountered was with the common.css. It contains global css style rules that are supposed to be used by all themes. But Bootstrap contains its own style rules. I was inheriting style rules from both the files which resulted in weird layout. The only hack was to override these styles. If only their was something like this: So I ended up opening developers tool and under style tab it showed me the properties which were being inherited and I manually override thos styles in my modernized theme.less file. This hack fixes the weird table layout in global history template page. Code Review patch(pending) : ChangeLogs for the patch Anyway, this is how it looked: Now Isuru Fernando(SymPy) GSoC Week 5 This week, I looked into installing SymEngine in Sage and wrappers. One issue was that we have a header named complex.h and some libraries look inside $SAGE_LOCAL/include for the C header complex and this leads to errors. It was decided to install symengine headers inside a folder called symengine so that a header could be accessed by symengine/basic.h form and it avoids clashes.

I looked at other libraries for how this is done. Some libraries have the installing headers in a separate folder like include/project_name, but .cpps under src. Some libraries have headers and source files in the same folder and they are included in project_name folder. Since SymEngine has headers and sources in the same folder, we decided to rename src to symengine. This lead to another problem. Python wrappers folder was named symengine. So I moved them to symengine/python/symengine to enable using symengine python wrappers inside the build directory (although you have to change directory into symengine/python to use it).

Some other work involved, making sure make install installed all the python tests and installing dependencies in Travis-ci without using sudo.

That's all the updates for this week. Mid evaluations are coming this week, so I hope to get a SymEngine spkg done this week.

AMiT Kumar(Sympy)

GSoC : This week in SymPy #4 & #5

Hi there! It's been five weeks into GSoC, This week, I worked on polishing my previous PR's to improve coverage and fixing some bugs.

Progress of Week 4 & 5

During the last couple of weeks my ComplexPlane Class PR #9438 finally got Merged thanks to Harsh for thoroughly reviewing it and suggesting constructive changes.

For this I Managed to improve it's coverage to perfect 100%, which is indeed satisfying, as it depicts all the new code being pushed is completely tested.

This week I also improved the Exception handling and coverage in my linsolve PR, It also have a 100% coverage.

Coverage Report

• [1] gauss_jordan_solve 100 %
• [2] linsolve : 100: %

Ref:

It's good to be Merged now.

Blocking Issue: Intersection's of FiniteSet with symbolic elements

During Week 5, While working on transcendental equation solver in solveset.py, I discovered a blocking issue in FiniteSets, which is with the Intersection of FiniteSet containing symbolic elements, for example:

In [2] a = Symbol('a', real=True)

In [3]: FiniteSet(log(a)).intersect(S.Reals)
Out[3]: EmptySet()


Currently, either FiniteSet is able to evaluate intersection otherwise it, returns an EmptySet(). (See 9536 & 8217).

To fix this, I have opened the PR #9540. Currently It fixes both the issues (9536 & 8217), but there are some failing tests using the current behaviour of FiniteSet.

For example:

In [16]: x, y, z = symbols('x y z', integer=True)

In [19]: f1 = FiniteSet(x, y)

In [20]: f2 = FiniteSet(x, z)

• In Master:
In [23]: f1.intersect(f2)
Out[23]: {x}

• It should rather be:
In [5]: f1.intersect(f2)
Out[5]: {x} U Intersection({y}, {x, z})


The current behavior of FiniteSet in Master is non-acceptable, since in the above example x, y, z are integer symbols, so they can be any integer, but in the Master , they are assumed to be distinct, which is wrong. There are such failing tests in test_wester.py here, which is updated here: aktech@e8e6a0b to incorporate with the right behaviour.

As of now there are a couple of failing tests, which needs to passed, before we can Merge #9540

from future import plan Week #6:

This week I plan to Fix Intersection's of FiniteSet with symbolic elements & start working on LambertW solver in solveset.

$git log PR #9540 : Intersection's of FiniteSet with symbolic elements PR #9438 : Linsolve PR #9463 : ComplexPlane PR #9527 : Printing of ProductSets PR # 9524 : Fix solveset returned solution making denom zero That's all for now, looking forward for week #6. :grinning: June 26, 2015 Aman Jhunjhunwala(Astropy) GSOC ’15 Post 3 : Mid Term Evaluation Mid Term Report – AstroPython (Astropy) Week 3 – Week 6 Report Date : 26th June, 2015 The mid term evaluations of Google Summer of Code , 2015 are here ! It has been an incredible 6 weeks of coding. Before I get into all the boring-geeky stuff , a big Hello from the new AstroPython web app ! Now that you’ve met AstroPython, here’s summarizing the efforts gone into it in the past 3 weeks (last report was on 5th June,2015) :- The Creation Wizard which I had put up earlier was riddled with flaws and was unnecessarily complex – code wise. So, I revamped the entire section – right from creation to displaying each article. The Creation form now is a single step – “Save Draft” or “Publish” kind of a form. Users who save a draft can come back later and complete their post to publish on the website. Articles are not moderate unless they are published. Once an article is “published” by a user, it awaits admin approval before showing up. An email is sent to all moderators stating a post has entered Moderation Queue. When an article is approved, the user gets an email stating so. Bypassing moderation for non-published articles was another speed bump for the project – and after failing at IRCs, StackExchange and other forums , I was happy to come up with a non conventional solution to the problem and resolve it soon ! A unique feature of the Creation Form is the availability of 2 types of editors – A WYSIWYG CKEditor for Rich Text Functionality and an advanced TEX supported GFM Markdown Editor. This was one of the most difficult parts of the project- Integrating 2 Editors to be dynamically replaceable in a form. Markdown has just become popular but the lack of any fully functional “plug and play” Javascript editor meant that I had to fork one according to my needs. After trying out Epic , MarkItUp , BMarkdown, etc , I successfully forked CKEditor and Pandao Editor to my needs.Additional functionality of adding code snippets was added to completely remove the need of a separate ACE Code Editor for the code snippet section. This was followed by developing Creation Forms for other sections. Here , I used relatively infamous Django Dynamic Forms to allow for maximum re-usability of existing infrastructure. This created creation forms of all sections in less than 10 lines of code. The next challenging portion was displaying rendered Markdown text on the website. I tried a lot of Markdown parsers , but there were features each one was lacking. So in the end , I used the “Preview Mode” of the current Markdown Editor to feed it raw markdown and generate HTML content to display on our web application. This was extended by displaying forms from each section in a centralized framework. Moderator approved User Editing of Posts from front end was implemented successfully next. Edit forms are being displayed in modals on selection dynamically. Disqus Comments and Social Sharing Plugins (Share on FB,Twitter,Email,etc) were integrated next, finished by a custom “Upvote – Downvote – Unvote ” function for each post which works quite well for Anonymous users also. (Generates a unique key based on IP Address and User META Address). Anonymous users also, can successfully create or edit artices on the web app. After this , we had our first round of “code cleaning” , during which we decided to “de-compartmentalize” the old app , and unify all the sections to use a common main framework. After this, most of our sections – Announcements , Blogs, Events,Educational Resources, Events, News, Packages,Snippets,Tutorials , Wiki lay complete. This was definitely one of the high point of the project and greatly benefited the timeline. “Search” was the next feature to be integrated. I initially started off with a Whoosh engine powered HayStack modular Search, but later shifted to Watson Engine. One could now search and display results from each section or all sections in a sorted manner. Filtering and Sorting Section Rolls was next – sorting by popularity, hits and date created and filtering by tags – which were discussed in the proposal were successfully implemented. Then a “My Posts” section was created to store list of all complete and incomplete posts from all the sections written by the user. This would allow users to resume editing of raw articles easily. Sidebar was populated next with recent and popular posts and our basic site was finally up ! On the front end, a lot of work has gone into the styling and layout of these pages through LESS,CSS and Bootstrap after taking into account the feedback from all my mentors. A Question and Answer Forum has just been integrated. Testing and Customization for the same remain. This completes the Work Summary of these 3 weeks ! The test server is running at http://amanjhunjhunwala.pythonanywhere.com and the Github repository is accessible at http://github.com/x-calibre/astropython. I filled in the Mid Term Evaluation forms at the Melange Website and had an extensively detailed “Mid Term Review” Google Hangout today with my mentors. I was glad that I far exceeded my mentor’s expectation and passed the Mid Term Review with “flying colors”. Hope I can carry on this work , and make the GSoC project a huge hit ! The next Blog will be up in about 15 days from now , when we open the app to a limited audience for preview ! Till then Happy Coding …..</end> Jazmin's Open Source Adventure Quick Update - Thursday 26 June 2015 Quick update! Today, I: 1) Solved some module import problems by fixing a path issue. 2) Made an ipython notebook with examples on how to use the plot_airmass function. 3) Modified plots.py. Prakhar Joshi(Plone) Getting Control panel for the add-on Last time I was able to finish things up with registering and deregistering the add-on in the plone site, so that when ever I activate my add-on from the ZMI on plone instance, it registered the default profile of my add-on, also it register the browserlayer of the add-on. So this things goes well and also there were some issues related to the versions which have been solved lately. What's Next ? After the registration of add-on we need to create a view on plone site so that when ever we click on the add-on we can get some page and configure our add-on from plone site. There are default configuration for the transform script already there and we can customize the configuration. So for that we have to create a control panel for our add-on so that user can get a platform to customize the configuration. There were 2 ways to create a control panel :- 1) Either to overwrite the old cntrol panel of PortalTransform safe_html. 2) Or to create a separate control panel for our new safe_html add-on. I choose the 2nd way to create a control panel and created a separate control panel for the add-on. How to create a control panel in plone add-on ? For creating control panel in plone add-on we have to 1) Create a schema for control pane. 2) register the control panel schema. 3) Create permissions for registering control panel. Lets start with 1st step, Create a Schema for control panel We will create the schema for the control panel in a file, where we will define FilterTagSchema which will contain space for nasty tags, stripped tags and custom tags. Similarly we will create IFilterAttributeSchema, IFilterEditorSchema and Finally in IFilterSchema we will include all the above mentioned classes. After that we will create FilterControlPanelForm which will allow the above defined schema on the plone site. Here is the snippet for FIlterControlForm :- class FilterControlPanelForm(controlpanel.RegistryEditForm): id = "FilterControlPanel" label = _("SAFE HTML Filter settings") description = _("Plone filters HTML tags that are considered security " "risks. Be aware of the implications before making " "changes below. By default only tags defined in XHTML " "are permitted. In particular, to allow 'embed' as a tag " "you must both remove it from 'Nasty tags' and add it to " "'Custom tags'. Although the form will update " "immediately to show any changes you make, your changes " "are not saved until you press the 'Save' button.") form_name = _("HTML Filter settings") schema = IFilterSchema schema_prefix = "plone" def updateFields(self): super(FilterControlPanelForm, self).updateFields() Observer here we used schema for the filter control form as IFilterSchema which further includes all the classes as mentioned above. Now finally we will wrap the control panel form and this will help us to get our control panel on plone site. Register the Control panel This was just the first step, but now after defining the control panel we have to register the control panel in the configuration.zcml in the generic way. Here is the snippet of code done to register the control panel :-  Here we have registered the browser page with the name safe_html_transfrom-settings, for IPloneSiteRoot of CMFPlone and using our own add-on browser layer and importing our controlpanel class. Adding permissions for control panel We will notice that we have added the permissions at the end of the setup and for that we will create a separate file named permissions.zcml and import that file in the configuartion.zcml. The permission.zcml file looks like this :-  After adding these permissions to generic setup and configuring the control panel we will be able to see the controlpanel on the plone site. Here is the snippet Finally after that the control panel thing is working perfectly. What is Next ?After that the main thing left before mid-term evaluation is to register the safe_html trasnsform in the add-on and BTW the safe_html transform is almost ready. I will explain that in next blog.Hope you like it!!Cheers!!Happy Coding. Andres Vargas Gonzalez(Kivy) Kivy backend using Line and Mesh graphics instructions This is the first prototype for the backend, points are extracted from the path an transformed into polygons or lines. Line and Mesh are used in the kivy side to render these objects in a widget canvas. Labels are used for displaying text. Below some examples can be found. The lines are not well defined and the next is to optimize this drawing as well as the text. Some attributes should be added to the labels, positioning is another problem. Daniil Pakhomov(Scikit-image) Google Summer Of Code: Optimizmizing existing code. Creating object detection module. The post describes the steps that were made in order to speed-up Face Detection. Rewriting of MB-LBP function into Cython The MB-LBP function is called many times during the Face Detection. For example, in a region of an image that contains face of size (42, 35) the function was called 3491 times. The sliding window approach was used. These numbers will be much greater if we use bigger image. This is why the function was rewritten in Cython. In order to make it fast, all the Python calls were eliminated and the function now uses nogil mode. Implementing the Cascade function and rewriting it in Cython In the approach that we use for Face Detection the cascade of classifiers is used in order to detect the face. Only faces pass all stages and are detected. All non-faces are rejected on some stage of cascade. The cascade function is also called a lot of times. This is why the class that has all the data was written in Cython. As opposed to native Python classes, cdef classes are implemented using struct C structure. Python classes use dict for properties and method search which is slow. Other additional entities that are needed for cascade to work were implemented using pure struct C structure. New module For the current project I decided to put all my work in skimage.future.objdetect. I did this because the functions can be changed a lot in the future. The name objdetect was used because the approach that I use will make it possible to detect not only faces but other objects on which the classifier can be trained. Sumith(SymPy) GSoC Progress - Week 5 Hello, this post contains the fourth report of my GSoC progress. We hit Piranha's speed, the highlight of this week. Progress We were able to reach Piranha's speed. At an average 14-ish ms to the benchmark, we are happy enough (still can be improved) to start wrapping this low-level implementation to a Polynomial class. Last week I had reported speed 23ms and this week we are better. We had missed out a compiler flag, DNDEBUG to indicate Release mode of Piranha leading to slow-down, #482. Adding this compiler flag means we should not be using assert statement, which SymEngine does in SYMENGINE_ASSERT and test files too. These had to be sorted out if Piranha were to be a hard dependency of SymEngine's polynomial module. Hence, the issue of moving the tests suite from asserts to a well-developed test framework came up again, #282. We explored a couple, but Catch still seemed to be the best option. Catch was implemented, which is a benefit to SymEngine in the long run too. As for the SYMENGINE_ASSERT, we decided to change our macro to raise an exception or just abort the program. Catch is a very good tool. We thank Phil Nash and all the contributors for making it. Next up, wrapping into Polynomial. • We need some functionality to convert a SymEngine expression(Basic) into one of hashset representations directly. Now I convert basic to poly and then to hashset as just getting the speed right was the issue. • Domains of coefficients need to be thought of. SymPy and Sage will be need to looked into and their APIs need to be studied. We need ZZ, QQ and EX, the work for EX has been done by Francesco Biscani, this will be patched for the latest master and commited in his name. There could also be an automatic mode, which figures out the fastest representation for the given expression, at the price of a little slower conversion, as it needs to traverse the expression to figure out what representation fits. • tuple to packed conversion when exponents don't fit. Also encode supports signed ints which is a boon to us, as we don't have to worry about negative exponents. For rational exponents we use tuple. I still haven't figured out the reason for slow down of expand2 and expand2b in my packint branch. I have been suggested to use git bisect. Will do next week. Report expand2d results: Result of 10 execution: 14ms 14ms 14ms 15ms 14ms 15ms 14ms 14ms 15ms 14ms Maximum: 15ms Minimum: 14ms Average: 14.3ms Here, the evaluate_sparsity() gave the following result for the hash_set 0,11488 1,3605 2,1206 3,85 Piranha has the following results Average: 13.421ms Maximum: 13.875ms Minimum: 12.964ms A more detailed report of benchmarks and comparisons can be found here A minor PR where MPFR was added as a Piranha dependency, 472 was merged. Another PR in which the tests where moved to Catch is good to play with and merge, minor build nits remaining, 484. Targets for Week 5 • Figure out the reason for slow down in benchmarks, fix that. • Change the SYMENGINE_ASSERT macro to raise an exception. • Add the DNDEBUG flag for with Piranha builds, as now SymEngine doesn't use assert, close issue #482. • Port @bluescarni's work of EX to SymEngine. • Wrap the lower-level into Polynomial for signed integer exponents in ZZ domain with functionality atleast that of UnivariatePolynomial. That's all this week. Sbohem Shivam Vats(SymPy) GSoC Week 5 I am almost half way through my project and it has been an amazing learning experience. According to my timeline I am supposed to finish ring_series work in SymPy by the mid-eval (i.e, by next week) and start porting it to Symengine after it. Though I faced some hiccups, especially in figuring out how to deal with a symbolic ring, the deadline is achievable. Till now • Series reversion has been added to ring_series after this PR of mine got merged. • I have started modifying the functions to operate on constant terms in the input series with this PR. I plan to finish it by this week. We are using the EX domain to represent the symbolic coefficients. • The PR to add puiseux series has not been yet merged. I have added a check_precision function that gives an approximate number of iterations with which the ring_series functions will output the series of the requested order. Next Week I expect both the pending pull requests to get merged by this week. After that the only major task remaining, would be to write the new Series class structure. So, the tasks are: • Discuss and write the new Series class structure. • Finish pending tasks, if any, and write more tests. Yue Liu(pwntools) GSOC2015 Students coding Week 05 week sync 09 Last week: • issues #36 fixed, support call/jmp/bx/blx in anywhere. • issues #37 set ESP/RSP fixed, but need re-write the migrate() method. • issues #38 partial fixed. • issues #39 partial fixed. • Update some doctests. Next week: • Optimizing and fix potential bugs. • Add some doctests and pass the example doctests. Aron Barreira Bordin(Kivy) Kivy Designer Development Hi! I had some tests on my University last week, so I've done a smaller progress in my development. Events/Properties viewer UI I did some modifications to Events and Properties UI, and fixed some related bugs. • Add custom event is working • Added a canvas line between properties and event names • Displaying Info Bubble in the correct position (information about the current event being modified) Designer Code Input • Added line number to KvLangArea • Implemented ListSetting radio Designer Tabbed Panel Implemented a closable and auto sizable TabHeader to Designer Code Input. Now it's possible to close open tabs :) Implemented "smart" tabs design. Change the tab style to inform if there is something wrong with the source code, if the content has modification, and to say the git status. (Only design, not yet working) Bug fixes I fixed a small bug to Python Console. Edit Content View was not working with DesignerCodeInput. Working now. Thats it, thanks for reading :) Aron Bordin. June 25, 2015 Yask Srivastava(MoinMoin) GSoC Updates | Hackathon | Teaching Django Informal Intro Ah! This week was a bit hectic. But I was able to do considerable amount of work. @yask123 I got all my pending code reviewed and commited the changes to my repo after resolving issues. The patch had to go through a number of interations to resolve the issues in prior patches. The last patch fixed all the major bugs. As I mentioned in my previous post I ported modernized stylus theme to bootstrap by making changes in Global Tempates. But Roger Haase suggested to make exlusive templates for Bootstrap themes as making changes to Global templates would restrict all the other theme developers to use Bootstrap’s components such as row , col-md-x , nav , panel.. etc. I also made changes in Global Templates to make sure that it doesn’t conflict with any bootstrap theme that work on top of it. Show me the code!! Some of the bugs in previous CRs were: • HTML validation error due to use of form inside ul and unclosed div tags.This is fixed in my last commit. • Design break issues in mobile views. Fixed in commit: • Design break issue when breadcrumb’s patch is too large. Fixed in this commit. The issues with last CR’s were discussed with mentors and fixed. Quick summary of my commits: I actually made a new branch in my fork of repo called improvethemes. Since I am doing things step by step and some things get broken in intermediatlry stages, It wouldn’t have been right to commit changes in main branch. This can be easily merged when this feature is working 100% without any bugs. Now back to summary: I have made 3 commits as yet , 4th one with improvements in usersetting page is exptected soon :). Anyway: 1. Commit #1 : Created a new branch improvethemes 2. Commit #2 : Wrote a new modernized theme based on bootstrap and also made it’s template files (layout.html, global_history.html) The template contains all the basic components such as navbar,sub menu, item menu, breadcrumb, footer.. etc. 3. Commit #3 : Further improvements in modernized theme and few style fixes in basic theme Improvements in modernized theme:# Added common.css Current opened tab now highlights in menu Various css rules written to work on top/with common.css Fixed the issue ‘jumping of footer while changing tabs in user setting’ Fixed issue with breadcrumbs when the location address gets too long. Fixed footer jump while changing tabs in user setting in basic theme Fixed design break issue in Basic theme’s subscription box: http://i.imgur.com/4s1CIb3.png Fixed design break in small resolution and removed form from under ‘ul’ 4. Commit #4: Fixed HTML validation error due to unclosed div tag I also updated xstatic bootstrap here is the commmit. This updates Bootstrap to version : 3.3.5 Show me the screenshots!! Other Updates ? Hackathon Yea! I participated in continuous 34 hours AngelHack Hackathon this week. It was a greeat experience and we made an opensource chat summarizer tool. I am really proud of this app. We worked together all night all day! It was a great experience. Well done Vinayak Mehta, Ketan bhutt , Pranu!. About this app: Summarize it is a chat summarizer plugin for instant messaging applications. It summarizes the large content of chat logs which enables users to quickly understand the current context of the conversation. Currently Summarize it works on top of Slack as its plugin. App Link One last thing…! Teaching Django I have started teaching Django web developement to college students as a part of their summer training. First class was on tuesday which was an introductory class. All of the students are enthusiastic! I really like Django and this is going to be a great experiance. Sartaj Singh(SymPy) GSoC: Update Week-4 This week was mostly spent on completing the rational algorithm for computing Formal Power Series of a function. It took some time, mainly due to testing. Rational algorithm is now mostly complete and I have opened PR-#9572 bringing in the changes. It is still a work in progress. There is still lots to do and test. So, am gonna spend the next few weeks implementing the rest of the algorithm. So far, the results are good. It is in general faster than the series function already implemented in SymPy. Tasks Week-5: • Get PR-#9523 polished and merged. • Improve the FormalPowerSeries class. • Start implementing SimpleDE and DEtoRE functions. I guess, that's it. See you later. Saket Choudhary(statsmodels) Week 5 Update This week was a bummer. I tried profiling MixedLM, and did all mistakes possible. My foolish attempts are discussed here: https://github.com/statsmodels/statsmodels/issues/2452 Andres Vargas Gonzalez(Kivy) Writing a matplotlib renderer for Kivy A Renderer is a class that knows all the graphics instructions to draw in a canvas. The first implementation involved to use the canonical agg renderer to get a figure which we could texture in a rectangle, this is fully explained in a previous post. The objective of my implementation is to have a backend implemented using kivy graphics instructions. The RendererBase class defines all the required methods to do so. RendererKivy extends from it and implements draw_path, other 5 different methods can be implemented but the most important is draw_path. In this method all the required information to draw on the canvas is provided however there is another class as well required to change the style of elements rendered. This class is GraphicsContext which is a middle class between a renderer and an Artist. Artist is one of the layers in the hierarchical architecture of matplotlib, the other two layers are backend and scripting. Basically, everything that can be drawn in a canvas is an Artist object such as Axis, Tick, Figure, etc. A GraphicsContextKivy is defined which extends from GraphicsContextBase and specifically translates matplotlib context instructions such as line width, line style, caps, joints, colors, etc. into the equivalent in Kivy. As can be seen in the figure below the lines are dashed and the foreground of the canvas was changed. At the moment there are three issues: the first one is that kivy implements caps for lines {‘square’, ’round’, ‘none’} and matplotlib {‘butt’, ‘projecting’, ’round’}. the second issue is that dashes in kivy are supported depending on the line width and finally, kivy does not support dynamic values for dash offset and dash length. The renderer, receives this object GraphicsContext and applies such styles defined there into kivy context and vertex instructions. As can be seen in the figure a very simple drawing of the information received in the draw_path is performed. Basically lines are created with all the vertex received but a path is more complex than that and represents a series of possibly disconnected, possibly closed, line and curve segments. The next post will be about this class Path and which kivy instructions are being used to implement the renderer. June 24, 2015 Siddhant Shrivastava(ERAS Project) The Half-Life of Telerobotics Hi all! If you've been following my previous posts, you'd have known that the Telerobotics module has been simmering for a couple of weeks. I'm happy to announce that it is almost complete and would hopefully be integrated with Vito's Bodytracking module. The last week (week four and five) were the busiest weeks of GSoC for me. Learning Experience • I learnt A LOT about Python Software Development • Different types of software architectures, • The development process of Python by one of the members of the Italian Mars Society who has been the reason I'm able to write more Pythonic code - Ezio Melotti • PyTango Development • ipython and how helpful it can be for Tango applications • Message queues - Both ROS and Tango utilize ZeroMQ - which makes integration of ROS and Tango much scalable • SIFT in Python - I will be working with my mentor Fabio Nigi on this very soon • Making my own stereo camera Deliverables • A ROS node which collects information from all interesting topics from the Husky robot. This can be found here • A Tango Server which integrates with ROS to provide diagnostic information from the robot (Battery Status, Temperature Levels, Current Draw, Voltate, Error Conditions ) • A simulated version of the Tango server for the Planning and Scheduling application that Shridhar is working on. These can be accessed here • Soft Real-time network streaming FFMPEG server and Blender Client for a single camera video stream. This can be found here Under heavy Development • Integration of Bodytracking with Telerobotics. The following message format has been decided upon by the mentors and students: # Attribute definitions for various diagnostic messages moves = attribute(label="Linear and angular displacement", dtype = (float,), display_level = DispLevel.EXPERT, access = AttrWriteType.READ, unit = "(meters, radians)", fget="getMoves", polling_period = POLLING, max_dim_x = 2, max_dim_y = 1, doc="An attribute for Linear and angular displacements")  Vito's Bodytracker would publish events in the form of Tango events. The associated data would be a float tuple of dimensions 2,1 (2 columns, 1 row). Such a tuple, like (3.4, 1.2) would specify a relative linear and angular displacement of the astronaut. My Telerobotics module would subscribe to this Tango event and transform this data to a Twist message that the Husky can understand. • Extension of Camera Streaming to a dual camera setup. I am extending the streaming capabilty for a stereo camera. Mid-term evaluations start tomorrow! Eagerly looking forward to them. It has been an eventful and productive half summer of code. I hope the next half is even more exciting and challenging as the one that passed. Ciao Chau Dang Nguyen(Core Python) Week 4 In the previous week, I have been testing the code and preparing the documentation. For testing, I was interested in Pyrestest at the beginning because of its simplicity. However it doesn’t support dynamic variable assign between tests, so I decided to switch to Javascript. This also brings convenience in the next step, I will need to write a demo client using Javascript. For documentation, Swagger.io is a great tool. It provides user friendly, quick testing, support authentication & API key. I still feel like the code needs more polishing and it is not good enough for demonstration. So I decided to hold back the demo version for a while. In the mean time I will tell more details about the project in the blog posts. The REST handler will be separated from the rest of the tracker. Any request uri begins with ‘/rest’ will be forwarded to REST handler. At this point, request is divided into 3 groups ‘class’, ‘object’, and ‘attribute’ with 5 possible actions ‘GET’, ‘POST’, ‘PUT’, ‘DELETE’ and ‘PATCH’. Example: • 'issue' is a class, ‘GET’ request to /rest/issue will return the whole collection of “issue” class. • 'issue12' is an object, ‘DELETE’ request to /rest/issue12 will delete the issue12 from the database • 'title' is an attribute of 'issue12', ‘PUT’ request to /rest/issue12/title with form “data=new title” will make the title of issue12 becomes “new title” REST Handler also accept HTTP Header ‘X-HTTP-Method-Override’ to override GET, POST with PUT, DELETE and PATCH in case the client cannot perform those methods. For PUT and POST, the object will be using form data format Error status will appear in both HTTP Status code and response body, so if the client cannot access the response header, it will retrieve the status from the response object. Detail information of the standard will be published in the documentation. Sahil Shekhawat(PyDy) GSoC Week #4/5 Update #1 We finally released PyDy 0.3.0. It was a big release and included a lot of new things like our new visualizer. Here is the official release statement from Jason. Keerthan Jaic(MyHDL) MyHDL GSoC Update MyHDL has two major components – core and conversion. The core allows users to model and simulate concurrent, event driven systems(such as digital hardware) in Python. A subset of MyHDL models can be converted to Verilog or VHDL. Over the last couple of weeks, I’ve been working on improving MyHDL’s test suite. The core tests were written using unittest, and we used py.test as the test runner. However, MyHDL’s core relies on a few hacks such as global variables. This did not play well with pytest and prevented us from automating boring test activities with tox. Additionally, one core test relied on the behaviour of the garbage collector and could not be run with PyPy. I’ve converted all the core tests to pytest, and PyPy can run our entire testsuite again. Now, we can also use tox and pytest-xdist to rapidly verify that tests pass in all the platforms we support. The conversion tests are a little trickier. MyHDL uses external simulators such as iVerilog, GHDL and Modelsim to verify the correctness of converted designs. The test suite currently uses global variables to pick the simulator, and the suite must be repeated for each simulator. This is cumbersome and inefficient because MyHDL’s conversion and simulation modules are re-run for every simulator. I’m currently working on using a combination of auto detection and pytest custom CLI options to simplify the process of testing against multiple simulators. Furthermore, The test suite generates a number of converted Verilog/VHDL files and intermediate simulation results which are used for validation. These files are clobbered every time the tests are run. This makes it harder to compare the conversion results of different branches or commits. I’ve implemented a proof of concept using pytest’s tmpfile fixture to isolate the results of each run. Along the same lines, I’ve uploaded a small utility which uses tox to help analyze the conversion results of different versions of MyHDL and Python. I’ve also made a few minor improvements to the conversion test suite: A bug fix for Modelsim 10.4b, and support for the nvc VHDL simulator. Finally, I’ve been exploring ways to reduce the redundancy in MyHDL’s core decorators and conversion machinery. After I finish improving the conversion tests, I will send a PR upstream and begin working on improving the robustness of Verilog/VHDL conversion. June 23, 2015 Gregory Hunt(statsmodels) Working Log Likelihood Got a working log likelihood function it would seem. Still need to do some more testing. Opened an issue on github which fairly accurately describes the situation. We're going to move to start thinking about implementing the score function now. We'll have to think about how to numerically integrate these integrals which may not be as nice as those in the log likelihood function. There were a couple of issues in implementing the log likelihood function. The last main issue was implementing enough algebra in the code so that numpy didn't have to handle large numbers which potentially were problematic. Sahil Shekhawat(PyDy) GSoC Week 5 I raised many PRs including unit tests for various parts of the API which include one giant PR including everything and atomic PRs for every single components. The main challenge was to balance between symbolic and numeric part of the API. Michael Mueller(Astropy) Week 4 At this point, the infrastructure for indexes is pretty much in place, and the goal now is to work on getting as much speed as possible out of each index. This week, I've begun trying a couple things to improve the performance of the basic binary-tree index implementation, at this stage mostly just algorithmic. As a first step I implemented a red-black tree (in Python for now, though I hope to cut it down to C level); its advantage is that all major operations (add, remove, find) are in O(log n) as guaranteed by the relative balancing of the tree height. The self-balancing part involves keeping to a standard set of rules about the tree, whose nodes have a "color" of either red or black: the root must be black, every red node must have two black children, leaf (null) nodes are black, and any path from a node to its descendants must have the same number of black components. Currently, the performance of the red-black tree in an index overshadows the vanilla binary search tree, but is still not as good as I'd like. Per my mentors' suggestion, I also implemented a sorted array implementation that can find elements in O(log n), while adding elements has a worst-case running time of O(n). While this makes sorted arrays worse when there are many modifications, the common use-case of an index used on an unmodified Table seems to justify their use. After first initializing the sorted array using a fast (probably in C) numpy sort, modifications to the array occur in-place (presumably the underlying Python list shifts elements on one side) and the find() method uses a binary search. Currently, this implementation outperforms the non-indexed version by about 10% for group_by, and I'm looking to improve on its performance for where(), which is probably the most time-critical function. June 22, 2015 Vipul Sharma(MoinMoin) GSoC 2015: Coding Period (7th June - 22nd June) In the 3rd week of the coding period, I worked on file upload feature, to upload files in the ticket create and modify view. Now one can upload any patch file, media file or screenshots. CR (for file upload feature) : https://codereview.appspot.com/246020043/ We had a meeting over our IRC channel where I discussed about my work and cleared few doubts with my mentors. I also worked on improving the UI of ticket create and modify views and made it look more consistent in both basic and modernized themes. Basic Theme (Before) Basic Theme (After) Modernized Theme (Before) Modernized Theme (After) 360x640 view Lucas van Dijk(VisPy) GSoC 2015: Vispy progress report Another two weeks have passed! At some point every piece fell together and all arrow shader code became clear to me, which has resulted in all Glumpy arrows ported to Vispy! A few selected examples can be seen in the image below. It's not perfect yet, the OpenGL shader tries to automatically calculate the orientation of the arrow head, but it is often slightly off. Also note that I've also added the new "inhibitor" arrow head. We've decided to document the principles behind this, and I used a few days to write a tutorial about it, which can be found here. However, there's a big update coming to Vispy changing quite a lot about the visual and scene system, so it requires a few changes to the code before it can be merged. The coming weeks I'll start thinking about the design of the network API! Wei Xue(Scikit-learn) GSoC Week 5 The week 5 began with a discussion with wheter we should deprecate params. I just fixed some bugs about checking functions and PRNG. Ziye Fan(Theano) [GSoC 2015 Week 4] This week I'm working with an optimization of inplace_elemwise_optimizer. The idea is described here. In the current version, when inplace_elemwise_optimizer trying to replace outputs of a node, the graph can become invalid, therefore validate() is called frequently to make sure the graph unbroken. But validate() is very time consuming, and the goal of this optimization is to make this optimizer more efficient by applying new validating strategy. However, this optimization did not work as expected. The total optimization time become 10 times larger: 370.541s for fgraph.validate()...1497.519540s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 33) - 315.055s The origin version: 72.644s for fgraph.validate()...143.351832s - ('inplace_elemwise_optimizer', 'FromFunctionOptimizer', 34) - 30.064s After several small optimization pointed out by my mentor, the time become 1178s. Why is it slower? I think it is because we are trying to apply the optimizer successfully on all nodes. It is a trade-off between the time took by validate() and the number of nodes optimized. In the past, all failed nodes are ignored directly, so it was fast. Now we are trying to apply on them again and again. validate() is called for much more times than before. Here is a figure I just plotted to display the nodes number to optimize in each iteration. In this figure, we know that although it is slower now, there is more nodes are optimized. A better balance should be taken in the trade-off, maybe to make the iteration stop earlier is a good choice? Or maybe the validate() can be optimized? I'm still working on this. Please tell me if you have any idea. Thank you. [GSoC 2015 Week 3] Hi, this is my third post of weekly record for GSoC 2015. What I did this week is to implement an optimization on local_dimshuffle_lift optimizer. This optimizer does following transfers, DimShuffle(Elemwise(x, y)) => Elemwise(DimShuffle(x), DimShuffle(y))DimShuffle(DimShuffle(x)) => DimShuffle(x) This optimizer is a local optimizer, which means it will be called by global optimizers several times on different nodes. For example, here is a function graph, DimShuffle{1,0} [@A] '' |Elemwise{mul,no_inplace} [@B] '' |DimShuffle{x,0} [@C] '' | |Elemwise{add,no_inplace} [@D] '' | |<TensorType(float64, vector)> [@E] | |DimShuffle{x} [@F] '' | |TensorConstant{42} [@G] |Elemwise{add,no_inplace} [@H] '' |<TensorType(float64, matrix)> [@I] |DimShuffle{x,x} [@J] '' |TensorConstant{84} [@K] If we apply local_dimshuffle_lift on this graph, it can be applied for at least 6 times, looking carefully at how this optimization works we will find that an optimization applied on node @A results in two new subnodes that can be applied on again. So the idea is to recursively apply the optimizer. So that the optimizer will applied less times than before. An additional test case is also added. I think perhaps doing things recursively can be replaced by a iterative way? I'll do some experiments first to know recursion's efficiency. Also, this optimization makes me recall another optimization on the to-do list, which will also be done recursively. Is there possibility to extract this pattern out? Mridul Seth(NetworkX) GSoC 2015 Python Software Foundation NetworkX Biweekly Report 2 Hello folks, this blog post is regarding the work done in Week 3 and Week 4 of Google Summer of Code 2015. After some discussion in #1546 we decided to make a new branch iter_refactor and decided to spilt up the changes into multiple pull requests, as it will be easier to review and will help avoid merge conflicts, as these changes touch a lot of files. The work done in week 1 and 2 is now merged. The following methods now return an iterator instead of list and their *iter counterparts are removed. A simple example of these changes using a directed graph with two edges, one from node 1 to node 2 and one from node 2 to node 3.  In [1]: import networkx as nx In [2]: G = nx.DiGraph() In [3]: G.add_edge(1, 2) In [4]: G.add_edge(2, 3) In [5]: G.nodes() Out[5]: dictionary-keyiterator at 0x10dcd9578 In [6]: list(G.nodes()) Out[6]: [1, 2, 3] In [7]: G.edges() Out[7]: generator object edges at 0x10dcd5a00 In [8]: list(G.edges()) Out[8]: [(1, 2), (2, 3)] In [9]: G.in_edges(2) Out[9]: generator object in_edges at 0x10dcd5aa0 In [10]: list(G.in_edges(2)) Out[10]: [(1, 2)] In [11]: G.out_edges(2) Out[11]: generator object edges at 0x10dcd5eb0 In [12]: list(G.out_edges(2)) Out[12]: [(2, 3)] In [13]: G.neighbors(2) Out[13]: dictionary-keyiterator at 0x10dcd9c00 In [14]: list(G.neighbors(2)) Out[14]: [3] In [15]: G.successors(2) Out[15]: dictionary-keyiterator at 0x10dcd9db8 In [16]: list(G.successors(2)) Out[16]: [3] In [17]: G.predecessors(2) Out[17]: dictionary-keyiterator at 0x10dcd9f18 In [18]: list(G.predecessors(2)) Out[18]: [1]  During the review we also found a bug in the core DiGraph class, which was a surprising thing as this code has been there from 2010. 5 years is a long time for a bug like this in a crucial place. The bug is now fixed. #1607 We started working on the degree (#1592) and adjacency (#1591) methods. After a detailed conversation we decided to work on a new interface for degree. Now G.degree() will return the degree of node if a single node is passed as argument of G.degree(), and it will return an iterator for a bunch of nodes or if nothing is passed. The implementation work is in progress at #1617. I plan to complete this in this week.  In [1]: import networkx as nx In [2]: G = nx.path_graph(5) In [3]: G.degree(2) Out[3]: 2 In [4]: G.degree() Out[4]: generator object d_iter at 0x11004ef00 In [5]: list(G.degree()) Out[5]: [(0, 1), (1, 2), (2, 2), (3, 2), (4, 1)]  We have also started a wiki regarding various ideas discussed for NX 2.0. We also have a release candidate for v1.10, everyone is welcome to try it out and report any issue here. As a side project I also started making networkx tutorials based on ipython notebooks. Feel free to correct me and contribute to it NetworkX Tutorial :) Cheers! PS: A note on my workflow regarding this work. Rupak Kumar Das(SunPy) Mid-Term time And I am back for a report! The last two weeks were spent mostly in reading code for the implementations. Unfortunately, the curved cut implementation for the Cuts plugin seemed complex and time-consuming so it has been put off till later. So I completed the Save feature by adding save support to the MultiDim plugin. Now it can save the slice as an image and also can generate a movie. And I finally figured out the Slit plugin! It was only a simple matter of plotting the array as an image but I overthought it. The only thing left to figure out is how to display it using Ginga’s viewer. I am also working on the Line Profile View feature which will plot a pixel’s intensity vs all wavelengths in the data. Cheers! Shivam Vats(SymPy) GSoC Week 4 Though I had resolved to update my blog posts by Friday every week, this one is quite late. This is mostly because this week was one of the most confusing yet, in terms of figuring out how to get things done within the existing code structure of Sympy. And that process is still on. So Far PR 9495 is under review. It took some time to decide how the precision/order of Puiseux series needs to be handled; We had an interesting discussion on reverting a series here. Fredrik Johansson suggested a fast algorithm of his for it. I also got to know a very ingenuous way to expand trigonometric functions. For example, for exponential: def exp_series(A, n): B = [exp(A[0])] for k in range(1, n): B.append(sum(j*A[j]*B[k-j]/k for j in range(1,min(len(A),k+1)))) return B  Possibly the most confusing part of the project is to get ring_series working over an Expression domain, i.e, the coefficients can be any SymPy symbolic expression. Multivariate series need to have multiple gens, implying that in a multivariate series, the coefficients can be symbolic functions of PolyElement objects. However, PolyElement class is of type CantSympify, which means I can't use it in SymPy functions. I had quite a few discussions with my mentors over it and I know now what the issues are. I need to solve them next week. Next Week • Finalise how to handle symbolic coefficients and finish it • Read Fredrik's paper and try to implement it. Cheers! June 21, 2015 Patricia Carroll(Astropy) Modeling Galaxies In 1962, J.L. Sersic empirically derived a functional form for the way the light is spread out across a galaxy. This is called the Sersic surface brightness profile. The Sersic index n determines how steeply the light intensity drops off away from a galaxy's center, and different values of n describe different galaxy populations. An index of n=4 for example (a.k.a. de Vancouleurs profile), well describes giant elliptical galaxies, whereas smaller star forming spiral galaxies like the Milky Way are best described with an exponential profile, n=1. Coming soon to Astropy are the Sersic1D and Sersic2D model classes. This is my first substantial code contribution to the project and I hope it proves useful to the astronomy community. This was also a great stepping stone to developing more complex functionality as I move forward with implementing bounding boxes and fast image rasterization. In [1]: from IPython.display import Image Image(filename="sersic_eqn.jpg",width=500)  Out[1]: In [2]: import os os.chdir('/Users/Patti/gsoc/astropy/')  In [3]: % matplotlib inline import numpy as np import matplotlib.pyplot as plt from astropy.modeling.models import Sersic1D,Sersic2D from astropy.visualization import LogStretch from astropy.visualization.mpl_normalize import ImageNormalize import seaborn as sns sns.set_context('poster') sns.set_style('white',{'grid':False})  In [4]: with plt.style.context('dark_background'): plt.figure(figsize=(25,8)) plt.subplots_adjust(wspace=0) plt.subplot(121,xscale='log',yscale='log') s1 = Sersic1D(amplitude = 1, r_eff = 5) r=np.arange(0,100,.01) for n in range(1,10): s1.n = n plt.plot(r,s1(r),alpha=1,lw=2) yl=plt.ylim(1e-2,1e3) yl=plt.xlim(1e-1,30) plt.xticks([]) plt.yticks([]) plt.ylabel('log Surface Brightness',fontsize=25) plt.xlabel('log Radius',fontsize=25) t=plt.text(.25,1.5,'n=1',fontsize = 30) t=plt.text(.25,300,'n=10',fontsize = 30) plt.title('Sersic1D model',fontsize=30) amplitude=1. r_eff=25. n=4. x_0 = 50. y_0 = 50. ellip = .5 theta = -1. x,y = np.meshgrid(np.arange(1000),np.arange(1000)) mod = Sersic2D(amplitude = 1, r_eff = 250, n=4, \ x_0=500, y_0=500, ellip=.5,theta=-1) img = mod(x,y) norm = ImageNormalize(vmin=1e-2,vmax=50,stretch=LogStretch()) plt.subplot(122) plt.imshow(img,origin='lower',interpolation='nearest',cmap='binary_r',norm=norm) plt.xticks([]) plt.yticks([]) cbar=plt.colorbar() cbar.set_label('Surface Brightness',rotation=270,labelpad=40,fontsize=30) cbar.set_ticks([.1,1,10],update_ticks=True) plt.title('Sersic2D model,$n=4$',fontsize=30) plt.show()  Jaakko Leppäkanga(MNE-Python) Update Okay, the epoch viewer got merged and the butterfly plotter is coming along nicely. See the pictures below. Now I'm starting to move the focus to other visualization issues. I already did small tweaks to the raw plotter as well. Now it has the same awesome scaling features that the epoch plotter has. We also decided to make a todo-list for the GSOC. It already has quite a few items, so I think I have the next couple of weeks planned out for me. Here it is: https://github.com/mne-tools/mne-python/issues/2213#issuecomment-113504189 Chad Fulton(Statsmodels) Estimating a Real Business Cycle DSGE Model by Maximum Likelihood in Python This post demonstrates how to setup, solve, and estimate a simple real business cycle model in Python. The model is very standard; the setup and notation here is a hybrid of Ruge-Murcia (2007) and DeJong and Dave (2011). Since we will be proceeding step-by-step, the code will match that progression by generating a series of child classes, so that we can add the functionality step-by-step. Of course, in practice a single class incorporating all the functionality would be all you would need. Lucas van Dijk(VisPy) Drawing arbitrary shapes with OpenGL points Part of my Google Summer of Code project involves porting several arrow heads from Glumpy to Vispy. I also want to make a slight change to them: the arrow heads in Glumpy include an arrow body, I want to remove that to make sure you can put an arrow head on every type of line you want. Making a change like that requires that you understand how those shapes are drawn. And for someone without a background in computer graphics this took some thorough investigation of the code and the techniques used. This article is aimed at people like me: good enough programming skills and linear algebra knowledge, but almost no former experience with OpenGL or computer graphics in general. Pratyaksh Sharma(pgmpy) Mingling Markov Chains We're still at the same problem - we wish to generate sample from a probability distribution$P$, that is intractable to sample from. In our case, such a problem arises when we wish to sample from, say, a Bayesian network (given some evidence); or even a Markov network. Markov chain is a commonly used construct to tackle this problem. What are Markov chains?  Figure 1 Example of a Markov chain To put it simply, a Markov chain is a weighted directed graph$\mathbb{G} = (\textbf{V}, \textbf{E})$, where the out-edges from a node$\textbf{x}$define a transition probability$\mathcal{T}(\textbf{x}\rightarrow\textbf{x'})$of moving to another node$\textbf{x'}$. Pick a node$x^{(0)}$as the start state of the Markov chain. We define a run as the sequence of nodes (states)$(x^{(0)}, x^{(1)}, ..., x^{(n)}$, where$x^{(i)}$is sampled from$P(\textbf{x}) = \mathcal{T}(x^{(i-1)} \rightarrow \textbf{x})$. At the$t+1$-th step of a run, we can define the distribution over the states as: $$P^{(t+1)}(\textbf{X}^{(t+1)} = \textbf{x'}) = \sum_{\textbf{x}\in Val(\textbf{X})} P^{(t)}(\textbf{X}^{(t)} = \textbf{x}) \mathcal{T}(\textbf{x}\rightarrow \textbf{x'})$$ The above process is said to converge, when$P^{(t+1)}$is close to$P^{(t)}$. At convergence, we call$P = \pi, the stationary distribution of the Markov chain. $$\pi(\textbf{X}^{(t+1)} = \textbf{x'}) = \sum_{\textbf{x}\in Val(\textbf{X})} \pi(\textbf{X}^{(t)} = \textbf{x}) \mathcal{T}(\textbf{x}\rightarrow \textbf{x'})$$ The useful property here is that as we generate more and more samples from our Markov chain, the more closer it is to the sample generated from its stationary distribution. Implementation Check out the pull request! Zubin Mithra(pwntools) MIPS SROP support Over the past week I've been working on getting SROP to work on MIPS and MIPSel. It was quite interesting as using MIPS and MIPSel introduced a new set of requirements in. 1. The SROP registers in MIPS and MIPSel were both 64-bit. Due to endianness, MIPS needed a 4 byte padding after the actual register value, and MIPSel would require a 4 byte padding before the actual register value. 2. The offsets at which the the MIPS and MIPSel registers started were different. We tried a couple of ways to go about doing this before setting on deciding to have registers in the form {offset: register}. Doing this is clean and this meets our requirements. The coming week I'll be travelling but I'll be working on finishing this and getting it merged, and writing tests for it. June 20, 2015 Shridhar Mishra(ERAS Project) Update! @20/06/2015 Things done: • Basic code structure of the battery.nddl has been set up. • PlannerConfig.xml has is in place. • PyEUROPA working on the docker image. Things to do: • test the current code with pyEUROPA. • Document working and other functions of pyEUROPA(priority). • Remove Arrow server code from the existing model. • Remove Pygame simulation and place the model for real life testing with Husky rover. • Plan and integrate more devices for planning. Chienli Ma(Theano) The Second Two-week Almost forget to update a post. In this two week, I finished the first feature “Allow user to regenerate a function from compiled one”, and this feature “can be merged. But there’s another PR need to rebase.” So, it’s done. Also, I get a draft of the code that allow user to swap SharedVariable. When I said ‘draft’, I mean that I’ve finish the code as well as the testcase and they work. I’ll make a PR for review at the beginning of next week. Also I have some new idea need to discuss with Fred. I hope I can finish all 3 feature in the first 6-week: copy, swap_sharedvariable and delete_update. So that I can focus on OpFromGraph in the next half. It seems that someone has started working on it now. I hope he did not ‘rob’ my job. :) Tarashish Mishra(ScrapingHub) I'm porting stuff to Python 3. And I'm loving it. GSoC update time! In case you didn't read my previous post, I'm participating in GSoC and porting Splash to Python 3. Quick update on what has been done so far. The pull request to add support for Qt5 and PyQt5 has been merged into the qt5 branch. The plan is to merge it into master after Python 3 porting and some other cleanup(fixing the docs, Vagrantfile etc) is done. So now on to Python 3 porting. The main road block in porting Splash to Python 3 is that some dependencies don't (fully) support Python 3 yet. The major one at that is Twisted. But the good thing is the most used parts of Twisted already support Python 3 and the developers behind Twisted are actively working on porting more and more modules. Also Twisted has a fairly well laid out guide for Python 3 porting and the community is really responsive with feedback and reviews. Thanks to that I have ported a module already and working on porting twisted.web.proxy for now. Among other dependencies, my fork of qt5reactor is Python 3 compatible. And pyre2, a faster drop-in replacement of the re module from standard library, is now Python 3 compatible after my PR was merged. For now, I'm porting the Splash code base one test at a time. Splash has a good test coverage and lots of tests. So that's working in my favor. That and pdb. That's all I have to share for now. Thanks for reading. Himanshu Mishra(NetworkX) GSoC '15 Progress: Second Report Past couple weeks had been fun! Learnt many new and interesting things about Python. The modified source code of METIS has got in, followed by its Cython wrappers. Thanks again to Yingchong Situ for all his legacy work. Nevertheless, things were not smooth and there were lots of hiccups and things to learn. One of the modules in the package was named types which was being imported by absolute import. Unknown of the fact that types is also a built-in module of python, the situation was a mystery for me. Thanks to iPython which told me this In [2]: types Out[2]: <module 'types' from '/usr/lib/python2.7/types.pyc'>  This alerted me for all the difference, pros and cons of absolute and relative import. Now one may ask (Does anyone read these blog posts?) why didn't I go with the following at the very first place. In [3]: from . import types Actually networkx-metis is supposed to be installed as a namespace package in networkx, and the presence of __init__.py is prohibited in a namespace package. Hence from . import types would raise a Relative import from a non-package error. We are now following the Google's style guide for Python[1]. Being licensed under Apache License Version 2, we also had to issue a file named NOTICE clearly stating the modifications we did to the library networkx-metis is a derivative work of. Next important items in my TODO list are Finalizing everything according for namespace packagingSetting up Travis CIHosting docs over readthedocs.org That's all for now. Happy Coding! [1] https://google-styleguide.googlecode.com/svn/trunk/pyguide.html Isuru Fernando(SymPy) GSoC 2015 - Week 4 This week I got access to OS X, so I decided to do all the work related to OS X this week while I had access. First of all, I worked on getting CMake to build on Sage in OS X 10.10. CMake is supported to be built using clang on OS X and is not supporting gcc. Since sage uses gcc for building packages, I tried building CMake on Sage. (Thanks to +Rajith for giving me the chance to work on his Mac). Main problem with CMake in OSX 10.10 was that it uses an Apple header <CoreFoundation/CoreFoundation.h> which is a collection of headers including <CoreFoundation/CFStream.h> which in turn includes a faulty Apple header '/usr/local/include/dispatch/dispatch.h'. After going through CMake code, it seemed that although the header 'CoreFoundation.h' was included 'CFStream.h' was not needed. So I used the specific headers needed (<CoreFoundation/CFBundle.h> etc.) and CMake was successfully built on Sage. Testing the CMake installation resulted in 6 out of 387 tests failing. Another good news was that we got access to test SymEngine on TravisCI with OSX. We are testing clang and gcc both to make sure, symengine builds on both. Building with clang was successful, but with gcc there were couple of problems and was hard to check on TravisCI as there were huge queuing times for builds on OSX. One issue is that on OSX, gmp library we are linking to is installed by homebrew. g++ was using a different C++ standard library than what gmp was compiled with and hence linking errors occur. A CMake check was added to try to compile a simple program with gmpxx and if it fails, give an error message at configuration time. Another issue was that uint was used in some places instead of unsigned int. On Linux and OSX clang unsigned int was typedef as uint, so there was no problem detected in automated tests in Travis-CI. Since uint is not a C++ standard type, it was changed to unsigned int. Next week, I will try to figure out why 6 tests in CMake test suite fails and try to fix those and get CMake into optional packages. Also I will work on the wrappers for sage for SymEngine. Manuel Paz Arribas(Astropy) Progress report The last two weeks have been a bit tough. After finishing the observation table generator mentioned in the previous post, I started working on the background module of Gammapy. I am currently working on a container class for 3D background models (a.k.a. cube background models). The three dimensions of the model are detector coordinates (X and Y) and energy. These kind of background models are largely in use by Fermi. The development of tools for creating such background models is an important milestone in my project and so implementing a class to handle them is crucial. For now, the class can read cube models from fits files and slice the 3D models to produce 2D plots of the background rate. Two kinds of plots are produced: 1. Color maps of the background rate for each energy slice defined in the model. 2. 1D curves of the spectrum of the background rate for each detector bin (X, Y) defined in the model. The figures attached have been performed using a sample background model fits file from the Gammalib repository. (Please click for an enlarged view). More functionality should come soon into this class, for instance, methods to create the bg models using event lists and the smoothing of the models to attenuate the effects of the statistical nature of the event detection. As I mentioned, I had some trouble developing a more complex class in python and this task is taking more time than expected.I am working hard to keep on track. Sumith(SymPy) GSoC Progress - Week 3 and 4 Hello, this post contains the third report of my GSoC progress. At one point, I had changed the deadline from Sundays to Fridays, but I seem to be running a week late on the post names. That has been corrected now Progress We decided to replace mpz_class with piranha::integer for coefficients and std::unordered_map with piranha::hash_set for the container. We got the lower-level working with this data-structure in the last week. We decided to depend on Piranha for the Polynomial else our module won't be upto the speed we expect it to. In future, we can write our hashtable and Integer as and when needed. Report The benchmarks results are: expand2b has: Average: 108.2ms Maximum: 114ms Minimum: 107ms while the latest expand2d has: Average: 23ms Maximum: 23ms Minimum: 23ms which is a nice 4-5x speed-up. The code for this(experimental) can be found in 470. A more detailed report can be found here. I also sent in a minor PR with some clean-ups that I felt neccessary in 472. Targets for Week 5 • Wrap the the lower level into a Polynomial class. • Have the functionality of atleast the UnivariatePolynomial class. • Explore what kind of coefficients can be passed, since we have piranha::integer we need to think of having rational and symbolic coefficients now itself. • Think of various areas where Polynomial class needs to integrate in SymEngine for example expand() That's all this week. Vaarwel June 19, 2015 Nikolay Mayorov(SciPy) Algorithm Benchmarks This post was updated to improve its clarity and to incorporate new information about “leastsqbound”. Before I present the results I want to make a few notes. 1. Initially I wanted to find a very accurate reference optimal value for each problem and measure the accuracy of an optimization process by comparison with it. I abandoned this idea for several reasons. a) In local optimization there isn’t a single correct minimum, all local minima are equivalently good. So ideally we should find all local minima which can be hard and comparison logic with several minima becomes awkward. b) Sources with problem descriptions often provide inaccurate (or plain incorrect) reference values or provide them with single precision, or doesn’t provide them at all. Finding optimal values with MATLAB (for example) is cumbersome and still we can’t 100% assure the required accuracy. 2. It is desirable to compare algorithms with identical termination conditions. But this requirement is never satisfied in practice as we work with already implemented algorithms. Also there is no one correct way to specify termination condition. So the termination conditions for all algorithms will be somewhat different, but nothing we can do about it. The methods benchmarked were dogbox, Trust Region Reflectiveleastsqbound (also used in lmfit) and l-bfgs-b (this method doesn’t take into account the structure of a problem and works with $f = \lVert r \rVert^2$ and $g = 2 J^T r$). The columns have the following meaning: • n – the number of independent variables. • m – the number of residuals. • solver – the algorithm used. The suffix “-s” means additional scaling of variables to equalize their influence on the objective function, this has nothing to do with scaling applied in Trust Region Reflective. The equivalent point of view is usage of an elliptical trust region. In constrained case this scaling usually degrades performance, so I don’t show the results for it. • nfev – the number of function evaluations done by the algorithm. • g norm – first-order (gradient) optimality. In dogbox it is the infinity-norm of the gradient with respect to variables which aren’t on the boundary (optimality of active variables are assured by the algorithm). For other algorithms it is the infinity-norm of the gradient scaled by Coleman-Li matrix, read my post about it. • value – the value of the objective function we are minimizing, the final sum of squares. Could serve as a rough measure of an algorithm adequacy (by comparison with “value” for other algorithms). • active – the number of active constraints at the solution. Absolutely accurate for “dogbox”, somewhat arbitrary for other algorithms (determined with tolerance threshold). The most important columns are “nfev” and “g norm”. For all runs I used tolerance parameters ftol = xtol = gtol = EPS**0.5, where EPS is the machine epsilon for double precision floating-point numbers. As I said above termination conditions vary from method to method, so it is a tedious job to explain each parameter for each method. The benchmark problems were taken from “The MINPACK-2 test problem collection” and “Moré, J.J., Garbow, B.S. and Hillstrom, K.E., Testing Unconstrained Optimization Software”, constraints to the latter collection are added according to “Gay, D.M., A trust-region approach to linearly constrained optimization”. Here is the very helpful page I used. All problems were run with analytical (user supplied) Jacobian computation routine. The discussion of the results is below the table.  Unbounded problems problem n m solver nfev g norm value active status --------------------------------------------------------------------------------------------------- Beale 3 2 dogbox 8 4.80e-11 1.02e-22 0 1 dogbox-s 9 7.69e-12 2.58e-24 0 1 trf 7 3.21e-11 4.50e-23 0 1 trf-s 9 4.07e-11 7.26e-23 0 1 leastsq 10 3.66e-15 5.92e-31 0 2 leastsq-s 9 6.66e-16 4.93e-32 0 2 l-bfgs-b 16 1.77e-07 1.95e-15 0 0 Biggs 6 13 dogbox 140 1.84e-02 3.03e-01 0 2 dogbox-s 600 3.92e-03 8.80e-03 0 0 trf 65 2.18e-16 2.56e-31 0 1 trf-s 43 1.18e-14 3.60e-29 0 1 leastsq 74 7.24e-16 4.56e-31 0 2 leastsq-s 40 1.65e-15 1.53e-30 0 2 l-bfgs-b 42 1.23e-06 5.66e-03 0 0 Box3D 3 10 dogbox 6 3.92e-10 1.14e-19 0 1 dogbox-s 6 3.92e-10 1.14e-19 0 1 trf 6 3.92e-10 1.14e-19 0 1 trf-s 6 3.92e-10 1.14e-19 0 1 leastsq 7 2.80e-16 4.62e-32 0 2 leastsq-s 7 2.80e-16 4.62e-32 0 2 l-bfgs-b 37 1.41e-07 3.42e-13 0 0 BrownAndDennis 4 20 dogbox 144 4.89e+00 8.58e+04 0 2 dogbox-s 153 3.46e+00 8.58e+04 0 2 trf 26 1.38e+00 8.58e+04 0 2 trf-s 275 4.28e+00 8.58e+04 0 2 leastsq 26 1.15e+00 8.58e+04 0 1 leastsq-s 254 3.63e+00 8.58e+04 0 1 l-bfgs-b 17 9.66e-01 8.58e+04 0 0 BrownBadlyScaled 2 3 dogbox 26 0.00e+00 0.00e+00 0 1 dogbox-s 29 0.00e+00 0.00e+00 0 1 trf 23 0.00e+00 0.00e+00 0 1 trf-s 23 0.00e+00 0.00e+00 0 1 leastsq 17 0.00e+00 0.00e+00 0 2 leastsq-s 16 0.00e+00 0.00e+00 0 2 l-bfgs-b 25 2.14e-02 2.06e-15 0 0 ChebyshevQuadrature10 10 10 dogbox 83 3.53e-05 6.50e-03 0 2 dogbox-s 72 1.70e-05 6.50e-03 0 2 trf 21 6.50e-07 6.50e-03 0 2 trf-s 21 5.93e-06 6.50e-03 0 2 leastsq 18 2.78e-06 6.50e-03 0 1 leastsq-s 25 3.22e-06 6.50e-03 0 1 l-bfgs-b 29 1.48e-05 6.50e-03 0 0 ChebyshevQuadrature11 11 11 dogbox 154 2.75e-05 2.80e-03 0 2 dogbox-s 196 5.01e-05 2.80e-03 0 2 trf 37 7.25e-06 2.80e-03 0 2 trf-s 44 7.85e-06 2.80e-03 0 2 leastsq 45 4.99e-06 2.80e-03 0 1 leastsq-s 47 7.70e-06 2.80e-03 0 1 l-bfgs-b 32 4.79e-04 2.80e-03 0 0 ChebyshevQuadrature7 7 7 dogbox 8 1.17e-12 4.82e-25 0 1 dogbox-s 10 7.22e-15 1.62e-29 0 1 trf 8 3.10e-15 2.60e-30 0 1 trf-s 9 1.04e-08 3.76e-17 0 1 leastsq 9 8.43e-16 7.65e-32 0 2 leastsq-s 9 1.37e-15 1.96e-31 0 2 l-bfgs-b 18 1.16e-05 7.14e-11 0 0 ChebyshevQuadrature8 8 8 dogbox 20 4.12e-06 3.52e-03 0 2 dogbox-s 56 5.73e-06 3.52e-03 0 2 trf 33 5.40e-06 3.52e-03 0 2 trf-s 39 9.26e-06 3.52e-03 0 2 leastsq 32 2.71e-06 3.52e-03 0 1 leastsq-s 39 7.99e-06 3.52e-03 0 1 l-bfgs-b 27 5.13e-06 3.52e-03 0 0 ChebyshevQuadrature9 9 9 dogbox 14 3.55e-15 9.00e-30 0 1 dogbox-s 11 6.02e-13 2.72e-25 0 1 trf 12 6.46e-13 3.15e-25 0 1 trf-s 9 2.22e-10 6.38e-20 0 1 leastsq 13 8.47e-16 7.64e-32 0 2 leastsq-s 12 5.95e-16 5.85e-32 0 2 l-bfgs-b 27 2.07e-05 2.53e-10 0 0 CoatingThickness 134 252 dogbox 7 2.33e-05 5.05e-01 0 2 dogbox-s 7 2.33e-05 5.05e-01 0 2 trf 7 2.33e-05 5.05e-01 0 2 trf-s 7 2.33e-05 5.05e-01 0 2 leastsq 7 2.33e-05 5.05e-01 0 1 leastsq-s 7 2.33e-05 5.05e-01 0 1 l-bfgs-b 281 5.54e-04 5.11e-01 0 0 EnzymeReaction 4 11 dogbox 23 1.32e-07 3.08e-04 0 2 dogbox-s 21 1.36e-07 3.08e-04 0 2 trf 20 1.30e-07 3.08e-04 0 2 trf-s 24 1.17e-07 3.08e-04 0 2 leastsq 23 7.13e-08 3.08e-04 0 1 leastsq-s 18 7.96e-08 3.08e-04 0 1 l-bfgs-b 30 2.53e-06 3.08e-04 0 0 ExponentialFitting 5 33 dogbox 10 1.56e-10 5.46e-05 0 1 dogbox-s 10 1.56e-10 5.46e-05 0 1 trf 19 1.29e-08 5.46e-05 0 1 trf-s 20 3.28e-08 5.46e-05 0 2 leastsq 20 3.23e-08 5.46e-05 0 1 leastsq-s 18 1.94e-08 5.46e-05 0 1 l-bfgs-b 44 9.98e-05 7.68e-05 0 0 ExtendedPowellSingular 4 4 dogbox 13 2.33e-09 5.72e-13 0 1 dogbox-s 13 2.33e-09 5.72e-13 0 1 trf 13 2.33e-09 5.72e-13 0 1 trf-s 13 2.33e-09 5.72e-13 0 1 leastsq 37 4.93e-31 7.22e-42 0 4 leastsq-s 37 4.93e-31 7.22e-42 0 4 l-bfgs-b 27 1.77e-04 3.70e-08 0 0 FreudensteinAndRoth 2 2 dogbox 6 0.00e+00 0.00e+00 0 1 dogbox-s 9 1.84e-11 1.95e-25 0 1 trf 6 1.57e-10 1.41e-23 0 1 trf-s 9 8.41e-11 4.07e-24 0 1 leastsq 8 1.78e-14 3.16e-30 0 2 leastsq-s 10 0.00e+00 0.00e+00 0 2 l-bfgs-b 15 5.29e-06 1.54e-13 0 0 GaussianFittingI 11 65 dogbox 14 1.27e-07 4.01e-02 0 2 dogbox-s 15 3.25e-07 4.01e-02 0 2 trf 13 1.75e-07 4.01e-02 0 2 trf-s 16 1.93e-07 4.01e-02 0 2 leastsq 13 5.89e-07 4.01e-02 0 1 leastsq-s 16 1.77e-06 4.01e-02 0 1 l-bfgs-b 69 8.67e-05 4.01e-02 0 0 GaussianFittingII 3 15 dogbox 3 5.93e-13 1.13e-08 0 1 dogbox-s 3 5.93e-13 1.13e-08 0 1 trf 3 5.93e-13 1.13e-08 0 1 trf-s 3 5.93e-13 1.13e-08 0 1 leastsq 4 1.25e-16 1.13e-08 0 2 leastsq-s 4 1.25e-16 1.13e-08 0 2 l-bfgs-b 4 5.81e-06 1.18e-08 0 0 GulfRnD 3 100 dogbox 20 9.12e-09 1.83e-18 0 1 dogbox-s 22 1.00e-15 5.87e-31 0 1 trf 16 1.00e-15 5.87e-31 0 1 trf-s 25 1.26e-08 3.51e-18 0 1 leastsq 16 1.61e-15 7.12e-31 0 2 leastsq-s 23 1.61e-15 7.12e-31 0 2 l-bfgs-b 60 2.51e-06 1.25e-12 0 0 HelicalValley 3 3 dogbox 9 8.37e-12 3.13e-25 0 1 dogbox-s 19 5.81e-09 3.94e-19 0 1 trf 13 1.68e-13 1.16e-28 0 1 trf-s 13 1.78e-11 1.26e-24 0 1 leastsq 16 2.50e-29 2.46e-60 0 2 leastsq-s 11 1.58e-15 9.87e-33 0 2 l-bfgs-b 32 6.79e-07 3.28e-15 0 0 JenrichAndSampson10 2 10 dogbox 22 7.17e-02 1.24e+02 0 2 dogbox-s 21 5.51e-02 1.24e+02 0 2 trf 20 6.21e-03 1.24e+02 0 2 trf-s 20 5.86e-02 1.24e+02 0 2 leastsq 20 2.76e-04 1.24e+02 0 1 leastsq-s 21 2.90e-02 1.24e+02 0 1 l-bfgs-b 63 1.86e+03 nan 0 2 PenaltyI 10 11 dogbox 35 4.90e-09 7.09e-05 0 1 dogbox-s 25 2.89e-09 7.09e-05 0 1 trf 38 3.98e-08 7.09e-05 0 2 trf-s 69 2.93e-08 7.09e-05 0 2 leastsq 26 7.54e-08 7.09e-05 0 1 leastsq-s 79 1.67e-08 7.09e-05 0 1 l-bfgs-b 20 8.58e-06 7.45e-05 0 0 PenaltyII10 10 20 dogbox 71 8.64e-07 2.91e-04 0 2 dogbox-s 33 4.15e-06 2.91e-04 0 2 trf 50 2.01e-06 2.91e-04 0 2 trf-s 32 4.98e-07 2.91e-04 0 2 leastsq 47 2.77e-07 2.91e-04 0 1 leastsq-s 58 6.64e-08 2.91e-04 0 1 l-bfgs-b 14 2.31e-06 2.91e-04 0 0 PenaltyII4 4 8 dogbox 27 1.64e-07 9.31e-06 0 2 dogbox-s 29 2.96e-07 9.31e-06 0 2 trf 24 3.42e-07 9.31e-06 0 2 trf-s 85 8.46e-08 9.31e-06 0 2 leastsq 70 7.35e-08 9.31e-06 0 1 leastsq-s 111 2.74e-08 9.31e-06 0 1 l-bfgs-b 19 1.16e-06 9.61e-06 0 0 PowellBadlyScaled 2 2 dogbox 43 4.89e-09 2.89e-27 0 1 dogbox-s 67 2.02e-11 9.86e-32 0 1 trf 43 4.90e-09 2.90e-27 0 1 trf-s 19 0.00e+00 0.00e+00 0 1 leastsq 72 1.01e-11 6.16e-32 0 2 leastsq-s 19 1.01e-11 1.23e-32 0 2 l-bfgs-b 4 1.35e-01 1.35e-01 0 0 Rosenbrock 2 2 dogbox 20 0.00e+00 0.00e+00 0 1 dogbox-s 18 0.00e+00 0.00e+00 0 1 trf 18 0.00e+00 0.00e+00 0 1 trf-s 20 0.00e+00 0.00e+00 0 1 leastsq 15 0.00e+00 0.00e+00 0 4 leastsq-s 14 0.00e+00 0.00e+00 0 4 l-bfgs-b 47 1.89e-06 1.31e-14 0 0 ThermistorResistance 3 16 dogbox 300 5.91e+06 1.61e+02 0 0 dogbox-s 291 1.95e+00 8.79e+01 0 2 trf 262 5.32e-04 8.79e+01 0 2 trf-s 202 3.10e-04 8.79e+01 0 3 leastsq 279 7.49e-04 8.79e+01 0 2 leastsq-s 216 1.68e+01 8.79e+01 0 3 l-bfgs-b 633 3.16e+00 3.17e+04 0 0 Trigonometric 10 10 dogbox 10 1.54e-11 9.90e-22 0 1 dogbox-s 65 3.66e-07 2.80e-05 0 2 trf 26 1.42e-07 2.80e-05 0 2 trf-s 31 2.08e-08 2.80e-05 0 2 leastsq 25 3.84e-08 2.80e-05 0 1 leastsq-s 28 5.67e-08 2.80e-05 0 1 l-bfgs-b 28 1.63e-06 2.80e-05 0 0 Watson12 12 31 dogbox 7 5.77e-10 4.72e-10 0 1 dogbox-s 12 1.50e-13 4.72e-10 0 1 trf 6 1.56e-10 5.98e-10 0 1 trf-s 8 2.19e-10 2.16e-09 0 1 leastsq 9 8.94e-14 4.72e-10 0 2 leastsq-s 9 3.63e-11 4.72e-10 0 3 l-bfgs-b 52 4.20e-05 1.35e-05 0 0 Watson20 20 31 dogbox 11 4.90e-12 2.48e-20 0 1 dogbox-s 19 6.35e-10 2.60e-20 0 1 trf 7 1.36e-12 1.63e-19 0 1 trf-s 8 1.32e-08 7.10e-18 0 1 leastsq 17 8.65e-13 2.48e-20 0 2 leastsq-s 20 1.08e-11 2.49e-20 0 2 l-bfgs-b 69 2.66e-05 7.28e-06 0 0 Watson6 6 31 dogbox 8 5.16e-08 2.29e-03 0 2 dogbox-s 10 5.62e-08 2.29e-03 0 2 trf 8 5.16e-08 2.29e-03 0 2 trf-s 11 1.11e-07 2.29e-03 0 2 leastsq 8 5.16e-08 2.29e-03 0 1 leastsq-s 8 5.16e-08 2.29e-03 0 1 l-bfgs-b 44 5.28e-06 2.29e-03 0 0 Watson9 9 31 dogbox 7 3.21e-13 1.40e-06 0 1 dogbox-s 9 8.26e-12 1.40e-06 0 1 trf 6 2.91e-11 1.40e-06 0 1 trf-s 10 4.29e-12 1.40e-06 0 1 leastsq 7 1.47e-11 1.40e-06 0 1 leastsq-s 7 1.70e-11 1.40e-06 0 4 l-bfgs-b 40 1.27e-04 6.51e-05 0 0 Wood 4 6 dogbox 73 0.00e+00 0.00e+00 0 1 dogbox-s 66 0.00e+00 0.00e+00 0 1 trf 74 0.00e+00 0.00e+00 0 1 trf-s 67 5.53e-12 9.26e-26 0 1 leastsq 69 0.00e+00 0.00e+00 0 2 leastsq-s 70 0.00e+00 0.00e+00 0 2 l-bfgs-b 20 6.39e-04 7.88e+00 0 0 Bounded problems problem n m solver nfev g norm value active status --------------------------------------------------------------------------------------------------- Beale_B 3 2 dogbox 4 0.00e+00 0.00e+00 0 1 trf 19 8.74e-09 2.11e-10 0 1 leastsqbound 12 1.17e-09 1.99e-20 1 2 l-bfgs-b 5 5.83e-15 4.44e-31 1 0 Biggs_B 6 13 dogbox 32 1.61e-08 5.32e-04 2 2 trf 24 3.96e-10 5.32e-04 2 1 leastsqbound 63 5.95e-04 5.90e-04 2 1 l-bfgs-b 70 1.52e-03 5.79e-04 2 0 Box3D_B 3 10 dogbox 8 1.45e-10 1.14e-04 1 1 trf 13 6.55e-09 1.14e-04 0 1 leastsqbound 18 1.75e-08 1.14e-04 0 1 l-bfgs-b 16 1.08e-03 1.18e-04 0 0 BrownAndDennis_B 4 20 dogbox 78 9.28e+00 8.89e+04 2 2 trf 41 4.98e+01 8.89e+04 0 2 leastsqbound 271 8.63e-01 8.89e+04 1 1 l-bfgs-b 18 5.66e-01 8.89e+04 2 0 BrownBadlyScaled_B 2 3 dogbox 33 1.11e-10 7.84e+02 1 1 trf 39 8.25e-05 7.84e+02 1 3 leastsqbound 300 3.14e+00 7.87e+02 0 5 l-bfgs-b 7 1.44e-11 7.84e+02 1 0 ChebyshevQuadrature10_B 10 10 dogbox 147 1.64e-05 6.50e-03 0 2 trf 40 1.03e-06 4.77e-03 0 2 leastsqbound 55 1.14e-06 4.77e-03 0 1 l-bfgs-b 50 1.78e-06 4.77e-03 0 0 ChebyshevQuadrature7_B 7 7 dogbox 15 2.75e-07 6.03e-04 2 2 trf 15 3.95e-08 6.03e-04 2 2 leastsqbound 33 9.61e-08 6.03e-04 0 1 l-bfgs-b 29 2.18e-05 6.03e-04 2 0 ChebyshevQuadrature8_B 8 8 dogbox 81 5.34e-06 3.59e-03 1 2 trf 127 1.12e-06 3.59e-03 0 2 leastsqbound 900 1.33e-06 3.59e-03 0 5 l-bfgs-b 46 2.92e-06 3.59e-03 1 0 ExtendedPowellSingular_B 4 4 dogbox 20 2.42e-07 1.88e-04 1 2 trf 16 7.36e-09 1.88e-04 1 1 leastsqbound 23 5.59e-08 1.88e-04 1 1 l-bfgs-b 29 6.06e-05 1.88e-04 1 0 GaussianFittingII_B 3 15 dogbox 3 5.93e-13 1.13e-08 0 1 trf 5 2.64e-10 1.13e-08 0 1 leastsqbound 12 1.43e-15 1.13e-08 0 1 l-bfgs-b 5 7.93e-09 1.84e-08 0 0 GulfRnD_B 3 100 dogbox 10 7.29e-05 5.29e+00 2 2 trf 9 9.03e-07 5.29e+00 1 2 leastsqbound 22 4.71e-05 5.29e+00 0 1 l-bfgs-b 29 4.11e-01 6.49e+00 0 0 HelicalValley_B 3 3 dogbox 9 2.69e-05 9.90e-01 1 2 trf 14 6.10e-05 9.90e-01 1 2 leastsqbound 125 4.24e-03 9.90e-01 1 1 l-bfgs-b 17 2.63e-05 9.90e-01 1 0 PenaltyI_B 10 11 dogbox 16 1.00e-05 7.56e+00 3 2 trf 17 8.72e-04 7.56e+00 3 2 leastsqbound 328 2.56e-06 7.56e+00 3 1 l-bfgs-b 5 8.56e-04 7.56e+00 3 0 PenaltyII10_B 10 20 dogbox 30 9.20e-07 2.91e-04 2 2 trf 304 5.20e-06 2.91e-04 1 2 leastsqbound 297 2.18e-07 2.91e-04 2 1 l-bfgs-b 23 2.59e-04 2.92e-04 0 0 PenaltyII4_B 4 8 dogbox 29 1.40e-12 9.35e-06 2 1 trf 193 2.91e-09 9.35e-06 0 1 leastsqbound 78 1.39e-08 9.35e-06 1 1 l-bfgs-b 14 3.07e-05 9.50e-06 0 0 PowellBadlyScaled_B 2 2 dogbox 38 4.07e-12 1.51e-10 1 1 trf 100 3.02e-11 2.07e-10 0 1 leastsqbound 220 1.82e-06 1.51e-10 1 1 l-bfgs-b 5 1.08e+00 1.35e-01 0 0 Rosenbrock_B_0 2 2 dogbox 17 0.00e+00 0.00e+00 0 1 trf 23 0.00e+00 0.00e+00 1 1 leastsqbound 6 1.11e-13 1.97e-29 1 2 l-bfgs-b 47 1.89e-06 1.31e-14 1 0 Rosenbrock_B_1 2 2 dogbox 6 4.97e-09 5.04e-02 1 1 trf 9 1.59e-07 5.04e-02 1 2 leastsqbound 21 2.66e-07 5.04e-02 1 1 l-bfgs-b 23 1.68e-06 5.04e-02 1 0 Rosenbrock_B_2 2 2 dogbox 6 2.27e-06 4.94e+00 1 2 trf 9 4.36e-07 4.94e+00 1 2 leastsqbound 18 3.44e-05 4.94e+00 1 1 l-bfgs-b 19 1.04e-05 4.94e+00 1 0 Rosenbrock_B_3 2 2 dogbox 3 0.00e+00 2.50e+01 2 1 trf 8 3.27e-09 2.50e+01 2 1 leastsqbound 19 4.38e-09 2.50e+01 2 1 l-bfgs-b 3 0.00e+00 2.50e+01 2 0 Rosenbrock_B_4 2 2 dogbox 6 4.97e-09 5.04e-02 1 1 trf 14 1.06e-08 5.04e-02 1 1 leastsqbound 20 7.03e-06 5.04e-02 0 1 l-bfgs-b 21 2.73e-10 5.04e-02 1 0 Rosenbrock_B_5 2 2 dogbox 12 0.00e+00 2.50e-01 1 1 trf 20 5.05e-06 2.50e-01 1 2 leastsqbound 24 8.47e-08 2.50e-01 1 1 l-bfgs-b 27 0.00e+00 2.50e-01 1 0 Trigonometric_B 10 10 dogbox 117 3.31e-07 2.80e-05 0 2 trf 37 7.67e-07 2.80e-05 0 2 leastsqbound 64 3.99e-08 2.80e-05 0 1 l-bfgs-b 34 2.61e-04 4.22e-05 0 0 Watson12_B 12 31 dogbox 1200 1.30e-03 7.17e-02 5 0 trf 171 4.50e-05 7.16e-02 6 2 leastsqbound 13 1.02e+02 1.71e+01 12 1 l-bfgs-b 101 9.37e-02 7.28e-02 6 0 Watson9_B 9 31 dogbox 5 1.79e+01 4.91e+00 3 2 trf 26 1.87e-09 3.74e-02 5 1 leastsqbound 462 2.89e-03 3.91e-02 2 1 l-bfgs-b 285 5.72e-05 3.74e-02 5 0 Wood_B 4 6 dogbox 63 5.05e-07 1.56e+00 1 2 trf 29 2.12e-08 1.56e+00 1 2 leastsqbound 43 4.17e-05 1.56e+00 1 1 l-bfgs-b 20 8.38e-03 1.56e+00 1 0  For unbounded problems “leastsq” and “trf” are generally comparable, with “leastsq” being modestly better. This is easily explained as the algorithms are almost equivalent, but “leastsq” uses a smarter strategy for decreasing a trust region radius, perhaps this issue is worth investigating. My second algorithm “dogbox” is less robust and fails in some problems (most of them have rank deficient Jacobian). The general purpose “l-bfgs-b” is generally not as good as lsq algorithms, but might be used with satisfactory results. In bounded problems “trf”, “dogbox” and “l-bfgs-b” do reasonably well, with performance varying over problems. I see one big fail of “dogbox” in “Watson9_B”, all other problems were solved relatively successful by all 3 methods. I suspect that performance of “l-bfgs-b” might degrade in high-dimensional problems, but for small constrained problems this method proved to be very solid, so use it! (At least until I add new lsq methods to scipy.) And I just fixed leastsqbound, so now it works OK! Rafael Neto Henriques(Dipy) [RNH Post #5] Progress Report (DKI simulations merged and DKI real data fitted) I have done great progresses on the 2 last weeks of coding!!! In particular, two major achievements were accomplished: 1 - By solving the couple of problems mentioned on my previous post, the DKI simulations were finally merged to the Dipy's master repository. 2 - The first part of the reconstruction modules to process DKI in real brain data was finalized. The details of these two achievements and the project's next step are posted on the below sections. 1) DKI simulations on Dipy's master repository Just to give an idea of the work done, I am posting an example of how to use the DKI simulations that I developed. More details on the mathematical basis of these simulations can be found here. 1.1) Import python modules and defining MRI parameters First of all, we have to import relevant modules (see code lines bellow). The main DKI simulations function multi_tensor_dki can be imported from the Dipy simulations' sub-module dipy.sims.voxel (line 19 shown below). To perform the simulations, some parameters of the MRI acquisition have to be considered. For instance, the intensity of the MRI's diffusion-weighted signal depends on the diffusion-weighting used on the MRI scanner (measured as the b-value) and the directions that the diffusion measurement are done (measured as the b-vectores). This information, for example, can be obtain from Dipy's real dataset samples. Dipy's dataset 'small_64D' was acquired with only one diffusion-weighting intensity. Since DKI requires data from more than one non zero b-value, a second b-values is artificially added. To convert the artificial produced b-values and b-vectors to the format assumed by Dipy's functions, the function gradient_table has to be called. 1.2) Defining biological parameters Having all the scanner parameters set, the biophysical parameters of the simulates have to be defined. Simulations are based on multi-compartmental models, which allow us to take into account brain's white matter heterogeneity. For example, to simulate two crossing fibers with two different media (representing intra and extra-cellular media), a total of four heterogeneous components are taken into account. The diffusion parameters of each compartment are defined below (the first two compartments correspond to the intra and extra cellular media for the first fiber population while the others correspond to the media of the second fiber population). The orientation of each fiber is saved in polar coordinates. To simulate crossing fibers at 70 degrees the compartments of the first fiber are aligned to the x-axis while the compartments of the second fiber are aligned to the x-z plane with an angular deviation of 70 degrees from the first one. Finally, the volume fractions of the compartment are defined. 1.3) Using DKI simulation main function Having defined the parameters for all tissue compartments, the elements of the diffusion tensor (dt), the elements of the kurtosis tensor (kt) and the DW signals simulated from the DKI model (signal_dki) can be obtained using the function multi_tensor_dki. As I mentioned in my previous post, these simulations are useful for testing the performance of DKI reconstruction codes that I am currently working on. In particular, when we apply the reconstruction modules to the signal_dki, the estimated diffusion and kurtosis tensors have to match the ground truth kt and dt produced here. 2) Progresses on the development of the DKI reconstruction module Finalizing DKI reconstruction module is the milestone that I proposed to achieved before the mid-term evaluation. Basically, the work done on this is on schedule! Since DKI is an extension of DTI, classes of the DKI modules were defined from inheritance of the classes defined on Dipy's DTI module (a nice post can be found here for more details on class inheritance). Having established this inheritance, DKI modules are compatible to all standard diffusion statistical measures previously defined in Dipy. I carried on with the development of the DKI module by implementing the estimation of the diffusion and kurtosis tensors from the DKI model. Two strategies were implemented - the DKI's ordinary linear least square (OLS) solution, which corresponds to a simple but less computational demanding approach, and the weighted DKI's linear least square (WLS) solution, which is considered to be one of the most robust estimation approaches in the recent DKI literature Currently, I am validating DKI implementation using the nose testing modules. Both implementations of the OLS and WLS solutions seem to produce the ground truth diffusion and kurtosis tensors when applied on the diffusion signal simulated from my DKI simulation modules. In addition, DKI modules are also producing the expected standard diffusion parameter images when applied to real data (see Figure 1). Figure 1. Comparison between real brain parameter maps of the diffusion fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) obtain from the DKI modules (upper panels) and the DTI module (lower panels). From the figure, we can see that the DT standard diffusion measures from DKI are noisier than the DTI measurements. This is a well known pitfall of DKI. Since it involves the fit of a larger number of parameters, DKI is more sensitive to noise than DTI. Nevertheless, diffusion measures from DKI were shown to have a better precision (i.e. less sensitive to bias). Moreover, as I have been mentioning on my previous posts, DKI allows the estimation of the standard kurtosis measures. 3) Next Steps Before the mid-term evaluation, a first version of the DKI reconstruction will be completed with the implementation of the standard kurtosis measures, as the mean, axial and radial kurtosis from the already estimated kurtosis tensors. Details of the usage of the DKI reconstruction modules and the meaning of the standard kurtosis measures will be summarized on my next post. Ambar Mehrotra(ERAS Project) GSoC 2015: 3rd Biweekly Report I worked on several things during the past two weeks. OpenMCTMission Control Technologies (MCT) brings information from many sources to the user through one consistent, intuitive interface. It is a software developed by NASA which helps the user compose the information he/she needs. MCT has a collection of user objects that correspond to the things users are interested in, along with the capability of displaying the same thing in different ways for different purposes. Data can be added from multiple sources, updated, modified, and represented in multiple composable views. This project is very similar to the framework I have to develop for my project, although MCT has been developed in Java while PSF requires us to write code in python. • Decision to use jython: In order to utilize this project directly, there was a decision to use jython(python running on jvm) which can combine both java and python. I was able to import MCT as a dependency in a java project but ran into trouble while using jython. After spending a lot of time in setting up things on jython I decided it would be better if I develop this completely in house using PyQt. Working with PyQt: I spent the later part of the past 2 weeks designing the interface in PyQt. These were the features that I implemented. • Add a new device source: A user can add a new device by entering its Tango server address • Tree View implementation: Implemented a tree view to categorize various data sources, collector devices and custom groups or branches. Some work is left in this section and i'll be focusing on this for the upcoming two weeks. • Real-Time graph of data sources: User can click on a data source to view its real-time gaph. • Creating custom branches: A user can create custom branches. He will be presented with the list of available data sources from where he can select the data sources he wants to add to that specific branch. In the upcoming weeks i'll mainly be working on making the tree view more concrete and presenting more data inside it. Also, a major point of focus will be data aggregation and summary creation from the children of a branch. Stefan Richthofer(Jython) GSoC status-update for 2015-06-19 I finally completed the core-GC routine that explores the native PyObject reference-connectivity graph and reproduces it on Java-side. Why mirror it on Java side? Let me comprehend the reasoning here. Java performs a mark-and-sweep GC on its Java-objects, but there is no way to extend this to native objects. On the other hand using CPython's reference-counting approach for native objects is not always feasible, because there are cases where a native object must keep its Java-counterpart alive (JNI provides a mechanism for this), allowing it to participate in an untracable reference cycle. So we go the other way round here, and let Java-GC track a reproduction of the native reference connectivity-graph. Whenever we observe that it deletes a node, we can discard the underlying native object. Keeping the graph up to date is still a tricky task, which we will deal with in the second half of GSoC. The native reference-graph is explored using CPython-style traverseproc mechanism, which is also implemented by extensions that expect to use GC at all. To mirror the graph on Java-side I am distinguishing 8 regular cases displayed in the following sketch. These cases deal with representing the connection between native and managed side of th JVM. In the sketch you can see that native objects have a so-called Java-GC-head assigned that keeps alive the native object (non-dashed arrow), but is only weakly reachable from it (dashed arrow). The two left-most cases deal with objects that only exist natively. The non-GC-case usually needs no Java-GC-head as it cannot cause reference cycles. Only in GIL-free mode we would still track it as a replacement for reference-counting. However GIL-free mode is currently a vague consideration and out of scope for this GSoC-project. Case 3 and 4 from left deal with objects where Jython has no corresponding type and JyNI uses a generic PyCPeer-object - a PyObject-subclass forwarding the magic methods to native side. PyCPeer in both variants serves also as a Java-GC-head. CStub-cases refer to situations where the native object needs a Java-object as backend. In these cases the Java-GC-head must not only keep alive other GC-heads, but also the Java-backend. Finally in mirror-mode both native and managed representations can be discarded independently from each other at any time, but for performance reasons we try to softly keep alive the counterparts for a while. On Java-side we can use a soft reference for this. PyList is a special case in several ways. It is a mutable type that can be modified by C-macros at any time. Usually we move the java.util.List backend to native side. For this it is replacing by JyList - a List-implementation that is backed by native memory, thus allowing C-macros to work on it. The following sketch illustrates how we deal with this case. It works roughly equivalent to mirror mode but with the difference that the Jython-PyList must keep alive its Java-GC-head. For a most compact solution we build the GC-head functionality into JyList. Today I finished the implementation of the regular cases, but testing and debugging still needs to be done. I can hopefully round this up for midterm evaluation and also include the PyList-case. Jakob de Maeyer(ScrapingHub) Towards an Add-on Framework Last time, we learned that most Scrapy extension hooks are controlled via dictionary-like settings variables. We allowed updating these settings from different places without having to worry about order by extending Scrapy’s priority-based settings system to dictionaries. The corresponding pull request is ready for final review by now and includes complete tests and documentation. Now that this is (almost) out of the way, how can we “[improve] both user and developer experience by implementing a simplified interface to managing Scrapy extensions”, as I promised in my initial blog post? The Concept of Add-ons Often, extension developers will provide their users with small manuals that show which settings they need to modify in which way. The idea behind add-ons is to provide developers with mechanisms allowing them to apply these basic settings themselves. The user, on the other hand, no longer needs to understand Scrapy’s internal structure. Instead, she only needs “plug in” the add-on at unified single entry point, possibly through a single line. If necessary, she can also configure the add-on at this entry point, e.g. to supply database credentials. Let us assume that we have a simple pipeline that saves items into a MySQL database. Currently, the user has to configure her settings.py file similar to this: # In settings.py ITEM_PIPELINES = { # Possible further pipelines here 'myproject.pipelines.mysql_pipe': 0, } MYSQL_DB = 'some.server' MYSQL_USER = 'some_user' MYSQL_PASSWORD = 'some!password'  This has several shortfalls: • the user is required to either edit settings blindly (Why ITEM_PIPELINES? What does the 0 mean?), or learn about Scrapy internals • all settings are exposed into the global settings namespace, creating potential for name clashes • the add-on developer has no option to check proper for dependencies and proper configuration With the add-on system, the user experience would be closer to this: # In scrapy.cfg [addon:mysql_pipe] database = some.server user = some_user password = some!password  Note that: • Scrapy’s internals (ITEM_PIPELINES, 0) are hidden • Specifying a complete Python path (myproject.pipelines.mysql_pipe) is no longer necessary • The database credentials are no longer independent settings, but local to the add-on section Add-ons from a Developer’s Point of View With the add-on system, developers gain greater control over Scrapy’s configuration. All they have to do is write a (any!) Python object that implements Scrapy’s add-on interface. The interface could be provided in a Python module, separate class, or along the extension class they wrote. The interface consists of two attributes and two callbacks: • NAME: String with human-readable add-on name • VERSION: tuple containing major/minor/patchlevel version of the add-on • update_settings() • check_configuration() While the two attributes can be used for dependency management (e.g. “My add-on needs add-on X > 1.1.0”), the two callbacks are where developers gain control over Scrapy’s settings, freeing them from relying on their users to properly follow their configuration manuals. In update_settings(), the add-on receives its (local) configuration from scrapy.cfg and the Scrapy Settings object. It can then internally configure the extensions and expose settings into the global namespace as it seems fit. The second callback, check_configuration(), is called after Scrapy’s crawler is fully initialised, and should be used for dependency checks and post-init tests. Current State So far, I have redrafted an existing Scrapy Extension Proposal (SEP) with an outline of the add-on implementation. Code-wise, I have already written loaders that read add-on configuration from Scrapy’s config files, then search and initialise the add-on objects. Where exactly the add-on objects should live is still up for debate. Currently, I plan on writing a small helper class that holds the add-on objects and provides helpers to access their attributes. This ‘holder’ would then live on the crawler, which is Scrapy’s central entry point object for all extensions and which manages the crawling process. You can follow my progress in my Add-ons pull request. Nikolay Mayorov(SciPy) Dogbox Algorithm The idea for this simple algorithm is taken from this paper. We use a rectangular trust region, so intersection of a trust region and a rectangular feasible region is again some rectangle. Thus at each iteration we need to solve the following constrained quadratic problem $\displaystyle \min_p m(p) = \frac{1}{2} p^T B p + g^T p \text{, s. t. } \tilde{l} \leq p \leq \tilde{u}$ This problem is interesting by itself, and I’m sure there are a lot of theories and methods for solving it. But we are going to use perhaps the simplest approach called “dogleg”. It can be applied when $B$ is positive definite. In this case we compute Newton (Gauss-Newton for least squares) step $p^N = -B^{-1} g$ and so called Cauchy step, which is unconstrained minimizer of our quadratic model along anti-gradient: $\displaystyle p^C = -\frac{g^T g}{g^T B g} g$ And define the dogleg path $p(\tau)$ as follows: $p(\tau) = \begin{cases} \tau p^C & 0 \leq \tau \leq 1 \\ p^C + (\tau - 1) (p^N - p^C) & 1 < \tau \leq 2 \end{cases}$ First we move towards Cauchy point and then from Cauchy point to Newton point. It can be proven that $m(p(\tau))$ is a decreasing function of $\tau$, which means we should follow the dogleg path as long as we stay in a trust region. For a spherical trust regions there is no more than one intersection of dogleg path and the boundary, which makes the method especially simple and clear. The last statement is not true for a rectangular trust region, but it is still not hard to find the optimum along the dogleg path within a trust region, or we can just stop at the first intersection, a slightly improved strategy is suggested in the paper I linked above. If during iterations some variable hit the initial boundary (from $l$ and $u$) and the component of anti-gradient points outside the feasible region, then if we try a dogleg step our algorithm won’t make any progress. At this state such variables satisfy the first-order optimality and we should exclude them before taking a next dogleg step. When $B$ is positive semidefinite then we don’t have proper Newton step, in this case we should compute regularized Newton step, by increasing diagonal elements of $B$ by a proper amount. As far as I know there is no universal and 100% satisfactory recipe of doing it (but I mentioned a paper in the previous post with some solution). And that’s basically the whole algorithm. It seems to be a little hackish, from the experience it works adequate in unconstrained least-squares problems when $J$ is not rank deficient. (I haven’t implemented any tweaks for this case so far.) In bounded problems it performance varies, but as it does for Trust Region Reflective. There is a notion of “combinatorial difficulty of inequality-constrained problems” which burdens methods that try to determine what constraints are active in the optimal solution (active set methods). The dogbox algorithm does something like that, but to my shame I have no idea how it will work in this regard when the number of variables becomes large. On the other hand, Trust Region Reflective is expected to work very well in high dimensional setting, it was mentioned as its strongest point by the authors. So far I was focusing on the general logic and workflow of each algorithm and tested it on small problems, I will publish the results in the next post. Vito Gentile(ERAS Project) Enhancement of Kinect integration in V-ERAS: Second report This is my second report about what I have done for my GSoC project. If you don’t know what it is about and want to find more information, please refer to this page and this blog post. The first problem I had to solve was to implement a valid head estimation algorithm. You can find the code of how I had implemented it at this link, while for the algorithm itself, I have also recently discussed it in an answer on StackOverflow. After the height estimation (that we decided to implement as a Tango command), the next step was to update the Tango classes that I had added with the previous commits, in order to use the new Tango API. You can find more about this topic in this very useful documentation page on “High level server API”. This update allowed me to reduce the number of lines of code, and it is also much more simple to implement commands or events in the Tango server now. However I had some issues with data types and starting the server (which made me stuck for a bit). Thankfully I finally fix them in my last commit, yesterday evening. I have also worked on the documentation. In particular, I have updated the Software Architecture Document (SAD) for the Body Tracker application, by adding the CLI section and updating the GUI part with some new features introduced together with the Python-based interface. I have also removed a redundant document, named “Execution of body tracker”, that was about how to execute the old tracker (which was written in C# and is still available in the repository, but basically to be deprecated). For more information about my project and the other ones supported by Italian Mars Society and Python Software Foundation, refers to the GSoC2015 page of ERAS website. Julio Ernesto Villalon Reina(Dipy) OHBM 2015 Hackathon Hi all, It has been a busy week. The Organization of Human Brain Mapping (OHBM) conference in Hawaii just finished today (http://ohbm.loni.usc.edu/). I had the chance to meet with my mentors in person and to get help from them directly. We participated in the Hackaton that took place two days before the conference (http://ohbm.loni.usc.edu/hackathon-2015/). We had the chance to work on the code and set up goals for the midterm. I also had the opportunity to talk about my GSoC project to other Hackathon participants and conference attendees. They all shared ideas with me and gave me good advice. I will be flying back home this weekend and will write another post with a detailed description of what we worked on this week plus some preliminary results. This is a photo with my mentors and other contributors to the DIPY project (Diffusion Imaging in Python). Mahalo! Abraham de Jesus Escalante Avalos(SciPy) Scipy and the first few GSoC weeks Hi all, We're about three (and a half) weeks into the GSoC and it's been one crazy ride so far. Being my first experience working in OpenSource projects and not being much of an expert in statistics was challenging at first, but I think I might be getting the hang of it now. First off, for those of you still wondering what I'm actually doing, here is an abridged version of the abstract from my proposal to the GSoC (or you can click here for the full proposal): "scipy.stats is one of the largest and most heavily used modules in Scipy. [...] it must be ensured that the quality of this module is up to par and [..] there are still some milestones to be reached. [...] Milestones include a number of enhancements and [...] maintenance issues; most of the scope is already outlined and described by the community in the form of open issues or proposed enhancements." So basically, the bulk of my project consists on getting to work on open issues for the StatisticsCleanup milestone within the statistics module of SciPy (a Python-based OpenSource library for scientific computing). I suppose this is an unusual approach for a GSoC project since it focuses on maintaining and streamlining an already stable module (in preparation for the release of SciPy 1.0), rather than adding a new module or a specific function within. The unusual approach allows me to make several small contributions and it gives me a wide (although not as deep) scope, rather than a narrow one. This is precisely the reason why I chose it. I feel like I can benefit (and contribute) a lot more this way, while I get acquainted with the OpenSource way and it also helps me to find new personal interests (win-win). However, there are also some nuances that may be uncommon. During the first few weeks I have discovered that my proposal did not account for the normal life-cycle of issues and PRs in scipy; my estimations we're too hopeful. One of OpenSource's greatest strengths is the community getting involved in peer reviews; this allows a developer to "in the face of ambiguity, refuse the temptation to guess". If you didn't get that [spoiler alert] it was a reference to the zen of python (and if you're still reading this and your name is Hélène, I love you). The problem with this is that even the smooth PRs can take much longer than one week to be merged because of the back and forth with feedback from the community and code update (if it's a controversial topic, discussions can take months). Originally, I had planned to work on four or five open issues a week, have the PRs merged and then continue with the next four or five issues for the next week but this was too naive so I have had to make some changes. I spent the last week compiling a list of next steps for pretty much all of the open issues and I am now trying to work on as many as I can at a time, thus minimising the impact of waiting periods between feedback cycles for each PR. I can already feel the snowball effect it is having on the project and on my motivation. I am learning a lot more (and in less time) than before which was the whole idea behind doing the Summer of Code. I will get back in touch soon. I feel like I have rambled on for too long, so I will stop and let you continue to be awesome and get on with your day. Cheers, Abraham. Nikolay Mayorov(SciPy) Trust Region Reflective Algorithm The most relevant description of this algorithm can be found in the paper “A subspace, interior and conjugate gradient method for large-scale bound-constrained minimization problems” by Coleman and Li, some insights on its implementation can be found in MATLAB documentation here and here. The difficulty was that the algorithm incorporates several ideas, but it was not very clear how to combine them all together in the actual code. I will describe each idea separately and then outline the algorithm in general. I will consider the algorithm applied to a problem with known Hessian, in least squares we replace it by $J^T J$. I won’t give any explanation or motivation for some things, if you are really interesting try digging into the original papers. Interior Trust-Region Approach and Scaling Matrix The minimization problem is stated as follows: $\min f(x), \: x \in \mathcal{F} = \{x: l \leq x \leq u\}$ Some of the components of $l$ and $u$ can be infinite meaning no bound in this direction. Let’s use the notation $g(x) = \nabla f(x)$ and $H(x) = \nabla^2 f(x)$. The first order necessary conditions for $x_*$ to be a local minimum: $g(x_*)_i = 0 \text{ if } l_i < x_i < u_i$ $g(x_*)_i \leq 0 \text{ if } x_i = u_i$ $g(x_*)_i \geq 0 \text{ if } x_i = l_i$ Define a vector $v(x)$ with the following components: $v(x)_i = \begin{cases} u_i - x_i & g_i < 0 \text{ and } u_i < \infty \\ x_i - l_i & g_i > 0 \text{ and } l_i > -\infty \\ 1 & \text{otherwise} \end{cases}$ Its components are distances to the bounds at which anti-gradient points (if this distance is finite). Define a matrix $D(x) = \mathrm{diag}(v(x)^{1/2})$, the first order optimality can be stated as $D(x_*)^2 g(x_*) = 0$. Now we can think of our optimization problem as the diagonal system of nonlinear equations (I would say it is the main idea of this part): $D^2(x) g(x) = 0$. The Jacobian of the left hand side exist whenever $v(x)_i \neq 0$ for all $i$, which is true when $x \in \mathrm{int}(\mathcal{F})$ (not on the bound). Assume that this holds, then Newton step for this system satisfies: $(D^2 H + \mathrm{diag}(g) J^v) p = - D^2 g$ Here $J_v$ is diagonal Jacobian matrix of $v(x)$, its elements take values $\pm 1$ or $0$, note that all elements of the matrix $C =\mathrm{diag}(g) J^v$ are non-negative. Now introduce the change of variables $x = D \hat{x}$. In the new variables we have Newton step satisfying: $\hat{B} \hat{p} = -\hat{g}$ where $\hat{B} = D H D + C$, $\hat{g} = D g$ (note that $\hat{g}$ is a proper gradient of $f$ with respect to “hat” variables). Looking at this Newton step we formulate corresponding trust-region problem: $\displaystyle \min_{\hat{p}} \: \hat{m}(\hat{p}) = \frac{1}{2} \hat{p}^T B \hat{p} + \hat{g}^T \hat{p}, \text{ s. t. } \lVert \hat{p} \rVert \leq \Delta$. In the original space we have: $B = H + D^{-1} C D^{-1}$, and the equivalent trust-region problem $\displaystyle \min_{p} \: m(p) = \frac{1}{2} p^T B p + g^T p, \text{ s. t. } \lVert D^{-1} p \rVert \leq \Delta$. From my experience the better approach is to solve the trust-region problem in “hat” space, so we don’t need to compute $D^{-1}$ which can become arbitrary large when the optimum is on the boundary and the algorithm approaches it. A modified improvement ratio of out trust-region solution is computed as follows: $\displaystyle \rho = \frac{f(x + p) - f(x) + \frac{1}{2}\hat{p}^T C \hat{p} } {\hat{m}(\hat{p})}$ Based on $\rho$ we adjust a radius of trust region using some reasonable strategy. Now summary and conclusion for this section. Motivated by the first-order optimality condition we introduced a matrix $D$ and reformulated our problem as the system of nonlinear equations. Then motivated by the Newton process for this system we formulated the corresponding trust-region problem. The purpose of the matrix $D$ is to prevent steps directly into bounds, so that other variables can also be explored during the step. It absolutely doesn’t mean that after introducing such matrix we can ignore the bounds, specifically our estimates $x_k$ must remain strictly feasible. The full algorithm will be described below. Reflective Transformation This idea comes from another paper “On the convergence of reflective Newton methods for large-scale nonlinear minimization subject to bounds” by the same authors. Conceptually we apply a special transformation $x = R(y)$, such that $y$ is unbounded variable and try to solve unconstrained problem $\min_y f(R(y))$. The authors suggest a reflective transformation: a piecewise linear function, equal to identity when $y$ satisfies the initial bound constraints, otherwise reflected from the bounds as a beam of light (I hope you got the idea). I implemented it as follows (although don’t use this code anywhere): import numpy as np def reflective_transformation(y, l, u): if l is None: l = np.full_like(y, -np.inf) if u is None: u = np.full_like(y, np.inf) l_fin = np.isfinite(l) u_fin = np.isfinite(u) x = y.copy() m = l_fin & ~u_fin x[m] = np.maximum(y[m], 2 * l[m] - y[m]) m = ~l_fin & u_fin x[m] = np.minimum(y[m], 2 * u[m] - y[m]) m = l_fin & u_fin d = u - l t = np.remainder(y[m] - l[m], 2 * d[m]) x[m] = l[m] + np.minimum(t, 2 * d[m] - t) return x  This transformation is simple and doesn’t significantly increase the complexity of the function to minimize. But it is not differentiable when $x$ is on the bounds, thus we again use strictly feasible iterates. The general idea of the reflective Newton method is to do line search along the reflective path (or a traditional straight line in $y$ space). According to the authors this method has cool properties, but it is used very modestly in the final large-scale Trust Region Reflective. Large Scale Trust-Region Problem In the previous post I conceptually described how to accurately solve trust-region subproblems arising in least-squares minimization. Here I again focus on least-squares setting and briefly describe how it can be solved approximately in large-scale. 1. Steihaug Conjugate Gradient. Apply conjugate gradient method to the normal equation until the current approximate solution falls outside the trust region (or indefinite direction is found if $J$ is rank deficient). This actually might be just the best approach for least squares as we don’t have negative curvature directions in $J^T J$, and the only criticism of Steihaug-CG I read is that it can terminate before finding the negative curvature direction. I would assume that it is not very important for positive semidefinite case. 2. Two-dimensional subspace minimization. We form a basis consisting of two “good” vectors, then solve two-dimensional trust region problem with the exact method. The first vector is a gradient, the second is an approximate solution of linear least squares with the current $J$ (computed by LSQR or LSMR). When Jacobian is rank deficient the situation is somewhat problematic, as I noticed in this case a least-norm solution is useless for approximating a trust-region solution. In this case we need to add (not too big) regularization diagonal term to $J^T J$. A recipe for this situation is given in “Approximate solution of the trust region problem by minimization over two-dimensional subspaces”. Outline of Trust Region Reflective Here is the high level description. 1. Consider the trust-region problem in “hat” space as described in the first section. 2. Find its solution by whatever method is appropriate (exact for small problems, approximate for large scale). Compute the corresponding solution in the original space $p = D \hat{p}$. 3. Restrict this trust-region step to lie within bounds if necessary. Step back from the bounds by $\theta = \min(0.05, \lVert D^2 g \rVert)$ times the step length. Do it for all type of steps below. 4. Consider a single reflection of the trust-region step if bound was encountered in 3. Use 1-d minimization of the quadratic model to find the minimum along the reflected direction (this is trivial). 5. Find the minimum of the quadratic model along the $\hat{g}$. (Rarely it can be better than the trust-region step because of the bounds.) 6. Choose the best step among 3, 4, 5. Compute the corresponding step in the original space as in 2, update $x$. 7. Update the trust region radius by computing $\rho$ as described in the first section. 8. Check for convergence and go to 1 if the algorithm has not converged. In the next two posts I will describe another type of algorithm which we call “dogbox” and provide comparison benchmark results. Yue Liu(pwntools) GSOC2015 Students coding Week 04 week sync 08 Last week: • Advance feature supported, see issue #27 . • ARM ROP chain supported, see the example armpwn exploit • Simplify the gadgets, when binary size large enough. • Drop gadgets who's branch >= 2, except call reg; xxx; ret / blx reg; xxx; pop{.*pc} / int 0x80; xxx; ret; / svc; xxx; ret • All in the pull request #24 Next week: Aron Barreira Bordin(Kivy) Progress Report 1 <p>Hi!</p> <p>In this week I developed some extra features to <strong>Kivy Designer</strong> not listed in my proposal. In the firsts weeks I made a good advance on my proposal, so now I have some times to add some important features to the project.</p> <p>Check the video with some of this features working:</p> <div align="center"> <iframe width="560" height="315" src="https://www.youtube.com/embed/wMdBLvUT0wc" frameborder="0" allowfullscreen></iframe> </div> <h3>Better Python Code Input - Jedi</h3> <p>Something completely <strong>essential to any IDE is autocompletion</strong>. In this extra-feature, I had added some improvements to the Python Code Input. Now it&#39;s possible to <strong>change the theme</strong>, it shows the <strong>line number</strong> on the left, and the most important one: <strong>Jedi integration</strong>.</p> <h3>Jedi - an awesome autocompletion/static analysis library for Python</h3> <p>Jedi is a static analysis tool for Python that can be used in IDEs/editors. Its historic focus is autocompletion, but does static analysis for now as well. Jedi is fast and is very well tested. It understands Python on a deeper level than all other static analysis frameworks for Python.</p> <h3>Next week</h3> <h4>Kivy Console</h4> <p>The current version of Kivy Console(a terminal emulator on Kivy Designer) have some bugs, it&#39;s not compatible with Python 3 and it&#39;s getting slower with long processes. Now I&#39;m analyzing to check what is the best: fix some parts, or rewrite this Widget.</p> <h4>PRs</h4> <p>I have 3 PRs waiting review and some branchs waiting r+ from these PRs.</p> <p>Thats it, thanks for reading :)</p> <p>Aron Bordin.</p> June 18, 2015 Siddhant Shrivastava(ERAS Project) When two Distributed Systems meet! Hi! This post is meant to be an insight into the experience and progress of the third and fourth weeks of my (a)vocation with the Google Summer of Code Program. Things got much pacier and smooth in the past two weeks. I've been able to get a stable codebase up and running with respect to the aims discussed in the timeline. I had to totally restructure my programming workspace for the second time to support Intelligent IDE like features since the Python packages I am working with (ROS and Tango) have a fair number of modules whose documentation I need to read on the fly while coding away. Thus I set up both my Vim and Sublime Text environments to support intelli-sense, code completion, block syntax completion, etc. I also added a dual monitor setup with the unused LCD television at my home to make for an efficient programming ambience. Telerobotics Code Pushed As I mentioned in my first post, the contributors of the Italian Mars Society are given write access to the online Bitbucket repository. This is a tremendous responsibility to ensure that the updates don't disturb the stability of the project. To work with this, I follow the simple and effective advice of my mentors - hg pull hg update hg add . hg commit -m "My awesome Commit Message" hg push  This simple algorithm ensures that all students can work at their pace without breaking the system. This simple tutorial can help the uninitiated to understand what I just said. So while working with Tango servers for my project, I had to constantly use the bundled GUI - Jive which works as a one-stop solution for Device Servers. But my primordial hacker instincts prompted me to write a CLI solution to add and remove device servers using the amazing PyTango API. Thanks to Ezio's excellent comments on my commits, I've been able to contribute a Pythonic solution for working with Device Servers in a jiffy. The script can be found here. It has a nice UI to help the user figure out what he/she needs to enter. I have yet to correct some formatting errors to make it more consistent with PEP8 and the EAFP idiom. The current stage of argument validation is more like LBYL (Look Before You Leap) which is slow for the script's use-case. The second module I pushed is the Husky Test script to ensure if the Husky installation works or not on a particular setup. The test script which allows a Husky to move with a particular linear and angular velocity. The Software Architecture Document was also updated to account for the new changes in the ROS-Tango interface architecture. A better understanding of the SAD can be had in an earlier post. Docker I explained the Docker setup and distribution in a quick mini-post. I tested that the X-errors don't impede with the scripts that I have been developing since ROS topics can be accessed from the command line as well. This is a good thing. The Docker repository for my workspace can be found here. Python Reading I have been voraciously consulting the following sources for getting the knack of Python and PyTango programming - The happiest point of all this reading kicked in when I could help Vito to reduce fifty lines of code to just two with the use of the exec construct in Python. In case you're wondering, this is the code written by Vito -  joints = [ 'skeleton_head', 'skeleton_neck', 'skeleton_left_shoulder', 'skeleton_right_shoulder', 'skeleton_left_elbow', 'skeleton_right_elbow', 'skeleton_left_hand', 'skeleton_right_hand', 'skeleton_torso', 'skeleton_left_hip', 'skeleton_right_hip', 'skeleton_left_knee', 'skeleton_right_knee', 'skeleton_left_foot', 'skeleton_right_foot' ] attr_init_params = dict( dtype=('float32',), unit='m', max_dim_x=3, polling_period=POLLING ) for joint in joints: exec "%s = attribute(**attr_init_params)" % joint  Note that without the exec usage, each line would've to be manually written for each of the joint that we see in the joints list. Ongoing Stuff There are certain deliverables in the pipeline currently waiting to be pushed to the online repository over the course of the next week. I have been working on - • ROS-feedback Aggregator Device Server for Tango • ROS Commander Node for the Husky • Tango Client to understand Husky status (battery levels, sensor monitor, etc.) • Mathematical Transformations and Named Tuples for different structures that Telerobotics requires. GSoC with PSF and Italian Mars Society is turning out to be fun-and-challenging. Mid-term Evaluations start in a week. Lots of work to do. I strongly hope my next post will be a celebratory one highlighting the pushed code I described in Ongoing Stuff. Until then, Ciao! Richard Plangger(PyPy) It is ... alive!!! I have been quite busy the last weeks improving my solution. Most of the time I have dedicated to accumulation of values. But first I have to tell you about the ... Breakthrough I have measured speedup on my sample interpreter already, but not in the NumPy library. I have tested and hardened the edge cases and it is now possible to measure speedup using the NumPy library. Micro benchmark a = np.arange(1000.0) b = np.arange(1000.0) for i in range(10000): a = a + b Invoking this program one can measure as speedup of ~1.33 faster program execution. Well, that is not quite the theoretical maximum of 2.00 (SSE4) I have then spent time to analyze the behavior using several profiling utilities. The included Python profiler did not do the job, because it is unaware of the underlying JIT. Thus I used the brand new vmprof and gprof. Sidenote: I used gprof only to verify, but if a statistical profiler is enough for your python program, go for vmprof! The overhead is minimal and it is possible to get live profiling feedback of your application! In combination with the jitviewer you can find out where your time is spent. It helped me a lot and the above loop spends about half of the time copying memory. So if the loop body is exchanged with ufunc.add(a, b, out=a) speedup increases up to 1.70-1.80. That is better, but where is the rest of the time spent? Sadly the profiling says in the loop around the NumPy call. One of my mentors has suggested that there might be possibilities to improve the register allocation. And I'm currently evaluating a way to exchange and add some heuristics to improve the allocator. The loop itself is a magnitude faster than the scalar loop. So I'm quite happy that my idea really worked out. Accumulation That is another big thing that I have been working on. I did not suggest this improvement in my GSoC proposal. Still I want to include it. Frequently used functions in scientific computing are sum, prod, any, all, max, min, ... Some of them consider the whole array, some of them bail out if an element has been found. There is potential to use SIMD instructions for these operations. Let's consider sum(...). The addition is commutative. x+y = y+x f.a. R Thus I have added a temporary vector register for summation, the accumulator. Instead of resolving the dependency using a horizontal add (supported by x86 SSE4) the loop partially sums the array. At every guard exit the accumulator is then horizontally added. Again the theoretical speedup is a factor 2 when using float64 on SSE4. I have not yet managed to compile a version that fully works on sum, but I'm quite close to it. Other functions like all or any are more complex. It is not so easy to recognize the reduction pattern if more than one operation is involved. I will add a pattern matcher for those instructions. Let's have a look at the following example (for all): d = int_and(a,b) guard_true(d) e = int_and(c,d) guard_true(e) And output the following vector statements (excluding guard compensation code) v = vec_int_and([a,c], accum) guard_true(v) I did not expect... I have evaluated the possibility to vectorize arbitrary PyPy traces using the array module. This does not work for PyPy traces. It works in my test toy language (located here). Let's take a look at the following user program: while i < 100: a[i] = b[i] + c[i] * 3.14 i += 1 a,b and c are array objects of the Python array module. Their elements are homogeneous and adjacent in memory. The resulting trace could be transformed into a vectorized form. The current two limitations make it impossible to vectorize the user program: 1) Python checks array boundaries (also negative) for each load/store operation. This adds an additional guard to the trace. 2) The boxing of integer values. The index variable will be recreated at the end of the loop and incremented. This includes several non pure operations in the trace including memory allocation of an integer box every iteration. I do not yet know how I will come around these problems, but for the second limitation I'm quite sure that the creation of the integer box can be moved to the guard exit. Nikolay Mayorov(SciPy) Basic Algorithms for Nonlinear Least Squares Hi! The last two weeks in GSoC I spent time developing algorithm drafts, testing and tuning them. It is an addicting process and I spent perhaps already too much time on it. The results are pretty good, but now I seriously need to stop it and start writing production quality code and so on. The first thing we decided I should do is to provide a plain English description of the methods I studied and implemented. To make it more understandable I decided to make this post as an introduction to nonlinear least-squares optimization. So here I start. Nonlinear least-squares optimization problem The objective function we seek to minimize has the form $f(x) = \dfrac{1}{2} \sum\limits_{j=1}^m r^2_i(x) = \frac{1}{2} \lVert r(x) \rVert^2$, here we introduced the residual vector $r(x)$ and we want to minimize the square of its Euclidean norm. Each component $r_i(x)$ is a smooth function from $\mathbb{R}^n$ to $\mathbb{R}$. We treat $r(x)$ as a vector-valued function which Jacobian matrix $J(x)$ defined as follows $J_{ij}(x) = \dfrac{\partial r_i(x)}{\partial x_j}.$ In other words $i$-th row of $J$ contains transposed gradient $\nabla r_i(x)^T$. Now it is easy to verify that the gradient and Hessian (the matrix containing all second partial derivatives) of the objective function have the form: $\nabla f(x) = J^T(x) r(x)$ $\displaystyle \nabla^2 f(x) = J^T(x) J(x) + \sum_{j=1}^n r_j(x) \nabla^2 r_j(x)$ Notice that the second term in a Hessian will be small if a) the residuals $r_j(x)$ are small near the optimum or b) the residuals depends on $x$ approximately linearly (possibly only near the optimum). Both conditions are often satisfied in practice, which leads to the main idea of least-squares minimization to use an approximation of Hessian in the form: $\nabla^2 f(x) \approx J^T(x) J(x)$. So the distinctive feature of least-squares optimization is the availability of a good Hessian approximation using only the first order derivative information. Noted that, we can apply any optimization method which employs Hessian information with a good probability of satisfactory results. Most of the local optimization algorithms are iterative: starting with the initial guess $x_0$ they generate a sequence $\{x_k\}$ which should converge to a local minimum. I will denote steps taken by an algorithm with a letter $p$, such that $x_{k+1} = x_k + p_k$. Also I will denote the quantities at a point $x_k$ with the index $k$, for example $J_k \equiv J(x_k)$ and so on. Gauss-Newton method This is just an adaptation of Newton method where instead of computing Newton step at each iteration $\nabla^2 f(x_k) p_k^N = -\nabla f(x_k)$, we compute Gauss-Newton step using the aforementioned Hessian approximation: $J^T(x_k) J(x_k) p_k^{GN} = -J^T(x_k) r(x_k)$. To make the algorithm globally convergent we invoke a line search along the computed direction to satisfy a “sufficient decrease condition”. See the chapter on line search in “Numerical optimization” of Nocedal and Wright. Note that the equation for $p_k^{GN}$ is the normal equation of the following linear least squares problem: $\dfrac{1}{2} \lVert J_k p + r_k \rVert \rightarrow \min\limits_p$. It means that $p_k^{GN}$ can be found by whatever linear-least squares method you think is appropriate: 1. Through the normal equation with Cholesky factorization. Pros: effectiveness when $m >> n$, $J^T J$ and $J^T f$ can be computed using $O(n^2 + m)$ memory, solving the equation is fast. Cons: potentially bad accuracy since the condition number of $J^T J$ is squared compared to $J$, Cholesky factorization can incorrectly fail when $J$ is nearly rank deficient. 2. Through QR factorization with pivoting of $J$. Pros: more reliable for ill-conditioned $J$. Cons: slower than previous, rank deficient case is still problematic. 3. Through singular value decomposition (SVD) of $J$. Pros: the most robust approach, gives the full information about sensitivity of the solution to perturbations of the input, allows finding the least norm solution in the rank deficient case, allows zeroing of very small singular values (thus avoiding the excessive sensitivity). Cons: slower than the previous two. 4. Conjugate gradient and alike methods (LSQR, LSMR), which only requires the ability to compute $J u$ and $J^T v$ for arbitrary $u$ and $v$ vectors. Used in sparse large-scale setting. The third approach is used in numpy.lstsq. At the moment I don’t have a good idea of how much SVD approach is slower than QR. If singular values of $J$ are uniformly bounded from zero in the region of interest then Gauss-Newton method is globally convergent and the convergence rate is no worse than linear and approaches quadratic as the second term in true Hessian becomes insignificant compared to $J^T J$. But note that convergence rate of an infinite sequence is measured after some $k$, which can be arbitrary large. It means that we are to observe such rate of convergence within some (perhaps small) neighborhood of the optimal point. Levenberg-Marquardt method The original idea of Levenberg was to use “regularized” Gauss-Newton step, which satisfies the following equation: $(J_k^T J_k + \alpha_k I) p_k^{LM} = -J_k^T f_k$, where $\alpha_k \ge 0$ is adjusted from iteration to iteration depending on some measure of the success of the previous step. A line search is not used in this algorithm. The rough explanation is as follows. When $\alpha_k$ is small we are taking full Gauss-Newton steps, which are known to be good near the optimum, when $\alpha_k$ is big we are taking the steps along the anti-gradient, thus assuring global convergence. If I’m not mistaken Marquardt contribution was a suggestion to use a more general term $\alpha_k D_k$ instead of $\alpha_k I$, with $D_k = \mathrm{diag}(J_k^T J_k)$. This is a question of variables scaling and for simplicity I will ignore it here. Algorithms directly adjusting $\alpha_k$ are considered obsolete, but nevertheless such algorithm is used in MATLAB for instance and it works well. But the more recent view is to consider Levenberg-Marquardt as a trust-region type algorithm. In a trust-region approach we obtain the step by solving the following constrained quadratic subproblem: $\min\limits_{p} \: m_k(p) = \frac{1}{2} p^T B_k p + g_k^T p, \: \text{s. t.} \lVert p \rVert \le \Delta_k$, where $B_k$ is the approximation to Hessian at the current point, $g_k$ — the gradient, $\Delta_k$ — a radius of the trust region. A radius $\Delta$ is adjusted by observing the ratio of actual to predicted change in the objective function (as a measure of adequateness of a quadratic model within the trust region): $\rho_k = \dfrac{f(x_k + p_k) - f(x_k)}{m(p_k)}$ The update rules for $\Delta$ are approximately as follows: if $\rho_k < 0.25$ then $\Delta_{k+1} = 0.25 \Delta_k$, if $\rho_k > 0.75$ and $\lVert p_k \rVert =\Delta_k$ then $\Delta_{k+1} = 2 \Delta_k$, otherwise $\Delta_{k+1} = \Delta_k$. If $\rho_k$ is negative (no actual descrease) then the computed step is not taken and it is recomputed with $\Delta_{k+1}$ from the current point again (the threshold for this might be higher, for example 0.25 is stated in “Numerical Optimization”). In least-squares problems we have $B_k = J_k^T J_k, g_k = J_k^T r_k$. The connection between original Levenberg method and a trust-region method are given by the following theorem: The solution to a trust-region problem $p^*$ satisfies the equation $(B+\alpha I)p^* = -g$, for some $\alpha \ge 0$. Such that $B + \alpha I$ is positive semidefinite, and $\alpha (\lVert p^* \rVert - \Delta) = 0$. The last condition means that either $\alpha =0$ or the optimal solution lies on the boundary. This theorem tells us that there is a one-to-one correspondence between $\Delta$ and $\alpha$ and suggests the following conceptual algorithm of solving trust-region problem: 1. If $B$ is positive definite compute a Newton step $p = -B^{-1} g$ and if it is within the trust regions — we found our solution. 2. Otherwise find $\alpha^* > 0$ s. t. $\lVert p(\alpha^*) \rVert = \Delta$ using some root finding algorithm. Then compute $p^*$ using $\alpha^*$. The step 2 is not particular easy, also there is an additional difficult case when $B + \alpha^* I$ will be positive semidefinite and we can’t compute $p^*$ just from the equation. For the proper discussion of the problem refer to “Numerical Optimization”, Chapter 4. The suitable and detailed algorithm for solving a trust-region subproblem arising in least squares is given in the paper “The Levenberg-Marquardt Algorithm – Implementation and theory” by J. J. More (implemented in MINPACK, scipy.optimize.leastsq wraps it). The author analyses the problem in terms of SVD, but then suggests an implementation using QR decomposition and Givens rotations (the approach generally chosen in MINPACK). I decided to stick with SVD, the only potential disadvantage of SVD is speed (but even that is questionable), in all other aspects this approach is great, including the simplicity of implementation. Let’s introduce a function we wan’t to find a zero of (see case 2 of trust-region solving algorithm): $\phi(\alpha) = \lVert p(\alpha) \rVert - \Delta = \lVert (J^T J + \alpha I)^{-1} J^T f \rVert -\Delta$ If we have SVD $J = U \Sigma V^T$ and $z = U^T f$, then $\displaystyle\phi(\alpha) = \left( \sum_{i=1}^n \frac{\sigma_i^2 z_i^2}{(\sigma_i^2 + \alpha)^2} \right)^{1/2} - \Delta$ We have an explicit function of $\alpha$ and can easily compute its derivative too. In numpy/scipy this function can be implemented as follows and will work correctly even when $m < n$. import numpy as np from scipy.linalg import svd def phi_and_derivative(J, f, alpha, Delta): U, s, VT = svd(J, full_matrices=False) suf = s * U.T.dot(f) denom = s**2 + alpha p_norm = np.linalg.norm(suf / denom) phi = p_norm - Delta phi_prime = -np.sum(suf ** 2 / denom**3) / p_norm return phi, phi_prime  Then an iterative Newton-like zero-finding method is run (with some safeguarding described in the paper) until $|\phi(\alpha)| \leq 0.1 \Delta$, it converges usually in 1-3 iterations. Of course in a real implementation SVD decomposition should be precomputed only once (this is not true for the QR approach described in the paper), also note that we need only “thin SVD” (full_matrices=False). So by almost the same with numpy.lstsq amount of work we accurately solve a trust region least-squares subproblem. This (near) exact procedure of solving a trust-region problem is important, but in large-scale setting one of the approximate methods must be chosen. I will outline them in a future posts. Perhaps the most efficient and accurate method is to solve the problem in a subspace spanned by 2 properly selected vectors, in this case we apply the exact procedure to the $m \times 2$ matrix. Summary Gauss-Newton and Levenberg-Marquardt methods share the same convergence properties. But Levenberg-Marquardt can handle rank deficient Jacobians and, as far as I understand, works generally better in practice (although I don’t have experience with Gauss-Newton). So LM is usually the method of choice for unconstrained least-squares optimization. In the next posts I will describe the algorithm I was studying and implementing for bound constrained least squares. Its unofficial title is “Trust Region Reflective”. Raghav R V(Scikit-learn) GSoC 2015 - PSF / scikit-learn - Nested Cross Validation Nested cross validation is simply cross validation done for hyper-parameter tuning as well as for evaluation of the tuned model(s). This is necessary to have an unbiased measure of the tuned estimators' performance score. To elaborate a bit we will start will model selection. The basic process in tuning a model is trying out different models (parameter combinations) and choosing one which has the highest cross validated score. The cross validation makes the scores unbiased by avoiding optimistic evaluation of the scores which happens when the model is tested with the training data itself. Now the best model's cross validated score found using the entire dataset cannot be considered as an unbiased estimate of this tuned model's performance on unseen data. This is because the information about the dataset could have leaked into the model by the selection of the best hyper parameters. (i.e, the hyper parameters could have been optimized for this input dataset which was also used to obtain the cross validated score.) To avoid this, we could simply partition the initial dataset into a tuning set and a testing set, tune the model using this tuning set, and finally evaluate it on the testing set. This would give us a fairly unbiased estimate of the tuned model as long as the tuning and testing set are similar in their distribution. Moreover partitioning the dataset and not utilizing the testing set for the model building is a bit uneconomical, especially when the number of samples are less, considering the fact that we get only a single evaluation of only one best model. Using cross validation to do this will be economical and efficient as it produces one best model and its unbiased score for each iteration. This makes it possible to check if there is any variance amongst the different models or their scores. In a nested CV approach to model selection, there are three main parts. The outer CV loop • The outer CV loop has n iterations (the number of iterations depends on the selected CV strategy) • For each iteration • The data is split into a tuning set and a testing set. • This tuning set is then passed on to the search algorithm which returns the best hyper parameter setting for that tuning set. • This model is then evaluated using the testing set to obtain a score which will be an unbiased estimate of the estimators performance on unseen data. • The variance in each model's hyper param setting and its score is studied to get a better picture of the best models. The parameter search • The parameter search module is given the estimator, the range of hyper parameters and a tuning set. • The various possible combinations of the hyper parameters are generated. • For each combination of the hyper parameter - • The estimator is constructed using this combination. • This estimator and the tuning set are passed to the inner CV loop to fit and evaluate the model for that particular combination. • If the inner CV loop has m interations, there will be m such performance scores*. • The mean of these m performance scores give an average measure of the estimator for that particular combination of hyper params. • The combination which has the best performance measure amongst the various combinations is chosen as the best model. The Inner CV loop • The inner CV loop gets the unfitted estimator with the particular combination of the hyperparameters and the tuning set. • Similar to the outer CV loop there are multiple iterations in the inner CV loop, say m (number of ways in which the data is partitioned). • In each iteration - • The tuning set is split into a training set and a testing set. • This training set is used to fit the estimator. • The testing set is used to evaluate the estimator's score. • Since the testing set is not used while training it eliminates the possibility of a bias in the computed score owing to overfitting of the model to the training data. • m such scores are then returned to the search module which averages them to get an unbiased measure of the models performance. So now to perform nested CV we require 2 cv iterators, say, the outer_cv and inner_cv. Since the cross validation iterators (as seen in the previous blog post) are data dependent, both these objects need to be constructed by passing in the characteristics of data such as the number of samples or the labels. While it is easier to set the outer_cv object, since the entire data is passed to it, constructing the inner_cv object ranges from difficult to impossible depending on the CV strategy. This is because the characteristics of the data (tuning set) generated by the outer CV loop for each iteration is not easily known which makes it impossible to construct the inner_cv object which requires this information. By making CV iterators data independent, we will no longer have this limitation. To illustrate nested CV let's consider a small example which despite the data dependency of iterators is possible owing to the selected CV strategy. Lets again work with the iris dataset and SVC.    import numpy as np from sklearn import datasets from sklearn.cross_validation import ShuffleSplit from sklearn.svm import SVC iris = datasets.load_iris() n_samples = iris.data.shape[0] We will be using GridSearchCV for the parameter searching process. For the inner CV loop let us choose the default [Stratified*]KFolds CV strategy with the number of folds, k = 4. NOTE: KFolds has been deliberately chosen to illustrate nested CV without getting bitten by data dependency. The reason why this works is because the CV iterator is constructed implicitly by the cross_val_score function which is placed inside GridSearchCV. Thus it has knowledge of the tuning set and hence is able to supply the data dependent parameters required for the construction of the inner CV loop. This eliminates the need for us to explicitly construct the CV object by initializing it with the data characteristics.    p_grid = {'C':[1, 10, 100, 1000], 'gamma':[1e-1, 1e-2, 1e-3, 1e-4], 'degree':[1, 2, 3]} grid_search = GridSearchCV(SVC(random_state=0), param_grid=p_grid, cv=4) The outer CV loop can now be freely chosen since we have the dataset X, y  cv_outer = StratifiedShuffleSplit(y, n_iter=5, test_size=0.3, random_state=0) Now lets nest this cross validated parameter search inside the outer CV loop and get the best parameters and the scores for 5 iterations.  >>> for training_set_indices_i, testing_set_indices_i in cv_outer: ... training_set_i = X[training_set_indices_i], y[training_set_indices_i] ... testing_set_i = X[testing_set_indices_i], y[testing_set_indices_i] ... grid_search.fit(*training_set_i) ... print grid_search.best_params_, '\t\t', grid_search.score(*testing_set_i) {'C': 10, 'gamma': 0.1, 'degree': 1} 1.0 {'C': 10, 'gamma': 0.1, 'degree': 1} 0.977777777778 {'C': 100, 'gamma': 0.01, 'degree': 1} 0.977777777778 {'C': 1000, 'gamma': 0.001, 'degree': 1} 0.977777777778 {'C': 10, 'gamma': 0.1, 'degree': 1} 0.977777777778 The scores alone can be obtained using the cross_val_score    >>> cross_val_score(grid_search, X, y, cv=cv_outer) array([ 0.97777778, 0.95555556, 1. , 0.95555556, 1. ]) To provide us more flexibility in choosing the inner CV strategy, we require the CV iterators to be data independent so they can be constructed without prior knowledge of the data. *Stratification is used in classification tasks to make the subsets (folds/strata) homogenous (i.e the percentage of samples per class remains same). My Progress The month of May went really slow with me doing very less work. :/ I've been working hard the past few weeks to catch up with all the pending issues/PRs both GSoC related or otherwise. WRT to my GSoC work, so far I've finished the model_selection refactor (Goal 1) and have done a quick draft of the data independent CV iterator which I will refine, add tests and document before pushing. I hope to finish Goal 1 and Goal 2 (excluding reviews/revisions) before this Sunday and publish my next blog post by Monday/Tuesday, which will be on how the new data independent CV iterator makes nested CV easier with a few examples. References June 17, 2015 Wei Xue(Scikit-learn) GSoC Week 4: Progress Report Updated in Jun 24. Here is the task check-list. 1. [x] Completes derivation report. 2. [x] Adds new classes. One abstract class _BasesMixture. Three derived classes GaussianMixture, BayesianGaussianMixture, DirichletProcessGaussianMixture 3. [ ] Decouples large functions, especially in DirichletProcessGaussianMixture and BayesianGaussianMixture 4. [x] Removes numerical stability fixes for HMM. It seems that whenever there is a numerical issue, the code always adds 10*EPS in the computation. I think in some cases there is a better way to address the problem, such as normalization the extremely small variables earlier, or we just simply remove 10*EPS which is only for HMM. 5. [ ] Writes updating functions for BayesianGaussianMixtureModel and DirichletProcessGaussianMixtureModel according to the report. 6. [x] Provides methods that allow users to initialize the model with user-provided data 7. [x] Corrects kmeans initialization. It is weird when using kmeans initialization, only means is initialized. The weights and covariances are initialized by averaging. 8. [x] Writes several checking functions for the initialization data 9. [x] Adjusts the verbose messages. When verbose>1, it display log-likelihood and time used in the same line of the message Iteration x 10. [ ] Adjusts the time to compute log-likelihood. The code in the current master branch compute the log-likelihood of the model after E-step which is actually the score of the last iteration, and misses the score immediately after the initialization. 11. [x] Simplify fit_predict 12. [x] Adds warning for params!='wmc' 13. [ ] Studies and contrasts the convergence of classical MLE / EM GMM with Bayesian GMM against the number of samples and the number of components 14. [ ] Friendly warning and error messages, or automatically addressing if possible (e.g. random re-init of singular components) 15. [ ] Examples that shows how models can over-fit by comparing likelihood on training and validation sets (normalized by the number of samples). For instance extend the BIC score example with a cross-validated likelihood plot 16. [ ] Testing on 1-D dimensions 17. [ ] Testing on Degenerating cases 18. [ ] AIC, BIC for VBGMM DPGMM 19. [ ] Old faithful geyser data set 20. [optional] add a partial_fit function for incremental / out-of-core fitting of (classical) GMM, for instance http://arxiv.org/abs/0712.4273 21. [optional] ledoit_wolf covariance estimation The most important progress I have done is the derivation report which include the updating functions, log-probability, and predictive distribution for all three models, and the implementation of the base class. Compared with the current scikit-learn math derivation documents, my report is consistent to PRML. It clearly depicts the updating functions of three models share a lot of patterns. We could abstract common functions into the abstract base class _MixtureBase. The three models could inherit it and override the updating methods. Next week I will finish the GaussianMixture model with necessary testing functions. GSoC Week 4: Progress Report Here is the task check-list. 1. [x] Completes derivation report. 2. [x] Adds new classes. One abstract class _BasesMixture. Three derived classes GaussianMixture, BayesianGaussianMixture, DirichletProcessGaussianMixture 3. [ ] Decouples large functions, especially in DirichletProcessGaussianMixture and BayesianGaussianMixture 4. [x] Removes numerical stability fixes for HMM. It seems that whenever there is a numerical issue, the code always adds 10*EPS in the computation. I think in some cases there is a better way to address the problem, such as normalization the extremely small variables earlier, or we just simply remove 10*EPS which is only for HMM. 5. [ ] Writes updating functions for BayesianGaussianMixtureModel and DirichletProcessGaussianMixtureModel according to the report. 6. [x] Provides methods that allow users to initialize the model with user-provided data 7. [x] Corrects kmeans initialization. It is weird when using kmeans initialization, only means is initialized. The weights and covariances are initialized by averaging. 8. [x] Writes several checking functions for the initialization data 9. [x] Adjusts the verbose messages. When verbose>1, it display log-likelihood and time used in the same line of the message Iteration x 10. [ ] Adjusts the time to compute log-likelihood. The code in the current master branch compute the log-likelihood of the model after E-step which is actually the score of the last iteration, and misses the score immediately after the initialization. 11. [x] Simplify fit_predict 12. [x] Adds warning for params!='wmc' 13. [ ] Studies and contrasts the convergence of classical MLE / EM GMM with Bayesian GMM against the number of samples and the number of components 14. [ ] Friendly warning and error messages, or automatically addressing if possible (e.g. random re-init of singular components) 15. [ ] Examples that shows how models can over-fit by comparing likelihood on training and validation sets (normalized by the number of samples). For instance extend the BIC score example with a cross-validated likelihood plot 16. [optional] add a partial_fit function for incremental / out-of-core fitting of (classical) GMM, for instance http://arxiv.org/abs/0712.4273 17. [optional] ledoit_wolf covariance estimation 18. Testing on 1-D dimensions 19. Testing on Degenerating cases The most important progress I have done is the derivation report which include the updating functions, log-probability, and predictive distribution for all three models, and the implementation of the base class. Compared with the current scikit-learn math derivation documents, my report is consistent to PRML. It clearly depicts the updating functions of three models share a lot of patterns. We could abstract common functions into the abstract base class _MixtureBase. The three models could inherit it and override the updating methods. Next week I will finish the GaussianMixture model with necessary testing functions. Palash Ahuja(pgmpy) Inference in Dynamic Bayesian Networks Today, I will be talking about how the inference works in Dynamic Bayesian Networks. We could have applied the following methods, 1) Naive Method:- There is one way where we could unroll the bayesian network as much as we'd like and then apply inference methods that we applied for the standard bayesian network. However, this methods lead to exponentially large graphs, thereby increasing the time that it takes for inference. (It was a surprise to me, wondered if we could do so, but then leads to a lot of problems) 2) Forward and Backward Algorithm:- We could apply this algorithm but then this applies exclusively for hidden markov model(hmm). This involves converting each of those nodes into state spaces, basically increasing the sizes, again leading to huge complexity. So as to reduce the complexity of inference, the methods are as follows:- We could compute a prior belief state by using some recursive estimation. Let's assume that Then we could propagate the state forward by the following algorithm. So this complicated jargon would be always multiplied for a recursive procedure so as to do the filtering algorithm. Other Algorithms are the Frontier Algorithms and the Interface Algorithms which are more popular for the inference on a large scale. Apart from that, there could be certain expectation maximization algorithms which may help in computing the most probable path. This is what I am planning to implement in a couple of weeks. Yask Srivastava(MoinMoin) Gsoc Update Informal Intro Phew! So my end-terms exams just ended on 15th June. But I had long gaps in between my exams so I was contributing to MoinMoin side by side in my free time. Now since my exams have finally ended, I can devote myself 100% to this. \o/ Progress As I mentioned in my introductory blog post, we decided to use Less. I started by modernized theme. It used stylus for CSS preprocessing. Since our new basic theme worked on top of Bootstrap's less files we decided to redesign and port theme to work on top of it. I finished writing the base new theme for modernized and also rewrote the base template layout.html to use this theme. Show me the code!! Code Review On process Description: This is how our previous modernized theme looks: This is how I styled it’s menu and sub menu tabs: http://i.imgur.com/0gaA1qh.png”> New Roboto fonts for the wiki contents http://i.imgur.com/uYuMYoj.png”> Complete new look: The code review is still under process so no commits have been made as yet. We also had a weekly meeting yesterday on IRC channel : #moin-dev where we discussed about our progress and future plannnings with all our mentors. My developement configuration I like to see the changes as I make thus compiling less file to css everytime after I made minor change was a big No No. I have added the complete project to Codekit and it automatically compiles and refreshes the page as soon as it detects any changes in the source code. :) We use mercurial as our version control system and this project is hosted on Bitbucket. I like to use sublime text as its super light. Future Plans 1. Make changes in make.py file to automate the less file’s compilation for modernized theme. 2. Write CSS rules for all the elements 3. Design footer, user setting page,… etc. 4. Implement the changes mentioned my mentors on previous CR. Saket Choudhary(statsmodels) Week 3 Update This week was mostly spent in debugging the current MixedLM code. It looks like the convergence depends a lot on the optimiser being used. For example here are two notebooks: 1. http://nbviewer.ipython.org/github/saketkc/statsmodels/blob/kerby_mixedlm_notebooks/examples/notebooks/MixedLM_compare_lr_test-Converging.ipynb 2. http://nbviewer.ipython.org/github/saketkc/statsmodels/blob/kerby_mixedlm_notebooks/examples/notebooks/MixedLM_compare_lr_test.ipynb The first one uses 'Nelder-Mead' as its optimiser while the latter relies on 'BFGS'. lme4 by default uses BOBQA. However, switching the default optimiser to Nelder-Mead in MixedLM results in a lot of difference from the expected results(following lme4 results). There is an existing issue tracking this[1] and an existing PR to Kerby's branch[2] This Week, I plan to profile and probably finish off the optimisation work. My goals have been well summarised by Kerby in [1] [1] https://github.com/statsmodels/statsmodels/issues/2452 [2] https://github.com/kshedden/statsmodels/pull/1 AMiT Kumar(Sympy) GSoC : This week in SymPy #3 Hi there! It's been three weeks into GSoC, & I have managed to get some pace. This week, I worked on creating ComplexPlane Class. Progress of Week 3 The major portion of this week went onto creating ComplexPlane Class. PR #9438 Design The design for the ComplexPlane class supports both forms of representation of Complex regions in Complex Plane. • Polar form Polar form is where a complex number is denoted by the length (r) (otherwise known as the magnitude, absolute value, or modulus) and the angle (θ) of its vector. • Rectangular form Rectangular form, on the other hand, is where a complex number is denoted by its respective horizontal (x) and vertical (y) components. Initial Approach While writing code for ComplexPlane class, we started with the following design: Input Interval of a and b interval, as following: ComplexPlane(a_interval, b_interval, polar=True)  Where a_interval & b_interval are the respective intervals of x and y for complex number in rectangular form or the respective intervals of r and θ for complex number in polar form when polar flag is True. But the problem with this approach is that we can't represent two different regions in a single ComplexPlane call, i.e. , for example let say we have two rectangular regions be represented as follows: We have to represent this with two ComplexPlane calls: rect1 = ComplexPlane(Interval(1, 4), Interval(1, 2)) rect2 = ComplexPlane(Interval(5, 6), Interval(2, 8)) shaded_region = Union(rect1, rect2)  Similiary for, the following polar region: halfdisk1 = ComplexPlane(Interval(0, 2), Interval(0, pi), polar=True) halfdisk2 = ComplexPlane(Interval(0, 1), Interval(1, 2*pi), polar=True) shaded_region = Union(halfdisk1, halfdisk2)  Better Approach The solution to the above problem is to wrap up two calls of ComplexPlane into one. To do this, a better input API was needed, and the problem was solved with the help of ProductSet. Now we take input in the form of ProductSet or Union of ProductSets: The region above is represented as follows: • For Rectangular Form psets = Union(Interval(1, 4)*Interval(1, 2), Interval(5, 6)*Interval(2, 8)) shaded_region = ComplexPlane(psets)  • For Polar Form psets = Union(Interval(0, 2)*Interval(0, pi), Interval(0, 1)*Interval(1, 2*pi)) shaded_region = ComplexPlane(psets, polar=True)  Note: The input θ interval for polar form tolerates any interval in terms of π , it is handled by the function normalize_theta_set (wrote using _pi_coeff function), It normalizes θ set to an equivalent interval in [0, 2π), which simplifies various other methods such as _union, _intersect. from future import plan Week #4: This week I plan to polish my pending PR's to get them Merged & start working on LambertW solver in solveset.  git log

PR #9438 : Linsolve

PR #9463 : ComplexPlane

PR #9527 : Printing of ProductSets

PR # 9524 : Fix solveset returned solution making denom zero

That's all for now, looking forward for week #4. :grinning:

June 16, 2015

Chau Dang Nguyen(Core Python)

Week 3

So far, I have my Roundup REST live and working now. During next week, I will have the Documentation published, so people can start making feedback on them.

This week:
Roundup can perform GET, POST, PUT, DELETE and return the data
Errors and Exception handling

Next week:
Perform PATCH
Documentation
Simple client that uses REST from Roundup

Christof Angermueller(Theano)

GSoC week three

Further steps towards a more agile visualization…

Last week, I revised my implementation to improve the visualization of complex graphs with many nodes. Specifically, I

• added buttons to rearrange all nodes in a force layout,
• implemented double-click events to release single nodes from a fixed position,
• colored edges consistently with pydotprint.

You can play around with three different examples here!

The post GSoC week three appeared first on Christof Angermueller.

What's the difference between UTC and UT1?

Keeping time is a messy business. Depending on your perspective, you may want one of two (or more) time systems:
1. As humans, we want a time system that ticks in seconds on the surface of the Earth contiguously and forever, both backwards and forwards in time.
2. As astronomers, we want a time system that will place stationary astronomical objects (like distant quasars) at the same position in the sky with predictable periodicity.
It turns out that reconciling these distinct systems is a difficult task because the Earth's rotation period is constantly changing due to tidal forces and changes in the Earth's moment of inertia. As a result, the number of seconds in a mean solar day or year changes with time in ways that are (at present) impossible to predict, since the variations depend on plate tectonics, large-scale weather patterns, earthquakes, and other stochastic events.

The solution is to keep these two time systems independent.

Coordinated Universal Time (UTC)

The first time system is kept by atomic clocks which tick with no regard for the Earth's rotation. If that system was left uncorrected over many years, solar noon would no longer occur at noon on the atomic clocks, because 24 hours × 60 minutes × 60 seconds is not precisely the rotation period of the Earth. To make up for this, the atomic clock timekeeping system gets leap seconds added to it every so often to keep the atomic clock time as close as possible (within 0.9 seconds) to mean solar time. We call this Coordinated Universal Time (UTC).

Universal Time 1 (UT1)

The second time system is kept by very precisely, for example, by measuring the positions of distant quasars using Very Long Baseline Interferometry. This time is therefore defined by the rotation of the Earth, and varies with respect to UTC as the Earth's rotation period changes. The orientation of the Earth, which must be measured continuously to keep UT1 accurate, is logged by the International Earth Rotation and Reference Systems Service (IERS). They update a "bulletin" with the most recent measurements of the Earth's orientation, called Bulletin B, referred to within astropy as the IERS B table.

Calculating UT1-UTC with astropy

The difference between UTC and UT1 are therefore modulated by (1) changes in the Earth's rotation period and (2) leap seconds introduced to try to keep the two conventions as close to each other as possible. To compute the difference between the two is simple with astropy, and reveals the strange history of our dynamic time system.

The following code and plots are available in an iPython notebook for your forking pleasure.

Using IERS B for backwards conversion

from __future__ import print_functionimport numpy as npimport datetimeimport matplotlib.pyplot as plt# Make the plots prettyimport seaborn as snssns.set(context='talk')# Generate a range of times from 1960 (before leap seconds)# to near the present daydt_range = np.array([datetime.datetime(1960, 1, 1) +                      i*datetime.timedelta(days=3.65) for                      i in range(5600)])# Convert to astropy time objectfrom astropy.time import Timetime_range = Time(dt_range)# Calculate the difference between UTC and UT1 at those times,# allowing times "outside of the table"DUT1, success = time_range.get_delta_ut1_utc(return_status=True)# Compare input times to the times available in the table. See# https://github.com/astropy/astropy/blob/master/astropy/utils/iers/iers.py#L80from astropy.utils.iers import (TIME_BEFORE_IERS_RANGE, TIME_BEYOND_IERS_RANGE,                                FROM_IERS_A, FROM_IERS_B)extrapolated_beyond_table = success == TIME_BEYOND_IERS_RANGEextrapolated_before_table = success == TIME_BEFORE_IERS_RANGEin_table = success == FROM_IERS_B# Make a plot of the time differencefig, ax = plt.subplots(figsize=(10,8))ax.axhline(0, color='k', ls='--', lw=2)ax.plot_date(dt_range[in_table], DUT1[in_table], '-',             label='In IERS B table')ax.plot_date(dt_range[extrapolated_beyond_table],              DUT1[extrapolated_beyond_table], '-',             label='Extrapolated forwards')ax.plot_date(dt_range[extrapolated_before_table],              DUT1[extrapolated_before_table], '-',             label='Extrapolated backwards')ax.set(xlabel='Year', ylabel='UT1-UTC [seconds]')ax.legend(loc='lower left')plt.show()

 There have been 25 leap seconds so far to date (as of summer 2015) since they were introduced in 1972.

Using IERS A for forwards conversion

# Download and cache the IERS A and B tablesfrom astropy.utils.iers import IERS_A, IERS_A_URL, IERS_B, IERS_B_URLfrom astropy.utils.data import download_fileiers_a_file = download_file(IERS_A_URL, cache=True)iers_a = IERS_A.open(iers_a_file)iers_b_file = download_file(IERS_A_URL, cache=True)iers_b = IERS_A.open(iers_b_file)# Generate a range of times from 1960 (before leap seconds)# to near the present daydt_range = np.array([datetime.datetime(1970, 1, 1) +                      i*datetime.timedelta(days=36.5) for                      i in range(525)])# Convert to astropy time objectfrom astropy.time import Timetime_range = Time(dt_range)# Calculate the difference between UTC and UT1 at those times,# allowing times "outside of the table"DUT1_a, success_a = time_range.get_delta_ut1_utc(return_status=True, iers_table=iers_a)DUT1_b, success_b = time_range.get_delta_ut1_utc(return_status=True, iers_table=iers_b)# Compare input times to the times available in the table. See# https://github.com/astropy/astropy/blob/master/astropy/utils/iers/iers.py#L80from astropy.utils.iers import (TIME_BEFORE_IERS_RANGE, TIME_BEYOND_IERS_RANGE,                                FROM_IERS_A, FROM_IERS_B)in_table_b = success_b == FROM_IERS_B# Make a plot of the time differencefig, ax = plt.subplots(figsize=(10,8))ax.axhline(0, color='k', ls='--', lw=2)ax.plot_date(dt_range, DUT1_a, '-',             label='IERS a table')ax.plot_date(dt_range[in_table_b], DUT1_b[in_table_b], 'r--',             label='IERS B table')ax.set(xlabel='Year', ylabel='UT1-UTC [seconds]')ax.legend(loc='upper right')plt.show()

 The IERS A table will know about near-future leap seconds and provide more accurate forward predictions in time.

June 15, 2015

Sartaj Singh(SymPy)

GSoC: Update Week-3

This week PR-#9435, introducing sequences module in SymPy finally got merged. A big thanks to my mentors, for patiently reviewing my PR.

During the week I worked on SeriesBase and FourierSeries class. Much of the time was spent on polishing these classes. I have opened PR-#9523 implementing the two classes.

I also revised my notes on the algorithm for computing Formal Power Series(FPS) as described in the paper Formal Power Series, Dominik Gruntz and Wolfram Koepf.

What is Formal Power Series?

As wikipedia states

In mathematics, formal power series are a generalization of polynomials as formal objects, where the number of terms is allowed to be infinite; this implies giving up the possibility to substitute arbitrary values for indeterminates. This perspective contrasts with that of power series, whose variables designate numerical values, and which series therefore only have a definite value if convergence can be established. Formal power series are often used merely to represent the whole collection of their coefficients.

Formal Power series of a functions is of the form $$f(x) = \sum\limits_{k=0}^\infty a_k x^k$$

Algorithm for computing FPS of a function

1. If $$f(x)$$ or $$f^k(x)$$ is a rational function. Apply rational algorithm.

1. Calculate a complex PFD of f
2. The coefficient can be found by expanding into binomial series.

$$\frac{c}{(x - \alpha)^j} \to \frac{(-1)^j}{\alpha^{j + k}} \binom{j + k - 1}{k}$$

2. Find a simpleDE

1. Fix a number $$N_{max} \in N$$ the maximal order of the DE searched for; a suitable value is $$N_{max} := 4$$.
2. Set N := 1
3. Search for a DE of the form

$$f^k(x) + \sum\limits_{j=0}^{k-1} A_j f^j(x) = 0$$

$$A_j$$ should be rational functions in x.

4. If (3) is unsuccessful, increase N by 1 and go back to (3), until N = $$N_{max}$$

3. Find the corresponding RE of the form

$$a_{n + 1} = \sum\limits_{k=0}^M r_{k} a_{n-k}$$

RE can be found by the following substitution into the DE

$$x^l j^k \to (n + 1 - j)_k . a_{n + k - j}$$

4. Check the type of RE

1. If the RE contains only one summand $$r_{j} a_{n-j}$$ on it's right hand side, then f is of hypergeometric type and RE can be solved, using some initial conditions.
2. If the DE has constant coefficients, then f is exp-like and can also be solved.
3. If the RE is none of the above types. There are two options.
1. Look for a DE of higher degree, so that a RE of hypergeometric type can be found.
2. Try to solve RE using known RE solvers.

• I have started implementing the Rational Algorithm. Initial results are promising. In sympy-master computing first 100 terms of series expansion of $$\ln(1 + x)$$ takes about 2.92 sec while using rational algorithm and sequences takes about 15.3 ms, that's faster by roughly a factor of 190.

• Implement a FormalPowerSeries class that supports functions that are rational or their derivatives are rational, for now.

• If am able to complete this, next on my list will be computing simple DE. This is can be tricky. Sometimes the DE found is not suitable. In this case a DE of higher degree must be found out. I will have to work out a fast way of doing this.

That's all for now. See you next week.

Sahil Shekhawat(PyDy)

GSoC Week 4

We finally released PyDy 0.3.0. It was a big release and included a lot of new things like our new visualizer. Here is the official release statement from Jason.

Andres Vargas Gonzalez(Kivy)

Connecting events between kivy and matplotlib

Matplotlib provides a list of events that can be connected to an external backend. This list of events can be found in backend_bases.py, the events are:

events = [
'resize_event',
'draw_event',
'key_press_event',
'key_release_event',
'button_press_event',
'button_release_event',
'scroll_event',
'motion_notify_event',
'pick_event',
'idle_event',
'figure_enter_event',
'figure_leave_event',
'axes_enter_event',
'axes_leave_event',
'close_event'
]



In order to connect these events to kivy events, we first need to know the corresponding kivy events that can be match with the ones in the previous list:

widget_events = [

'on_touch_down',
'on_touch_up',

]

keyboard_events = [

'on_key_down',
'on_key_up',

]

window_events = [

'on_close',

]

attribute_events = [
'mouse_pos', #from Window
'size',

]



The definition for each mpl event and how it is connected to each kivy event will be explained below:

– ‘button_press_event’ :
On FigureCanvasKivy ‘on_touch_down’ will be triggered when a user touches the widget. A call to FigureCanvasBase will be performed on ‘button_press_event’ with the corresponding touch.x and touch.y arguments.

– ‘button_release_event’ :
On FigureCanvasKivy ‘on_touch_up’ will be triggered when a user release a touch. A call to FigureCanvasBase will be performed on ‘button_release_event’ with the corresponding touch.x and touch.y arguments.

– ‘draw_event’ :
Internally dispatched by matplotlib.

– ‘key_press_event’ :
On FigureCanvasKivy a keyboard will be requested and ‘on_key_down’ will be bound to an internal function ‘on_keyboard_down’. On this function a call to FigureCanvasBase will be performed on ‘key_press_event’ with the corresponding keycode argument.

– ‘key_release_event’ :
On FigureCanvasKivy a keyboard will be requested and ‘on_key_up’ will be bound to an internal function ‘on_keyboard_up’. On this function a call to FigureCanvasBase will be performed on ‘key_release_event’ with the corresponding keycode argument.

– ‘motion_notify_event’ :
I need to import Window and on FigureCanvasKivy will be bound ‘mouse_pos’ with an internal function ‘_on_mouse_move’. On this function a call to FigureCanvasBase will be performed on  ‘motion_notify_event’ with the corresponding position of the mouse in x and y coordinates.

– ‘pick_event’ :
Event dispatched by matplotlib internally called by Artist.

– ‘resize_event’ :
On FigureCanvasKivy will be bound ‘size’ with an internal function ‘_on_size_changed’. On this function a call to FigureCanvasBase will be performed on ‘resize_event’ with the corresponding arguments.

– ‘scroll_event’ :
On FigureCanvasKivy ‘on_touch_down’ will be triggered then on this event a call to FigureCanvasBase will be performed on ‘scroll_event’ with the values for touch (if touch.button is scrollup or scrolldown step would be possitive or negative accordingly).

– ‘figure_enter_event’ :
I need to import Window and on FigureCanvasKivy will be bound ‘mouse_pos’ with an internal function ‘_on_mouse_move’. On this function a call to FigureCanvasBase on ‘enter_notify_event’ will be performed (if the mouse position collides with the canvas) with the respective arguments.

– ‘figure_leave_event’ :
I need to import Window and on FigureCanvasKivy will be bound ‘mouse_pos’ with an internal function ‘_on_mouse_move’. On this function a call to FigureCanvasBase on ‘leave_notify_event’ will be performed (if the mouse position does not collides with the canvas and it was before) with the respective arguments.

– ‘axes_enter_event’ :
Event dispatched by matplotlib.

– ‘axes_leave_event’ :
Event dispatched by matplotlib.

– ‘close_event’ :
I need to import Window and on FigureCanvasKivy will be bound ‘window.on_close’ with an internal function ‘on_close’. On this function a call to FigureCanvasKivy on ‘close_event’ will be performed with the respective arguments.

This can be seen in my last commit on mpl_kivy branch –> https://github.com/andnovar/kivy/tree/mpl_kivy

Michael Mueller(Astropy)

Week 3

I spent most of this week integrating the new indexing system into the existing system of table operations, with the goal of making sure that everything works correctly before honing in on performance improvements. (I'm keeping thoughts about improvements in the back of my mind though, e.g. changing the indexing data structure to something like a red-black tree, rewriting code in Cython/C, etc.) So far the results have come along reasonably well, so that the indexing system is used in group_by() and a new function called where(), which is under construction. The goal of the where() function is to mimic the SQL "where" query, so that one can write something like this for a table full of names:

results = table.where('first_name = {0} and last_name = {0}', 'Michael', 'Mueller')

This will ultimately be sort of mini-language, so I'll have to worry about parsing. For now it only deals with a couple of the easiest cases.

I've also modified the Index structure to allow for composite indices, in which an index keeps track of keys from more than one column. The idea is to store the node keys in the underlying implementation (currently a BST) as a tuple containing values from each column, in order; for example, a key for an index on the columns "first_name" and "last_name" might be ("Michael", "Mueller"). Clearly the order of the columns is important, because ("A", "B") < ("B", "A"), so where() queries involving multiple columns will have to use a leftmost prefix of the column set for a composite index in order to be usable. For example, if an index is on ("first_name", "last_name"), then querying for "first_name" should be allowed but querying only for "last_name" should not be allowed, as otherwise the advantage of an index is no longer relevant.

Going into next week, I plan on continuing the current integration of Index into the Table system, more specifically on the where() functionality. Luckily the basic infrastructure works now, though figuring out a couple errors with pickling/memory views in the Table and Column classes were a bit annoying this week and the week prior. As of now, indices are stored in their container Columns and, conversely, each index contains a list of references (in order) to its column(s).

Gregory Hunt(statsmodels)

GLMMs, Loglikelihood and Laplacian approximations

Kerby did some of the heavy lifting helping out with the Laplacian approximation to a Gaussian integral. This forms the basis of our approximation of the log-likelihood. I wrote up some notes on what we're trying to accomplish here. Note 1

Wei Xue(Scikit-learn)

GSoC Week 3

The week 3 has a very exciting start. I finished the derivation of DPGMM, as well as the lower bound and the predictive probability for each model.

The difference between my derivation and the current document is that the current models assume a simpler approximation. The model defined in PRML is more accurate and provides more knobs. The two approximations both appear in the literature. Maybe we should do some experiments to decide which one is better.

With regards to the new names of DPGMM and VBGMM, I think these two names are not suitable, just like someone calls SVM as SMO. Actually, the models are Bayesian GMM, Dirichlet Process Bayesian GMM (DPGMM is often used) respectively. Both of them are solved by variational inference. In other words, VBGMM is not a good name. The new names, I think, should have the meaning of 'Bayesian GMM solved by VB', 'DP(B)GMM solved by VB'.

I also took a close look at the code base. The code is not maintained well. The problem I am going to address is as follows.

• decouple some large functions, such as _fit
• use abstract class and inheritance to reduce code redundancy
• numerical stability. It seems that whenever there is a numerical issue. The code always like to add EPS. I think in some place there is a better way to address the problem, such as normalization the extremely small variables earlier.
• write updating functions for BayesianGaussianMixtureModel and DirichletProcessGaussianMixtureModel
• provide methods that allow users to initialize the model before fit
• correct kmeans initialization. It is weird when using kmean initialization, only means is initialized. The weights and covariances are initialized by averaging.
• write several checking functions for the initialization data
• [optional] add a partial_fit function for incremental / out-of-core fitting of (classical) GMM, for instance http://arxiv.org/abs/0712.4273
• [optional] ledoit_wolf covariance estimation

The last days of this week I implemented the structure of new classes. _MixtureModelBase, GaussianMixtureModel, BayesianMixtureModel, DirichletProcessMixtureModel. It provides us a big picture of the classes I am going to implement. I am looking forward the feedback.

June 14, 2015

Mark Wronkiewicz(MNE-Python)

Inner workings of the Maxwell filter

C-Day plus 20

This week, I finished a first draft of the Maxwell filtering project. Remember my goal here is to implement an open-source version of this filter that uses physics to separate brain signal from environmental garbage picked up by the MEG sensors. Now comes the fun part of this project: trying to add all the small tweaks required to precisely match the proprietary Maxwell filter, which I cannot access. I’m sure this will devolve into a tedious comparison between what I’ve implemented and the black box version, so here’s to hoping the proprietary code follows the original white papers.

Most of the Maxwell filter work up until this point was focused on enabling the calculation of the multipolar moment space (comprised of the spherical harmonics), which is the foundation of this Maxwell filter. These multipolar moments are the basis set I’ve mentioned earlier that allow the brain signals to be divided into two parts: those coming from within a sphere and those originating from outside a slightly larger sphere (to see this graphically, cf. Fig 6 in Taulu, et al., 2005). In essence, representing the brain signals as a sum of multipolar moments permits the separation of brain signal from external noise sources like power lines, large moving objects, the Earth’s magnetic field, etc. My most recent code actually projects the brain signals onto this multipolar moment space (i.e., representing MEG data as a sum of these moments), and then reconstructs the signal of interest. These calculations are all standard linear algebra. From Taulu and Simola, 2006 (pg 1762):

Takeaway: The below equations show how Maxwell filtering is accomplished once the appropriate space has been calculated. We take brain signals recorded using MEG, represent them in a different space (the truncated mutlipolar moment space), and then reconstruct the MEG signal to apply the Maxwell filter and greatly reduce the presence of environmental noise.

ϕ represents MEG recordings
represents the multipolar moment space (each vector is a spherical harmonic)
x represents the ideal weight of each multipolar moment (how much of each basis vector is present)
hat represents an estimate
inout, refer to internal spaces (brain signals), and external spaces (noise), respectively
xis the inverse of x

In the ideal case, the signal we recorded can also be represented as a weighted combination of our multipolar moments:
ϕ = S * x
The S matrix contains multipolar moments but only up to a certain complexity (or degree), so it has been truncated. See my first post (end of 3rd paragraph) about why we cut out the very complex signals.

Since we can break up the multipolar moments and their weights into an internal and external space (comprised of brain signal and noise), this is equivalent to the last equation:
ϕ = [S_in, S_out] * [x_in, x_out]T

However, we're not in an ideal world, so we need to estimate these mutlipolar moment weights. x is the unknown so isolate it by taking the pseudo inverse of S to solve for an estimate of multipolar moment weights:

S_pinv * ϕ = S_pinv * S * x
S_pinv * ϕ = x_hat
x_hat = S_pinv * ϕ
or equivalently,
[x_in_hat, x_out_hat]S_pinv * ϕ

With the multipolar weight estimates in hand, we can finally reconstruct our original MEG recordings, which effectively applies the Maxwell filter. Again, since S_in, and S_out have been truncated, they only recreate signals up to a certain spatial complexity to cut out the noise.

ϕ_in_hat = S_in * x_in_hat
ϕ_out_hat S_out x_out_hat

The above ϕ matrices are a cleaner version of the brain signal we started with and the world is now a much better place to live in.

Isuru Fernando(SymPy)

GSoC 2015 : Week 3

This week, I continued my work from the previous two weeks to complete the floating point support for SymEngine.

RealDouble was the wrapper for double used in SymEngine and it used to return NaN for numerical evaluations that resulted in a complex number. Checks were added to ensure that if the result was complex, a ComplexDouble was returned. So now, asin for RealDouble looks like this

    virtual RCP<const Basic> asin(const Basic &x) const {        SYMENGINE_ASSERT(is_a<RealDouble>(x))        double d = static_cast<const RealDouble &>(x).i;        if (d <= 1.0 && d >= -1.0) {            return number(std::asin(d));        } else {            return number(std::asin(std::complex<double>(d)));        }    }
Here, when the result is a double, a RealMPFR is returned, while when it is complex, RealDouble is converted to std::complex<double>, evaluated and then a ComplexDouble is returned.

Implementing RealMPFR took more time than I thought, because of several design decisions that were needed to be taken.

Basic classes in SymEngine are immutable classes. So, when constructing a RealMPFR class, the mpfr_t value has to be passed into the RealMPFR's constructor. Which means when working with RealMPFR, we would first construct a mpfr_t, initialize it, set a value to the mpfr_t and then construct the RealMPFR. Therefore when exceptions are raised and caught, memory leaks happen, because the mpfr_t is not managed. This is also in EvalMPFR class where if we had an expression like 3 + x that we want to evaluate numerically, but when trying to evaluate x numerically, an exception is raised and a memory leak happens. Solution was to implement a managed mpfr_class that acts like mpz_class.

To avoid adding a new dependency, I implemented a simple mpfr_class that would manage the mpfr_t inside it.

class mpfr_class {private:    mpfr_t mp;public:    mpfr_ptr get_mpfr_t() { return mp; }    mpfr_srcptr get_mpfr_t() const { return mp; }
    mpfr_class(mpfr_t m) {        mpfr_init2(mp, mpfr_get_prec(m));        mpfr_set(mp, m, MPFR_RNDN);    }    mpfr_class(mpfr_prec_t prec = 53) {        mpfr_init2(mp, prec);    }mpfr_class(const mpfr_class& other) {         mpfr_init2(mp, mpfr_get_prec(other.get_mpfr_t()));         mpfr_set(mp, other.get_mpfr_t(), MPFR_RNDN);     }     mpfr_class(mpfr_class&& other) {         mp->_mpfr_d = nullptr;         mpfr_swap(mp, other.get_mpfr_t());     }     mpfr_class& operator=(const mpfr_class& other) {         mpfr_set_prec(mp, mpfr_get_prec(other.get_mpfr_t()));         mpfr_set(mp, other.get_mpfr_t(), MPFR_RNDN);         return *this;     }     mpfr_class& operator=(mpfr_class&& other) {         mpfr_swap(mp, other.get_mpfr_t());         return *this;     }     ~mpfr_class() {         if (mp->_mpfr_d != nullptr) {             mpfr_clear(mp);         }     } };

To add the move constructors and destructors, it was needed to know if a mpfr_class has been initialized or not. This was achieved by setting _mpfr_d pointer to null when move constructor is called to avoid adding a data member to mpfr_class. Thanks to +Ondřej Čertík  and +Francesco Biscani  for the ideas and comments.

Similarly an mpc_class was introduced to manage a mpc_t object. Another decision taken was to use a default rounding mode for rounding when operations on RealMPFR's and ComplexMPC's are done.

When adding two RealMPFR's of different precisions, it was decided to promote the RealMPFR with low precision to higher precision. An example of this is given below.
RCP<const Number> RealMPFR::addreal(const RealMPFR &other) const {    mpfr_class t(std::max(get_prec(), other.get_prec()));    mpfr_add(t.get_mpfr_t(), i.get_mpfr_t(), other.i.get_mpfr_t(), MPFR_RNDN);    return rcp(new RealMPFR(std::move(t)));}
To add SymEngine into Sage, I had to add CMake into Sage as well, since CMake is a dependency of SymEngine. Problem with including CMake was that it fails to build with OS X 10.10 due to a bug in a header installed in OS X 10.10 (/usr/include/dispatch/object.h). This header does not give a fault when compiling with clang but with gcc which is included in Sage this gives an error. This seemed to be a problem in building bundled curl in cmake, but realized that it was not so afterwards. After building curl and linking with it in cmake, I got the same error on OS X. I got access to OS X 10.10 only this week, but will try next week as time permits.

For next week, I am going to implement the wrappers for SymEngine for Sage. (i.e. conversions to and from Sage).

June 13, 2015

Zubin Mithra(pwntools)

SROP support for ARM merged in, MIPS pending

ARM support for SROP is merged in and you can see the corresponding PR here.

This week I've been working on adding in support for SROP on MIPS(and mipsel). It was simpler compared to ARM as there weren't any specific flag checks. If you had the offsets correct, you could simply set the registers as you wanted and set the rest to "\x00". This was my first time working with MIPS and it was sort of interesting. The syscall number is passed in the register v0 and arguments in a0, a1 ... a3. There is a "ra" register which is pretty much the same as the "lr" register on ARM.

The pull request for SROP on MIPS can be found here. You'll notice that the registers have a "JUNK" value between them and this becomes clearer when you inspect the kernel source here. The register values go inside the "sc_regs[32]" whose type is "unsigned long long", implying that the sigreturn frames for MIPS 64 and MIPS 32 are the same. This makes sense when you see that the ABI for MIPS 32 and MIPS 64 are pretty much the same(they only seem to differ in how the 5th and 6th arguments are passed to the system call; see here) which is not the case with x86 and x64.

When you're setting up the network connection with QEMU, make sure the interface that brings in your internet connection does not have an ip but the interface that acts as the bridged connection has that ip instead.

June 12, 2015

Siddhant Shrivastava(ERAS Project)

All for Docker; Docker for all!

Hi! This is going to be a short post about my developments in the Week 3 of my GSoC project. Since my last post, I have had the chance to work with some exciting state-of-the-art technologies which allow easy distribution and scalability. These are -

1. Docker
2. Tango-Controls

I used the Ubuntu 14.04 Docker Container to setup my system which can be used by anyone in the world as a common platform to test the applications that I am working on. This has multiple advantages -

• Setup-time for collaborators is null. The developer sets up the Docker container and the community members can use it directly.
• Host platform-independent. It doesn't matter whether the collaborator's host system is Arch Linux, Windows 8, or a specific version of Ubuntu. Docker uses Linux namespaces and ensures a separation of concerns.
• Revision control mechanism. The developer plays around with a Docker Image just as he/she would do with any other Distribution Revision Control system. I push my changes to the repository (Docker image) and my mentors can simply pull the updates to get the new system configuration.

So far, I have setup Tango-Controls, ROS Indigo, and the Husky libraries for my Docker image. These can be found in the Docker Registry Hub

The issues that I am currently facing are -

• Graphics Problems. X-server Bad Drawing errors. A way to get around this will be to better understand how ROS applications use the X-server and then provide Docker the appropriate graphics capabilities. But this does not impede with the Command Line applications of ROS and Tango which I have been working on.
• MySQL connection problems. The workaround currently is to use the Host OS's Tango HOST. I observed that it works fine that way.

This is it for this post. I mainly discussed about Docker in this post, which was an important thing that we discussed in the All-hands meeting on 8th June. I'll go into much more detail with Tango Controls in the upcoming blog posts and the biweekly reports.

Ciao!

Chau Dang Nguyen(Core Python)

Week 2

In the previous week, I had implemented GET prototype using the same method as xmlrpc handler. With this implementation, I will achieve better manipulation with information, in comparison with the last method.

The current challenge is making my module as clean and easy as possible, in order to upgrade it. Moreover, taking the full advantage of the roundup design is necessary.

So that's it for this week. My target is having a working REST by the end of this week.

Sumith(SymPy)

GSoC Progress - Week 2

Hello, this post contains the second report of my GSoC progress. At one point, I had changed the deadline from Sundays to Fridays, but I seem to be running a week late on the post names. Will be corrected next week.

Progress

We decided that instead of coding up the Polynomial upfront, we try to speed up the expand2b benchmark i.e. try to nail speed at a lower level, then think of the design decisions and wrap it up into Polynomial class.
* Add support for Piranha in SymEngine CMake
* Implement packing of exponents and a check function to ensure it fits, use this to fasten expand2b
* Use Piranha's integer and benchmark expand2b again

The faster hashtable was kept for later.

Report

PR 464 was merged.
Implements support for Piranha in CMake along with it's dependencies Boost and PTHREAD.
The above two dependencies come as separate CMake option as well. We feel that the Boost support can be improved, that can be done at a later stage.

PR 470 Speeding up the benchmark.
* The pack and check function were implemented
* Used std::valarray instead std::vector(inspired by issue 111) but the benchmark slowed down, hence change was not adopted
* Implemented functions poly2packed() and packed2poly(), for converting between the two representations
* Implemented function poly_mul2() for multiplying the packed polynomials
* Re-wrote expand2b to use packed, now expand2c

Very nice speedup was obtained from using the packed structure, a more detailed report can be found here.
But we are still far from Piranha and we have lots to do :)

Most of the week's time went to learning to link libraries and writing cmake files for my own projects so that I could figure what was happening in PR 464. Now I feel it was very easy a task and shouldn't have consumed the time it did.

Targets for Week 3

My aim is to get all the code and optimization, possible at this level, done by next week, so that we can start wrapping in the coming weeks.
* Use piranha::integer for coefficients, benchmark
* Implement switching between packed structure and tuple depending on whether exponents fit or not

If time permits, I want to implement a system to use int for small cofficients and switch to mpz_class when large.
We have lots to do to hit the speed that we expect.

That's all folks.
Au Revoir

Yue Liu(pwntools)

GSOC2015 Students coding Week 03

week sync 07

Last week:

• Caching Gadget object using ZODB
• Merge classify pass into gadget find pass, classify gadget when finding.
• Add thumb instructions supported.
• Try to solve the advance feature in issue #27, but not finished now.
• Fix some bugs.

• Solve armpwn using thumb gadgets.

June 11, 2015

Ziye Fan(Theano)

[GSoC 2015 Week 2]

In the week 2 I implement one optimization to the Equilibrium Optimizer. The PR is here.

In this optimization, an "final optimization" procedure is added to the equilibrium optimization. Final optimizers is a list of global optimizers, and will be applied at the end of every equilibrium optimization pass. The number of optimization pass is expected to decrease, by making right optimizers final ones.

Another change is to delete a node's function graph reference when pruning it from a function graph. So merge optimizer can easily tell whether a node belongs to a graph. It will be useful in other optimizers too.

In the next week, the next 2 optimizations in the to-do list are what I'm going to do:

* Make local_dimshuffle_list lift through many elemwise at once
* Speed up inplace elemwise optimizer

Thanks. Any feedback is welcome!

Andres Vargas Gonzalez(Kivy)

Backend for kivy in matplotlib, first steps

A backend in matplotlib is a class that creates a link between matplotlib and any frontend framework in which the data can be rendered. Matplotlib provides a template which works as an excellent starting point for anyone who would like to develop a backend. It is a class which by itself works as a backend but do nothing. The very first thing I have been doing the last three days is reading the comments on this file and analyzing how backends for other frameworks are implemented.

The first objective towards implementing a fully functional backend in kivy is to write one with a canonical renderer such as Agg. The first class I modified from the list of classes that the file provides is FigureCanvas which I renamed FigureCanvasKivyAgg. This class extends from a FigureCanvasAgg and a Kivy Widget. The reason why it extends from FigureCanvasAgg is because this class returns an Agg renderer which will contain the graphical information of the graph. Additionally, it extends from Widget because it is handy to add a widget when someone wants to embed matplotlib graphs or use it with pyplot to provide a fully functional App.

python ../../../examples/mpl/test_plt.py -dmodule://backend_kivy

In FigureCanvasKivyAgg two methods were overridden. The first one is draw which defines the context in which the renderer will be placed. From the FigureCanvasAgg I get the rgba buffer in order to place that in a texture which can be then added to a widget, as you can see in the following snippet.

texture = Texture.create(size=(w, h))
texture.blit_buffer(buffer_rgba, colorfmt='rgba', bufferfmt='ubyte')
texture.flip_vertical()
with self.canvas:
Rectangle(texture=texture, pos=self.pos, size=(w, h))



The second method is blit, in which given a bounding box defines the rectangular area of the graph to be drawn.

self.blitbox = bbox


FigureCanvasKivyAgg can be embed in any Kivy application given it is a Widget

Matplotlib graphs can be embed or run as kivy application. Pyplot is the responsible to allow the total independence from the way the information is visualized. This is an example code  in which no single line with kivy is shown. However, internally will create and run an App:

import matplotlib
#matplotlib.use('module://../../kivy/ext/mpl/backend_kivy')
import numpy as np
import matplotlib.pyplot as plt

N = 5
menMeans = (20, 35, 30, 35, 27)
menStd = (2, 3, 4, 1, 2)

ind = np.arange(N)  # the x locations for the groups
width = 0.35       # the width of the bars

figure, ax = plt.subplots()
fig1 = plt.gcf()
rects1 = ax.bar(ind, menMeans, width, color='r', yerr=menStd)

womenMeans = (25, 32, 34, 20, 25)
womenStd = (3, 5, 2, 3, 3)
rects2 = ax.bar(ind + width, womenMeans, width, color='y', yerr=womenStd)

# add some text for labels, title and axes ticks
ax.set_ylabel('Scores')
ax.set_title('Scores by group and gender')
ax.set_xticks(ind + width)
ax.set_xticklabels(('G1', 'G2', 'G3', 'G4', 'G5'))
ax.legend((rects1[0], rects2[0]), ('Men', 'Women'))

def autolabel(rects):
# attach some text labels
for rect in rects:
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width() / 2., 1.05 * height, '%d' %
int(height), ha='center', va='bottom')

autolabel(rects1)
autolabel(rects2)

plt.draw()
plt.show()



Pyplot looks for a method in the backend call new_figure_manager in this method a figure is created with the information coming from the main app, in the case above figure. Once Figure created a FigureCanvas is created and in this instantiation and application is setup. All this happens until plt.draw(), which calls the draw method in FigureCanvas. Finally to run the app with plt.show() is needed to overwrite the show function in the FigureManager and send the app to run.

For the complete file you can check the code on github. This is the branch where implemented:

https://github.com/andnovar/kivy/tree/mpl_kivy

So far

My PR 9262 finally got merged. However, we have not yet decided which algorithm to use for the trigonometric and hyperbolic functions. In most cases there are multiple options

• Expansion using closed form formula
• Expansion of tan and tanh using a form of Newton's method
• Expansion of sin, cos, sinh and cosh from tan and tanh

Newton's Method

Newton's method is rather interesting. It lets you calculate the series expansion of a function, given the series expansion of its inverse. Now, the formula for the expansion of tan is much more complicated than that of atan. So using atan's series expansion for tan makes sense. The basic idea is as follows:

Say I have a continuous and differentiable function f(x), whose inverse is g(x), i.e, f(g(x)) = x. Assume that we have an efficient implementation for g(x) and we want to use it for f(x). Let h(x) = y - g(x). On solving this equation, we'll get a root c, such that: h(c) = 0
or y = g(c)
or c = f(y)

Now, using Newton's method we have the following iteration:
x_j+1 = x_j + (y - g(x_j)) / g'(x_j)

The more iterations we do, higher the precision of our series expansion. If you are interested in the mathematics behind it, you can refer to this

As of now, ring_series has the code for plain vanilla formula based expansion too. I need to properly benchmark the different methods to conclude anything about their relative performance.

Puiseux Series

This week I am working on PR 9495, which will let us manipulate Puiseux series in ring_series. A Puiseux is a series which can have fractional exponents. By definition, a polynomial should have only positive integer exponents. So, I was hoping that using a Rational doesn't break anything in the polys module. Unfortunately, my PR is causing some tests to fail. I hope I don't need to make major changes because of it.

Once Puiseux series gets up and running, I will add the remaining functions dependent on it.

Next Week

A Symbolic Ring

Just as the type of series decides how the exponents of the series will be, the coefficients are determined by the ring over which the polynomial is defined. To be able to expand functions with arguments that have a constant term with respect to the expansion variable, the series should be allowed to have symbolic expressions as coefficients. For example, a 2nd order expansion about 0 of sin(x + y), wrt x, is sin(y) + x*cos(y). To be able to do this with ring_series, the polynomial needs to be defined over the EX or expression ring, which is Sympy's implementation of a symbolic ring. Currently ring_series works with EX ring but it can not handle a series with constant terms.

So, my major tasks for next week are:

• Get ring_series working with symbolic ring
• Implement series expansion of polylog and series reversion
• Discuss the structure of Series class and send a PR for it.

Cheers!

June 10, 2015

Brett Morris(Astropy)

Anti-Racism, Pro-Astronomy: Week 2

For background on what this is all about, check out my first post on Anti-Racism, Pro-Astronomy.

This week, I've gotten the ball rolling on the Diversity Journal Club (DJC) blog idea, which I'm calling astroDJC. Before committing to a name, I briefly considered renaming Diversity Journal Club to a new name with less contention like "inclusion" or "equity" rather than "diversity". After a brief Twitter discussion about alternatives, I decided to stick with Diversity simply because many institutions have DJCs by that name, and its goals and purpose is widely recognized. If you have strong opinions about the name and what alternatives you'd prefer, I'd love to hear them on Twitter or in the comments.

astroDJC

The first iteration of the blog is now live(!) with two posts contributed by Nell Byler (with Russell Deitrick) about the Genius Effect and the Matilda Effect. I encourage you to read these posts, first for content and then for format, and give feedback about how these posts work as a template for future submissions.

Submitting a post to astroDJC

I created a GitHub repository for suggested posts for astroDJC where anyone can contribute their resources and discussion questions for DJC presentations. For those unfamiliar with GitHub, there is a wiki page (still in development) with a tutorial on how to contribute a post to astroDJC on GitHub in the browser, without any command line nonsense.

The workflow goes something like this:

1. A contributor will take a template post file styled with markdown, and fill it with content.
2. Once they are happy with their draft post, they can submit it to astroDJC for review via a pull request, where we can collaborate on improvements and make corrections.
3. The finalized file will be merged to the repository where it will be stored, converted into HTML with pandoc, and posted to the astroDJC blog.

Why GitHub?

Using GitHub for contributed posts ensures a few things that are important to me:

• The ability to post must be open to everyone. Pull requests can be submitted by anyone, removing the need for a moderator or gatekeeper – which has been a sticking point in some social media circles lately... This way, if an undergraduate or physics graduate student wants to contribute but wouldn't have the credentials to prove that they're an astronomer (though they may be an expert on DJC issues), the content of their post is all that will matter to be considered for a submission.
• The collaborative dialogue on each post – from the moment it's submitted via pull request to the moment it's merged – is done in public, where those who are interested can contribute and those aren't can ignore it. GitHub's notifications settings are flexible and easy to use, allowing you to get as much or as little notification about each pending update as you like.
• Appropriate attribution is natural – you choose how you'd like to be referred to in the final blog post, and the history of your contribution is logged in GitHub for bragging rights/reference.
• Writing posts in markdown enables contributors to have some control over the formatting of the post without prior knowledge of HTML (though of course, this is in exchange for prior knowledge of markdown, but I think this is a preferable exchange).
If you would like to submit a post and have any difficulty, reach out to me and I'll help you and work to update the tutorial to make it more complete and intuitive.

Make a submission and gimme feedback!

I'd really like to hear what you think about the blog, the post template, and the example posts that are up. The best way to get good feedback would be to have you give it a test drive – if you've given a DJC talk, try putting it into astroDJC format and submit a pull request. Then be sure to make suggestions about how can we make this tool more effective and easy to use.

Patricia Carroll(Astropy)

Bounding Boxes & Benchmarking

Last week, I played around with modeling simple sources with both Astropy and Sherpa. Sherpa is a modelling and fitting application developed for analysis of Chandra x-ray data. The recently released Sherpa for Python package offers a very useful comparison to existing Astropy methods.

Astropy vs. Sherpa

Here I've generated a mock source with random Poisson noise and fit it with a 2D Gaussian using both Astropy and Sherpa. Both use the Levenberg-Marquardt algorithm and least squares statistic and begin with the same intial guesses.

In [5]:
% matplotlib inline
import numpy as np

import warnings
warnings.filterwarnings('ignore')
import sherpa.astro.ui as ui

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context('poster')
sns.set_style('white',{'grid':False})

In [2]:
%%%capture
%%writefile benchmarking.py

import numpy as np
import sherpa.astro.ui as ui
from astropy.modeling.models import Gaussian2D
from astropy import table
from astropy.nddata.utils import add_array

def gen_source_table(imshape, nsrc, stddev=3., mean_amp=10.):
"""
Populate a source table with randomly placed 2D gaussians
of constant width and variable amplitude.

Parameters
----------
imshape : tuple
Shape of the image.
nsrc : int
Number of sources:
stddev : float, optional
Standard deviation in pixels
mean_amp : float, optional
Mean amplitude
"""

# Buffer the edge of the image
buffer = np.ceil(stddev*10.)

data = {}
data['x_mean'] = np.around(np.random.rand(nsrc)*(imshape[1]-buffer))+buffer/2.
data['y_mean'] = np.around(np.random.rand(nsrc)*(imshape[0]-buffer))+buffer/2.
data['amplitude'] = np.abs(np.random.randn(nsrc)+mean_amp)
data['x_stddev'] = np.ones(nsrc)*stddev
data['y_stddev'] = np.ones(nsrc)*stddev
data['theta'] = np.zeros(nsrc)

return table.Table(data)

def make_gaussian_sources(image, source_table):
"""
A simplified version of ~photutils.datasets.make_gaussian_sources.
Populates an image with 2D gaussian sources.

Parameters
----------
image_shape : tuple
Shape of the image.
source_table : astropy.table.table.Table
Table of sources with model parameters.

"""

y, x = np.indices(image.shape)

for i, source in enumerate(source_table):
model = Gaussian2D(amplitude=source['amplitude'], x_mean=source['x_mean'],
y_mean=source['y_mean'],
x_stddev=source['x_stddev'],
y_stddev=source['y_stddev'], theta=source['theta'])

image += model(x, y)

return image

def make_gaussian_sources_sherpa(image, source_table):
"""
A simplified version of ~photutils.datasets.make_gaussian_sources.
Populates an image with 2D gaussian sources generaged with the Sherpa python package.

Parameters
----------
image_shape : tuple
Shape of the image.
source_table : astropy.table.table.Table
Table of sources with model parameters.

"""

ui.set_source(ui.gauss2d.g2)
ui.freeze(g2)
for i, source in enumerate(source_table):
g2.xpos=source['x_mean']+1
g2.ypos=source['y_mean']+1
g2.ellip = 0.
g2.fwhm = sigma2fwhm(source['x_stddev'])
g2.ampl=source['amplitude']
g2.theta=0.
mod = ui.get_model_image().y
image += mod

return image

def make_gaussian_sources_bb(image, source_table, width_factor=5):
"""
A simplified version of ~photutils.datasets.make_gaussian_sources.
Populates an image with 2D gaussian sources.
Uses a bounding box around each source to increase speed.

Parameters
----------
image_shape : tuple
Shape of the image.
source_table : astropy.table.table.Table
Table of sources with model parameters.
width_factor: int
Multiple of the standard deviation within which to bound the source.

"""

for i, source in enumerate(source_table):
dx,dy = np.ceil(width_factor*source['x_stddev']),np.ceil(width_factor*source['y_stddev'])
subimg = (2*dx,2*dy)
x,y = np.meshgrid(np.arange(-dx,dx)+source['x_mean'],np.arange(-dy,dy)+source['y_mean'])

model = Gaussian2D(amplitude=source['amplitude'], x_mean=source['x_mean'],
y_mean=source['y_mean'],
x_stddev=source['x_stddev'],
y_stddev=source['y_stddev'], theta=source['theta'])

image=add_array(image, model(x, y), (source['y_mean'],source['x_mean']))

return image

sigma2fwhm = lambda x: 2.*np.sqrt(2.*np.log(2.))*x

fwhm2sigma  = lambda x:x/(2.*np.sqrt(2.*np.log(2.)))

In [3]:
from astropy.io import fits
from astropy.modeling import fitting,models
import sherpa.astro.ui as ui
from photutils.datasets import make_noise_image
from benchmarking import *
import logging
logger = logging.getLogger("sherpa")
logger.setLevel(logging.ERROR)

npix=100
x,y = np.meshgrid(range(npix),range(npix))
data_model = models.Gaussian2D(amplitude=10.,x_mean=npix/2,y_mean=npix/2, \
x_stddev = 5.,y_stddev =10., theta = np.pi/4.)
data = data_model(x,y)+make_noise_image((npix,npix), type=u'poisson',mean=.5,stddev=.25)

#this doesn't work so I'm writing to fits for reading by Sherpa

hdu = fits.PrimaryHDU(data)
hdulist = fits.HDUList([hdu])
hdulist.writeto('ap_data.fits',clobber=True)

ui.set_source(ui.gauss2d.g2)
g2.xpos=npix/2+1
g2.ypos=npix/2+1
g2.ellip = .5
g2.fwhm = sigma2fwhm(10.)
g2.fwhm.min = 1
g2.fwhm.max = 50.
g2.ampl=10.
g2.ampl.min=1
g2.ampl.max=100
g2.theta=3*np.pi/4.
ui.thaw(g2)

ui.set_stat("leastsq")
ui.set_method('levmar')
logger.setLevel(logging.ERROR)
print 'Astropy:'
fit_g = fitting.LevMarLSQFitter()
amod = fit_g(data_model,x,y,data)(x,y)
t1 = %timeit -o -r 3 -n 3 fit_g(data_model,x,y,data)
print '%i iterations' % fit_g.fit_info['nfev']
print '%.2f ms per model evaluation' % (t1.best/fit_g.fit_info['nfev']*1000.)

print '\n'
print 'Sherpa:'
t1 = %timeit -o -r 3 -n 3 ui.fit()
ui.fit()
smod = ui.get_model_image().y
f=ui.get_fit_results()
print '%i iterations' % f.nfev
print '%.2f ms per model evaluation' % (t1.best/f.nfev*1000.)

plt.figure(figsize=(15,9))
titles='Data', 'Astropy Fit','Sherpa Fit','Astropy Residual','Sherpa Residual','Astropy - Sherpa'
for i,im in enumerate([data,amod,smod,data-amod,data-smod,amod-smod]):
plt.subplot(2,3,i+1)
plt.imshow(im)
cbar = plt.colorbar()
plt.setp(plt.getp(cbar.ax.axes, 'yticklabels'), color='w')
plt.setp(plt.getp(cbar.ax.axes, 'yticklabels'), color='w')
plt.xticks([])
plt.yticks([])
title=plt.title(titles[i])
plt.setp(title, color='w')

Astropy:
3 loops, best of 3: 30.1 ms per loop
7 iterations
4.31 ms per model evaluation

Sherpa:
3 loops, best of 3: 23.2 ms per loop
8 iterations
2.90 ms per model evaluation


While Sherpa performs more iterations (likely due to a lower error tolerance threshold), there's no contest. Sherpa wins. So what makes it faster? I'm not sure yet. It's worth finding out but for now I want to implement a very simple improvement to speed up Astropy.

Bounding Boxes

When you have a large image of the sky containing many discrete sources (stars and galaxies) with lots of space inbetween, it makes little sense to evaluate each source model across the entire image. In the case of our 2D gaussian, 99.9999% of the flux is contained with a 5-sigma radius.

What I've done is to simply evaluate each source only within these limits. Here I compare Sherpa, Astropy, and Astropy with bb's by timing how long it takes to model 10 sources as a function of image size.

In [7]:
N_sources = 10
width_factor=5
im_sides = np.arange(50,550,50)
t1all,t2all,t3all=[],[],[]
for i in im_sides:
image = np.zeros((i,i), dtype=np.float64)
hdu = fits.PrimaryHDU(image)
hdulist = fits.HDUList([hdu])
hdulist.writeto('image.fits',clobber=True)
source_table = gen_source_table((i,i),N_sources,stddev=1)
t=%timeit -r 30 -n 1 -o -q make_gaussian_sources_bb(image, source_table,\
width_factor=width_factor)
t1all.append(t)
t=%timeit -r 30 -n 1 -o -q make_gaussian_sources(image, source_table)
t2all.append(t)
t=%timeit -r 30 -n 1 -o -q make_gaussian_sources_sherpa(image, source_table)
t3all.append(t)

In [8]:
mt=np.vstack([[0.]*len(im_sides)]*3)
et=np.vstack([[0.]*len(im_sides)]*3)

for j,tall in enumerate([t1all,t2all,t3all]):
for i in range(len(im_sides)):
mt[j][i]=np.mean(tall[i].all_runs)
et[j][i]=np.std(tall[i].all_runs)

t1,t2,t3 = mt*1000.
e_t1,e_t2,e_t3=et*1000.

In [11]:
plt.figure(figsize=(12,8))
with plt.style.context((['dark_background'])):
plt.errorbar(im_sides[:-1],t3[:-1], e_t3[:-1],fmt='r.-',label = 'Sherpa',lw=3,alpha=1)
plt.errorbar(im_sides[:-1],t2[:-1], e_t2[:-1],fmt='y.-',label = 'Astropy',lw=3,alpha=1)
plt.errorbar(im_sides[:-1],t1[:-1],e_t1[:-1],fmt='c.-',label = 'Astropy-BB',lw=3,alpha=1)
plt.legend(frameon=False, loc='left')
plt.xlabel('Image Pixels/Side')
plt.ylabel('Average timing (ms)')
xl=plt.xlim(0,550)


Sherpa clearly excels over Astropy for any image size up to about 100,000 pixels. At that point, the bounding boxes really start to shine. This limit will of course differ depending on the model used, pixel scale, and source density; but given the same set of sources and pixel scale, the boxes don't change and so the time is independent of the total image size. The more sparsely popuated your image is, the bigger improvement you will see with bounding boxes.

In [ ]:



June 09, 2015

Shridhar Mishra(ERAS Project)

Coding in full swing.

Ok, so all the installations are over after a bit of a hassle while installing EUROPA-pso, except that all the installations like Pytango and pyEUROPA went well.

Since i had to face a lot of problem installing EUROPA on a 64 bit ubuntu 14.10 machine, i have decided to write stepwise procedure of installing it so that if required it could be done again.

These steps has to be followed in specific order for successful installation or its almost inevitable to get some weird java errors.

Prerequisites.

• JDK-- sudo apt-get install openjdk-7-jdk
• ANT-- sudo apt-get install ant
• Python -- sudo apt-get install python
• subversion-- sudo apt-get install subversion
• wget -- sudo apt-get install wget
• SWIG sudo apt-get install swig
• libantlr3c
• unzip sudo apt-get install unzip

Now let us get the necessary packages to install libantlr3c.

svn co http://europa-pso.googlecode.com/svn/ThirdParty/trunk plasma.ThirdParty

Get Europa.

cd ~/plasma.ThirdParty

Install ANTLR-C
First, unzip libantlr3c-3.1.3.tar.bz2.

cd plasma.ThirdParty/libantlr3c-3.1.3> ./configure --enable-64bit ; make> sudo make install

The above commands are for 64 bit machines.
for 32 bit machines remove --enable-64bit flag.

Installing EUROPA.
mkdir ~/europacd ~/europaunzip ~/tmp/europa-2.1.2-linux.zip export EUROPA_HOME=~/europaexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$EUROPA_HOME/lib

Add the following lines to ~/.bashrc at the end.
EUROPA_HOME=~/europaLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$EUROPA_HOME/lib

Testing.

$EUROPA_HOME/bin/makeproject Light ~ cp$EUROPA_HOME/examples/Light/*.nddl ~/Light
cp $EUROPA_HOME/examples/Light/*.bsh ~/Light If the install was successful. cd ~/Light ant The Gui should appear for EUROPA. If all the steps a correctly followed it should work. Links. ANTLR-C installation Europa Installation. Quick start Apart from this i have been able to successfully run the Rover example from europa which is to be modified according to the further needs of the Italian mars society. Andrzej Grymkowski(Kivy) More about Plyer What is it? It provides API that in easy way, in many platforms, features like desktop notifications, dialogs or taking picture can be executed in same way by one class. Implementation reaches platforms like OS X, Windows, Linux and even mobile: Android and iOs. When you need plyer? For multi platform app. For example code below: >> from plyer import battery >> battery.status >> {"isCharging": True, "percentage": 0.5} works on all of platforms listed above and give you the same result. Installation For Linux just type in console pip install plyer If you haven't pip installed please look at: how to install pip. For android I recommend buildozer. It's wrapper for package python-for-android. Most of examples uses buildozer.. For OSX you will need python in objective-c bridge called pyobjus. For iOs are needed pyobjus and kivy-ios. Examples Plyer, for demonstrate implemented features, uses framework kivy. Kivy its a framework for making applications. Provides window with widgets similar to android or ios widgets. In order to run examples install kivy. How to install Kivy. To run example go to folder examples and pick wanted to test feature fe. battery. In that folder ( here battery ) open terminal and for • desktop type: python main.py • android connect device to your pc and type in console: buildozer debug deploy run buildozer will compile ( debug param ) kivy, python, plyer and many other libraries and will build android app. Compilation takes about 5 minutes before app will be transfered ( parameter deploy ) to your device and run : ) Wish many successes on testing examples. Best regards. Artem Sobolev(Scikit-learn) Week 1 + 2: Takeoff The first two weeks have ended, and it's time for a weekly (ahem) report. Basic implementation outlined in the previous post was rewritten almost from scratch. Now there are 2 implementations of cost function calculation: a fully vectorized (that doesn't scale, but should work fast) and a semi-vectorized (that loops through training samples, but all other operations are vectorized). Meanwhile I work on a large scale version. More on that below. Also, I wrote a simple benchmark that shows improved accuracy of 1NN with the learned distance, and compares 2 implementations. There are several issues to solve. The first and the major one is scalability. It takes$O(N^2 M^2)$time to compute NCA's gradient, which is waaay too much even for medium-size datasets. Some ideas I have in mind: 1. Stochastic Gradient Descent. NCA's loss is a sum of each sample's contribution, so we do stochastic optimization on it reducing computational complexity down to$O(w N M^2)$where$w$is a number of iterations. 2. There's a paper Fast NCA. I briefly skimmed through the paper, but my concern is that they look for$K$nearest neighbors, which takes them$O(K N^2)$time — don't look like quite an improvement (Though it's certainly is if you want to project some high-dimensional data to a lower dimensional space). Another thing to do which is not an issue, but still needs to be done is choosing an optimization algorithm. For now there're 3 methods: gradient descent, gradient descent with AdaGrad and scipy's scipy.optimize.minimize. I don't think it's a good idea to overwhelm a user by the variety of settings with no particular difference in the outcome, so we should get rid of features that are known to be useless. Also, unit tests and documentation are planned, as well. Saket Choudhary(statsmodels) Week 2 Update This week I brushed up a little bit of theory on heteroscedasticity for linear mixed models. As per my timeline, I have another week to wrap it up. On a slight digression, I moved Josef's compare_lr_test method(with minor changes) to allow likelihood ratio test for fixed & random effects. A note book comparing R's equivalent is here: http://nbviewer.ipython.org/github/saketkc/statsmodels/blob/kerby_mixedlm_notebooks/examples/notebooks/MixedLM_compare_lr_test.ipynb The p-values provided by R's 'anova' and statsmodels' compare_lr_test seem to mostly agree. The problem however rises, since the ML estimates in statsmodels do not seem to converge(after playing around with tolerances and max iterations too, which will require another look) Pull request reflecting the minor change is more of a WIP here: https://github.com/statsmodels/statsmodels/pull/2440 Suggestions and criticism are always welcome. Week 3 goals: - Finish up heteroscedasticity support - Add some solid unit tests for compare_lr_test Prakhar Joshi(Plone) New Releases at plone , releases my sweat !! Hello everyone!!, in the blog I will share my experience that how things become terrifying if the new version of some product is released in plone. The main problem occurs when we have not pin up the products to the specific versions in the buildouts. Wow!! its a lot much at one shot. Don't worry we will understand each and everything. Lets start.. Plone uses buildouts to make a structure of its code. There are builduts.cfg (Configuration files) to set up plone projects. What is buildout ? Buildout is a Python-based build system for creating, assembling and deploying applications from multiple parts, some of which may be non-Python-based. It lets you create a buildout configuration and reproduce the same software later. So plone uses these buildouts for setups. I have also configured the buildout for my project and things were going great until there is new release for plone 5.0.dev from plone 4.3.6(the previous plone version). Here is the snippet of buildout.cfg :- What happens when a new version is released ? When a new version is released, then the buildout tries to extract the code from the latest version of the various products until we have pinned the product to a particular version. Here is the snippet of how to pin the particular product to specific version, I used version.cfg that purpose that have been extended from the base.cfg as you can see in the "[extends]" section of the above snap. Here is the snippet for the versions.cfg :- Here we can see that I have specified the version of the products, there are other products also but as we have not pinned them so while we run "./bin/buildout" it will install the latest version of those products. What is the reason to pin the products ? Lets download the latest version of that product ? Yeah, its good to keep the latest version of the code, but sometimes there are things that are dependent on the previous versions of the code, like in my case the products CMFPlone's latest version has been released but I have been working on the Plone 4.3.6 so this causes the failure for travis. Actually I have pinned the CMFPlone product to 4.3.6 but there was another product named "plone.app.widget" which has not been pinned in the buildouts and that product "p.a.widget" calls the CMFPlone, so as I have not pinned that product so it always call the latest version of CMFPlone but we need the CMFPlone version 4.3.6 so it creates the test failure for me. Here is the snippet for the travis failure :- So the main problem was that how to resolve that issue, as I have easily said that there was an error is plone.app.widget as I have not pinned it but in the error log we can see that there is no mention of plone.app.widget as it directly says that "there is a version conflict for CMFPlone", when we see our buildouts we can see that CMFPlone has been pinned to 4.3.6 so if that product is called directly then it should be installed as version 4.3.6 and not as 5.0.dev, but it has been installed as a latest version, so it creates a lot of problem for me to detect the problem and to solve that issue. How to detect where the problem is ? which product to pin ? There were two ways either start pinning each product one by one and that will solve our issue, but that is really a terrible and redundant solution, we rather need to find the specific products that have been creating the problem. So with the help of jensens (irc name) , he suggested me the "grep" method. With the help of grep method I tried to find that which products are trying to call the CMFPlone of latest version and I have found that plone.app.widget is the product that has not been pinned yet so buildout has been installing its latest version and it has been calling the latest version of the CMFPlone, which is CMFPlone 5.0.dev. So finally all this I got the solution of my problem and as I have pinned plone.app.widget in the vesions.cfg , it works and finally travis passed. Here is the snippet :- People on irc really helped me on solving that issue, learn a lot from that issue. Thanks for reading that, hope you enjoy reading that!! Cheers!! Happy Coding :) Sudhanshu Mishra(SymPy) GSoC'15: Week two Second week of GSoC is over. I learned a lot this week about the assumptions system. Goal of this week was to finish the documentation PR which I started few days back. I think it's complete now and ready for the final review. We also merged a very old and crucial PR for the new assumptions started by Aaron. Now we really need to improve performance of satask. This week I'll start working on adding assumptions on Symbols to global assumptions context. That's all for now. Cheers! Mark Wronkiewicz(MNE-Python) Progress and Paris C-Day plus 13 · I’ve spent the past couple of weeks have getting my hands dirty with the underlying physics equations behind solving for forward solutions in source imaging (mostly with Mosher et al., 1999 and Hämäläinen &Sarvas, 1989). The forward solution is a matrix that relates each point on the cortex to the MEEG sensors. Once you have the forward solution, you can use some pretty fancy mathematics (including Tikinov regularization) to find a pseudoinverse for this matrix – called the inverse solution. Then you’re able to relate the sensor measurements with estimates of what areas of the brain are active, which is the fundamental motivation for source imaging. · One component of my project is focused on modifying this forward solution, so learning something about the way it was originally formulated currently has been a useful endeavor. I also found out from the algorithm’s creator that no material exists to help bridge the gap between the idealized and published equations and the optimized and cryptic code. Therefore, I’ve added quite a few comments and docstring improvements to make this easier for the next programmer wrestling with these equations. Later in the summer, I’m hoping to use this knowledge to find the relationship between the cortical surface and the multipolar moment space I started describing in my last post (see the discussion on SSS and spherical harmonics). This should provide a number of benefits that I’ll discuss when I pick this portion of the work back up. · For now, I’m going to try to make some headway on the first aim of my project: Maxwell filtering. Again, the Maxwell filter implemented in SSS is just an elegant way to exclude noise from MEEG recordings using physics (see my earlier post about steam bowls and floating frogs for more description or Taulu 2005 for one of the SSS papers). · Last thing: the heads of the MNE-Python project have generously offered to fly me to Paris for a weeklong coding sprint in July! I’m pretty excited to finally meet all of the MNE-Python crew and learn more about how the Europeans view science and research. Aniruddh Kanojia(Qtile) GSOC Update - Week 1 Hi, This week we tried using the python built-in tool to pickle the state of various layouts.However most of the qtile classes are not pickelable, because of some recursive links.We therefore had to make some changes to the classes to make them pickelabe.Instead of doing a major redesign of qtile code architecture , we decided to implement the __getstate__() and __setstate__() functions for the classes with problems.This was implemented for almost all layouts and seems to be working fine for them.However some layouts are still not working. This is all for this week. Cheers, Aniruddh Kanojia Keerthan Jaic(MyHDL) GSOC 2015 with MyHDL MyHDL is a Python library for hardware description and verification. The goal of the MyHDL project is to empower hardware designers with the elegance and simplicity of Python. Designers can use the full power of Python and elegantly model and simulate hardware. Additionally, MyHDL designs which satisfy certain restrictions can be converted to Verilog or VHDL. This feature can be used to integrate MyHDL designs with conventional EDA flows. I started exploring alternative hardware description languages while working on a research project which involved designing an FPGA based network intrusion detection system (NIDS). During the initial stages of the project, we were using System Verilog for designing hardware and writing tests. However, seemingly simple tasks such as generating network packets for testing felt cumbersome. On the other hand, Python is a concise and dynamic language, and a great choice for creating quick prototypes. I decided to try MyHDL because I was already comfortable with Python. MyHDL greatly simplified the process of generating test data and validating results since I was able to use existing python modules in simulation code. MyHDL also enabled me to rapidly iterate on both the hardware and software components of the NIDS. Over the course of the project, I got involved in MyHDL’s development and started contributing code. Most notably, I implemented interface conversion support and helped make MyHDL compatible with Python 3. This summer, I have the opportunity to spend a considerable amount of time working on MyHDL since my proposal to the Python Software Foundation has been accepted for Google Summer of Code 2015! My agenda for the first few weeks is to clean up the code base and the test suite before I focus on the business logic. MyHDL was first released in 2003. Over the years, it has gathered lots of duplicated and dead code. I think that refactoring the code will make it easier for me and other contributors to extend MyHDL’s functionality. After the initial refactoring, I’m going to simplify the core modules. MyHDL relies heavily on parsing the abstract syntax tree (AST) of various code objects. AST parsing code is hard to debug, and sometimes causes incomprehensible errors. I plan to explore various ways to reduce MyHDL’s reliance on AST parsing. My eventual goal is to increase the robustness of MyHDL’s conversion modules and improve MyHDL’s maintainability. I’m currently working on squashing interface conversion bugs and finishing documentation for a stable release before I start making big changes to the code. I’ll be writing periodically with status updates and technical details. Thanks for reading! June 08, 2015 Aman Singh(Scikit-image) Scipy.ndimage module structure and Initial plan for rewriting Originally posted on Aman Singh: Image processing functions are generally thought to operate over two-dimensional arrays of value. There are however a number we need to operate over images with more than wo dimensions. The scipy.ndimage module is an excellent collection of a number of general image processing functions which are designed to operate over arrays with arbitrary dimensions. This module is an extension of Python library written in C using Python – C API to ameliorate its speed. The whole module can be broadly divided into 3 categories:- • Files containing wrapper functions:- This includes the nd_image.h and nd_image.c files. ndimage.c file mainly contains functions required for extension of module in c viz. All the wrapper functions along with other module initialization function and method table. • Files containing basic constructs:- These are in the files ni_support.c and ni_support.h. These constructs include a mixture of some containers, macros and various functions. These constructs are like arteries of… View original 208 more words Christof Angermueller(Theano) GSoC week two Theano is becoming more colourful! Last week, I • improved the graph layout • revised colors and shapes of nodes • improved the visualization of edges and mouseover events • scaled the visualization to the full page size You can find two examples here! The post GSoC week two appeared first on Christof Angermueller. Goran Cetusic(GNS3) GNS3 architecture and writing a new VM implementation Last time I wrote a post I talked about what GNS3 does and how Docker fits into this. What I failed to mention and some you already familiar with GNS3 may know, the GNS3 software suite actually comes in two relatively separate parts: 1. https://github.com/GNS3/gns3-server 2. https://github.com/GNS3/gns3-gui The GUI is a Qt-based management interface that sends HTTP requests to specific endpoints to the server defined in one of its files. These endpoints normally handle the basic stuff you would expect for VM instance to do: start/stop/suspend/restart/delete. For example, sending a POST request to /projects/{project_id}/virtualbox/vms creates a new Virtualbox instance handled in virtualbox_handler.py. You might run into some trouble getting the GUI to run, especially if you're using the latest development code like me because with the latest development version Qt4 was replaced with Qt5 and a lot of Linux distributions out there don't yet have Qt5 in their repositories. The installation instructions only deal with Qt4 and Ubuntu so it's up to you to trudge through numerous compile and requirement errors. Generally, every virtualization technology (Dynamips, VirtualBox, QEMU) has the request handler, the VM manager responsible for managing available VM instances and the VM handler that knows how to start/stop/suspend/restart/delete. Going back to the request handler, if we wanted to start a previously created VM instance, sending a POST request to /projects/{project_id}/docker/images/{id}/start would do it. Once the request gets routed to a specific method, it usually fetches the singleton manager object responsible for that particular VM technology like VirtualBox or Docker that can fetch the Python object representing the VM instance based on the ID in the request. This VM instance object has the methods that can do various things with the instance, but are specific for VirtualBox or Docker or Qemu. Here are some important files that the current Docker implementation uses but there are equivalent files for other kinds of virtual devices: SERVER • handlers/docker_handler.py - HTTP request handlers calling the Docker manager • modules/docker/__init__.py - Manager class that knows how to fetch and list Docker containers • modules/docker/docker_vm.py - Docker VM class whose methods manipulate specific containers • modules/docker/docker_error.py - Error class that mostly just overrides a base error class • schemas/docker.py - request schemas determining allowed and required arguments in requests GUI • modules/docker/dialogs - folder containing code for GUI Qt dialogs • docker_vm_wizard.py - Wizard to create a new VM type, configure it and save it as a template from which the VM instances will be instantiated. Concretely, for Docker you choose from a list of available images from which a container will be created but this really depends on what you're using for virtualization. • modules/docker/ui - folder with Qt specifications that generate Python files that are then used to define how the GUI reacts to user interactions • modules/docker/pages - GUI pages and interactions are defined here by manipulating the previously generated Python Qt class objects • modules/docker/__init__.py - classes that handles the Docker specific functionality of the GUI like loading and saving of settings • modules/docker/docker_vm.py - this part does the actual server requests and handles the responses • modules/docker/settings.py - general Docker and also container specific settings This seems like quite a complicated setup but the important thing to remember is that if you want to add your own virtualization technology you have to make equivalent files to those above in a new folder. My advice is to copy the files of an already existing similar VM technology and go from there. All of the classes inherit from base classes that require some methods to exist, otherwise it will fail spectacularly. As is true with every object oriented language, you should try to leave most of the work to the base classes by overriding the required methods but if the methods they use make no sense, write custom code that circumvents their usage completely. A lot of the code used in one technology may seem useless and redundant in another. For example, VirtualBox has a lot of boilerplate code that manages the location of its vboxmanage command that's completely useless in Docker which uses docker-py to handle all container related actions. The core of GNS3 is written with modularity in mind but with all the (very) different virtualization technologies it supports you're bound to do a hack here or there if you don't want to completely retumble the rest of the code a couple of times. Cheers until next time when I'll be talking about how to connect vastly different VMs via links. Manuel Jacob(PyPy) Progress Report - Week 1 & 2 GSoC Project Overview Since this is my first blog post here, I'll describe shortly what my GSoC project is about. The goal is to bring forward PyPy's Python 3.x support. As this is a large project, it can't be finished this summer. However, here is a rough schedule: 1. Release PyPy3 2.6.0 (notably for CFFI 1.1 support, around June 12th) 2. Finish Python 3.3 support (scheduled for release around July 3th) 3. Work on Python 3.4 support (open project) Current Status In the first week I did a lot of merging. Merging is necessary to bring the latest features and optimizations from the default branch, which implements Python 2.7, to the py3k branch, which implements Python 3.2. The previous merge was done on February 25th, so this created a lot of merging conflicts. In the second week I spent most time fixing tests (all tests in the py3k branch pass now) and bugs reported by users in PyPy's issue tracker. Next week I will fix more issues from the bug tracker and release PyPy3 2.6.0. Siddharth Bhat(VisPy) Blog entry of week 1 and 2 I've been hard at work getting stuff done in Vispy. For the entire article, visit here . g Mridul Seth(NetworkX) GSoC 2015 – Python Software Foundation – NetworkX – Biweekly report 1 NetworkX is preparing for a new release v1.10 and as discussed we are planning to deprecate *iter functions of the base classes of Di/Multi/Graphs. They are now deprecated (https://github.com/networkx/networkx/pull/1547) I started working on the first part of my project, removing *iter functions. Till now I have worked on the following functions: • edges_iter for Di/Multi/Graphs • out_edges_iter and in_edges_iter for Multi/Digraphs • neighbors_iter for Di/Multi/Graphs • predecessors_iter and successors_iter for Multi/DiGraphs The progress can be seen in this pull request : (https://github.com/networkx/networkx/pull/1546) I will also soon turn on a wiki page for further discussion and planning of various issues regarding the API and investigating it further. Michael Mueller(Astropy) Week 2 This week, I worked on the backbone of the indexing system I'll be working on--I have an Index class that will be compatible with a number of data structure implementations, and for now a binary search tree implementation. See https://github.com/mdmueller/astropy/blob/table-indexing/astropy/table/index.py and the implementation, https://github.com/mdmueller/astropy/blob/table-indexing/astropy/table/bst.py. I also have tests to make sure that the BST structure works as expected. (Nodes of the BST are of the form (column value, row number) for now, although this will probably change to something like ((column 1 value, ..., column n value), [row 1, ..., row m]) later to deal with composite indices and repeated column values across multiple rows.) Next week I'll be integrating the new Index class into the existing functionality of the Table class, as well as implementing composite indices (indices on multiple columns) as well as a new where() function to find rows by column values. Abraham de Jesus Escalante Avalos(SciPy) My motivation and how I got started Hello all, It's been a busy couple of weeks. The GSoC has officially begun and I've been coding away but before I go heavy into details, I think I should give a brief introduction on how I found SciPy and my motivations as well as the reasons why I think I got selected. The first thing to know is that this is my first time contributing to OpenSource. I had been wanting to get into it for quite a while but I just didn't know where to start. I thought the GSoC was the perfect opportunity. I would have a list of interesting organisations with many sorts of projects and an outline of the requirements to be selected which I could use as a roadmap for my integration with the OpenSource community. Being selected provided an extra motivation and having deadlines was perfect to make sure I stuck to it. I started searching for a project that was novice friendly, preferably in python because I'm good at it and I enjoy using it but of course, the project had to be interesting. Long story short, I found in SciPy a healthy and welcoming community so I decided this might be the perfect fit for me. The first thing I did was try find an easy-fix issue to get the ball rolling by making my first contribution and allowing one thing to lead to another, which is exactly what happened; before I knew it I was getting familiarised with the code, getting involved in discussions and exchanging ideas with some of the most active members of the SciPy community. In short, what I'm trying to say is: find your motivation, then find something that suits that motivation and get involved, do your homework and start contributing. Become active in the community and things will follow. Even if you don't make it into the GSoC, joining a community is a great learning opportunity. Cheers, Abraham. Siddhant Shrivastava(ERAS Project) The Docker Chronicles June 07, 2015 Chad Fulton(Statsmodels) State space diagnostics State space diagnostics¶ It is important to run post-estimation diagnostics on all types of models. In state space models, if the model is correctly specified, the standardized one-step ahead forecast errors should be independent and identically Normally distributed. Thus, one way to assess whether or not the model adequately describes the data is to compute the standardized residuals and apply diagnostic tests to check that they meet these distributional assumptions. Although there are many available tests, Durbin and Koopman (2012) and Harvey (1990) suggest three basic tests as a starting point: These have been added to Statsmodels in this pull request (2431), and their results are added as an additional table at the bottom of the summary output (see the table below for an example). Furthermore, graphical tools can be useful in assessing these assumptions. Durbin and Koopman (2012) suggest the following four plots as a starting point: 1. A time-series plot of the standardized residuals themselves 2. A histogram and kernel-density of the standardized residuals, with a reference plot of the Normal(0,1) density 3. A Q-Q plot against Normal quantiles 4. A correlogram To that end, I have also added a plot_diagnostics method which creates those following four plots. Vipul Sharma(MoinMoin) GSoC 2015: Coding Period (25th May - 7th June) The coding period started 2 weeks ago, from 25th May. In these two weeks I worked according to the timeline which I created during the community bonding period. I worked on implementing Ajax based searching of duplicate tickets feature. Initially, I thought that it has to be created using Whoosh and JQuery from scratch but it turned out that something similar was implemented in /+search view. The /+search view has an Ajax based search form which displays content suggestions of various contenttypes, name term suggestions and content term suggestions along with informations like revision id, size, date of creation and file type. So, I used the code of /+search view for implementing duplicate ticket search. I made some changes in the existing code to allow it to search tickets and added few lines of code in ajaxsearch.html template to render duplicate ticket suggestions as the /+search view displayed some results which were not necessary as suggestions for duplicate tickets. Just a few lines of CSS were only required to keep the rendered result tidy. Obviously, I was not able to code it in one go as I don't have much experience in working on large codebase. But my mentors were very helpful. They guided me, reviewed my code and gave me suggestion on how to reduce few redundant code segments. Their advice was really helpful as I reduced a lot of redundant code and now it looks pretty. Codereview: https://codereview.appspot.com/236490043 This is how it looks: I am currently working on file upload feature so that a user can upload any patch file, media or screenshot. I've implemented it by creating a new item for every file uploaded. I've few issues regarding how to deal with item_name and itemids which I am discussing with my mentor and I hope that I'll figure it out very soon :) Sartaj Singh(SymPy) GSoC: Update Week-2 Time flies fast. Two weeks have already passed since I started working on the project. Here's what I have been upto. Week-2: I was hoping #9435 will get merged this week, but that didn't really went my way. PR went through another round of review and I got plenty of good suggestions to work with. It still needs a little more love, to get merged. Also started my work on series based classes. I worked on SeriesBase class. It will be the base class for all the other type of series. It defines a common interface that other classes should follow. A long time ago(before GSoC), I wrote some code for computing Fourier series #9050 that never got completed. So, I gave a try to port it to the new sequences based system. It's still not complete and needs some more work. The biggest challenge is speed, it's slow because integration is slow. Speeding up integration can be whole new project on it's own. So, I am not going to touch integration for now, but am still trying to speed it up in the best way I can. On a side note, I had been working on Fraction Part or frac(x) (#9342) for quite some time now (almost a month). Today it also got merged into the codebase. A big thanks to @jksuom for patiently working with me. So, overall it's been an average GSoC week with SymPy. Tasks Week-3: • Polish #9435 and get it merged. • Complete SeriesBase, FourierSeries • Start with FormalPowerSeries That's all for now. Catch you later. Isuru Fernando(SymPy) GSoC 2015 : Weeks 1 & 2 These two weeks, I worked on finishing up the floating point support for SymEngine. First ComplexDouble class was introduced to keep a std::complex<double>. With automatic simplification interface completed, implementing this class did not touch any of the other classes except for RealDouble which used the ComplexDouble classes when an operation on RealDouble resulted in a complex. SymEngine had eval_arb and eval_double for numerical evaluations, but eval_double is double precision only and since arb is not a standard package with sage, SymEngine needed to use another library already in Sage for that. eval_mpfr method was introduced to use MPFR to get arbitrary precision numerical approximations for symbolic expressions. MPFR library is a C library for multiple-precision floating-point computations with correct rounding. MPFR is also a dependency of arb and therefore SymEngine already had support for linking against it in cmake. See PR. Next step is to interface a library for arbitrary precision complex floating point computations and MPC which is a Sage standard package is used. MPC extends the functionality of MPFR to complex numbers. eval_mpc was written to evaluate the expressions using MPC. This weekend, I hope to finish eval_mpc, add a RealMPFR and a ComplexMPC classes to store mpfr_t and mpc_t respectively and then write a eval function that would combine all 4 methods, eval_double, eval_complex_double, eval_mpfr, eval_mpc and finally wrap it into python so that a user can call it with expression.n(number_of_bits) 3rd and 4th week, I am going to concentrate on getting the wrappers for Sage done. CSymPy (Python wrappers of SymEngine) to Sage and Sage to CSymPy conversion support is going to be added to sage via Cython. This PR has some work on this for CSymPy to Sage conversions, but then it was decided to implement this in the sage symbolic directory to avoid a cyclic dependency. Jaakko Leppäkanga(MNE-Python) First two weeks Time flies... It's been two weeks already and I've got up to speed with the coding despite the slow start. I was occupied with other stuff for the first couple of days of coding and had to work extra hours over the weekends to catch up. The epoch viewer is nearly done and you can expect a merge early next week. The biggest difficulties I faced concern compatibility issues with OSX, which are really hard to solve, since I don't have a mac at my disposal. I also faced some problems with GIT, but that's nothing new. Once we get the compatibility issues with OSX sorted out, I can start implementing butterfly plotter for the epochs. I have a pretty good picture of how to implement it and I think it'll be ready pretty soon. Overall, I feel pretty satisfied with the plotter thus far. Here's a link to the pull request: https://github.com/mne-tools/mne-python/pull/2154#issuecomment-107392548 Zubin Mithra(pwntools) OABI and EABI - notes I thought I'd document whatever I learnt last week about the OABI and the EABI ARM system call application binary interfaces. As you probably know, the ABI describes, amongst other things, how the system call number and arguments are passed in to the kernel. First, a hardware(read: floating-point-related) perspective. There is quite a bit on this out there so I'll make this short and sweet. ARM processors don't really have opcodes and instructions to manipulate and work with floating point numbers in their core instruction set. Currently, if you need to work with floating point numbers there are 2 options : Either you have an ARM machine that comes with a coprocessor(number 10) that supports floating point instructions and registers(ARM coprocessors introduce a certain set of new opcodes and registers for added functionality); Or you can use software emulated floating point instructions. Now that we have that out the way, lets look at OABI vs EABI. 1. OABI came out first. It assumed that your underlying machine supports floating point instructions and generates code to run on it. Now if your machine did not have support for floating point instructions, an exception is generated and the operation will be emulated in the kernel(section 2 here for more). The kernel support for this is termed NWFPE. Using NWFPE means you have context switch into the kernel for each floating point instruction. NWFPE is no longer present in the Linux ARM kernel. 2. EABI to the rescue. You pass "-mfloat-abi=soft" to gcc and the compiler converts floating point operations to library calls that implement these operations in userspace. Newer ARM processors also come with coprocessors such as VFP(section 3.1)(-mfpu=softfp/hard) and NEON(section 3.2)(-mfpu-neon). From an exploitation perspective(and this is where its related to my GSoC work), OABI and EABI primarily differ in how system call numbers are passed into the kernel. • If you look here you can see that system call numbers have a __NR_SYSCALL_BASE specified. System call numbers are defined relative to __NR_SYSCALL_BASE. For eg: #define __NR_sigreturn (__NR_SYSCALL_BASE+119) If the kernel is compiled for either thumb or with EABI support __NR_SYSCALL_BASE is set to 0. Else, it is set to 0x900000. What this means is that if you are writing a srop exploit for a raspberry pi, you can set r7 to 0x77. If you're trying out on a kernel compiled with OABI, the exploit on qemu(user emulation) mode, there is a chance that your syscall number needs to be 0x900077. You can see that the common entry point for syscalls here where it extracts the system call number from swi/svc or r7. Rupak Kumar Das(SunPy) Two weeks in Time flies quickly! It has been nearly two weeks since the start of the coding period so let me give a small report on my progress. The first week was spent in trying to figure out how to create the Slit plugin for Ginga. It is an extension of the existing Cuts plugin, but instead of plotting the pixel values, it plots the time values. Unfortunately, I was unsure of the implementation so it was a fruitless week spent in reading the Cuts code to figure out how it plotted the data and how to modify it. So I decided to work on the Save feature instead which will save the cuts plot and data. Fortunately, in a meeting with my mentor, it was decided that I should focus on the Cuts plugin for the time being, improving it by fixing bugs and adding a new type of curved cut to plot the data (so that Slit would have the same functionalities and stability). I have implemented the save function and am currently working on the curved cut. Hopefully, I will have completed it in a few days time. I will try to be more productive in the coming weeks. The next update will be in two weeks so till then, ciao! AMiT Kumar(Sympy) GSoC : This week in SymPy #2 Hi there! It's been two weeks into GSoC, & I have managed to flip some bits. This week, I started working on ComplexPlane Class, & also worked on improving linsolve. Progress of Week 2 The major portion of this week went into improving linsolve function, which I wrote last week, PR : #9438. Jason suggested to let the Matrix code be the core source for all linear solve operations (i.e. remove all algorithmic solve code from everywhere else in sympy). Then for any linear solve stuff that can't be handled by the Matrix code base, implement that here in linsolve. It was indeed a good idea, since solving linear system is more of Matrix stuff than that of solvers in CAS, So we introduced a new solver in matrices.py named as: • gauss_jordan_solve() : It solves Ax = b using Gauss Jordan elimination. There may be zero, one, or infinite solutions. If one solution exists, it will be returned. If infinite solutions exist, it will be returned parametrically in terms of Dummy parameters. If no solutions exist, It will throw ValueError. Now linsolve is a light wrapper around gauss_jordan_solve() method, it basically converts all the input types into the standard A & b form & calls A.gaussjordansolve() and replaces the dummy parameters with the symbols input by the user. Plan for Week 3: This week I plan to complete ComplexPlane class & get the following PR's Merged: That's all for now, looking forward for week #3. June 06, 2015 Pratyaksh Sharma(pgmpy) Sampling is one honking great idea -- let's do more of that! If the math below appears garbled, read this post here. A primer on inference by sampling The quintessential inference task in graphical models is to compute$P(\textbf{Y}= \textbf{y} | \textbf{E}=\textbf{e})$, which is the probability that the variables$\textbf{Y} = (Y_1, Y_2, ..., Y_n)$take the values$\textbf{y} = (y_1, y_2, ..., y_n)$, given that we have observed an instantiation$\textbf{e} = (e_1, e_2, ..., e_m)$to the other variables of the model. It turns out that this seemingly unassuming task is hard computationally ($\mathcal{NP}$-hard to be precise, though I would not bother with the details here). The good news, however, is that there exist numerous approximate algorithms that solve the problem at hand. An popular approach is to estimate$P(\textbf{Y} | \textbf{e})$by sampling. Let us surmise that we have a set of 'particles',$\mathcal{D} = \{\xi[1], ..., \xi[M]\}$where$\xi[i]$represents an instantiation of all variables$\mathcal{X}$of our graphical model. Then, we can estimate$P(\textbf{y})$from this sample as simply the fraction of particles where we have seen the event$\textbf{y}$, $$\hat{P}_\mathcal{D} = \frac{1}{M} \sum_{i=1}^{M}\mathbb{I}\{\xi[m] = y\}$$ where$\mathbb{I}$is the indicator random variable, and$\xi[m]$also has the overloaded meaning 'assignment to$\textbf{Y}$'. Getting back to our original problem of estimating$P(\textbf{Y} | \textbf{E} = \textbf{e})$, we can instead filter the set$\mathcal{D}$to comprise of only those 'particles' which do not contradict the evidence$\textbf{E} = \textbf{e}$, and then proceed as we did for estimating$P(\textbf{Y})$. But, what am I doing? Wait, we didn't see how we can generate the sample$\mathcal{D}$. Given a Bayesian network, we'll see that it is straightforward to do so. But, things are rather onerous in case of Markov networks. A Bayesian network is represented by a directed graph$\mathbb{G} = (V, E)$. Let us impose a topological ordering on the vertices$V$. Then, start with$V_1$, it does not have any parent(s). So, we sample$V_1$from it's possible values$\{v^1_1, v^2_1, ...\}$, according to the given probability weights$P(V_1)$. Say, we sampled$V_1 = v^t_1$. Now we can go ahead and sample the next variable$V_2$in the topological ordering. The fact that we are proceeding in this order, makes known the sampled values of a variable's parents before we head out to sample that variable. This is known as forward sampling. To answer the inference query when we are given an evidence$\textbf{E} = \textbf{e}$, we can merely reject those samples which do not comply with our evidence. This is known as rejection sampling. All is well, until we notice that the number of samples generated by rejection sampling is proportional to$P(\textbf{E})$. The fewer samples we generate, the worse is our approximation of the intended quantity. Let us tweak forward sampling to generate only the observed values of the variables in our evidence. And then proceed as we've always had. As we look more closely into this approach, we see that it has some flaws. Imagine the simple Bayesian network with just two nodes with an edge between them. [Intelligence]-->[Grade]. Suppose we have the evidence that the grade is an$A$. From our usual way of sampling, we sample a value for [Intelligence]. Then, instead of sampling the value of [Grade], we take it to be$A$. In the resulting particles, the value of [Intelligence] will be distributed as given by the probability weights associated with that variable. The fact that we have observed [Grade]$=A$, we should have expected the particles to demonstrate a higher value of [Intelligence] (assuming higher intelligence tends to produce better grades), but this is not the case. The issue here is that the observed values of the variables are not able to influence the distribution of their ancestors. We make another tweak to address this. Now, with each particle, we shall associate a weight - the likelihood of that particle being generated. $$w = \prod_{i=1}^m{P(\textbf{E}_i = \textbf{e}_i | Parents(\textbf{E}_i))}$$ We then modify our estimation of$P(\textbf{y} | \textbf{e})$as follows, $$\hat{P}_\mathcal{D}(\textbf{y}|\textbf{e}) = \frac{\sum_{i=1}^{M}w[M]\cdot{\mathbb{I}\{\xi[m] = y\}}}{\sum_{i=1}^{M}{w[M]}}$$ This is known as likelihood weighted sampling. And with this, we have a decent enough way to estimate the conditional probability queries on a Bayesian network. In case you are still wondering, I've implemented forward sampling, rejection sampling and likelihood-weighted sampling so far. See the pull request Aman Jhunjhunwala(Astropy) GSOC ’15 Post 2 : Coding Phase : Weeks 1 & 2 Work Summary Report Date : 05th June, 2015 The coding phase for Google Summer of Code ,2015 has officially started and the following sections(relating to the components worked on) summarize the progress of the 2 weeks- the successes, difficulties, roadblocks and surprises:- 1. Data Porting The old Astropython web application is a legacy Google App Engine Application written in 2010-2011. The entire data is stored in a Google Cloud Datastore in an unstructured format ( as a soup of keys and values ). During the bonding period it was decided that the data would first be extracted and structured in a friendly format – JSON,YAML,etc. and then a “population” script would be written to use that data to populate the database irrespective of the DB technology used in the future. To accomplish this , I completed a few Google App Engine tutorials to get acquainted with the technology and also set up the infrastructure on my local machine. This included the official Google” Guestbook” tutorial ,which was great fun to learn. To extract the data, I first needed to get the old Astropython app running on the localhost and then use a “bulkloader dump” of data (provided to me by my mentor) to restore the data into my local app and then pull the data from there into whatever format I liked. Just 2 days into using GAE , this was extremely challenging but I was glad that I could get through. Initially , I was trying to update the old code so that it could run with the latest versions of the dependencies (GAE SDK, WebApp2 and Django) . This took a lot of time and code manipulation but there were too many legacy dependencies (libraries, functions,etc) to satisfy and I ditched the process after which I re-created the environment used to code that project – Python 2.5 , Django 0.96 and webapp beta and a few manipulations later ,I got the old Astropython web app running on my localhost. Next , I restored the data to my local app using a local datastore and tried to extract the records but was unable to do so for 2 days. It was here that I reached a dead end. Any method I tried would output nothing but a random mess of data. In the end , I replaced the view function with a new view that showed the XML of data using the builtin to_xml() function. Then a simple python script converted that XML to JSON keeping in mind the character encoding problems. After this a python population script parsed that JSON data and stored it in our new web app’s database in the desired way. The script is found in the project’s GitHub repo. A very difficult ,tiring and challenging one week with little sleep, but mission accomplished in the end ! 2. Teach and Learn Section Now that I entered into my comfort zone -things began to move more quickly and comfortably. Framing the models and setting up Moderation abilities on them were followed by creating a Multi-Step Creation Wizard to create a tutorial /code snippet / tutorial series or Educational Resource. Initially I was using Django form-tools’s built -in CreationWizard Class but it was quite inflexible to our needs, so I decided to create my own custom Creation Wizard views. After getting the basic infrastructure up, I modified it to be more robust and add protection to it – making it usable and mature (saving un-submitted forms , resuming from where the user stopped ,etc ) After finishing this , I jumped onto coding the infrastructure of displaying displaying each article (from any of the categories ) . After which a secure anonymous voting mechanism was added which took into account the IP Address and UserID of a user to generate a unique token for a vote. This was followed by finishing the last part of the basic Teach and Learn section infrastructure – the aggregation pages which displayed all the posts and sorts them in terms of popularity or date created. Lastly , Pagination abilities were incorporated and CSS styling was completed to end the week on a high ! It has been two excellent weeks of coding with a lot of new things to learn. Before the mid term evaluation , the Teach and Learn section is expected to be absolutely complete, mature and tested . The next update will be in 2 weeks ! Till then , Happy Coding ! Rafael Neto Henriques(Dipy) [RNH Post #4] First report (1st week of coding, challenges and ISMRM conference) The coding period started in a challenging way. As I mentioned on my previous post, I started the coding period by merging the work done during the community bonding period to the main Dipy master repositories. This was not as trivial as expected since some latest updates on the Dipy master repositories was causing conflicts with my codes. Rebasing To solve this conflicts, I used git rebase (I want to thank my mentor Dr Ariel Rokem for his useful tips on how to do this). For more details on rebasing you can find a nice tutorial here. To summarize, below you can find the essential steps to rebase a branch: 1) make sure that the master branch of your computer have the latest changes, for this: git checkout master git pull upstream master 2) start rebasing by moving the work done on your branch to the updated version of the master: git checkout your_branch git rebase master 3) If there is a conflict, automatic rebasing stops so you can manually update the files. The parts of the script with conflict will be pointed as the lines between markers >>>> and ====. 4) After manually resolving a conflict, you can add the corrected files and continue rebasing using: git add file git rebase --continue 5) When rebasing is accomplished, you can update the changes to your fork by typing: git push -f origin your_branch After rebasing Problem #1 After rebasing, I noticed some problem on Dipy's master compilation. Fortunately, with the help of all the amazing Dipy team, this problem was quickly addressed (for more information see here). After rebasing Problem #2 - the right order to reconstruct the diffusion tensor This is a good example why testing modules are so important. After solving problem #1, one of my testing modules was failing since the simulated diffusion tensors were given unexpected results (to know what is a diffusion tensor read my previous post). Basically on my simulations, the diffusion tensors are reconstructed from eigenvalues and eigenvectors decomposition. After some hours of debugging, I realized the cause of the logic error. The eigenvalues given by an updated master's function was transpose relatively to its previous version. So, for the ones also working with second order tensors, make sure that when reconstructing the tensor from its eigenvalues and eigenvectors decomposition you use the matrix multiplication in the following order: import numpy as np from numpy import dot Tensor = dot(dot(R.T, np.diag(mevals)), R) where R = [eigenvector1, eigenvector2, eigenvector3] and mevals = [eigenvalue1, eigenvalue2, eigenvalue3] ISMRM Conference As I mentioned on my proposal, this week I also attended the 23rd annual meeting of the International Society of Magnetic Resonance in Medicine (ISMRM). The conference was very productive. In particular, I had some nice discussions with the top experts on diffusion kurtosis imaging (DKI) and their feedback will be taken into account so that my work during the summer of code is done according to the most recent state of art of the field. It was also great to personally meet the Dipy developers and promoting the open source software. I had very nice feedback from different research groups and there were many new researchers interested in using Dipy and/or willing to collaborated with its development. Soon I will post some photos of the conference =). Next steps My mentor suggested a small change on the example codes for the DKI simulation usage. I am currently finalizing this, so soon I will be posting the final version of the DKI simulations. In the following days, I will also create a pull request with the work started on DKI reconstructions modules. As mentioned on my proposal, the implementation of these modules are the objective for my midterm evaluation. June 05, 2015 Andres Vargas Gonzalez(Kivy) Canvas to handle strokes and stroke processing. During my milestone 1 period, we came up with 4 new classes for ink processing. StrokePoint: Class with coordinates x and y. This is the native structure for the StrokeCanvas behavior. StrokeRect: Logical Rectangle which purpose is to bound a stroke generated in the StrokeCanvas. Provides methods to: Know if a point is contained in a rectangle. Know if another rectangle overlap itself. Stroke: The stroke contains a list of points. The list of points form a line. A stroke have different drawing attributes: color → change color of Stroke. width → change the size of the graphic line. is_highlighter → changes the visibility. A stroke provides: get_bounds method to get the enclosed rectangle in which a stroke is. StrokeCanvas: A StrokeCanvas contains a list of strokes. A StrokeCanvas provides: Events to access to the strokes created when they are added or removed. A StrokeCanvas is the visual representation of a Stroke using Lines. Event to access to the mode when this is changed. The branch for this can be found at: https://github.com/andnovar/kivy/tree/mpl_kivy and you need the colors from here: https://github.com/andnovar/kivy/tree/colors_in_utils.py although these colors are going to change eventually for more descriptive ones. Manuel Paz Arribas(Astropy) First report Wow! Two weeks are already gone?! Time goes fast, when working on an interesting project! After a few iterations of the API with my mentors, I started working on the observation and datataset modules of Gammapy. These modules are essential for observation bookkeeping and data access. Since the background subtraction methods should act on the data, a good implementation of these modules is important for developing the toolbox mentioned in my proposal. Before explaining what I am working on exactly, I will try to clarify how TeV gamma-ray observations are carried and organized. A typical observation would be to point the telescope system (for instance H.E.S.S. or CTA) to a particular position in the sky (i.e. a source like the Crab nebula) for a certain amount of time (typically ~30 min.). Since most sources in gamma-ray need deep exposures (typically several hours to ~100 h) to obtain a significant emission, and hence actually "see something", an object is observed during multiple observations of ~30 min each. Each of these observations delivers a large amount of data (raw data): telescope images of interesting events (plus also a lot of noise), atmosphere monitoring information, telescope calibration information, etc. Each experiment or observatory has a team of experts that filter the data and derive the properties of the primary particles (ideally gamma-rays, but also a lot of gamma-like background events) and produce the so called event list files: one file per observation. These files contain ordered list of events with their properties (arrival direction, energy, etc.) and a header with the information common to all of them, for being in the same observation (observation time, telescope pointing direction, etc.). These event list files are the data, that is delivered to the observer/analyzer and are the input for the analysis tools. In order to keep track of these observations and organize the event lists needed for a specific analysis, observation tables are used. These are but a list of observations IDs that should identify the corresponding event lists, together with interesting properties that help the analyzer in the selection of the observations needed for a specific analysis. These properties are for instance the observation time, the pointing position, the observation duration, the quality of the data obtained, etc. My work on the last 2 weeks has been especially in the development of a tool that generates dummy observation tables that I will be able to use for testing some of the tools that I need to develop for the background module of Gammapy. In addition, I am working together with my mentors on defining the data format that Gammapy observation tables will support. A link to the pull request can be found here. The work is not yet finished, but the tool generates already test observation tables with a few columns. Here is a screenshot with a 10 observation example table (please click on the picture for an enlarged view): The algorithm of the tool generates random pairs of azimuth and altitude evenly distributed in spherical coordinates. Azimuth values are generated in the interval (0 deg, 360 deg), and altitude values in the interval (45 deg, 90 deg). It also generates random time values between the start of 2010 and the end of 2014. The az, alt pairs are then translated into celestial coordinates (RA, dec) for a specific observatory. More on coordinates and its transformations can be found in the Astropy docs here, or in a less technical way in Wikipedia here. This tool could also be used in the future to generate example observation lists. Items that need to be improved: • Write the doc specifying the supported format for observation tables in Gammapy. • Add additional columns to the dummy observation table generator. • Add a header to the observation table with for instance the name of the observatory/experiment (i.e. H.E.S.S. or CTA). • The times in the observation table generator should be written in a more convenient format. • Optional: since TeV gammas-ray observations are carried at night, it would be nice if the randomly generated observation times could be restricted to night hours (i.e. from 22h to 4h). • Optional: develop a tool to filter observation tables depending on a few parameters like: • Time interval: define t_min, t_max • Pointing position: define a circle/box in the sky. Lucas van Dijk(VisPy) First steps with OpenGL and Vispy Wow, we've already passed the first two weeks of the Google Summer of Code! So it's time for a little report on the current progress. I do have quite a lot of study assignments, so unfortunately I was not able to devote as much time to the GSoC as I planned to do, but there's is definitely some progress! Bezier Lines I've ported the code for drawing curved lines from Glumpy to Vispy. This was quite easy as the code written by Nicolas Rougier didn't have any exotic dependencies and I actually just copied the file and fixed some imports. I did write an example how to use this Bezier lines module, which can be found here. Arrow heads This has been a more challenging assignment. Glumpy has several OpenGL shaders for drawing arrows. I want to port these arrows, but with a slight change: I only want to draw the arrow heads. In this way the user can draw any line he or she wants, and optionally include any arrow head, with automatic orientation of the head. For one without a computer graphics background and without much OpenGL experience, some of this shader code looks like a lot of magic, but luckily there's one thing that sort of saves me: my linear algebra knowlegde. With a notebook besides my keyboard where you can write down the mathematical formulas, visualizing the vertices used, etc. its inner workings are slowy revealed. The current result can be seen in the picture below. I'm not sure yet why the first arrow head is oriented in the wrong way, and the curved arrow heads are still a bit to big, but we're getting there! I'll write a separate blogpost in the coming weeks on the mathematical principles behind drawing these shapes. Anyway, until next time! Sudhanshu Mishra(SymPy) GSoC'15: Week one The first week ended few days back. I managed to do few things during this period. I finished #9228 and got it merged into the master. I also started working on documenting new assumptions. Thanks to Aaron for quick feedback on that. There are few hiccups in the latter one. After changing all predicate keys to property, some of them started giving None when used in ask. I hope that I'll be able to sort it out before Monday. I'm also looking forward to merge #2508 this week. You are welcome to help me in reviewing this one. I'll start working on global assumptions once I'm done with the documentation. That's all for now. Cheers! Jakob de Maeyer(ScrapingHub) Introducing the Technical Details In my previous post, I introduced the Scrapy framework and its use cases, and touched that it provides a broad variety of hooks and mechanisms to extend its functionality. My GSoC project is to ease the use of these mechanisms for both users and developers by providing an extra layer – the add-on system – to Scrapy’s extension management. Here, we take a look at Scrapy’s settings API, where extensions are configured, and introduce the first sub-project needed for the implementation of an add-on system. Scrapy Settings When you work with Scrapy, you typically organise your code in a Scrapy project. The project is a structured folder that holds your: • items (where you define the structure of the data you wish to extract), • spiders (where you parse web page content into items), • pipelines (where you filter and export items), and, • if you want, much more code (e.g. downloader middlewares if you want to mess with the requests Scrapy sends to web servers). The project also contains a settings.py file. This file is the entry point to changing many of Scrapy’s internal settings. For example, you would change the user agent that is reported to web servers by defining the USER_AGENT setting, or throttle how fast Scrapy sends request to the same server by setting a DOWNLOAD_DELAY. Setting Priorities The settings.py file is not the only place where settings are defined. Some setting defaults are overwritten when you call a specific scrapy command, or you can temporarily overwrite settings by passing command line arguments. To honour the precendence of settings set at different locations – command line arguments take higher precedence than settings.py, which in turn takes precedence over the default settings – without having to care in which order settings are read in the code, Julia Medina introduced settings priorities in last year’s Summer of Code. Internally, settings are saved in an instance of a Settings class. This class maps keys to values, much like a dictionary. However, the Settings.set() method that is used to write to it has a priority keyword argument. When a setting is saved, the given priority is saved along with the value. On the next call to set(), the setting is overwritten if, and only if, the given priority exceeds or equals the priority of the already saved value. This frees Scrapy’s codebase form having to watch out for order when overwriting settings. If a setting is given via the command line, it is simply saved with a high priority, and then guaranteed to not be overwritten by a lower priority. Compound Type Settings Scrapy does not only know simple (non-compound) settings, such as the DOWNLOAD_DELAY or the USER_AGENT. In particular, many of the settings related to enabling extensions are dictionaries. Typically, the dictionary keys represent where an extension can be found, and the values determine in which order extensions should be called. For example, if you have two filter pipelines and an export pipeline, you can make sure the filters are called before exporting by enabling your pipelines in the following way: # In settings.py ITEM_PIPELINES = { 'myproject.pipelines.filter_one': 0, 'myproject.pipelines.filter_two': 10, 'myproject.pipelines.mongodb_exporter': 20, }  There is a problem with settings priorities and the compound type settings: there are no per-key priorities. In other words, the whole dictionary has only a single priority. Moreover, the complete dictionary is overwritten instead of updated every time when the setting is written to. This has some unpleasant consequences. For example, if I want to temporarily disable the second filter (from above) via the command line, I cannot simply pass that settings update (encoded in JSON) via scrapy crawl -s "ITEM_PIPELINES={'myproject.pipelines.filter_two': null}" myspider  as the ITEM_PIPELINES setting would be completely overwritten, such that the only entry it now holds is a disabled filter_two. Introducing per-key priorities As the add-ons will mostly update the dictionary-like settings, my first sub-project aims at updating these kinds of settings from multiple locations such that • the dictionary is not completely overwritten, but • new keys are inserted and their per-key priority saved • existing keys are updated, if the update priority is high enough. This is achieved by promoting the (default) dictionary settings from dict instances to Settings instances. This requires: • completing the dictionary-like interface of the Settings class • rerouting set() calls for these settings to their update() methods It has further benefits for the structure of the Scrapy settings. Previously, some of the dictionary settings had default settings, e.g. there were a larger number of DOWNLOADER_MIDDLEWARES enabled by default. To avoid that these are completely overwritten, and instead only updated, when users set their own DOWNLOADER_MIDDLEWARES in settings.py, these were outsourced into a DOWNLOADER_MIDDLEWARES_BASE setting. When the downloader middleware list is then compiled, the two dictionaries were merged (with higher priority given to the user-defined DOWNLOADER_MIDDLEWARES). Implementing the per-key priorities for dictionary-like settings made this structure obsolete. The default downloader middlewares (and similar components) can now simply be saved in DOWNLOADER_MIDDLEWARES without fearing that they’re overwritten (unless specifically wanted) when the user sets their own. The pull request for this sub-project will soon be finished. It includes code cleanup for the now-deprecated _BASE settings as well as fixing some previously existing inconsistencies (not all components could be disabled by setting their order to None) and can be found on github. Ambar Mehrotra(ERAS Project) GSoC 2015: Second Biweekly Report The past two weeks were good and I primarily devoted them to doing two main things. Firstly, a major portion of time was spent on developing and finalizing the Software Architecture Document for my project, and secondly, I devoted some time to build the main GUI of the Habitat Monitor. Software Architecture Document Software Architecture Document provides an architectural overview of the system to depict various aspects of the system. It is often used to convey the important architectural decisions taken which have been made regarding the whole system. Link to Habitat Monitor Software Architecture Document Designing GUI interface I also spent a major portion of the time in deciding the suitable library and getting started with designing the interface. I spent a lot of time experimenting between Kivy and PyQt and I finally decided to go ahead with PyQt. This decision was based on the fact that kivy is still fairly new and lacks the large developer and user community as compared to PyQt. I tried building the GUI with kivy first and got at certain places, and all the google searches directed me to the same few threads which did not help much. Kivy is good and is developing rapidly but it will take some time for it to build a large user community so that users can find what they are looking for quickly. PyQt on the other hand has a very large user community and can also leverage the help available on the Qt forums as PyQt is only a wrapper for the famous Qt library. I went ahead and started working on the GUI and below are a few screenshots. The GUI currently asks the address of a data source to get data from, the GUI then pings the address and checks if the server is running, if the server is running it requests the server for all the attributes available and saves them in a configuration file. This information can later be shifted to a database. Enter the device address The device address entered is correct The device address entered is incorrect User presses cancel button Understanding the Open Mission Control Technologies I also spent sometime trying to understand the Open Mission Control Technologies as this software can help me build the GUI in a lot more effective manner. Although it would require coding in Java and I am still discussing with my mentor about the prospects of using it. Vito Gentile(ERAS Project) Enhancement of Kinect integration in V-ERAS: First report This is the first report about what I have done in these weeks for my GSoC project. If you don’t know what it is about and want to find more information, please refer to this page and this blog post. I started with some documentation, by updating the old one and adding a SAD document that describes my project. You can read it at this link, in the ERAS repository. The main part of my work during these weeks was focused on porting in Python the existing C# code to track skeletal joints with Kinect. I did it by using PyKinect, and you can find the code in the ERAS repository, under the body_tracker module. I had also the possibility to test my code by using some recordings of an user walking on a Motivity. These recordings were made by using Kinect Studio, and fortunately PyKinect has also revealed its ability to be linked to this tool. In this way, it is possible to simulate depth and color Kinect data by using previous recordings. I had also to reorganize the two modules related to skeletal tracking, that were present in the repo: • body_tracker_ms, which included the C# implementation (based on Microsoft API) was removed, but the C# code is still there, under the old folder in the body_tracker module; • body_tracker is now the only tracker module, and it includes the Python files to track the skeletal joints (tracker folder), the old C# implementation of skeletal tracker (old folder), the only Blender-related file still included in this module (blender/blenderCoordinates.py) and the documentation. No more C++ files are there, because they used OpenNI, and now ERAS has moved to Microsoft API. Also the old documentation (which was related to a not working version of the tracker) was removed. For what concerns the Python code, I have written mainly two scripts: tracker.py and gui.py. In tracker.py there are the classes related to Tango, and a Tracker class that can be used to manage a single tracker. The script can be executed as it is (mainly for test purposes), and it is also able to record tracked joints in a JSON file, and simulate the tracking by reading them from the latter JSON file (without the need of a Kinect plugged-in). In practice, tracker.py is not executed as a standalone script, but it is imported in gui.py, where the Tracker class is used. The gui.py script is the one that displays the GUI to manage up to 4 Kinects simultaneously connected and working on a sinle PC. It uses the aforementioned Tracker class to instantiate up to four “trackers”, each of which takes data from a single Kinect. The very last feature that I have been starting to implement in these days is the evaluation of user’s height, and the publishing of it in the Tango bus. The estimation is done by calculating distances between some joints. You can see some code at this link (although it is still in an early stage) but in order to understand it better, may be useful to take a look at the following image, which represents the positions of skeletal joints tracked by the Microsft Kinect SDK 1.x: For more information about my project and the other ones supported by Italian Mars Society and Python Software Foundation, refers to the GSoC2015 page of ERAS website. Ankit Kumar(SunPy) Python Software Foundation Phase II : Coding, Coding, Coding<br>By Ankit Kumar (JUN 05, 2015) Status Update: Last two weeks: • Implemented URL Pattern and Data format for STEREO SIT Instrument - few changes to be made • Implemented data format for all the other Instruments (committing to PR soon) • Many changes to code few of which have been accommodated while rest have been kept to deal next week • Struggled for a couple of days to read in headers along with data, but eventually gave up and read them separately • Already made 3 PR, about to make 3 more ( 1 for each instrument, different branches ) A few places where I got intimidated a bit was when so many changes were told to be made to my PR and in trying to read headers along with data. Also I guess I lost a sense of direction a bit early since I made a significant PR on first day of coding itself and ongoing changes in the sunny platform which I hadn’t expected to affect my development. But soon I figured out what part of my development would remain unaffected and coded all of that thing so as to get some time to start developing on the changing parts of the platform. Next two weeks: • Implement URL Pattern for all the instruments, using scraper PR to download files based on URL generated and then • Fitting it finally in the dataretriever,sources • Complete the instruments classes implementation and dataretriever.sources I hope to be more productive in the coming two weeks than I was in the last two. Cheers Ankit Kumar -- Delivered by Feed43 service Python Software Foundation Phase I : Getting Accepted !!, Community Bonding, Mailing List, Preparation, Welcome Package<... So what’s up people. Long time huh!! It seems that getting done with semesters and getting accepted for GSOC 2015 and that too under such a prestigious organization as Python Software Foundation has made me fall into quite a celebratory mood. And thats why the heading is no longer Google Summer of Code 2015 Phase IV but Python Software Foundation Phase I :D So I’ll just list the things that have been going on with me over the last few weeks. Lets start with getting accepted. Well I’ve already talked about it in the first paragraph so not a lot about that but just how grateful I am to PSF and SunPy for giving me this opportunity. I really look forward to a Successful GSOC 2015 Completion. Moving on to Community Bonding period. This has been a nice phase where I talked to my mentor, we decided upon work timings (due to different time zones, work methods (we are using troll cards) and of course IRC. I’ve already gone over the few code pieces that I require to understand to start and infant will be making my first commit tomorrow itself (with the beginning of Coding Period). One thing that obviously characterized this period of Community Bonding was the annoying GSOC mailing list. OMG are people seriously crazy !! :-( :-/ like seriously I had to change the mailing list settings to abridged daily updates because I was getting like 10 mails every day and that too about some really stupid and irrelevant things. But yeah like whatever. So I guess I covered uptill preparation part. So lets move on to the Welcome package sent by Google to all accepted students. I must say that over the past few weeks I was excited about this package and it arrived just yesterday. So FTW it contains a moleskin notebook, a pen-cum-pencil, and GSOC sticker. It may also contain your payment card but since I live in India and the only option we have is to opt for Bank transfer so my package didn’t had the payment card. For others I am sure would have. But Now all is done and now it’s time to get some perspective. By that I mean “LESS TALK, MORE CODE” and so signing off there is only one thing on my mind i.e. Let The Coding Begin -- Delivered by Feed43 service Google Summer of Code 2015 Phase III : Github, Patch, PR and Proposal<br>By Ankit Kumar (MAR 21, 2015) Now why does the heading specially mentions Github (its common among developers right !!) but as it turns out it actually was my first time using it and hell it was confusing so I had to ask out my friends, seniors and I did trouble my mentor a lot and I am really very sorry for that !!(David if you read this do know I am really sorry this is my first time using Github. I promise that thats the first thing I am going to do after I submit my proposal) So Finally it seems that I am active now and moving forward. And as was the next step I started looking for issues to fix and to make a PR. So I read over the issues on the Github issue tracker and well I decided to deal with issue #798. Now while doing this activity is incredibly crucial for the mentors to judge the ability of us ,the new contributors, I on the other hand also had a lot of fun interacting with the sunny source code. I mean I got to read the original code and then add features to that and I was like how cool is that. So ok I did have a bit of confusion about about what exactly the issue was about in the first place but thanks to David that got sorted out. Then I headed to interact with the particular piece of code I had to improve and fix. And that lead me to the source code of parse_time function where I found that the issue existed because simply the author of the code never meant to add the support for numpy.datetime64 type input and also the time string input with time zone info in the string. So I tried to fix it one way which would’ve gotten a bit complicated but then I came up with a easier workaround by adding an if clause which handled the numpy.datetime64 input by converting it to diatomite.datetime type and then the function handled it just like it did other diatomite.datetime. Moving on the other problem was to add support for time_strings with time zone info attached at the trailing end. This was solved using parse function from dateutils library. So fun it was and so I end up making a nice PR which actually failed both tests. But on being urged by David a bit I again tweaked it a bit made some test cases and tried it again and now it passed one test although I am still not sure why its failing the travis-ci test. But again I remember that I have to commit again by avoiding code repetition and adding the test cases to test file using pytest. So more for later, right now I am just going to add some more commits improving my patch and then head straight into making the proposal which btw I am completely freaked about cause its such an important thing. So for now I just hope the proposal writing goes fine and I do get accepted. I’ll update this post later when I get done with my proposal and will be about to submit it because then that’ll officially be the end of Phase III of GSOC 2015. After that Phase IV will start that is waiting for the results but I think I am just gonna start reading up more code and atleast set up the skeleton of the code (or make some progress with it ) before I go back home for summers. But lets right now focus on the proposal thats in front of us. And to quote David The requirements to fulfil are the following: • Create a PR with a patch. This can be to any part of SunPy, the above would do. (by the way, does not need to be accepted, but better if it is). • Create a blog with some tags/categories (python, psf, gsoc,.. you choose) so what you write under it the PSF can grab it automatically. • Write your proposal. To write your proposal you should try to get familiar with everything, but mostly with the part that you are going to contribute. So, if your project involve lightcurves, it would be good that you understand how they work and how we want them to work (https://github.com/sunpy/sunpy-SEP/pull/6) even if you are not going to do such project. For that, it will be helpful if you know how sunpy.maps work too. The unifiedDownloader is going through deep changes, so keeping an eye on what are they is also good. -- Delivered by Feed43 service Google Summer of Code 2015 Phase II : Joining mailing Lists, Introducing myself, Talking to mentors, waiting for replies... Well I am obviously not gonna list down here the organizations and the projects that made it to my shortlist !! :P (for obvious reasons). I think I’ll only mention the final one that I end up preparing the proposal for but of course I don’t know what it will be so thats for later :P. Also seeing that I have got a lot of work to get done I am gonna keep this blog post short….save some talking for interacting with mentors !! Ha. So well yeah the first of all steps is to join the mailing lists of the development community of the respective organization and introduce yourself there along with the specific idea that you have selected from the pool of ideas of that organization. After that its a slight wait for reply but the developing community is really very helpful and welcoming and will help you to get on with open source development even if you are a beginner in it (I was !!) But I found really good organization and the mentors were really very patient in replying to my mails and answering all the questions pretty descriptively. Well after that comes reading up a bit more on the resources and links shared with you by the mentors and getting a sense of the organization and especially how you’re selected idea might integrate with their overall mission and code base. In my case it took a bit of time with few organizations while with others it was much more rapid. Now based on this newly gained knowledge we have to decide whether we might be able to develop that idea, be interested in it, and whether ultimately we get what the idea is. Well ultimately because you just have to you know get a gist of what it is although a bit more holistic gist because the rest is for the time when we start preparing the proposal. ( Note: It may seem that I simply had this all in my mind but no I had to talk to lots and lots of people, ex-GSOCers, some seniors at my college who were mentors for organizations and mentors out there in the dev-community. ) And Finally after all this well you get your final organization !! Right !! Well life aint that straight. After doing all this one random day (two days ago) I was just looking through the Python organizations because I felt that only if there could be a bit more interesting organizations with a bit more interesting idea to me and for me and there I hit PSF page and I am like “I definitely didn’t see all these new organizations before”. And so I have sent out the mails and introductions so now lets see what happens!! So then the whole process of Phase 2 was repeated and guess what The final Organization that I end up finally selecting is SunPy under Python Software Foundation. What I would especially like to mention here is the speed with which my mentor from SunPy helped me pick up the necessary bits and get started since obvious I was a bit late. So now here I am finally with one single project and setting up the dev environment and using bits of it. And I guess now lets move on to Phase III of GSOC 2015. So lets get our hands dirty now and deal some blows with the SunPy codebase!! -- Delivered by Feed43 service Google Summer of Code Phase 1: Shortlisting of Organisations to numbers I can deal with.<br>By Ankit Kumar (Mar 13, 2015... Ok so Here is my first blog post for my Google Summer of Code 2015 Proposal to Sunpy, Python Software Foundation. So hmm how was my experience applying for GSOC. Hmm Let me think of which word is more intense than tiring because woof is it tough man!! So I started looking for Organizations right after the List of Accepted Organization was posted and my god there were 137 organizations in total and thats kind of a lot!! So how do I filter down an organization thats suited for me. You know one important thing about me is that I love to learn and I have specific interests that I like to explore so I needed an organization that suited to my interest or speaking specifically that I be interested in continuing to work even if they don't pay me at all. See that is how I choose whether I will or not do anything. That is where I get my persistence from. And this may sound crazy because hey you might not be interested in anything but I am kinda unusual on that note. I am greatly interested in technology, business, entrepreneurship, astronomy, physics, and most importantly programming. So now 137. Well I know C,C++, Java, Python and web technologies so how do I start. Lets rewind to when exactly did I start loving programming or when exactly did it start speaking to me. I started out coding in my first year of college when we had a programming course in C. So it was nice I got to know about a very good programming language, C. And I aced that course too not to say because I liked it a lot….it felt singular I mean it was not as complex as talking to people it was simple and I liked that. Although at a lot of times I felt that it was restrictive I mean I couldn’t do everything that I wanted to do with it. I guess it was because it probably couldn;t be covered in a single semester or that the course for simply an introductory course so the ydidnt wanna complicate it enough so that others couldn’t follow. I wanted it to be able to talk to other files read from them write to it and wanted it to do this seamlessly without a lot of hassle but as it turnout that it wasn’t all easy. It almost always remained in the console. But come second year and I am introduced to online course on Python and I delve more into it. And soon enough I learn how to make gui applications in it, read files, write to them plot graphs make it talk to internet and that was liberating and that is the story of how I fell in love for love for the second time. And it was similar to the adrenaline rush that I got when I fell for Physics in ninth grade. So there it was yeah I felt liberated and powerful with python because it enabled me. Another thing that I have been particularly inclined to has been building things and then showing them off to people that it worked !! Ha So there was my decision — search for Python tag. and now we were down to 40 organizations and man the real struggle starts now. So now what I do is open up the ideas page of all the 40 organization on side tabs and hmm over two days read up the projects, filtering through them. So even 40 is a lot man. So I took up a simple criteria — I am just gonna select the projects and therefor the organization if my current skill matched the requisite skill for that idea. I have fair amount of experience using and developing in python and its libraries (following from the fact that it made me feel liberated). This took up a while. And guess I ended up with some 20 organization ideas page. Thats nice hah. So moving on I cut through the list by selecting the organizations that also coded about things that interested me. And this was the most time consuming process of all cause I had to read through each of the idea and read it like saying cover to cover and the googling about it seeing some online examples of what the organization did and what it was used for and after about a hell of a time I ended up on about 8 organizations which for me was decent to start talking to mentors, to hang out on IRC, introduce myself and you know start looking at a specific idea from each ideas page. So basically that meant 8 ideas selected down from 137 organizations times average of 7-8 ideas per ideas page ie 959-1096 ideas. Nice huh !! I had my spring break during this time so I was a bit merrier so I guess it took me a bit more time than it should have to get it done. But whatever happens ….. I am moving on to next phase and thats all that mattered now !! So Now let the talking begin. !! It was finally time for Phase 2. -- Delivered by Feed43 service Stefan Richthofer(Jython) Establish a proper referencing-paradigm (as preparation for GC-implementation) In the last days I tidied up a lot reference-related stuff and clarified what reference-types the type-conversion methods deliver. Before starting with the actual GC-work, much clean-up regarding ordinary reference-counting needed to be done, especially related to singletons and pre-allocated small integers and length-one strings/unicode-objects (characters). Both CPython and Jython do pre-allocate such frequently used small objects and I felt these caches should be linked to improve conversion-performance. Now the integer-conversion methods check whether the integers are small pre-allocated objects and if so establishes a permanent linkage between the CPython- and Jython-occurrence of the converted object. More important: I established some rules regarding referencing and worked out their JyNI-wide appliance. - JyObjects (i.e. JyNI-specific pre-headers of native PyObjects) store a link to a corresponding jobject (as of the very beginning of JyNI-development). It is now specified that this is always of JNI-type WeakGlobalReference. Accordingly I re-declared the JyObject-struct to use jweak-type instead of jobject. - The conversion-method jobject JyNI_JythonPyObject_FromPyObject(PyObject* op) now is specified to return a JNI-LocalReference. If this is obtained from the jweak stored in a JyObject, it is checked to be still alive. If so, a LocalReference is created to keep it alive until the caller is done with it. If the caller wants to store it, he must create a GlobalReference from it, see http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#weak_global_references for more details on JNI-references. The macros JyNIClearRef and JyNIToGlobalRef are now provided as efficient helpers in this context. - The conversion-method PyObject* JyNI_PyObject_FromJythonPyObject(jobject jythonPyObject) now consistently always returns a *new* reference, i.e. the caller must decref it after using it. (GIL-free mode will be another story, but will most likely just ignore ref-counting, so the decref command won't harm this mode.) These rules clarify the usage of the conversion-functions. Fixing all calls to these methods JyNI-wide to comply with the new semantics (having to use decref or JyNIClearRef etc) is still in progress. However having this referencing-paradigm specified and cleaned up is a crucial initial step to implement a proper garabge collction in JyNI. Brett Morris(Astropy) Anti-Racism, Pro-Astronomy: Week 1 What's this about? I'm participating in a six month Anti-Racism, Pro-Astronomy experiment led by Sarah Tuttle to educate myself on matters of diversity in STEM each week, and act on that education. For the two weeks leading up to the start date, I tried to convince myself that my first idea for an action plan would be hard, but I've decided it's worth a shot. Before I tell you my plan, some background. At the UW, graduate students and faculty involved in our Pre-MAP program hold meetings called Diversity Journal Club. It's like an astro-ph journal club for matters of diversity in STEM, though the requirements to present recent papers are relaxed. One or two speakers laboriously hunt through social science literature and pop-sci articles looking for sources to cite and to build presentations around. They prepare a presentation, and end on discussion questions for the audience, often leaving room for people to discuss personal experiences or how to take positive action. In my time at UW, the present Diversity Journal Club owes its organization and sources of topics to a big group of people including Sarah GarnerRussell Deitrick, Nell BylerEddie Schwieterman, Michael Tremmel, Erin Hicks, and Christine Edgar, among others. I have gained much more actionable knowledge at these Diversity Journal Clubs (DJCs) than I ever have from an astro-ph journal club, and I think they hold an important place in the discussion about how to combat racism in astronomy. DJCs give us a common vocabulary that may help to ease the flow of conversation in our office or at home about race, which is so often talked about as a "hot-button issue" that we need to actively remind each other that it's a personal matter for nearly half of the members of our community. DJCs create a venue for unaware perpetrators of microaggressions to (potentially) recognize their actions. DJCs confront scientists in their own language and challenge them to draw on statistically quantified results from that tell us what we're doing right or wrong. I think we could all use more of that kind of active introspection. What's the plan? I've always been a tremendous fan of astrobites for summarizing nearly impenetrable journal articles, making it possible for undergraduates to find accessible introductions to the papers they are nearly ready to read. As an undergraduate, I found that astrobites articles met me half-way in my half-English half-astronomer language which gave me enough context and background to then read the full journal article without stopping to Google. I want to start a blog in the spirit of astrobites dedicated to finding primary sources that can act as the seeds of Diversity Journal Club discussions. These can include peer-reviewed journal articles or primary-source personal accounts by under-represented minorities in any field which speakers feel can help teach majority astronomers about the experience of the non-majority. Each post will also include a brief description of the paper and its results or major take-aways, and a handful of questions that speakers can use to start discussions after their presentation. My primary goal is to lower the barrier of entry so that interested astronomers at other institutions have no excuse not to host a Diversity Journal Club with borrowed momentum. Need a source that describes a Hawaiian perspective on TMT? We'll have that. Need some pointed questions to get your colleagues challenging their views on the issue? We'll work on those. At first, I will source these posts from the Diversity Journal Club presentations that have already happened at UW by collecting sources from my peers. I also want to collect information on how they found their sources and collect those resources in one place. The posts will not be presentations in a box, but they will point you in a few directions to base a presentation on. This week I've done a bit of research on options for multi-author blogging and on how Diversity Journal Club runs here. Next week, I plan to tap every past DJC speaker I can find to chat about their resources and how they come upon them, and take input from peers and other Anti-Racism, Pro-Astronomy participants on how best to do this blog. If you have good resources or (even know of another blog that does this and makes my idea unnecessary), let me know in the comments or on Twitter. Daniil Pakhomov(Scikit-image) Google Summer of Code: reading OpenCv trained file. Building detector. A post about how I read the OpenCv trained file and implemented a prototype of detector. Example of a face Evaluation by detector This is an example of what stages and respective weak classifiers are stored in the OpenCv trained file for Mb-LBP. There are 20 stages overall with increasing accuracy. The cascade structure speeds up the process by orders of magnitude because the non-faces patches are rejected on the first stages of the classifier. Sliding window and scale example. I also implemented an naive sliding window and scale search. Because the classifier trained only for 24x24 window size we, have to scale the image and also slide the window to find a face. To fire, the face should fit into the 24x24 window. This is an example. I started with (118, 118) image with a face. On this scale classifier didn’t detect any faces which is correct. Then, the program scaled the image by the ratio of 0.5 (size (59, 59)) and detected one image. Actually it’s a part of the face and we don’t want detection like this, but it can be overcome by using pruning technique. Then the the images was downscaled once again and this is where the face was found. Once again, this big amount of detections of the same face can be removed by pruning. The code. This is the temporary code and its only purpose was to make sure that the classifier works. It will be optimized in the following stages. Sumith(SymPy) GSoC Progress - Week 1 Hi again, this post contains the first report of my GSoC progress, even though it's week 2. Progress I have completed the UnivariatePolynomial implementation in PR 454, the PR is reviewed and merged. This SymEngine class can handle univariate polynomials and can handle all the basic polynomial manipulation. The current functionality of UnivariatePolynomial are: * two constructors of UnivariatePolynomial class, one using a dict of degree to coefficient and other is using a dense vector of coefficients. Note that this implementation is sparse. * printing, same output pattern as that of SymPy * from_dict which returns the appropriate Basic type on passing the dict * dict_add_term to add a new term to the dict * max_coef(), diff(), eval() as the name suggests * some bool check funtions to check specific cases like is_zero(), is_one(), etc. * also the add, sub, neg and mul functions. What I learnt here was having a testing environment setup first speeds up the process of implementation and things go in the right direction. Report The UnivariatePolynomial uses std::map, I plan to switch to std::unordered_map or other specialized data structures before benchmarking the class and comparing speeds so that we get a decent speed. The, to be implemented, multivariate class will be called Polynomial. Note that two classes are high level, because they can take part in SymPy expressions. The plan is to implement lower level classes with various data structures, as well as using Piranha. These lower level classes do not use RCP at all, thus they could be faster for some applications. The user could then call specialized classes if needed for a given application (if we implement any). Targets for Week 2 and Week 3 First aim is to use the already implemented polynomials in rings in SymEngine, look at the expand2b benchmark and try to speed it up by: * Making use of Piranha int * Using Kronecker packing for exponents If we get satisfactory speed, we wrap it in Polynomial class. This can further be optimized using our very own class Integer, where in it we switch between int and mpzclass automatically (we should use it everywhere in SymEngine instead of mpzclass) and hashmap in future. Tasks * have an option of Piranha in cmake * code for packing exponents into machine int64 * try to use Piranha's integer to see how it performs If time permits * Implement faster hashmap this weekend with Shivam That's all for now. Catch you next week. Adiós Shivam Vats(SymPy) GSoC Week 2 The 2nd week is now coming to an end, and by now I have a pretty good idea about how things often don't work the way we want them to. So far I trimmed down PR 9262 to remove all the parts not yet ready. It is undergoing review and should get merged soon. Last week, I had planned to complete work on Laurent series by now, but it is still a work in progress. Fortunately, while discussing it with my mentors, we realized that handling Puiseux series is not as difficult as I had thought initially. Internally, all polynomials are stored in the form of a dictionary, with a tuple of exponents being the key. So x + x*y + x**2*y**3 is stored as {(1, 0): 1, (1, 1): 1, (2, 3): 1}. For Puiseux series I need to be able to have rational exponents as keys in the dict. This isn't an issue in Python 3 as it evaluates 1/4 to 0.25 and the uses the decimal value as key. It doesn't work in Python 2 as it evaluates 1/4 to 0 and hence all fractions less than 1 become 0. The solution to this is to use Sympy's Rational data type, which lets us use the exact fraction as a key. This means that, hopefully, I will not need to make any complex changes in the code of ring. I still have a few days to go in this week, during which I will further explore how to make the required changes. There hasn't been much progress on the hashtable as both me and Sumith have been busy with our PRs. Hopefully, we will look into it during the weekend. Next Week • Make ring_series work with Puiseux series • Write an interface to better handle the series, especially so that it works with the rest of Sympy. Cheers! Yue Liu(pwntools) GSOC2015 Students coding Week 02 week sync 06 Last week: • Using capstone's Instruction.groups to filter the opcode. • Try to support call/blx instruction. • Rewrite the setRegisters() function, but not yet completed. • Fix some bugs. • Try to solve the armpwn challenge using the new ROP module. Next week: • Bugs fix and performance optimization. • Do more example to find more problem. • Do a shellcode(including ROP payloads) encoder. Aron Barreira Bordin(Kivy) Report 1 - Bounding period and two weeks of code Hi! I started to work with Kivy Designer two weeks ago. I made a good progress with my proposal, coded and learned a lot with it :) Bounding period • I studied a lot about Kivy, reading the documentation and fixing some small bugs • Added a initial support for Python 3 • Fixed Kivy Designer code style • Tried to help users on #kivy(I'm not yet experienced enough to support some users and usually busy, but when possible, I check the IRC to see if there is something that I can do) • While studying Kivy Designer, I had found a lot of bugs and had some ideas to new improvements; Everything is listed here I have coded... Buildozer integration Kivy Designer is now integrated with Buildozer. Now it's possible to build and run your Python application on Desktop, Android and iOS devices! To handle these multiple targets and options, I created a "Build Profile" settings. Build profiles There are three default profiles: • Desktop • Android - Buildozer • iOS - Buildozer The user is able to edit these profiles, create new ones or even delete them. With build profiles I hope to turn multi-platform development easier. Now it's just necessary to change the profile to build the same application to the desired target :) buildozer.spec editor The Buildozer requires a .spec file(INI format) to read the project settings. You can check the default spec here. I implemented a GUI to edit this file. And to be easier to edit some properties, when necessary, all possible values are already provided: My progress Major part of my code is waiting for review, but I did a good progress in my proposal, so I'm ok with my schedule. Thats it, thanks for reading :) Aron Bordin. June 04, 2015 Ziye Fan(Theano) GSoC started! [GSoC 2015 Week 1] Theano is a symbolic expression calculator. User write expressions and functions in symbolic form, and then compile them to make them executable, so that user can do real calculation with them, feed data in and get output. As I mentioned in last post, my task here is to make theano compile faster. So the very first step is profiling. During community bonding period, I profiled an autoencoder model which is very simple and a more complicated model provided by my mentor. The first interesting thing I found was that the sum of time in profiling output is shorter than the time it actually takes. I plan to do some more profiling to find out the reason and to make it covered by theano profiling module, but not now. In bigger model, optimization is the most time-cost procedure during compile. Optimizations are introduced to make calculation more efficient, by changing the function graph without side effect. (http://deeplearning.net/software/theano/tutorial/symbolic_graphs.html#optimizations) Optimizations are useful, but it sometimes take too much time to apply them. So here is a optimizations' to-do list. As I'm not very familiar with compiling procedure yet, I didn't purpose any optimization. But I will do that when I can. The items listed now are added by my mentor and other developers, thank you. This week I did the first optimization, which is removing unnecessary operator chain, gpu_from_host and host_from_gpu. Theano can calculate with gpu. When doing optimization, theano tries to move operators and data into gpu, gpu_from_host is introduced to do that. Things like gpu_from_host(host_from_gpu(x_whatever_already_in_gpu)) can be generated, although they can be replaced with x_whatever_already_in_gpu in next applied optimizer "gpu_cut_transfer", it take time. Here I do some check to make sure gpu_from_host(host_from_gpu(x)) not introduced when moving data to gpu. The pull request is here. I'm thinking about a possible optimization to do similar to this one, but still need more experience and experiments. It was really interesting to read theano's code. Richard Plangger(PyPy) PyPy unrolling trace loops Note: Please see my last post "GSoC the first two weeks" I already outlined the big picture of the final optimization. In many places I have assumed certain conditions and added simplifications to more easily explain the concept. In this and the next posts I will present much more detail about all the different parts that I have added to RPython. Note that I might recapitulate some basics of the terminology to make it better understandable for those who are not familiar with tracing- or method- JIT compilers. RPython's optimizations The basic data structure is a list of operations called "trace". A trace has no entry points other then its header instruction. In addition it might only be exited at a special instruction call "guard" instruction. Note that a trace must not necessarily have a backwards jump, but in the following only trace loops are considered. Here is an example of a trace. The red edge shows what is not allowed to happen. Jumps are only valid to the label instruction. The optimizer takes this list of operations and transforms it to an equivalent trace. Many of the optimizations only pass through the list of operations once and gather information and emit, transform or leave out the current instruction. Optimizations that are done in other compilers are easier to implement because it is a single basic block instead of a region or control flow graph. There is one optimization (one of the most powerful) that is different though. The unrolling optimization. Trace loop unrolling The proposed algorithm (henceforth called VecOpt) needs a way to unroll the loop body several times. So why not reuse this optimization already present in PyPy? I tried to reuse the optimization (and did in a certain way), but the unrolling itself I have duplicated for two reasons: • Unrolling does not only unroll the trace loop, but it gathers information about the traces and applies many more optimizations than just peepling one loop iteration. This includes guard strength reduction, guard implication, loop invariant code motion and more. • The factor is not easily adjustable. It has the sole purpose of peel the loop body just once. The only requirement VecOpt has: It needs a way to unroll the loop X times and correctly rename the variables. This is implemented in less then 200 lines of code and it is specifically tailored to VecOpt. Renaming a loop iteration needs a mapping function (dict) of the jump arguments to the label arguments and must instantiate new variables for operations that create variables. In a loop unrolling iteration the mapping function is used to rename the arguments of operations. The idea is also described in the loop unrolling optimization article here (chapter 5). Benefit VecOpt benefits from the unrolling optimization. Without it, chances are very very low that a vectorizable loop can be found and transformed to a faster trace loop. There are two reasons for that: • The peeled loop in most cases contains significantly less guards than the original trace • Most of the loop overhead has been removed Guards increase the dependency edge count in a trace. In many cases they make two instructions dependent even though without the guard they could in theory be executed in parallel. If loop overhead is not removed, then the gain might not be that significant when using SIMD instructions. There are many loops that after the unrolling optimization has been applied, only contain less than half of the instructions. Summary VecOpt is applied to the peeled loop the unrolling optimization outputs. This smaller loop is then unrolled X times. X is determined by the smallest type that is loaded in the trace loop. Without the unrolling already done by PyPy there would only be little chance to find parallel instructions and group them into vector statements. In a nutshell one could say that: VecOpt unrolls an already unrolled loop. GSoC the first two weeks The community bonding phase ended almost two weeks from now and I have been very busy working on my proposal. My situation might be quite different from other GSoC participants. I started my proposal in early February and also wrote code at that time. Thus I did not have a rough start with the project setup. The major things I have been working on is ironing out some problems that happen in the dependency construction. I use a dependency graph to reschedule the PyPy trace and there where some corner cases I did not think of (Trace operations with side effects and guards). Some edges missing are fatal, because it might happen that the instruction end up in the wrong order. In addition to that I designed and added a cost model to decide whether or not it is reasonable to vectorize a trace loop. This prevents the optimizing backend to spend time on traces that will not be faster. I already compared the idea to classical vectorization in the last post. The biggest problem is that without a cost model it can sometimes happen that not vectorizable loops end up with a single vector load statement and later unpack the vector elements. This is not desirable and the cost model prevents this from happening. By looking at the schedule of my proposal I'm quite advanced already and I'm already working on stuff I proposed to do in the second term of GSoC. This is great because there is lots of time to fine tune and I have already discussed with my mentors and project members where we could even further improve. My next post explains the setup I started working at back in February/March and shows how I unroll loops. Prakhar Joshi(Plone) Plone add-on setup with registering and deregistering the add-on For my project, I have been working on an add-on package and for this first we have to setup that add-on package to start working on it. So we created an add-on package named experimental.safe_html_transform. We have to set the testing enviornment, Generic setup of profiles, setup for default and uninstall profiles, setting up browser layer, setting up filter control panel for the browser layer, generic setup for the filter control panel. So these are the few of the essential things that have to be configured for setting up add-on and are part of the project. We will discuss each and everything in the blog,how we setup these things and what are the things I have referred for setting up these things. So lets start... For setting up testing environment we have to add plone.app.testing in the setting files so that we have create automated tests also robot tests for the add-on. Here is the code to be added in the setup.py extras_require={ 'test': [ 'plone.app.testing', 'plone.app.contenttypes', 'plone.app.robotframework[debug]', ], }, This will help in adding all the three packages that are required for setting up testing environment. After that we have to setup base module for unit testing, we will set up the base module in testing.py After that we will have our test module and we will write robot test as well as unit test in that module only. Robot test will test the plone site on robot server and check if its working or not. On the other hand we can check the functionalities of our add-on using unit tests. We will also write some integration tests for the setup like test if product is installed correctly, browser layer is setup correctly etc. After that we will be left with writing unit tests to test the transform script and checks its accuracy using as much as possible test cases. This is the basic overview of setting up the testing environment for the add-on. The documentation for the testing is too good. It really helps me a lot in understanding about the flow of setting up testing module and how it works. For testing we just have to write "./bin/test --all" (for all tests including robot and unit tests) and "./bin/code-analysis" for checking the code using flake8 code analysis. After setting up testing environment we will do the generic setup of profiles for our add-on. What is Generic Setup ? GenericSetup is an XML-based way to import and export Plone site configurations. It is mainly used to prepare the Plone site for add-on products, by: registering CSS files, registering Javascript files, etc. So we will also configure generic setup for the profiles of our add-on. We will write the generic setup in configuration.zcml file (zope configuration) for "default" and "unregister" profile. Here is the snippet for the xml code for generic setup. <genericsetup:registerProfile name="default" title="experimental.safe_html_transform" directory="profiles/default" description="Installs the experimental.safe_html_transform add-on." provides="Products.GenericSetup.interfaces.EXTENSION" /> <genericsetup:registerProfile name="uninstall" title="experimental.safe_html_transform uninstall" directory="profiles/uninstall" description="Uninstalls the experimental.safe_html_transform package" provides="Products.GenericSetup.interfaces.EXTENSION" /> After that we have registered the profiles by the generic way, now we have to create two profiles, one is default and other is uninstall. Default profile contains thing that have to be done when the add-on is installed and unisntall profile contains the things that have to be done post uninstallation of our add-on. First lets create a default profile, in that profile we will have to set up the browser layer, control panel for browser, metadata and registry and also note that all these files will be .xml, so here also we will xml to setup these things for a particular profile. You can have a look at the profile setup of my add-on Similarly we will create an uninstall profile, for creating uninstall profiles in plone there is a great blog written by keul. That is a good way to understand how to uninstall profile and that helped me a lot. With the help of that blog I have created a unistall profile for the add-on. After that we have created testing environment, generic setup for add-on, setting up profiles ( default and uninstall) . Still we are left with setup for browser layer and setting up filter control panel for the browser layer. So lets do it... After that we will setup Browser layer by generic set up for the filter control panel. We will create a browser layer interface so that our add-on can run cross browser and this can be done in the interface.py where all the interface lives. Here is the snippet for the same class IExperimentalSafeHtmlTransformLayer(IDefaultBrowserLayer): """Marker interface that defines a browser layer.""" This will create a browser layer for an add-on and we can run our add-on over the browsers. This is really a nice thing ;). As now browser layer is setup now we will set up the filter control panel for the add-on through generic setup in the configuration.zcml file inside the browser module. Here is the snippet of the code <!-- Filter Control Panel --> <browser:page name="filter-controlpanel" for=".controlpanel.IPloneSiteRoot" layer="..interfaces.IExperimentalSafeHtmlTransformLayer" class=".controlpanel.FilterControlPanel" permission="plone.app.controlpanel.Filtering" /> This will setup the filter control panel for the browser layer now we will configure the control panel that has been set up for the browser layer just above. This will be done in the controlpanel.py file under the browser module. You can check the code on github.The control panel will be set up through the API only and this way we have set up the control panel for the browser layer and we have also configure the control panel. This way we have set up the basic requirements of the add-on. This is just the start and we have set up the add-on. Now the main focus will be on writing the transform for filtering html using lxml. That thing will be included in the next blog. Hope it helps you and you enjoyed reading it. It is just a start and lot more yet to come, lot more yet to learn and lot more yet to develop. Cheers!! Happy coding :) June 03, 2015 Palash Ahuja(pgmpy) Hidden Markov Models In the previous week, I have described about what dynamic bayesian network. I have also written the basic framework that the dynamic bayesian network encompasses. Currently, I am planning the hidden markov model framework. The functionality of the hidden markov model, will be similar to that of matlab. 1st week report I have finished off coding the dynamic bayesian network that has the methods unrolling and removing time slices. What are hidden markov models(HMM)? They are basically dynamic bayesian networks on a smaller scale, which makes them lesser complex and more efficient. The above example, consists of nodes Q1 and Q2 which are known as hidden nodes while the nodes Y1 and Y2 are known as the observed nodes. The hidden states are represented by discrete random variables. Again this is a 2-TBN similar to the dynamic bayesian networks. The probabilities of the observed nodes are given by the formula.$P(Z_t|Z_{t-1}) = \prod_{i = 1}^{N} P({Z_t}^{i}| Pa({Z_t}^{(i)})) $where$Z_t$is the$i^th$node in the time slice$t$also ($N = N_h + N_o$) and$Pa({Z_t}^i)$are the parents of the$Z_t^i$node which may be in the same time slice or the previous one. The above figure shows a 2-TBN. The probabilities can be given as follows:-$P(Q_2, Y_2) = P(Q_2|Q_1)\times P(Q_1) \times P(Y_1|Q_1)$The general equation becomes:$P(Q_{1:T}, Y_{1:T}) =  P(Q_1)P(Y_1|Q_1) \times  \prod_{t= 2}^{T}P(Q_t|Q_{t-1})P(Y_t|Q_t)$There are more standard sophisticated models such as the coupled HMM's which look like this Now in the above chain there are several hidden variables that are interacting with each other, with the observed variables$Y_1,Y_2$and$Y_3$The chains interact directly with each other affecting the adjacent chains. So, there might be a possibility of$X_2'$interacting with the 1st chain if unrolled further. The coupled HMM's are usually used in detecting temperatures for fire alarms and help in explicitly encoding the relation between them. Another application, that the HMM helps in understanding is the most probable states in a sequence of time slices. The maximum probable state is then found by the popular Viterbi Algorithm. The states are found out as follows:- \begin{array}{rcl} V_{1,k} &=& \mathrm{P}\big( y_1 \ | \ k \big) \cdot \pi_k \\ V_{t,k} &=& \max_{x \in S} \left( \mathrm{P}\big( y_t \ | \ k \big) \cdot a_{x,k} \cdot V_{t-1,x}\right) \end{array} Basically in every state, the maximum probable state is always chosen and then the probabilities of the maximum probable state is found out using the simplistic chain rule. The algorithm is remarkably similar to the one used in belief propagation used in bayesian model. These algorithms make hidden markov model remarkably significant, and makes them of serious application to speech recognition and convolutional codes. This is what I am planning to implement within the next 2 weeks. Then I will start with inference in dynamic bayesian networks. Himanshu Mishra(NetworkX) GSoC '15 Progress : First report We are past the community bonding period and are in the second week of the coding period. My experience has been very good till now! I started working on building python wrappers around METIS library which is written in C for graph partitioning. It is extremely fast and the feature was not present in NetworkX because of the problem being categorized as NP-hard so, it needs a lot of approximations to make and Python isn't so good at speed either. A big part of this work was already done by my mentor Yingchong Situ last year. However there were some hiccups : METIS is licensed under Apache License Version 2.0, nevertheless it uses some of the code licensed under GNU Lesser General Public License. NetworkX has a BSD license. And the work is supposed to be hosted under NetworkX umbrella and shipped as an add-on named networkx-metis. So, a big problem was, what should be the appropriate license for this add-on. We had pretty interesting discussions over it with Aric, Dan and other NetworkX developers and finally decided to remove the LGPL dependencies out from the source code and go with Apache. This took up some changes. I had to replace a couple C files which were making use of qsort with a C++ file using std::sort. In this process I came to learn about extern "C" which is a method to export correct ABI to the library using it. So extern "C" makes a function-name in C++ have 'C' linkage so that client C can make use of the function using a C compatible header file that contains just the declaration of the function. I also came to learn about how good are C macros while writing functions with undetermined data type or such. I got my first PR merged! Next is with the wrappers. After it and the setup requirements, I think the add-on will be ready to go. That's all for now. Happy coding! June 02, 2015 Christof Angermueller(Theano) GSoC week one In the first of GSoC week, I a) learned more about javascript and the d3.js library, b) initialized my Theano repository, and c) started to implement the module d3printing, which provides the function d3print to convert a statistic Theano graph such as into a dynamic graph: You can find the source code here. Nodes are arranged in a force layout, and it is possible to pan and zoom, to drag and drop nodes, and to highlight edge information via mouse-over events. The visualization is far from perfect! Next week, I will • revise the layout to arrange nodes, • use the same colors and shapes for nodes as pydotprint, • revise the visualization of edges, • improve mouse-over events, and • use the full page width and height to visualize the graph. Let’s go! The post GSoC week one appeared first on Christof Angermueller. Sartaj Singh(SymPy) GSoC: Update Week-1 So, the first week of the coding period is over and I successfully wrote a bunch of lines this week. Week 1: In my proposal I had planned to implement sequences and binary operations like addtion, multiplication in the first two weeks of the coding period. I was able to complete the main tasks in this week. I have made a PR(#9435) which is now under review by my mentors Jim Crist and Sean Vig. This Week: • Polish the PR and get it merged. • Start with the implementation of series based classes(SeriesData, SeriesX). The implementation details would have to be discussed with my mentors. That's about it. Will catch up by the end of week. Tarashish Mishra(ScrapingHub) On Summer of Code Bit of a late news, but yes, I am participating in Google Summer of Code 2015 under Python Software Foundation. I am working on Splash developed by Scrapinghub. Splash is a lightweight, scriptable web browser with an HTTP API written on top of Qt and Twisted. Currently it supports Python 2.7+ only and runs on Qt4/PyQt4. My goal is to add Python 3 support and make Splash use Qt5/PyQT5. So far the first week of the development phase has passed. And Splash is almost ready to run on Qt5/PyQt5. We are using qt5reactor for integrating Twisted and Qt5 eventloop. It is pulled out of the Aether project. So far the transition from Qt4/PyQt4 to Qt5/PyQt5 has been rather seemless. I'll work on completeing the Qt5/PyQt5 transition and porting some tests to Python3 next week. That's all I have to share for now. Thanks for reading. Here's to a fruitful summer. Ciao. Luca Puggini(statsmodels) End of week 1 During week 1 we modified the patsy library in order to be able to extract splines derivatives from the R style formula. There is still a lot of work to do here but at least we have something that we can use. We wrote also the code for the GAM penalization function. The plan now is to integrate it with the general framework in order to have the full cost function for all the families of distributions. June 01, 2015 Nikolay Mayorov(SciPy) GSoC: Review of the first week As nothing exciting is done yet I will just briefly review the current situation and perspectives. 1) Turned out that there is a no reasonable implementation of numerical differentiation in scipy, and I will certainly need it. So I started to implement one. It might seem like quite a lot of code, but in fact it is a very basic finite difference derivatives estimation for vector-valued functions. The “features” are: 1. Classical forward and central finite difference schemes + 3 point forward scheme with the second order accuracy (as a fallback for central difference near the boundaries, see below). 2. The capability of automatic step selection, which will be good in a lot of cases. 3. The ability to specify bounds of a rectangular region where the function is defined. The step and/or finite difference scheme can be automatically adjusted to fit into the bounds. Some really good suggestions appeared recently, so perhaps it will take another couple of days to potentially incorporate them and merge. 2) According to the plan this week should be devoted to adding some benchmark problems for least squares methods. Well, it didn’t take a lot of time. I added about 10 problems from MINPACK-2 problem set (all unconstrained) and a very simple benchmarking class (for ASV framework). Then Evgeny Burovski (one of my mentors) sort of unexpectedly fast merged it. I guess it’s fine since it’s not public. Certainly I will come back to this benchmarks, meanwhile I have some basic problems to work this. 3) I implemented a draft version of an optimization solver I was planning to work on. Btw, here is the link with a description. Turned out it doesn’t work for least squares problems. It originally designed to solve nonlinear systems of equations (by minimizing the squared norm of a residual), so I naively though that it can be trivially generalized to least squares. Yes, you need to be careful with math :). But things are looking good. At the moment I’m strongly considering implementing a method based on a rectangular trust-region approach. The idea is simple: if we intersect such trust region with a rectangular feasible region we will again get a rectangular region. So all we need is to solve quadratic subproblems subject to bound constraints. The simplest method of doing it is again a dogleg approach (maybe with a little tweak), and it is very well suited for large scale problems. Here is the link to a detailed description of the method (very likely I will describe it myself in one of the future posts). So far a draft implementation of this “dogbox” method was able to successfully solve all problems from MINPACK-2 problem set and a few constrained problems (many more are needed for sure). In the coming days I will continue to investigate its properties. I think that’s all for now. Sorry for this post being not very well thought-out, I hope to come back with more interesting updates. Himanshu Mishra(NetworkX) Just a blog : A quick guide to git and GitHub (Part 2) So, now we are going to take a step further. In this post, we'll deal with one of the most important concepts of distributed collaboration, branching. Also, we will learn about contributing to real world open source libraries. Several concepts are involved in it. I'll try to deal with the most relevant ones. Also this time, I'll try to explain things, in better way. Branching : There's a mantra, branch early and branch often. Branches are used to develop new features, simultaneously but isolated. Suppose you are at one point of time in your commit history. A new idea came to your mind about your project and you want to implement it. Let's name the new feature feature_x. Now why do we need branching? Why not implement it right through? What's up with the master? What is that? master is the one branch your project works on when you initialize a git repository. It's by default, the mainstream. Now suppose you start adding your new feature right in the master, and at one point of time your realize you want to implement a new feature feauture_y. Obviously you'll have to stop one of your work to proceed on another. And this new feature might have bugs or you need to add tests. You might also want it to be reviewed. You'll have to wait for the review. At some point you might come to know that this new feature is totally screwed and you want to go back, where you had started. But you'll realize that you've also made some commits on the other feature you were working simultaneously. At any point of time, you could give up on your project! And that is unfortunate because you forgot the mantra. Branch! Branch! and Branch! Okay, back in time. We have a new feature, feature_x. You do this $ git branch feature_x   $git checkout feature_x  The first command will create a new branch feature_x and the second command will take you away from master. This means that your next commit will not affect your master. Cheers! You've got freedom. That's what git is for. The above two lines of code can also be combined into one piece. $ git checkout -b feature_x  
This creates the branch and moves you to it simultaneously.

You can check which branch you are currently on by
 $git branch * feature_x  master  * denotes that you are on feature_x. But still, this branch won't be visible on github, unless until you push it by $ git push origin feature_x  
Okay now, it's add, commit and push time! The commit you made has taken your feature_x branch one step ahead than master. If you think your new feature is ready, It's time to merge. Checkout to your master and merge your new branch into it.
 $git checkout master$ git merge feature_x  
You can check the log now (git log) which has been updated with your new feature.

After your work is done, you can delete the branch locally by
 $git checkout -d feature_x  and over the github repository by $ git push origin :feature_x  

Fork : Now we are going to make a contribution to an open source library. I choose networkx. It's a python package for the creation, manipulation and study of the structure, dynamics and functions of complex networks.
Go to https://github.com/networkx/networkx. On the right top, you'll see three buttons, Watch, Star and Fork.
Watch means you'll get notification for all the activities on the repository.
Star is just a feature of Github to Star a project like you upvote an answer on Quora.
Fork lets you own a copy of the code under your username.

So, after you fork the repo, you can see the exact same package over https://github.com/<username>/networkx. Pretty cool huh!

Clone the forked repository over your computer. Make a new feature branch. Add commit. Push the branch over github. DO NOT merge the branch in your master. Because your master should not be controlled by you but should be in sync with the official source code repository.

After you push the changes with git push origin feature_x, over your https://github.com/<username>/networkx you will find an option to 'Compare & Pull Request'

This option lets you compare the changes you are going to make over the original networkx repository. This will create a Pull Request.

A pull request is a method of submitting your contributions. The PR is reviewed, analyzed, discussed and then can be either merged or closed. So, go on. Choose a project and get your first PR merged. And once again, do not forget to make a branch.

Dealing with upstream: Your local repository often gets outdated because of the active changes in the official repository. There's a method to update it whenever you want. And you must update your master, before you make a new branch!
To add networkx/networkx as an upstream, do
 $git remote add upstream https://github.com/networkx/networkx  So now, whenever you have to update your local master branch, do this $ git fetch upstream   $git merge upstream/master  To update your github repo, do $ git push origin  

Welcome to the open source world!

Happy coding!

Saket Choudhary(statsmodels)

Week 1 Update

I had a slow start. I spent a lot of time reviewing the mixed effects theory. The lme4 book[1], West's book[2] and McCulloch's[3] books have been really helpful. I have also realised that the content is harder than I had thought.

Taking a very short step, and in-sync with my first week goals, I put up a notebook that explains part of theory, and discusses a few examples.

Week 2 goals:

Support for heteroscedastic residual errors, add notebooks explaining everything with examples Documentation and test casesBesides this, I realised the nested effects model was way too slow. I plan to benchmark it against R's implementation. Week 1 was not productive, though I got to learn a lot!The notebook is here: http://nbviewer.ipython.org/github/saketkc/statsmodels/blob/kerby_mixedlm_notebooks/examples/notebooks/Mixed_Linear_Models.ipynb

Proposal for improvements to Mixed Linear Models

This post comes a bit late. I am going to work with Josef and Kerby from statsmodels project to work on improving mixed linear effects models.

The proposal is hosted on statsmodels wiki:

https://github.com/statsmodels/statsmodels/wiki/GSoC-2015-Proposal:-Improvements-to-Mixed-Effects-Models

Here's to a fruitful summer!

Michael Mueller(Astropy)

Week 1

It's the end of the first week of coding--I spent this week writing benchmarks to test current Table functionality (see repo: https://github.com/mdmueller/astropy-benchmarks-1) and creating the groundwork for an Index class that will maintain sorted order on one or more columns. The results of the ASV benchmarks (http://mdmueller.github.io/astropy-benchmarks-1/) were essentially what I expected; for example, the cost of grouping by column scales by a factor of 10 when a table is made 10 times as large, confirming that current operations are O(n). Over time, we should see this scaling improve and become O(log n). A couple other comparisons are viewable at http://i.imgur.com/uxBcFvQ.png and http://i.imgur.com/lQ49dWZ.png.

Apart from this, I wrote a basic skeleton of the Index class and adjusted parts of the Table class to update Table indices upon modification of the Table. My work will be done in the branch "table-indexing" on my Astropy fork (https://github.com/mdmueller/astropy/tree/table-indexing). More to come in a bit.

Chau Dang Nguyen(Core Python)

Week 1: Warming up with Roundup

This week, I had been playing Maze game with Roundup. The project structure is quite complex, I can say. However, my experiment was a success in creating a REST-like GET page based on the template of displaying list of issue.

The first step is reached but I'm not satisfied with it. It is relying on the web template and so, it is restricted in many way: loaded via zope, restricted from displaying only 50 items in the list, being slow (600ms delay on localhost, I'm sure it can be improved), ....

So next week, I'm looking to hook my code to a lower level, my target is the same as Web and xmlprc at client.py to lower the cost of unnecessary process, thus increase the speed.

AMiT Kumar(Sympy)

GSoC : This week in SymPy #1

Hi there! The First week of the coding period has came to an end, this week has been very hectic for me due to my practicals and Minor project submission at college, though I mananged to reach the goal for this week.

This week, I worked on Linear system solver linsolve in the solveset Module, as I mentioned in my last post, about my goals for Week 1.

Progress of Week 1

I implemented the following two functions: PR : #9438. It's almost good to merge after a final review by flacjacket & hargup.

• linear_eq_to_matrix : method to convert system of linear Equations to Matrix Form.

• linsolve: It's the General Linear System solver.

Thanks to Jason for reviewing my initial implementation & suggesting useful changes.

Algorithm Used

The algorithm used in linsolve is Gauss-Jordan elimination, which results, after elimination, in an reduced row echelon form matrix. (used rref() method of matrices)

Capabilities of Linsolve

linsolve is a powerful linear system solver, It can solve all types of linear systems, accepted in all input forms, hence providing a user friendly Public API.

• under-determined:
In []: A = Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In []: b = Matrix([3, 6, 9])

In []: linsolve((A, b), [x, y, z])
Out[]:{(z - 1, -2*z + 2, z)}

• well-behaved:
In []: Eqns = [3*x + 2*y - z - 1, 2*x - 2*y + 4*z + 2, - x + S(1)/2*y - z]

In []: linsolve(Eqns, x, y, z)
Out[]:{(1, -2, -2)}

• over-determined:
# Parametrized solution
In []: A = Matrix([[1, 5, 3], [2, 10, 6], [3, 15, 9], [1, 4, 3]])

In []: b = Matrix([0, 0, 0, 1])

In []: linsolve((A, b), [x, y, z])
Out[]:{(-3*z + 5, -1, z)}

# No solution
In []: A = Matrix([[1, 5, 3], [2, 1, 6], [1, 7, 9], [1, 4, 3]])

In []: b = Matrix([0, 0, 0, 1])

In []: linsolve((A, b), [x, y, z])
Out[]: ....
ValueError: Linear system has no solution


The input formats supported:

(as mentioned in my last post)

• Augmented Matrix Form
• List Of Equations Form
• Input A & b Matrix Form (from Ax = b)

Plan for Week 2:

This week I plan to work on Complex Sets.

That's all for now, looking forward for week #2.

May 31, 2015

Vivek Jain(pgmpy)

GSoC Week1

The first week of coding period is now almost over.

This week I worked on improving the XMLBIF module.The reader class of XMLBIF module was working fine but the writer class was not implemented.
Also the reader class din't have any method which would return the model instance (for ex Bayesian or Markov model instance). Since i was not familiar with the Bayesian and Markov models very much, so my mentors helped me in understanding the Bayesian and Markov models so that i can easily implement them for the next set of modules in the later stage.

Also this week i worked on writing the writer class of the module.Now it has been completed. I have send a PR and hopefully it would be mergeg until next week.

Details about the Writer class

Writer class takes a model_data as input.

An example of sample model_data is

self.model_data =
{'variables': ['light-on', 'bowel-problem', 'dog-out', 'hear-bark', 'family-out'],
'states': {'bowel-problem': ['true', 'false'],
'dog-out': ['true', 'false'],
'family-out': ['true', 'false'],
'hear-bark': ['true', 'false'],
'light-on': ['true', 'false']},
'property': {'bowel-problem': ['position = (190, 69)'],
'dog-out': ['position = (155, 165)'],
'family-out': ['position = (112, 69)'],
'hear-bark': ['position = (154, 241)'],
'light-on': ['position = (73, 165)']},
'parents': {'bowel-problem': [],
'dog-out': ['family-out', 'bowel-problem'],
'family-out': [],
'hear-bark': ['dog-out'],
'light-on': ['family-out']},
'cpds': {'bowel-problem': np.array([[0.01],[0.99]]),
'dog-out': np.array([[0.99, 0.01, 0.97, 0.03],[0.9, 0.1, 0.3, 0.7]]),
'family-out': np.array([[0.15],[0.85]]),
'hear-bark': np.array([[0.7, 0.3],[0.01, 0.99]]),
'light-on': np.array([[0.6, 0.4],[0.05, 0.95]])}}

The writer class has following methods:
1. This method basically adds variables tags to the file.
1. This method add definition tags to the file.
1. This method adds table tags to the file.
And, finally the file returned by the Writer class is as follows:
<BIF version="0.3">
<NETWORK>
<VARIABLE TYPE="nature">
<OUTCOME>true</OUTCOME>
<OUTCOME>false</OUTCOME>
<PROPERTY>position = (190, 69)</PROPERTY>
</VARIABLE>
<VARIABLE TYPE="nature">
<OUTCOME>true</OUTCOME>
<OUTCOME>false</OUTCOME>
<PROPERTY>position = (155, 165)</PROPERTY>
</VARIABLE>
<VARIABLE TYPE="nature">
<OUTCOME>true</OUTCOME>
<OUTCOME>false</OUTCOME>
<PROPERTY>position = (112, 69)</PROPERTY>
</VARIABLE>
<VARIABLE TYPE="nature">
<OUTCOME>true</OUTCOME>
<OUTCOME>false</OUTCOME>
<PROPERTY>position = (154, 241)</PROPERTY>
</VARIABLE>
<VARIABLE TYPE="nature">
<OUTCOME>true</OUTCOME>
<OUTCOME>false</OUTCOME>
<PROPERTY>position = (73, 165)</PROPERTY>
</VARIABLE>
<DEFINITION>
<FOR>bowel-problem</FOR>
<TABLE>0.01 0.99 </TABLE>
</DEFINITION>
<DEFINITION>
<FOR>dog-out</FOR>
<GIVEN>bowel-problem</GIVEN>
<GIVEN>family-out</GIVEN>
<TABLE>0.99 0.01 0.97 0.03 0.9 0.1 0.3 0.7 </TABLE>
</DEFINITION>
<DEFINITION>
<FOR>family-out</FOR>
<TABLE>0.15 0.85 </TABLE>
</DEFINITION>
<DEFINITION>
<FOR>hear-bark</FOR>
<GIVEN>dog-out</GIVEN>
<TABLE>0.7 0.3 0.01 0.99 </TABLE>
</DEFINITION>
<DEFINITION>
<FOR>light-on</FOR>
<GIVEN>family-out</GIVEN>
<TABLE>0.6 0.4 0.05 0.95 </TABLE>
</DEFINITION>
</NETWORK>
</BIF>

May 29, 2015

Daniil Pakhomov(Scikit-image)

Google Summer of Code: Multi-Block Local Binary patterns implementation. Pure Python

A post describing my way of implementing MB-LBP and achieved results.

The selected API

The API of the function was taken from OpenCv. It was done in order to allow users to use OpenCv for training their object detector and connect it later to Scikit-Image. The training part will also be implemented for Scikit-Image. This step will just allow users to be more flexible and also by doing this Scikit-Image Face detection Framework will be tested on much earlier stage.

This is the visual description of MB-LBP that OpenCv uses:

Implemented functions

For this part two function were implemented:

1. Function that computes the MB-LBP given the left-top corner coordinates, width and height of one of 9 equal rectangles (See the previous part with OpenCv API).

2. Function that takes the computed MB-LBP feature and visualizes it on the selected image.

MB-LBP visualization

The hatched regions show the rectangles that has less summed intensity value than the central one.

Future plans

The next stage will be to implement the same thing using Cython.

Code

The code of functions can be found here. The code that was used for generating images in the post is here.

Test coverage

Implemented functions were covered with tests.

May 28, 2015

Artem Sobolev(Scikit-learn)

NCA

Not to be confused with NSA :-)
So the coding has started!

The first algorithm to implement is Nearest Components Analysis (NCA for short). Unlike other methods no complicated optimization procedure required: authors propose just a plain gradient descent (actually, ascent since we're going to maximize). Of course, this has it's own drawbacks: target is non-convex, so it's hard to come up with an efficient algorithm that's guaranteed to find the optimum.

Authors propose 2 objective functions with different interpretations. The first one minimizes expected number of correctly classified points, and has the gradient of the following form:$$\frac{\partial f}{\partial \mathbf L} = 2 \mathbf L \sum_{i} \Bigl( p_i \sum_{k} p_{ik} (x_i - x_k) (x_i - x_k)^T - \sum_{j \in C_i} p_{ij} (x_i - x_j) (x_i - x_j)^T \Bigr)$$And the second one minimizes KL-divergence, and its gradient is:$$\frac{\partial f}{\partial \mathbf L} = 2 \mathbf L \sum_{i} \Bigl( \sum_{k} p_{ik} (x_i - x_k) (x_i - x_k)^T - \frac{\sum_{j \in C_i} p_{ij} (x_i - x_j) (x_i - x_j)^T}{p_i} \Bigr)$$
One thing to notice here is $(x_i - x_k) (x_i - x_k)^T$ outer product. In order to speed up the whole algorithm we'd like to precompute these products in advance, but it could take a lot of space: $O(N^2 M^2)$ where $N$ is number of samples and $M$ is number of features. Unfortunately, this is too expensive even for medium-sized datasets (for example, for 1000 samples of 50 features it'd require ~10Gb of RAM if stored in doubles).

What can be done with it? I can think of several possibilities:

1. Recompute these products over and over again. There is space for various engineering optimizations, for example, we can keep a cache of those products, using it only if $p_{ij}$ is not too small.
2. Restrict ourselves to a diagonal $\mathbf{L}$ case. This is a useful option in general, since it allows to run these methods on larger datasets.
3. Do "coordinate-wise" gradient ascent: pick a cell in $\mathbf{L}$ and make a step along the gradient.
The basic implementation goes like this
def fit(self, X, y):    n_samples, n_features = X.shape    rng = np.random.RandomState(self.random_state)    L = rng.uniform(0, 1, (self.n_components, n_features))    outers = np.ndarray((n_samples, n_samples, n_features, n_features))    for i in range(n_samples):        for j in range(n_samples):            d = (X[i, :] - X[j, :])[None, :]            outers[i, j] = np.dot(d.T, d)    C = {}    for i in range(n_samples):        if y[i] not in C:            C[y[i]] = []        C[y[i]].append(i)    for it in range(self.max_iter):        grad = np.zeros( (n_features, n_features) )        fnc = 0        for i in range(n_samples):            x = X[i, :]            A = np.dot(L, x)[None, :] - np.dot(X, L.T) # n_samples x n_comp            logp = -(A*A).sum(axis=1)            logp[i] = -np.inf            logp -= sp.misc.logsumexp(logp)            p = np.exp(logp) # n_samples            class_neighbours = C[y[i]]            p_i = p[class_neighbours].sum()            grad += np.sum(p[:, None, None] * outers[i], axis=0) * p_i - \                np.sum(p[class_neighbours, None, None] * outers[i, class_neighbours], axis=0)            fnc += p_i        grad = 2 * self.learning_rate * np.dot(L, grad)        L += grad        print("Iteration {}, target = {}".format(i+1, fnc))    self.L = L    return self
Moreover, it even works! :-) I took the following example:
Yes, I like XKCD :-) BTW, you can get an XKCD "mod" for matplotlib.

Here we have 2 classes (red and blue) divided into train and test (train is opaque, semitransparent is test). Obviously, 1NN will make a lot of mistakes here: samples are very close according to feature 2, and quite distant according to the feature 1. It's decision areas are

So 1NN and 3NN make a lot of mistakes on this artificial problem. Let's plug in NCA as a transformer:

Decision boundary became much more linear, as one would assume looking at data. Right plot shows data space after applying learned linear transformation $\mathbf{L}$.

The above implementation is just for reference and better understanding of the algorithm. It uses a lot of memory, and not as efficient as one might want.

Aron Barreira Bordin(Kivy)

Week 1 - Buildozer integration and new menu

Hi!

I started to work with Kivy Designer this week. In the last weeks, I studied the Kivy Documentation and made some contributions to Kivy Designer. I submitted my first PR today, I'm very happy about this first week. I had developed a completely new menu, with a better design, easier to use, and more powerful.

Whats is new

Kivy Designer is now integrated with Buildozer. Now it's possible to build and run your Python application on Desktop, Android and iOS devices! To handle these multiple targets and options, I created a "Build Profile" settings.

There are three default profiles:

• Desktop
• Android - Buildozer
• iOS - Buildozer

The user is able to edit these profiles, create new ones or even delete them. With build profiles I hope to turn multi-platform development easier. Now it's just necessary to change the profile to build the same application to the desired target :)

Bounding period

• I studied a lot about Kivy
• Added a initial support for Python 3
• Fixed Kivy Designer code style
• Tried to help users on #Kivy(I'm not yet experienced enough to support major part of the users, but I've been always reading and trying to help when possible)
• While studying Kivy Designer, I had found a lot of bugs and had some ideas to new improvements; Everything is listed here

My first PR

I did my first PR to the project :) I'm still waiting the review.

New improvements:

Bugs(some bugs are related):

Next week

In the next week, I'll be fixing more bugs and developing the Buildozer Settings UI, a easy to use interface to edit the buildozer.spec file.

I'll try to improve Kivy Designer performance as well, but I'll try to get some tips with my mentors before start working with it ;/

And if possible, I'll add support to the Hanga builder.

Thats it, thanks for reading :)

Aron Bordin.

May 27, 2015

Andrzej Grymkowski(Kivy)

Hi!

Coding time has just started. Before that I worked hard to get into this project. Nowadays from android side I feels much more confident about how to add new features or compile examples and run on my phone.

What have I done

1. Some small fixes have been done like updated support for vibrator and email on early android version phones. Merged and updated old pull requests like orientation.
2. One facade has been added - audio with example and implementation for android. It's responsible for recording and playing audio on phone.
3. Update kivy pep8. Generally docs and style guide is a bit different in kivy and respectively in plyer. One of the reason can be using triple one quote ''' instead of triple double quotes """ for docs. But on other hand in pythonic standards it's correctly. Still I cant get accustom to this.
4. Structure of facades have been changed. For long time all facades where put in one file. One from last merged pull requests splits facades into separated files. One facade file per one class.

In progress

There are many branches in progress.
1. Implementation audio for linux plyer-linux-audio . It uses pyaudio for recording and playing. Standard builtin python modules seems to hard for me.
2. plyer-android-hardware: it's implemented but in java file. The main goal is to move functionality from there to plyer. Most are moved but these are not good structured. It should be split into separate facades. Example has to be updated either. Current implementation are like all in one.
3. plyer-speech-recognition - android implementation works well enough. Linux recognizer recognizes very badly a words. For linux I used python package 'SpeechRecognition'. Later I think to test another module. Both platforms have same issue - they can't listen in background for very long time.
4. plyer-android-bluetooth - still like in raw. As I remember facade and implementation still lays in example folder. Works features like togling enable/disable and scanning for devices. Implementation for android of course. For linux I have plan to use package Blues. It should works on OSX also.
5. android-contact-list-example - I don't know when I end it. It's much complicated that I need some kind of manager to control contacts in easy way. Current methods mostly are based on django manager. I have to check how it solves python package sqlalchemy. It's another ORM and don't know what more o0. Other problem is to when to load data from contacts. Loading it at start will take some time. Both android and iOs have contacts split on two groups: people and groups. One contact has a lot of information like addresses, email addresses, images and groups which belongs to. By loading group I mean to load all people in that group. Fine solution would be implement browser that does not load all data but searches and paginates only that data. Another one is to load only names with ids of contacts and rest of data gets in mean time. I will consider it much more.

What I have learnt?

• Kivy and plyer style coding
• How the compilation is done to run kivy on android (TODO how it look on iOs :-3 )
• How implement Java classes and interfaces in python
• pep257 and pep8. Go even further and check out hacking project!

best regards,

Artem Sobolev(Scikit-learn)

API design

Having discussed mathematical aspects of the selected metric learners, it's time to move towards more practical things, and think how these methods fit existing scikit-learn conventions.

Since there're no metric learning methods in scikit-learn at the moment, and I'm going to contribute several of them, it makes sense to organize my contributions as a new module called metric_learning.

Many of metric learning models aim to aid KNN, so it's not an Estimator, but rather a Transformer. One possible application is to transform points from the original space to a new one using matrix $\mathbf{L}$ (recall $\mathbf{M} = \mathbf{L}^T \mathbf{L}$). This new space is interesting because Euclidean distance in it is exactly the Mahalanobis distance $D_\mathbf{M}$ in the original space, so one can use methods that support Euclidean distance, but don't support custom metric (or it's computationally expensive since calculating $D_\mathbf{M}$ requires matrix multiplication, so it might be preferable to do this multiplication only once per training sample).

ml = LMNNTransformer()knn = KNeighborsClassifier()pl = Pipeline( ('ml', ml), ('knn', knn) )pl.fit(X_train, y_train)pl.predict(X_test)

Another application is similarity learning. There are methods like SpectralClustering that can use precomputed affinity matrix, so we'd like to be able to compose those with metric learning.

ml = LMNNSimilarity()sc = SpectralClustering(affinity="precomputed")pl = Pipeline( ('ml', ml), ('sc', sc) )pl.fit(X_train, y_train)pl.predict(X_test)
Accordingly, each algorithm will be shipped in 2 versions: transformer + similarity learner. Of course, I'd like to minimize code duplication, so the actual implementation would be similar to that of SVMs: the base class and a couple of descendants that implement different transforms.

May 26, 2015

What is Google Summer of Code?

Google Summer of Code is a really great opportunity for early-career astronomers to learn to code with forethought for open source projects that will actually get used by other astronomers — something we often aspire to do, but are rarely taught to do. To begin a GSoC project, you work one-on-one (or in my case, two-on-one) with mentors who are experienced open source developers to prepare a proposal for a software tool you would like to make with their help, including a detailed description of the project's deliverables and timeline.

In the astronomical world, one source of GSoC projects is astropy, our friendly neighborhood Pythonic astronomical Swiss-army knife. There are projects related to the active development on the "core" guts of astropy — like one proposed project by UW graduate student Patti Carroll — in addition to projects on affiliated packages which make use of astropy to do new things for more specific end-users than astropy core.

Your proposal gets written up in a wiki page on the astropy GitHub repository, where it can be revised with the help of your proposed mentors.

My GSoC2015 project: astroplan

My GSoC 2015 proposal is to help co-develop astroplan (heads up: as of posting in May 2015, this repo will be boring), an observation planning and scheduling tool for observational astronomers. This package will allow observers to enter a table of astronomical targets and a range of observing dates in order to retrieve (1) the sky-coordinates for their targets, (2) rise/set times, moon/sun separation angles, airmass ephemerides, and other essential positional criteria necessary for determining the observability of a target, and (3) a roughly optimized observing schedule for the list of targets. This project will take advantage of the already-developed infrastructure for these calculations in the coordinates, time, and table modules of astropy, plus Astroquery — an astropy-affiliated package. If you don't already know about these powerful tools, check them out!

I will be working with a great team of astroplanners including mentors: Eric Jeschke, Christoph Deil, Erik Tollerud and Adrian Price-Whelan, and co-developer Jazmin Berlanga Medina.

Call for input

Since we want astroplan to be useful and used by astronomers, I'd be happy to hear your thoughts on what astroplan absolutely must do. If you think you might be an astroplan user one day, leave a comment below or on Twitter with your top-priority observation planning/scheduling features.

Aman Singh(Scikit-image)

Array Iterators in ndimage

Iterators are one of the basic pillars on which the ndimage module stands. As the name suggests it is used to iterate over the arbitrary dimensional input arrays. Even though Numpy library contains a complete iterator PyArrayIterObject container with all the required members, ndimage module has its own optimized iterator structure NI_Iterator.

But before jumping directly to Iterators it would be better if we first understand PyArrayObject. It will help us in deciphering Iterators better.  In C every ndarray is a pointer to PyArrayObject struture. It contains all the information required to deal with ndarray in C. All instances of ndarray will have this structure. It is defined as:

typedef struct PyArrayObject {
char *data;
int nd;
npy_intp *dimensions;
npy_intp *strides;
PyObject *base;
PyArray_Descr *descr;
int flags;
PyObject *weakreflist;
} PyArrayObject;
• data is the pointer to the first element of the array.
• nd refers to number of dimensions in the array.
• dimensions is an array of integers which tells the shape of each dimension.
• strides is an array of integers providing for each dimension the number of bytes that must be skipped to get to the next element in that dimensions.

Rest of the members of the container are not much of our use as of now. So for now we can safely ignore them.

Now let’s come back to PyArrayIterObject. It is another container defined in Numpy containing information required to iterate through the array.

The NI_Iterator container defined in ni_support.h looks something like:-

typedef struct {
int rank_m1;
npy_intp dimensions[MAXDIM];
npy_intp coordinates[MAXDIM];
npy_intp strides[MAXDIM];
npy_intp backstrides[MAXDIM];
} NI_Iterator;
•  rank_m1 is basically the rank of the Array which is to be iterated. Its value is equal to N -1, where N is the number of dimensions of the underlying array.
• dimension is an array containing the size of each of the dimension the iterator iterate over. Its value is one less than the dimension of PyArrayObject of the Array which is to be iterated.
• coordinates refers to a N-Dimensional index of the array i.e. it tells us the last position visited by the iterator.
• strides tells us stride along each of the dimension in the array. It is same as that of strides of PyArrayObject of the Array which is to be iterated.
• For each dimension backstrides tells us the number of bytes needed to jump from the end of a dimension back to its beginning. Note that:- backstride[k] = dimension[k] * stride[k]. It is saved only for optimization purpose.

The PyArrayIterObject is a higher version of NI_Iterator. Along with the members of NI_Iterator it contains some extra members which provide some extra functionality. Following is the exact struct of PyArrayIterObject annotated with the function of each of its members:

typedef struct {
/* Same as rank_m1 in NI_Iterator */
int <strong>nd_m1</strong>;
/* The current 1-D index into the arrray.*/
npy_intp index;
/* The total size of the Array to be iterated. */
npy_intp size;
/* Same as coordinates in NI_Iterator */
npy_intp coordinates[NPY_MAXDIMS];
/* Same as dimensions in NI_Iterator */
npy_intp dims_m1[NPY_MAXDIMS];
/* Same as strides in NI_Iterator*/
npy_intp strides[NPY_MAXDIMS];
/* Same as backstrides in NI_Iterator */
npy_intp backstrides[NPY_MAXDIMS];
/* This array is used to convert 1-D array to N-D array*/
npy_intp factors[NPY_MAXDIMS];
/* The pointer to underlying Arrray*/
PyArrayObject *ao;
/* Pointer to element in the ndarray indicated by the index*/
char *dataptr;
/* This flag is true if Underlying array is C- contiguous.*/
/* It is used to simplify calculations. */
Bool contiguous;
} PyArrayIterObject;

The members in bold were also part of NI_Iterator. As we can see all the extra members from NI_Iterator are mostly convenience values and can be derived at the time of requirement.
For iterating through any ndarray, the members of the NI_Iterator requires to be set according to the ndarray to iterated. This is done by NI_InitPointIterator function. It takes values from input array and initializes all the members of NI_Iterator. The function is quite simple to understand and defined as:-

int NI_InitPointIterator(PyArrayObject *array, NI_Iterator *iterator)
{
int ii;
iterator->rank_m1 = array->nd - 1;
for(ii = 0; ii < array->nd; ii++) {
iterator->dimensions[ii] = array->dimensions[ii] - 1;
iterator->coordinates[ii] = 0;
iterator->strides[ii] = array->strides[ii];
iterator->backstrides[ii] =
array->strides[ii] * iterator->dimensions[ii];
}
return 1;
}

This sets NI_Iterator to iterate over all the dimensions of the input array. But there may be cases when we don’t need to iterate over whole ndarray, but only some axes. For this two other variant function of NI_InitPointIterator are available. These are:

• NI_SubspaceIterator:- This functions sets the iterator to iterate through a given set of axes. It takes two arguments, first an iterator and other the set of axes to be iterated and initializes the iterator. Its code can be seen here
• NI_LineIterator:- This function sets the iterator to iterate over .only one given axis. It basically calls NI_SubspaceIterator giving only one axis in the axes` argument. Its code can be seen here.

I would like to thank my mentor Jaime who helped me in understanding this.

In the next blog I will give the detailed explanation of NI_LineBuffer and NI_FIlterIterators.

May 25, 2015

Aman Singh(Scikit-image)

Scipy.ndimage module structure and Initial plan for rewriting

Image processing functions are generally thought to operate over two-dimensional arrays of value. There are however a number we need to operate over images with more than wo dimensions. The scipy.ndimage module is an excellent collection of a number of general image processing functions which are designed to operate over arrays with arbitrary dimensions. This module is an extension of Python library written in C using Python – C API to ameliorate its speed. The whole module can be broadly divided into 3 categories:-

• Files containing wrapper functions:- This includes the nd_image.h and nd_image.c files. ndimage.c file mainly contains functions required for extension of module in c viz. All the wrapper functions along with other module initialization function and method table.
• Files containing basic constructs:- These are in the files ni_support.c and ni_support.h. These constructs include a mixture of some containers, macros and various functions. These constructs are like arteries of the module and it can be summarised in three functions:-
1. NI_Iterators and its other variants, NI_LineIterator and NI_SubspaceIterator
2. NI_FilterIterator
3. NI_LineBuffer

I will explain these constructs in detail in further blogs.

• Files containing all the underlying functions:- It includes files ni_filter.c, ni_fourier.c, ni_interpolation.c, ni_measure.c and ni_morphology.c. All the general image processing functions of the module are defined in these files.

For rewriting the module our plan of action as suggested by my mentor Jaime is something like:-

1. Implement basic iteration
2. Implement a couple of selected functions that only use basic iteration
3. Implement iteration over buffered lines
4. Implement a couple of selected functions that only use these two types of iteration
5. Implement iteration over a neighborhood of a point, i.e. filter iteration
6. Implement a couple of selected functions that use the three types of iteration
7. Is there any other basic functionality needed by any function? While true, repeat the above two-step pattern.
8. Once all basic functionality is in place, implement all other missing functions.

In order to identify specific functions in points 2, 4 and 6, I have made a list of all constructs used by various functions in the module. It can be seen here. We will use this list to decide the order of poting of functions.

</keep coding>

Shridhar Mishra(ERAS Project)

First Post!!

I guess its a bit late for my first post but here it goes. The official coding period has begun and its time to start coding. Past few days have been busy since its the semester end and all the submissions had to be done. Installed europa but there were quite a few errors on my side which needs immediate attention since all the work depends on that. This has to be ironed out and has to be discussed with the mentors.
Looked into dockers platform since its a easy way to distribute a single image of the system to all the members. Haven't tried it yet but it seems quite effective.

Better posts coming soon :P
Cheers
Shridhar

Goran Cetusic(GNS3)

Google Summer of Code - prepost

This year (my last year as a student!) I decided to apply for GSOC (Google Summer of Code). It's been on my academical TODO list for a few years but haven't had the time or the motivation to apply. Since this is my last chance, I decided to have a go.

Now, a (not so) short intro...Google summer of code is a stipend programme for open source software that students (bsc, ms, phd) can apply to. Basically, different organizations like GNOME, Debian, CERN and MIT register and submit their proposals and open source project ideas to Google. Google then goes through the proposals and selects somewhere between 100 and 200 organizations. Once the organizations have been selected, students can contact them regarding different project ideas. These ideas are mostly already defined by the organizations but students can propose their own ideas. Once the students find a couple of organizations of interest, they usually contact the organizations before sending the official proposal to Google and before student registrations are open. It's important to start as early as possible to see about project ideas or even come to an agreement that the organization will pick them as one of the students for the project. Once a student feels confident enough and once Google opens student registrations, he/she can register, provide whatever info Google requires them and submit their proposals. Some organizations will require additional info in the proposal so make sure to check that with your prospective orgs. Organizations assign mentors to specific project ideas, pick student proposals and then Google evaluates the proposals and finally picks the students who will be given a stipend during their GSOC enrollment.

IMPORTANT: Being picked by the organization the student proposal is meant for doesn't mean you'll get accepted by Google -> after the organizations have been selected and after student proposal deadline, Google assigns student slots to organizations. So if the organization picked 4 students but Google assigns 2 slots, only 2 students will be accepted for that organization. But they may get selected for a different organization. That's why most students submit several proposals to different organizations. Basically, Google has a limited number of slots and distributes them between organizations.

Most organizations get 1-2 slots but organizations that have a vast number of student applications and are longtime contributors to GSOC get more slots. For example, Python Software Foundation (PSF) is a longtime member of GSOC and an umbrella organization. What this means is that this organization applied for GSOC, got accepted and then accepts other Python projects under its fold. Even some projects that haven't been accepted by Google (maybe because of limited number of slots) later get into GSOC through these umbrella organizations. Google is often extremely generous with slots given to umbrella organizations like PSF and Apache Software Foundation but if a large number of projects get under an umbrella organization, organizations might still get only 1-2 slots and some might theoretically even miss out. It depends.

Enough of the short intro, let's get back to what I originally wanted to write about. Since Python is the only programming language I trust myself to use to get the job done without the language being the obstacle, I've decided to work exclusively on Python projects. Now, I don't want to get stuck with a project that I'd do only for the money (and the amount Google pays isn't negligible from where I'm from) so I picked up three really cool projects to apply for:

• GNS3 - higly veracious network simulator of real, physical networks.
• NetworkXa software package for the creation, manipulation, and study of complex networks.
• SunPyPython for Solar Physics

Cool, right? All three projects I've mentioned are part of the PSF umbrella organization. PSF requires students to keep a blog of their project progress and that's the reason this blog exists. It did take some of my time from writing the actual proposals but now that I'm writing it I think it's actually a nice idea.
They're ordered based on preference. I'm working on my masters thesis, porting a network simulator developed at my university from FreeBSD to Linux. That's why GNS3 is first on my list. I can elaborate on the idea there and use it in GSOC. Concretely, the project idea is "Docker support for GNS3".

Background: right now GNS3 supports QEMU, VirtualBox and Dynamips (a Cisco IOS emulator). We can think of the nodes in GNS3 and the links between them as virtual machines that have their own network stacks and communicate amongst themselves like separate machine on any other "real" network. While this is nice by itself, QEMU and VirtualBox are "slow" virtualization technologies because they provide full virtualization -> you can run any OS on them. So while QEMU and VirtualBox can run various network services, it's not very efficient. Docker, on the other hand, uses kernel-level virtualization which means it's the same OS but processes are grouped and different groups isolated between themselves, effectively creating a VM. That's why it's extremely fast and can run thousands of GNS3 nodes -> no calls between host and guest systems, it's the same kernel! Docker is quite versatile when it comes to managing custom made kernel-based VMs. It takes the load of the programmer so he/she doesn't have to think about disk space, node startup and isolation etc.

The second project, NetworkX is basically a graph analysis software written in Python. You define your graph with nodes and edges and run various graph algorithms on it. Before Google announced the selected organizations, NetworkX has been on the PSF GSOC wiki page, one of the first. They're the first organization I've contacted. While for GNS3 I just chose one of the already available project ideas, for NetworkX I've proposed to make a Tkinter GUI since it doesn't have one. It would enable users draw edges and graphs without actually writing programs. This wasn't exactly rejected but one of the core developers explained to me in a lengthy email that while they appreciate the effort, NetworkX is moving away from any kind of GUI development and that I should probably pick one of the existing ideas. So I chose to write a backend API for NetworkX.

Background: Until now, graph data has been represented as volatile Python dictionaries. It would make sense to provide a flexible backend interface to easily create modules for graph storage to efficiently access graphs at any time. This would usually include graph databases since their data representation is close what NetworkX does and such databases have efficient algorithms to access graph data but it shouldn't be restricted to such storages. Case in point would be document-store databases that can more or less directly save Python dictionaries as JSON data and load them. SQL databases are somewhat trickier because their data representation isn't directly compatible with graph.

The last project, SunPy, is a software package for solar physics computations. Now, while the previous two projects are more along the line of what I usually do and study, this is more of a whim! I mean, solar physics! Cool! The project ideas is to refactor on of its modules called Lightcurve. I have to admit I don't know a lot (close to nothing) about solar physics but this refactor project has more to do with actual Python refactoring. I'll probably have to learn something about solar physics, which I'd like to but although GNS3 was added to PSF organization list after SunPy, I've put most of my effort into writing the proposal for GNS3  because of my thesis with a touch of regret that I won't work on software that researches the Sun!

Whichever project gets selected, I'm sure it's going to be a fun and educational experience for me. Keeping my fingers crossed for the next post where I'll write in more detail about the project I'll (hopefully) work on.

Cheers

May 24, 2015

Abhijeet Kislay(pgmpy)

Integer Linear Programming in OpenGM

Today I will be discussing some of the ways in which inference algorithms especially the Linear Programming ones work in OpenGM. So there are many ways to solve an energy minimization problem. From the table below we can get an idea about the different accumulative operations used to do inference: That is, if we need […]

Rafael Neto Henriques(Dipy)

[RNH Post #3] Time to start mapping brain connections and looking to brain properties in vivo

Hi all,

Tomorrow we are starting the coding period :), so it is time for some details about my project and tell you what was done in the community bonding period.

1) How can we study brain connections and brain's tissue properties in vivo? - A simple introduction for non experts

Trajectory of neuronal connections (tractography) and quantification of tissue properties in the living human brain can be obtain from measures of water diffusion using MRI scans. To give you an example how this is done, I will first start by describing one of the simplest technique - the diffusion tensor imaging (DTI).

By combining the information of several diffusion weighted images, DTI models the water diffusion for each image element using a tensor which can be represented by an ellipsoid (see Figure below).

Figure 1. Diffusion tensors computed from all voxels of a real brain image. This image was produced using Dipy as described in Dipy's website.

From figure 1 we can see that diffusion is larger is some directions. In fact the direction of larger diffusion can be related to the direction of brain's white matter fibers. The axon myelin sheaths restricts the water diffusion and thus diffusion is smaller on the directions perpendicular to fibers. On the other hand, the diffusion parallel to fibers is less restricted and therefore matching the direction of fibers.

Based on this, 3D virtual reconstruction of brain connection can be obtain using specific tracking algorithms - a procedure which is named fiber tracking. An example of this 3D maps obtain from a real brain dataset is shown below.

Figure 2. Example of corpus callosum fibers. These fibers connect the left and right fiber hemispheres. This image was produced using Dipy as described in Dipy's website.

Nowadays, DTI is still one of the diffusion weighted techniques most used in both clinical applications and in many research studies, however it is not always accurate. DTI cannot account properly for the crossing of different populations of white-matter fiber connections. Moreover, it ignores the non-Gaussian properties of diffusion in biological tissues which can be used to derive interesting and important measures of tissue properties.

2) Project proposal

In this project, I will be implementing an alternative the diffusion-weighted technique named the diffusion kurtosis imaging (DKI) in an open source software project, the Diffusion Imaging in Python (Dipy). DKI overcomes the two major limitations of DTI:
1. It quantifies the non-Gaussian properties of water diffusion in biological tissues by modelling the kurtosis tensor (KT) which can be used to derive important tissue measures as the density of axonal fibers.
2. Relative to the diffusion tensor, KT is also shown to offer a better characterization of the spatial arrangement of tissue microstructure and can be used as a basis for more robust tractography. Particularly, DKI based tractography is sensitive to resolve crossing fibers.

3) What is done so far

As an update of what I posted previously (see Post #2), I finished the work on DKI's simulations - procedures that will be useful for testing the codes that I will be implementing during this summer. In particular, as my mentor suggested, I added some automatic debugging scripts using Nose python testing. These scripts are now insuring that the kurtosis tensor is symmetry (as expected) and that simulations are able to currently produce the diffusion tensor and kurtosis tensor in both cases of well aligned and crossing fibers.

Many thanks to my mentor for teaching me how to work with nose python testing. In particular, the useful tip running the nose tests and knowing which lines the testing scripts are covering by using the following command:

nosetests -v dipy/sims/tests/test_voxel.py --with-coverage --cover-package=dipy

4) Next steps

After merging the DKI simulations to Dipy's master brunch, I will start working on the DKI reconstruction modules, based on some preliminary preparation work previously submitted by other dipy contributors. At the end of the week, I intend to finish the first part of the DKI reconstruction modules - the KT estimation from diffusion-weighted signals. For this I will implement the standard ordinary linear least-squares (OLS) solution of DKI.

[RNH Post#1]

Hi guys,

This year I am applying to the Google Summer of Code 2015. I am proposing to implement some techniques based on Diffusion Kurtosis Imaging in DIPY (Diffusion Imaging in Python). If you are interested in diffusion MRI techniques and how to implemented them using python,  the information you will find here will be useful for you.

In this blog I will report my progress during the summer!

Hope you will enjoy.

Rafael

Julio Ernesto Villalon Reina(Dipy)

Community bonding period

Hi all, this is my first post since I got accepted to GSoC 2015. I am really excited about the start of the coding period and about being part of the greater community of the Python Software Foundation. Honestly, I am a bit scared, but I like the challenge and I am working with the best people. The Dipy team is really great! During the community bonding period I was able to interact with some of my mentors and draw a general plan of the coding phase.

First, a short intro to my project which is called: "Tissue classification to improve tractography."
This is the abstract of my project:

• Diffusion Magnetic Resonance Imaging (dMRI) is used primarily for creating visual representations of the structural connectivity of the brain also known as tractography. Research has shown that using a tissue classifier can be of great benefit to create more accurate representations of the underlying connections. The goal of this project is to generate tissue classifiers using dMRI or a different MRI modality e.g. T1-weighted MRI (T1). This reduces to an image segmentation task. I will have to implement popular segmentation algorithms using T1 and invent a new one using dMRI data.

As stated in my initial proposal, the first task for the the community bonding period was to read and discuss the paper by Zhang et al, 2001 (Yongyue Zhang; Brady, M.; Smith, S., "Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm," Medical Imaging, IEEE Transactions on , vol.20, no.1, pp.45,57, Jan 2001).  This paper gives us  a closer idea of how to approach the segmentation algorithm for MRI T1-weighted images of the brain. The goal is to derive partial volume estimates (PVEs) for each of the tissues and compartments of the brain, i.e. grey matter, white matter, and cerebrospinal fluid. With the mentors we defined the main strategy to code the segmentation algorithm proposed in the paper, which parts of the theory we would like to implement and which ones not as well as the general assumptions about the inputs to the program.

Rupak Kumar Das(SunPy)

The start…

Today is the last day of the Community Bonding period. It was an exciting month, studying the codebase and communicating with my mentors. I have already started my project by fixing up a bug and adding a small new feature to Ginga(though these are not much).

A funny thing happened today. The next thing on my list was the Cuts plugin which was to be modified so that it supports the addition of an extra cut. I was racking my head over how to add it when I discovered that it was already implemented! It was mentioned in the documentation but I had failed to notice it. I have now re-read the documentation in detail to prevent such occurences.

The meetings will start soon and frankly, I am a little scared! But it will be an interesting and new experience for me so I am looking forward to it!

Unobserved components models

The first model considered in the state space models GSoC 2015 project is the class of univariate unobserved components models. This blog post lays out the general structure and the different variations that will be allowed.

The basic unobserved components (or structural time series) model can be written (see Durbin and Koopman 2012, Chapter 3 for notation and additional details):

$$y_t = \underbrace{\mu_{t}}_{\text{trend}} + \underbrace{\gamma_{t}}_{\text{seasonal}} + \underbrace{c_{t}}_{\text{cycle}} + \underbrace{\varepsilon_t}_{\text{irregular}}$$

where different specifications for the different individual components can support a range of models.

Berker Peksag(ScrapingHub)

Hello, World

I'll be working on completing Python 3 port of Scrapy this summer. During community bonding period,

• I've checked my proposal and made some updates to the Twisted section. twisted.web.static has already been ported to Python 3.
• I've played with Twisted and read some good documentation about asynchronous programming in general.
• I've followed development of both Scrapy and Twisted from GitHub.
You can follow my work in the python3 branch at GitHub.

Vivek Jain(pgmpy)

Community Bonding Period

Now that the coding period for this year’s Summer of Code is about to start, I am extremely happy that things have been working pretty well with me and my mentors over this community bonding period. We had a group meeting on IRC and all of us are excited to have  a more than successful Summer of Code.

Community Bonding Period

In the community bonding period,I reviewed my proposal again and discussed with my mentors about what features are necessary, how things should be implemented and cleared my doubts. I read the documentation, read the code  to understand the flow of execution and how things have been implemented.

I read the documentation of pyparsing module which would  be used for parsing UAI file format. Here are some of the notes which i created from the documentation so that i can easily find around some functions which would be needed in the later stage.
1. import pyparsing module as import pyparsing as pp.
2. p.parseString(s) → input is “s” and parser is “p” .If the syntax of s matches the syntax described by p, this expression will return an object that represents the parts that matched. This object will be an instance of class pp.ParseResults.
3. pp.Word() class produces a parser that matches a string of letters defined by its first argument
4. Use pp.Group(phrase) to group things. For example to differentiate models with variable numbers use pp.Group().
5. Use setResultsName() to give name to the string which is returned for ex model_name = pp.Word(pp.alphas).setResultsName('modelName')

I also made the grammar for the UAI module.

Grammar for UAI Preamble:
Preamble --> model_name \n no_variables
model_name --> MARKOV | BAYES
no_variables --> IntegerNumber \n domain_variables
domain_variables --> IntegerNumber* \n no_functions
no_functions --> IntegerNumber \n function_definition*
function_definition* --> function_definition | function_definition function_definition*
function_definition --> size_function " " IntegerNumber*

Ziye Fan(Theano)

Theano: proposal changed

Hi there, I'm Ziye, I'm taking part in this year's GSoC. I apply for the Theano project because I use it in my own lab researching work (Btw, I do some music information retrieval work in lab).

The original proposal is to decrease the python overhead generated by theano. The new one is to decrease the compiling time. It will improve the theano performance. During the community bonding period, with help of my mentor Fred, the optimization objective is almost confirmed.

In the upcoming coding period. I'll put more time on it. It will be interesting.

Ziye

Vipul Sharma(MoinMoin)

GSoC 2015: Improving the Issue Tracker of MoinMoin 2.0

MoinMoin is an open source, advanced, easy to use wiki engine implemented in Python. It is a collaborative software that runs a wiki which allows users to create and edit web pages collaboratively. Some of the sites using MoinMoin wiki engine includes: Python, Mercurial, Apache, Ubuntu, Debian, Wireshark and many more.

In GSoC 2015, I'll be working on improving the issue tracker of MoinMoin 2.0.

Project Details

MoinMoin 2.0 has an existing implementation of a simple issue tracker. The current issue tracker requires some improvement in its UI/UX and few more features which would be good to have for better use.

Implementation

For improvement of the current issue tracker, the implementation of the task can be divided into 3 parts:
• Creation of new tickets
• View containing ticket list, searching / filtering of tickets
• Ticket update view

Creation of new tickets

• Enter title to the issue
• Search for possible duplicates: This can be implemented by providing auto suggestion of existing tickets based on the title of new ticket which the use wants to create. Currently, MoinMoin 2.0 uses Whoosh for searching queries, we can use this and jQuery for displaying real time suggestions of existing tickets if the title of new ticket to be created matches with any of the existing tickets.
• Improve metadata fields and add new fields
• File upload: add a feature to allow user to add upload a screenshot, media or a patch file
Wireframe for ticket creation view:

View containing ticket list (/+tickets), searching / filtering of tickets

Currently the /+tickets view lists all the tickets with some options including few important
filters like "all", open, "closed" and "tags". Also, the current view consist of an option for creating a new ticket and a search box to find tickets. In this view, we can include an "Advanced Search" feature where we can add additional filters for filtering tickets based on a particular "author", "tags", "assignee", "difficulty", "effort", "severity" and "priority".

For "Advanced Search" option, a new view /+tickets/query will be created, where additional filters will be provided through which we will filter the results based in the query applied in the view: /+tickets?<query_type>=<query>. We will also allow to apply multiple filters in our advanced search.

Wireframe for ticket list view and advanced search view:

Ticket update view

The update ticket view is similar to ticket create view. In this view, the comment mechanism can be improved by adding feature to reply to any comment, delete any comment, adding Markdown formatting syntax and posting comments after any updates in the meta data of the ticket.

Wireframe for update ticket view:

Community Bonding

In the community bonding period, I reviewed my proposal again and discussed with my mentors about what features are necessary, how things should be implemented and cleared my doubts. I read the documentation, read the code to understand the flow of execution and how things have been implemented.

I learned about form processing using Flatland, and its really cool :) as explained in their website:

Flatland maps between rich, structured Python application data and the string-oriented flat namespace of web forms, key/value stores, text files and user input. Flatland provides a schema-driven mapping toolkit with optional